Introductio Recommender Systems o What is this lecture about? What is the purpose of a recommender system? Alban galland b What are the key features What are the main challenges? When to use it 18 March 2010 How to design it? 03/18/20101/ 03/18/20102/4 Content Content O Who uses a recommender system? O Who uses a recommender system? Q What tasks and data correspond to a recommendation problem? e What tasks an o Content-filter . Collaborative-filtering algorithms o Collabo No o User-based o Item-based ● Hybrid methods furthe O To go o Interesting issues o Interesting iss
Recommender Systems Alban Galland INRIA-Saclay 18 March 2010 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 1 / 42 Introduction What is this lecture about? I What is the purpose of a recommender system? I What are the key features? I How does it work? I What are the main challenges? I When to use it? I How to design it? A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 2 / 42 Content 1 Who uses a recommender system? 2 What tasks and data correspond to a recommendation problem? 3 How to do it? Content-filtering algorithms Collaborative-filtering algorithms Not personalized User-based Item-based Hybrid methods 4 To go further Interesting issues Bibliography A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 3 / 42 Who uses a recommender system? Content 1 Who uses a recommender system? 2 What tasks and data correspond to a recommendation problem? 3 How to do it? Content-filtering algorithms Collaborative-filtering algorithms Not personalized User-based Item-based Hybrid methods 4 To go further Interesting issues Bibliography A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 4 / 42
Content site e Commerce site o Example: Amazon, Netflix t, The Ar a cat g Examples: Allo Cine, Zagat, o Task: build group of products Library Thing, Last. fm, Pandora generally find a list of products g Task: predict ratings of items by ven user or find a list of o Data: list of purchases and owning history for all users o Data: precise conter 爵 description, explicit rating for 03/18/20105 03/18/20106/4 e Commerce site Advertisement o The Netflix challenge Sponsored Links o Example: google AdSense SIM prize competition Double Click lew car Input: huge training dataset Goal: improve root mean square prediction error rate of 10% compare g Task: find a list of iew New Used Local Listings Now to Netflix agorithm 40000+teams from 186 countries(5000+ teams with valid according to expected income Own car o Data: browsing history for al Begins October 2006, winners in June 2009 ww thefreecar com
Who uses a recommender system? Content site Examples: AlloCine, Zagat, LibraryThing, Last.fm, Pandora, StumbleUpon Task: predict ratings of items by a given user or find a list of interesting items Data: precise content description, explicit rating for some user Recommendation on LibraryThing A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 5 / 42 Who uses a recommender system? eCommerce site Example: Amazon, Netflix Task: build group of products for bundle sales or more generally find a list of products that the user is likely to buy Data: list of purchases and browsing history for all users Recommendation on Amazon A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 6 / 42 Who uses a recommender system? eCommerce site The Netflix challenge I $1M prize competition I Input: huge training dataset I Goal: improve root mean square prediction error rate of 10% compare to Netflix algorithm I 40000+ teams from 186 countries (5000+ teams with valid submissions) I Begins October 2006, winners in June 2009 A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 7 / 42 Who uses a recommender system? Advertisement Example: Google AdSense, DoubleClick Task: find a list of advertisements optimized according to expected income Data: browsing history for all users Recommendation on Google A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 8 / 42
Content What to do with data? e What tasks and data correspond to a recommendation problem? o Two kinds of problem with data Information retrieval (IR): static content, dynamic query= modeling o Content-filtering algorithms content(organized with index Collabor Information filtering(IF): dynamic content, static query = modeling N query(organized as filters o Recommendation is between IR and IF since the content varies slowly IF are then used to reduce computation at query time OtA and the queries depend of few parameters. Methods of both IR and 03/18/20109/42 03/18/201010/4 Task(1) Task(2) ● General purpose g Degree of personalization Top-k filtering: list of "best"items(main usage) or anti-spam everyone receives same b items correlation: find similar items nographic: everyone in the same category receives same Prediction of rating: predict affinity between any pair of an user and an Contextual: recommendation depends only on current activity Persistent: recommendation depends on long-term interests
What tasks and data correspond to a recommendation problem? Content 1 Who uses a recommender system? 2 What tasks and data correspond to a recommendation problem? 3 How to do it? Content-filtering algorithms Collaborative-filtering algorithms Not personalized User-based Item-based Hybrid methods 4 To go further Interesting issues Bibliography A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 9 / 42 What tasks and data correspond to a recommendation problem? What to do with data? Two kinds of problem with data: I Information retrieval (IR): static content, dynamic query ⇒ modeling content (organized with index) I Information filtering (IF): dynamic content, static query ⇒ modeling query (organized as filters) Recommendation is between IR and IF since the content varies slowly and the queries depend of few parameters. Methods of both IR and IF are then used to reduce computation at query time. A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 10 / 42 What tasks and data correspond to a recommendation problem? Task(1) General purpose I Top-k filtering: list of “best” items (main usage) or anti-spam I Items correlation: find similar items I Prediction of rating: predict affinity between any pair of an user and an item (more general) A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 11 / 42 What tasks and data correspond to a recommendation problem? Task(2) Degree of personalization I Generic: everyone receives same recommendations I Demographic: everyone in the same category receives same recommendations I Contextual: recommendation depends only on current activity I Persistent: recommendation depends on long-term interests A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 12 / 42
Data(1) Data(2) o Context of the current page(current request, item currently explored and structured content about this context o History of the current user on the system(explicit or implicit ratings) of all user Users Items of the current user on multiple systems, the whole web or even on its computer o History of all users on multiple systems, the whole web or even their puter 03/18/20 03/18/201014/42 Explicit ratings Implicit ratings o Numeric ratings Numeric scale, usually between 2(thumb up/ thumb down )and 15 o Based teraction and time (between A+ and E-)levels purchase The more levels you have, the much data you get but the much variance you have on these data browsing( page view time) Numeric ratings should be normalized cursor on the page o Partial order: comparison between two items o Used to generate an implicit nu rating o Semantic information: tags, labels
What tasks and data correspond to a recommendation problem? Data (1) Context of the current page (current request, item currently explored and structured content about this context) History of the current user on the system (explicit or implicit ratings) History of all users on the system History of the current user on multiple systems, the whole web or even on its computer History of all users on multiple systems, the whole web or even their computer A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 13 / 42 What tasks and data correspond to a recommendation problem? Data (2) In general, three matrix as input: I Users attributes I Items attributes I Rating matrix A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 14 / 42 What tasks and data correspond to a recommendation problem? Explicit ratings Numeric ratings: I Numeric scale, usually between 2 (thumb up/thumb down) and 15 (between A+ and E-) levels. I The more levels you have, the much data you get but the much variance you have on these data. I Numeric ratings should be normalized. Partial order: comparison between two items Semantic information: tags, labels A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 15 / 42 What tasks and data correspond to a recommendation problem? Implicit ratings Based on interaction and time I purchase I clicks I browsing (page view time) I cursor on the page Used to generate an implicit numeric rating A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 16 / 42
Content General scope a Who uses a re 3 How to do it o Purely editorial(still used for some advertisement) o Collaborative-filtering algorithms o Content filtering: depending on attributes of items g Collaborative filtering: depending on ratings of all users o User-based o Hybrid o Item-based ● Hybrid methods o Interesting 03/18/201017/42 03/18/201018/42 ontent filtering algorithms do it? Collaborative-filtering algorithms Content-filtering algorithms Direct aggregation o Usually, content-filtering algorithms means an algorithm based on the g Usually, collaborative filtering algorithm means an algorithm based on attributes of the items and the ratings of the targeted user the rating matrix. g Interpretation of the preferences of users as a function of the g The recommender system displays some statistics summary attributes e the average rating of the users o Two main methods. age rating of professional Heuristic-based: Use common techniques of information retrieval set of reviews of the users or of professional reviewer presented earlier in the course: TF/IDF, cosine, clusterin o Some basic techniques such as explicit voting or date are used to rank Model-based: Use a probabilistic model to learn prediction of users
How to do it? Content 1 Who uses a recommender system? 2 What tasks and data correspond to a recommendation problem? 3 How to do it? Content-filtering algorithms Collaborative-filtering algorithms Not personalized User-based Item-based Hybrid methods 4 To go further Interesting issues Bibliography A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 17 / 42 How to do it? General scope Purely editorial (still used for some advertisement) Content filtering: depending on attributes of items Collaborative filtering: depending on ratings of all users Hybrid A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 18 / 42 How to do it? Content-filtering algorithms Content-filtering algorithms Usually, content-filtering algorithms means an algorithm based on the attributes of the items and the ratings of the targeted user Interpretation of the preferences of users as a function of the attributes Two main methods: I Heuristic-based: Use common techniques of information retrieval presented earlier in the course : TF/IDF, cosine, clustering... I Model-based: Use a probabilistic model to learn prediction of users from attributes A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 19 / 42 How to do it? Collaborative-filtering algorithms Direct aggregation Usually, collaborative filtering algorithm means an algorithm based on the rating matrix. The recommender system displays some statistics summary I the average rating of the users I the average rating of professional reviewers. I a set of reviews of the users or of professional reviewer Some basic techniques such as explicit voting or date are used to rank reviews. A. Galland (INRIA-Saclay) Recommender Systems 03/18/2010 20 / 42