a Taxonomy of collaborative-Based Recommender Systems Fabian P. lousame and eduardo sanchez 1 ntroduction The explosive growth in the amount of information available in the www and e emergence of e-commerce in recent years has demanded new ways to deliver personalized content. Recommender systems 51] have emerged in this context as a solution based on collective intelligence to either predict whether a particular user will like a particular item or identify the collection of items that will be of interest to a certain user. Recommender systems have an excellent ability to characterize and recommend items within huge collections of data, what makes them a computerized alternative to human recommendations. Since useful per- sonalized recommendations can add value to the user experience, some of the largest e-commerce web sites include recommender engines. Three well known examples are Amazon. com [1], LastFM [4 and Netfix 6 Although the first studies can be traced back to cognitive science, approxima tion theory and information retrieval among other fields, recommender systems became an independent research area in the mid-1990s when Resnick et al. 50 Hill et al.29 and Shardanand et al. 56 proposed recommendation techniques explicitly based on user rating information. Since then, numerous approaches have been developed that use content or historical information: user-item in teractions, explicit ratings, or web logs, among others. Nowadays, recommender systems are typically classified into the following categories content-based. if the user is recommended items that are content -similar to the items the user already liked collaborative, if the user is recommended items that people with similar tastes and preferences liked in the past hybrid, if the user is recommended items based on a combination of both nd content-based methods This chapter presents a study focused on recommender systems based on ollaborative filtering, the most successful recommendation technique to date The chapter provides the reader an overview of recommender systems based on collaborative filtering, contributes with a general taxonomy to classify the algo- rithms and approaches attending to a set of relevant features, and finally provides G. Castellano, L C. Jain, A.M. Fanelli(Eds ) Web Person in Intel. Environ., SCI 229, pp. 81-117 C Springer-Verlag Berlin Heidelberg 2009
5 A Taxonomy of Collaborative-Based Recommender Systems Fabi´an P. Lousame and Eduardo S´anchez 1 Introduction The explosive growth in the amount of information available in the WWW and the emergence of e-commerce in recent years has demanded new ways to deliver personalized content. Recommender systems [51] have emerged in this context as a solution based on collective intelligence to either predict whether a particular user will like a particular item or identify the collection of items that will be of interest to a certain user. Recommender systems have an excellent ability to characterize and recommend items within huge collections of data, what makes them a computerized alternative to human recommendations. Since useful personalized recommendations can add value to the user experience, some of the largest e-commerce web sites include recommender engines. Three well known examples are Amazon.com [1], LastFM [4] and Netflix [6]. Although the first studies can be traced back to cognitive science, approximation theory and information retrieval among other fields, recommender systems became an independent research area in the mid-1990s when Resnick et al. [50], Hill et al. [29] and Shardanand et al. [56] proposed recommendation techniques explicitly based on user rating information. Since then, numerous approaches have been developed that use content or historical information: user-item interactions, explicit ratings, or web logs, among others. Nowadays, recommender systems are typically classified into the following categories: • content-based, if the user is recommended items that are content-similar to the items the user already liked; • collaborative, if the user is recommended items that people with similar tastes and preferences liked in the past; • hybrid, if the user is recommended items based on a combination of both collaborative and content-based methods. This chapter presents a study focused on recommender systems based on collaborative filtering, the most successful recommendation technique to date. The chapter provides the reader an overview of recommender systems based on collaborative filtering, contributes with a general taxonomy to classify the algorithms and approaches attending to a set of relevant features, and finally provides G. Castellano, L.C. Jain, A.M. Fanelli (Eds.): Web Person. in Intel. Environ., SCI 229, pp. 81–117. springerlink.com c Springer-Verlag Berlin Heidelberg 2009
FP Lousame and e. sanchez some guidelines to decide which algorithm best fits on a given recommendation problem or domain. 2 Recommending Based on Collaborative Filtering The term Collaborative Filtering(CF) was first introduced by Goldberg et al 23. They presented Tapestry, an experimental mail system that combined both content-based filtering and collaborative annotations. Although the system wa enriched with collaborative information, users were required to write complex queries. The first system that automated recommendations was the groupLens system 50, 37 which helped users find relevant netnews from a huge stream of articles using ratings given by other similar users. Since then, many relevant research projects have been developed(Ringo 56, Video Recommender 29 Movielens [19, 5, Jester [24)and the results have positioned the CF techniques as the most successful ones to build recommender engines. Popular e-commerce systems, such as Amazon [1, CDNow 3 or LastFM [4, are taking advantage of the CF relies on the assumption that finding similar users to a new one and ex- mining their usage patterns leads to useful recommendations for the new user Users usually prefer items that like-minded users prefer, or even that dissimilar users don't prefer. This technology does not rely on the content descriptions of e items, but depends on preferences expressed by a set of users. These prefer ences can either be expressed explicitly by numeric ratings or can be indicated implicitly by user behaviors, such as clicking on a hyperlink, purchasing a book or reading a particular news article. CF requires no domain knowledge and offers the potential to uncover patterns that would be difficult or impossible to detect using content-based techniques. Besides that, collaborative filtering has proved its ability to identify the most appropriate item for each user, and the quality of recommendations is improved over time as long as the user database gets larger Two different approaches have been explored for building Pure CF recom- menders. The first approach, referred to as memory-based 56, 37, 15, 50, 27, 54 essentially makes rating predictions based on the entire collection of rated items Items frequently selected by users of the same group can be used to form the basis to build a list or recommended items. They produce high-quality recom- mendations but suffer serious scalability problems as the number of users and items grow. The other approach, known as model-based 56, 14, 15, 9, analyzes historical interaction information to build a model of the relations between differ ent items/users which is intended to find the recommended items. Model-based schemes produce faster recommendations than memory-based do, but requires a significant amount of time to build the models and leads to lower qualit Definitions and notati In the context of recommender systems, a dataset is defined as the collection of all transactions about the items that have been selected by a collection of users
82 F.P. Lousame and E. S´anchez some guidelines to decide which algorithm best fits on a given recommendation problem or domain. 2 Recommending Based on Collaborative Filtering The term Collaborative Filtering (CF) was first introduced by Goldberg et al. [23]. They presented Tapestry, an experimental mail system that combined both content-based filtering and collaborative annotations. Although the system was enriched with collaborative information, users were required to write complex queries. The first system that automated recommendations was the GroupLens system [50, 37] which helped users find relevant netnews from a huge stream of articles using ratings given by other similar users. Since then, many relevant research projects have been developed (Ringo [56], Video Recommender [29], Movielens [19, 5], Jester [24]) and the results have positioned the CF techniques as the most successful ones to build recommender engines. Popular e-commerce systems, such as Amazon [1], CDNow [3] or LastFM [4], are taking advantage of these engines. CF relies on the assumption that finding similar users to a new one and examining their usage patterns leads to useful recommendations for the new user. Users usually prefer items that like-minded users prefer, or even that dissimilar users don’t prefer. This technology does not rely on the content descriptions of the items, but depends on preferences expressed by a set of users. These preferences can either be expressed explicitly by numeric ratings or can be indicated implicitly by user behaviors, such as clicking on a hyperlink, purchasing a book or reading a particular news article. CF requires no domain knowledge and offers the potential to uncover patterns that would be difficult or impossible to detect using content-based techniques. Besides that, collaborative filtering has proved its ability to identify the most appropriate item for each user, and the quality of recommendations is improved over time as long as the user database gets larger. Two different approaches have been explored for building Pure CF recommenders. The first approach, referred to as memory-based [56, 37, 15, 50, 27, 54], essentially makes rating predictions based on the entire collection of rated items. Items frequently selected by users of the same group can be used to form the basis to build a list or recommended items. They produce high-quality recommendations but suffer serious scalability problems as the number of users and items grow. The other approach, known as model-based [56, 14, 15, 9], analyzes historical interaction information to build a model of the relations between different items/users which is intended to find the recommended items. Model-based schemes produce faster recommendations than memory-based do, but requires a significant amount of time to build the models and leads to lower quality recommendations. Definitions and Notation In the context of recommender systems, a dataset is defined as the collection of all transactions about the items that have been selected by a collection of users
nomy of Collaborative- Based Recommender Systems Symbols n and m will be used in this text to denote the number of distinct users and items in a particular dataset, respectively. Each dataset will be represented formally by a n x m matrix that will be referred to as the user-item matrix, A=uxI.u denotes the set of all users and I the set of all items available in he database. The value of element ak i E 1,0 denotes whether an interaction between user k and item i has been observed or not In a recommendation problem, there usually exists additional information about the utility of the user-item interactions, commonly captured as a rating that indicates how a particular user liked a particular item. This rating infor mation is represented in a different n x m matrix that will be denoted R. The rating that user k expressed for item i is in general a real number and will be referred to as Tki. Tk denotes the vector of all ratings of user k. In recommender systems terminology, the active user is the user that queries the recommender system for recommendations on some items. The symbol a will be used to refer to the active user's rating vector. By convention, if di denotes a vector that results from taking row i from a certain matrix D, d; will be used to denote the vector that results from taking column j from that matrix The symbol Ak refers to the set of items the user has already experienced and Rk is the set of items for which user k has actually given ratings. Note that RkCI and Rk CA Problem formulation In its most common formulation, the CF recommendation problem is reduced to the problem of estimating, using collaborative features, the utility for the items that have not been selected by the active user. Once these utilities for unseen items are estimated, a top-N recommendation can be built for every user, by recommending the user the items with the highest estimated values This estimation is usually computed from the ratings explicitly given by the active user to a specific set of items(rating-based filtering) but ratings could also be derived from historical data(purchases, . or from other sources of information In the rest of the chapter we will assume without loss of generality that interactions are based on rating activity. In movie recommendation, fc instance, the input to the recommender engine would be a set of movies the user has seen, with some numerical rating associated with each of these movies. The output of the recommender system would be another set of movies, not yet rated by the user, that the recommender predicts to be highly rated by the user More formally, given the user-item rating matrix R and the set of ratings a specified by the active user, the recommender engine tries to identify an ordered set of items a such that xnRk=0. To achieve this, the recommendation engine defines a function (k,j)→v(k,j)=E(rk)
A Taxonomy of Collaborative-Based Recommender Systems 83 Symbols n and m will be used in this text to denote the number of distinct users and items in a particular dataset, respectively. Each dataset will be represented formally by a n × m matrix that will be referred to as the user-item matrix, A = U×I. U denotes the set of all users and I the set of all items available in the database. The value of element ak,i ∈ {1, 0} denotes whether an interaction between user k and item i has been observed or not. In a recommendation problem, there usually exists additional information about the utility of the user-item interactions, commonly captured as a rating that indicates how a particular user liked a particular item. This rating information is represented in a different n × m matrix that will be denoted R. The rating that user k expressed for item i is in general a real number and will be referred to as rk,i. rk denotes the vector of all ratings of user k. In recommender systems terminology, the active user is the user that queries the recommender system for recommendations on some items. The symbol a will be used to refer to the active user’s rating vector. By convention, if di denotes a vector that results from taking row i from a certain matrix D, dT j will be used to denote the vector that results from taking column j from that matrix. The symbol Ak refers to the set of items the user has already experienced and Rk is the set of items for which user k has actually given ratings. Note that Rk ⊆ I and Rk ⊆ A. Problem Formulation In its most common formulation, the CF recommendation problem is reduced to the problem of estimating, using collaborative features, the utility for the items that have not been selected by the active user. Once these utilities for unseen items are estimated, a top-N recommendation can be built for every user, by recommending the user the items with the highest estimated values. This estimation is usually computed from the ratings explicitly given by the active user to a specific set of items (rating-based filtering) but ratings could also be derived from historical data (purchases, ...) or from other sources of information. In the rest of the chapter we will assume without loss of generality that interactions are based on rating activity. In movie recommendation, for instance, the input to the recommender engine would be a set of movies the user has seen, with some numerical rating associated with each of these movies. The output of the recommender system would be another set of movies, not yet rated by the user, that the recommender predicts to be highly rated by the user. More formally, given the user-item rating matrix R and the set of ratings a specified by the active user, the recommender engine tries to identify an ordered set of items X such that X ∩Rk = ∅. To achieve this, the recommendation engine defines a function ν : U×I→ (k, j) → ν(k, j) = E(rk,j ) (1)
FP Lousame and e. sanchez Input Interface CF Algorithm Item for which user Fig. 1. Illustration of the recommendation process. Given the vector of ratings of the active user, the collaborative filtering algorithm produces selecting the N items with the highest estimated predictions. that predicts the utility of the interactions between each user k and every item j. Note that for a given user k, the utilities need to be computed only for items jET-Rk. Once all utilities are predicted, recommendations to the active user re made by selecting the items with the highest estimated utility(see figure 1) The prediction computation is usually performed on a sparse user-item matrix. Typical values of sparsity are in the order of 98%, what means an almost empty interaction matrix In addition to recommender systems that predict the absolute values of rat ings, there are other proposals focused on preference-based filtering, i.e., pre- icting the relative preferences of users [18, 35, 36. These techniques predict the correct relative order of the items rather than their individual ratings 2.1 Memory-Based Collaborative Filtering Memory-based collaborative filtering is motivated from the observation that users usually trust the recommendations from like-minded neighbors. These methods are aimed at computing unknown relations between users and items by means of nearest neighbor schemes that either identify pairs of items that tend to be rated similarity or users with a similar rating history. Memory-based collaborative filtering became very popular because they are easy-to-implement very intuitive, avoid the need of training and tuning many parameters, and the user can easily understand the rationale behind each recommendation Three components characterize this approach: (1)data preprocessing, in which input data to the recommender engine is preprocessed to remove global effects to normalize ratings, etc;(2)neighborhood selection, which consists in selecting the set of K users items that are most similar to the active user to the set of items already rated by the active user; and (3) prediction computation, which generates predictions and aggregates items in a top-N recommendation. Table 1 summarizes different memory-based algorithms that are briefly explained in next
84 F.P. Lousame and E. S´anchez Fig. 1. Illustration of the recommendation process. Given the vector of ratings of the active user, the collaborative filtering algorithm produces a recommendation by selecting the N items with the highest estimated predictions. that predicts the utility of the interactions between each user k and every item j. Note that for a given user k, the utilities need to be computed only for items j ∈I−Rk. Once all utilities are predicted, recommendations to the active user are made by selecting the items with the highest estimated utility (see figure 1). The prediction computation is usually performed on a sparse user-item matrix. Typical values of sparsity are in the order of 98%, what means an almost empty interaction matrix. In addition to recommender systems that predict the absolute values of ratings, there are other proposals focused on preference-based filtering, i.e., predicting the relative preferences of users [18, 35, 36]. These techniques predict the correct relative order of the items, rather than their individual ratings. 2.1 Memory-Based Collaborative Filtering Memory-based collaborative filtering is motivated from the observation that users usually trust the recommendations from like-minded neighbors. These methods are aimed at computing unknown relations between users and items by means of nearest neighbor schemes that either identify pairs of items that tend to be rated similarity or users with a similar rating history. Memory-based collaborative filtering became very popular because they are easy-to-implement, very intuitive, avoid the need of training and tuning many parameters, and the user can easily understand the rationale behind each recommendation. Three components characterize this approach: (1) data preprocessing, in which input data to the recommender engine is preprocessed to remove global effects, to normalize ratings, etc; (2) neighborhood selection, which consists in selecting the set of K users [items] that are most similar to the active user [to the set of items already rated by the active user]; and (3) prediction computation, which generates predictions and aggregates items in a top-N recommendation. Table 1 summarizes different memory-based algorithms that are briefly explained in next subsections
A Taxonomy of Collaborative-Based Recommender Syster Table 1. Summary of memory-based algorithms based on the different components of the recommendation process Neighborhood selection Prediction computation Pearson correlation User-based Ratings Rating aggregation (default voting) Most frequent item Mean squared difference Predictability Predictability co Linear rating transforma- Item-based Conditional probability. Rating aggregation (adjusted ratings)based similarity Item-to-item coo Cluster-based (cluster-based Pearson correlation Rating aggregation nothing trust of users Trust inferences Ratings Weighted average composi-.Rating aggregation Improved Ratings Weight optimization Rating aggregation This CF approach estimates unknown ratings based on recorded ratings of like- nded users. The predicted rating of the active user for item j is a weighted sum of ratings of other user (r1-) where Uk denotes the set of users in the database that satisfy wk, 1+0. This weights can reflect distance, correlation or similarity between each user and the active user. Fk and Fi represent the mean rating of the active user k and user L, espectively. Different weighting functions can be considered. Pearson correlation, cosine vector similarity, Spearman correlation, entropy-based uncertainty, mean-square difference are some examples. The Pearson correlation(eq. 3) was the first measure used to compute these weights 50. Breese et al. [15 and Herlocker et al.[27 proved that Pearson correlation performs better than other metrics (rk.i-7k)(r;-) I Note that Pearson correlation is defined in [1,+1] and then, in order to make sense sing negative weights, ratings should be re-scaled to fit [-r, +rl
A Taxonomy of Collaborative-Based Recommender Systems 85 Table 1. Summary of memory-based algorithms based on the different components of the recommendation process Data (preprocessing) Neighborhood selection Prediction computation User-based Ratings (default voting) · Pearson correlation · Vector similarity → Inverse user frequency · Mean squared difference · Rating aggregation · Most frequent item Predictability paths Ratings · Predictability condition heuristics · Linear rating transformation Item-based Ratings (adjusted ratings) · Vector similarity · Pearson correlation · Conditional probability based similarity · Rating aggregation · Regression based Item-to-item coocurrence Cluster-based smoothing Ratings (cluster-based smoothing) · Pearson correlation · Rating aggregation Trust inferences Ratings · Compute trust of users → Pearson correlation Weighted average composition · Rating aggregation Improved neighborhood Ratings (remove global effects) · Weight optimization · Rating aggregation User-Based This CF approach estimates unknown ratings based on recorded ratings of likeminded users. The predicted rating of the active user for item j is a weighted sum of ratings of other users, νk,j = ¯rk + l∈Uk wk,l · (rl,j − r¯l) l∈Uk |wk,l| (2) where Uk denotes the set of users in the database that satisfy wk,l = 0. This weights can reflect distance, correlation or similarity between each user and the active user. ¯rk and ¯rl represent the mean rating of the active user k and user l, respectively. Different weighting functions can be considered. Pearson correlation, cosine vector similarity, Spearman correlation, entropy-based uncertainty, mean-square difference are some examples. The Pearson correlation (eq. 3)1 was the first measure used to compute these weights [50]. Breese et al. [15] and Herlocker et al. [27] proved that Pearson correlation performs better than other metrics. wk,l = i∈Rk∩Rl (rk,i − r¯k)(rl,i − r¯l) i∈Rk∩Rl (rk,i − r¯k)2 i∈Rk∩Rl (rl,i − r¯l)2 (3) 1 Note that Pearson correlation is defined in [−1, +1] and then, in order to make sense when using negative weights, ratings should be re-scaled to fit [−r,+r].