FP Lousame and e. sanchez Vector similarity is another weighting function that can be used to measure the similarity between users ∈R∩R1 Tki. rli Though Pearson correlation and vector similarity are the most popular, other metrics are also used. For instance, Shardanand and Maes 56 used a Mean Squared Difference to compute the degree of dissimilarity between users k and I and predictions were made by considering all users with a dissimilarity to the user which was less than a certain threshold and computing the weighted average of the ratings provided by the most similar users, where weights were inverse proportional to this dissimilarity. They also presented a Constrained Pearson correlation to take into account the positivity and negativity of ratings Most frequent item recommendation. Instead of using equation 2 to compute predictions and then construct a top-N ndation by selecting the highest predicted items, each similar item could be ranked according to how many similar Sk, j Uk/al.s= ind the recommendation list would be then computed by sorting the most fre- quently selected N items Weighting Schemes Breese et al. [15 investigated different modifications to the weighting function that have shown to improve performance of this memory-based approach Default voting was proposed as an extension of the Pearson correlation(equation 3)that improves the similarity measure in cases in which either the active user or the matching user have relatively few ratings(Rk n R has very few items) Refer to [15 for a mathematical formulation Inverse user frequency tries to reduce weights for commonly selected items based on the background idea that commonly selected items are not as useful in char acterizing the user as those items that are selected less frequently. Following the original concepts in the domain of information retrieval [10 the user inverse frequency can be defined as I ukI {uk:i∈Bk where ni is the number of users who rated item i and n is the total number f users in the database. To use the inverse user frequency in equation 4 the transformed rating is simply the original rating multiplied by the user inverse frequency. It can also be used in correlation but the transformation is not direct (see Breese et al. [15 for a detailed description)
86 F.P. Lousame and E. S´anchez Vector similarity is another weighting function that can be used to measure the similarity between users: wk,l = i∈Rk∩Rl rki · rli i∈Rk∩Rl r2 ki i∈Rk∩Rl r2 li (4) Though Pearson correlation and vector similarity are the most popular, other metrics are also used. For instance, Shardanand and Maes [56] used a Mean Squared Difference to compute the degree of dissimilarity between users k and l and predictions were made by considering all users with a dissimilarity to the user which was less than a certain threshold and computing the weighted average of the ratings provided by the most similar users, where weights were inverse proportional to this dissimilarity. They also presented a Constrained Pearson correlation to take into account the positivity and negativity of ratings in absolute scales. Most frequent item recommendation. Instead of using equation 2 to compute predictions and then construct a top-N recommendation by selecting the highest predicted items, each similar item could be ranked according to how many similar users selected it sk,j = l∈Uk/al,j=1 1 (5) and the recommendation list would be then computed by sorting the most frequently selected N items. Weighting Schemes Breese et al. [15] investigated different modifications to the weighting function that have shown to improve performance of this memory-based approach: Default voting was proposed as an extension of the Pearson correlation (equation 3) that improves the similarity measure in cases in which either the active user or the matching user have relatively few ratings (Rk ∩ Rl has very few items). Refer to [15] for a mathematical formulation. Inverse user frequency tries to reduce weights for commonly selected items based on the background idea that commonly selected items are not as useful in characterizing the user as those items that are selected less frequently. Following the original concepts in the domain of information retrieval [10] the user inverse frequency can be defined as: fi = log | {uk} | | {uk : i ∈ Bk} | = log n ni (6) where ni is the number of users who rated item i and n is the total number of users in the database. To use the inverse user frequency in equation 4 the transformed rating is simply the original rating multiplied by the user inverse frequency. It can also be used in correlation but the transformation is not direct (see Breese et al. [15] for a detailed description).
A Taxonomy of Collab based recommender syster Predictability Paths Aggarwal et al. 9 proposed a graph-based recommendation algorithm in which the users are represented as nodes of a graph and the edges between the nodes indicate the degree of similarity between the users. The recommendations for a user were computed by traversing nearby nodes in this graph. The graph repre- entation has the ability to capture transitive relations which cannot be captured by nearest neighborhood algorithms. Authors reported better performance than the user -based schemes The approach is based on the concepts of horting and predictability. The horting condition states whether there is enough overlap between each pair of users(k, i )to decide whether the behavior of one user could predict the behavior of the other or not. By definition, user k horts user l if the following equation is card(Rk∩R)2min(F·card(Rk),G) where F< l and G is some predefined threshold. The predictability condition establishes that user I predicts behavior of user k if there exists a linear rating transformation k1,k,·k=8·7y+t carries ratings rL,] of user l into ratings Tk,j of user k with an acceptable The(s, t) pair of real numbers is chosen so that the transformation 8 keeps at least one value in the rating domain(see 9 for further details on s-t value pair restrictions). More formally, user I predicts user k if user k horts user I(eq 7)and if there exists a linear rating transformation Ts, t such that the expression 9 is satisfied, with B a positive real number. ∑/∈RnR1n-xk) Each arc between users k and l indicates that user l predicts user k and therefore it has associated a linear transformation TskltkI. Using an appropriate graph search algorithm a set of optimal directed paths between user k and any user that selected item j can be constructed. Each directed path allows a rating prediction computation based on the composition of transformations(eq. 8) For instance, given the directed graph k→l1→….→ In with predictor values (sk. 1, tk, 1), ($12, t1.2),.(sn-1,n, tn-1n)the predicted rating of item j will be SkA,tL o(Tsa,ty o(.oTsn-lntn-(rn.).). Since different paths may exist the average of these predicted ratings is computed as the final prediction. a top- N recommendation is constructed by aggregating the n items with the highest predicted ratings Item-Based The item-based algorithm is an analogous alternative to the user-based approach that was proposed by Sarwar et al. 53 to address the scalability problems of
A Taxonomy of Collaborative-Based Recommender Systems 87 Predictability Paths Aggarwal et al. [9] proposed a graph-based recommendation algorithm in which the users are represented as nodes of a graph and the edges between the nodes indicate the degree of similarity between the users. The recommendations for a user were computed by traversing nearby nodes in this graph. The graph representation has the ability to capture transitive relations which cannot be captured by nearest neighborhood algorithms. Authors reported better performance than the user-based schemes. The approach is based on the concepts of horting and predictability. The horting condition states whether there is enough overlap between each pair of users (k,l) to decide whether the behavior of one user could predict the behavior of the other or not. By definition, user k horts user l if the following equation is satisfied: card(Rk ∩ Rl) ≥ min(F · card(Rk), G) (7) where F ≤ 1 and G is some predefined threshold. The predictability condition establishes that user l predicts behavior of user k if there exists a linear rating transformation Tsk,l,tk,l : xk,j = s · rl,j + t (8) that carries ratings rl,j of user l into ratings xk,j of user k with an acceptable error. The (s, t) pair of real numbers is chosen so that the transformation 8 keeps at least one value in the rating domain (see [9] for further details on s-t value pair restrictions). More formally, user l predicts user k if user k horts user l (eq. 7) and if there exists a linear rating transformation Ts,t such that the expression 9 is satisfied, with β a positive real number. j∈Rk∩Rl |rk,j − xk,j )| card(Rk ∩ Rl) < β (9) Each arc between users k and l indicates that user l predicts user k and therefore it has associated a linear transformation Tsk,l,tk,l . Using an appropriate graph search algorithm a set of optimal directed paths between user k and any user l that selected item j can be constructed. Each directed path allows a rating prediction computation based on the composition of transformations (eq. 8). For instance, given the directed graph k → l1 → ... → ln with predictor values (sk,1, tk,1),(s1,2, t1,2), ...,(sn−1,n, tn−1,n) the predicted rating of item j will be Tsk,1,tk,1 ◦ (Ts1,2,t1,2 ◦ (... ◦Tsn−1,n,tn−1,n (rn,j )...)). Since different paths may exist, the average of these predicted ratings is computed as the final prediction. A topN recommendation is constructed by aggregating the N items with the highest predicted ratings. Item-Based The item-based algorithm is an analogous alternative to the user-based approach that was proposed by Sarwar et al. [53] to address the scalability problems of
F.P. Lousame and e. sanchez the user-based approach. The algorithm, in its original formulation, generates a list of recommendations for the active user by selecting new items that are similar to the collection of items already rated by the user. As for the user-based approach, the item-based approach consists of two different components: the similarity computation and the prediction computation. There are different ways to compute the similarity between items. Here we present four of these methods: vector similarity, Pearson correlation, adjusted vector similarity and conditional probability-based similarity. Vector similarity. One way to compute the similarity between items is to con- sider each item i as a vector in the m dimensional user space. The similarity between any two items i and j is measured by computing the cosine of the angle between these two vectors: k∈U,∩U,Tk,iTkj (10) k∈ VinU,k,iv2k∈U∩U,k,j where the summation is extended to users who rated both of the items. k e U∩U Pearson correlation. Similarly to equation 3, the Pearson correlation between items i and j is given by: ∑k∈Unu,(Tk,x-元)(Tk,;-) (11) eLnU,(rk-7)21/∑k∈Unu,(rk-示)2 where Fi and Fi denote the average rating of items i and 3, respectively Adjusted vector similarity. Computing similarity between items using the vector similarity has one important draw back: the difference in the rating scale between different users is not taken into account. This similarity measure addresses this problem by subtracting the corresponding user average from each rating Wii= (12) EkeunU, (rk i-k)2 Ekeu, nu, (Tkj-ik)2 Conditional probability-based similarity. An alternative way to compute the ilarity between each pair of items is to use a measure based on the condi probability of selecting one of the items given that the other item was selected This probability can be expressed as the number of users that selected both items i,j divided by the total number of users that selected item 1{uk:i,j∈Rk} un=P(1)={k:1∈R 13) Note that this similarity measure is not symmetric: PGi)+ P(il])
88 F.P. Lousame and E. S´anchez the user-based approach. The algorithm, in its original formulation, generates a list of recommendations for the active user by selecting new items that are similar to the collection of items already rated by the user. As for the user-based approach, the item-based approach consists of two different components: the similarity computation and the prediction computation. There are different ways to compute the similarity between items. Here we present four of these methods: vector similarity, Pearson correlation, adjusted vector similarity and conditional probability-based similarity. Vector similarity. One way to compute the similarity between items is to consider each item i as a vector in the m dimensional user space. The similarity between any two items i and j is measured by computing the cosine of the angle between these two vectors: wi,j = k∈Ui∩Uj rk,irk,j k∈Ui∩Uj r2 k,i k∈Ui∩Uj r2 k,j (10) where the summation is extended to users who rated both of the items, k ∈ Ui ∩ Uj . Pearson correlation. Similarly to equation 3, the Pearson correlation between items i and j is given by: wi,j = k∈Ui∩Uj (rk,i − r¯i)(rk,j − r¯j ) k∈Ui∩Uj (rk,i − r¯i)2 k∈Ui∩Uj (rk,j − r¯j )2 (11) where ¯ri and ¯rj denote the average rating of items i and j, respectively. Adjusted vector similarity. Computing similarity between items using the vector similarity has one important drawback: the difference in the rating scale between different users is not taken into account. This similarity measure addresses this problem by subtracting the corresponding user average from each rating: wi,j = k∈Ui∩Uj (rk,i − r¯k)(rk,j − r¯k) k∈Ui∩Uj (rk,i − r¯k)2 k∈Ui∩Uj (rk,j − r¯k)2 (12) Conditional probability-based similarity. An alternative way to compute the similarity between each pair of items is to use a measure based on the conditional probability of selecting one of the items given that the other item was selected. This probability can be expressed as the number of users that selected both items i, j divided by the total number of users that selected item i: wi,j = P(j|i) = | {uk : i, j ∈ Rk} | | {uk : i ∈ Rk} | (13) Note that this similarity measure is not symmetric: P(j|i) = P(i|j).
A Taxonomy of Collaborative-Based Recommender To compute predictions using the item-based approach, a recommendation list generated by ranking items with a prediction measure computed by taking a weighted average over all active users ratings for items in the collection Rk ∈Rk7k,·U Dk,j Model Based Interpretation Since similarities among items do not change frequently, relations between items can be stored in a model M. This is why some researchers consider the item-based is a model-based approach to collaborative filtering Model M could contain all relations between pairs of items but one common approach is to store, for each item i, its top-K similar items only. This para of m on k is moti- vated due to performance considerations. By using a small value of K, M would be very sparse and then similarity information could be stored in memory even in situations in which the number of items in the dataset is very large. However if K is very small, the resulting model will contain limited information and could potentially lead to low quality recommendations(see 53 for further reading Item-to-Item Extension Greg Linden et al. [41 proposed this extension to the item-based approach that is pable of producing recommendations in real time, to scale to massive datasets and to generate high-quality recommendations. The algorithm is essentially an tem-based approach but includes several advantages to make the item-to-item algorithm faster than the item-based: (1)the similarity computation is extended only to item pairs with common users(co-ocurrent items)and(2)the recom- nendation list is computed by looking into a small set that aggregates items that were found similar to a certain basket of user selections To determine the most similar match from a given item, the algorithm builds a co-ocurrence matrix by finding items that users tend to select together. The similarity between two items i and j is not zero if at least q+l users have selected the pair (i,j), with q 20 some predefined threshold. The similarity between two items satisfying this property can be computed in various ways but a common method is to use the cosine similarity described in equation 10. Predictions fo new items are computed with equation 14(see [41 for further details) Cluster-Based Smoothing Xue et al. 59 proposed a collaborative filtering algorithm that provides higher accuracy as well as increased efficiency in recommendations. The algorithm is a user-based algorithm that has been enhanced with clustering and a rating smoothing mechanism based on clustering results. Clustering was performed by using the K-means algorithm with the Pearson correlation coefficient as the distance metric(eq 3)between users Data smoothing is a mechanism to fill in the missing values of the rat ing matrix. To do data smoothing Xue et al. 59 made explicit use of the
A Taxonomy of Collaborative-Based Recommender Systems 89 To compute predictions using the item-based approach, a recommendation list is generated by ranking items with a prediction measure computed by taking a weighted average over all active user’s ratings for items in the collection Rk: νk,j = i∈Rk rk,i · wi,j i∈Rk |wi,j | (14) Model Based Interpretation Since similarities among items do not change frequently, relations between items can be stored in a model M. This is why some researchers consider the item-based is a model-based approach to collaborative filtering. Model M could contain all relations between pairs of items but one common approach is to store, for each item i, its top-K similar items only. This parameterization of M on K is motivated due to performance considerations. By using a small value of K, M would be very sparse and then similarity information could be stored in memory even in situations in which the number of items in the dataset is very large. However, if K is very small, the resulting model will contain limited information and could potentially lead to low quality recommendations (see [53] for further reading). Item-to-Item Extension Greg Linden et al. [41] proposed this extension to the item-based approach that is capable of producing recommendations in real time, to scale to massive datasets and to generate high-quality recommendations. The algorithm is essentially an item-based approach but includes several advantages to make the item-to-item algorithm faster than the item-based: (1) the similarity computation is extended only to item pairs with common users (co-ocurrent items) and (2) the recommendation list is computed by looking into a small set that aggregates items that were found similar to a certain basket of user selections. To determine the most similar match from a given item, the algorithm builds a co-ocurrence matrix by finding items that users tend to select together. The similarity between two items i and j is not zero if at least q+1 users have selected the pair (i, j), with q ≥ 0 some predefined threshold. The similarity between two items satisfying this property can be computed in various ways but a common method is to use the cosine similarity described in equation 10. Predictions for new items are computed with equation 14 (see [41] for further details). Cluster-Based Smoothing Xue et al. [59] proposed a collaborative filtering algorithm that provides higher accuracy as well as increased efficiency in recommendations. The algorithm is a user-based algorithm that has been enhanced with clustering and a rating smoothing mechanism based on clustering results. Clustering was performed by using the K-means algorithm with the Pearson correlation coefficient as the distance metric (eq. 3) between users. Data smoothing is a mechanism to fill in the missing values of the rating matrix. To do data smoothing Xue et al. [59] made explicit use of the
FP Lousame and e. sanchez lusters as smoothing mechanisms. Based on the clustering results they applied the following smoothing strategy ifTk≠0 (15) where Fk.i denotes the smoothed value for user A's rating towards an item By considering the diversity of the user. Xue et al. 59 proposed the fol equation to compute the smoothed rating k=下k+4rc(k),=k+ (16) where C(k)denotes the cluster of user k and C(k, j)the subset of users in cluster C(k) that rated item j. Smoothed ratings are used to compute a pre-selection larity between each user l in the neighborhood and the active user is computed ng the smoothed representation of the user rating hl (7)、∑/∈R吃( ere 1-Aifn≠② otherwise represents the confidential weight for the user I on item j. A E0, 1]is a parameter for tuning the weight between the user rating and the cluster rating. Predictions for the active user are computed by aggregating ratings from the top-K most similar users in the same manner as for the user-based algorithm ∑∈n,j·mk.l·(m-) ∑1∈b;·lukl By assigning different values to A Xue et al., 59 adjusted the weighting schema For instance, if a=0 the algorithm only uses the original rating information for the similarity computation and prediction. But if A= l the algorithm is a cluster-based Cf that uses the average ratings of clustering for similarity and prediction computation Trust Inferences This approach focuses on developing a computational model that permits to ex- plore transitive user similarities based on trust inferences. Papagelis et al.[46 presented the concept of associations between users as an expression of established
90 F.P. Lousame and E. S´anchez clusters as smoothing mechanisms. Based on the clustering results they applied the following smoothing strategy r k,j = rk,j if rk,j = ∅ r¯k,j otherwise (15) where ¯rk,j denotes the smoothed value for user k’s rating towards an item j. By considering the diversity of the user, Xue et al. [59] proposed the following equation to compute the smoothed rating: r¯k,j = ¯rk + ΔrC(k),j = ¯rk + 1 |C(k, j)| l∈C(k,j) (rl,j − r¯l) (16) where C(k) denotes the cluster of user k and C(k, j) the subset of users in cluster C(k) that rated item j. Smoothed ratings are used to compute a pre-selection of neighbors. Basically, given the active user k, a set of most similar clusters is selected to build a neighborhood of similar users. After this preselection, the similarity between each user l in the neighborhood and the active user is computed using the smoothed representation of the user ratings, wk,l = j∈Rk δl,j · (r k,j − r¯k)(r l,j − r¯l) j∈Rk (r k,j − r¯k)2 j∈Rk δ2 l,j (r l,j − r¯l)2 (17) where δl,j = 1 − λ if rl,j = λ otherwise (18) represents the confidential weight for the user l on item j. λ ∈ [0, 1] is a parameter for tuning the weight between the user rating and the cluster rating. Predictions for the active user are computed by aggregating ratings from the top-K most similar users in the same manner as for the user-based algorithm (see equation 2): νkj = ¯rk + l∈Uk δl,j · wk,l · (rlj − r¯l) l∈Uk δl,j · |wkl| (19) By assigning different values to λ Xue et al., [59] adjusted the weighting schema. For instance, if λ = 0 the algorithm only uses the original rating information for the similarity computation and prediction. But if λ = 1 the algorithm is a cluster-based CF that uses the average ratings of clustering for similarity and prediction computation. Trust Inferences This approach focuses on developing a computational model that permits to explore transitive user similarities based on trust inferences. Papagelis et al. [46], presented the concept of associations between users as an expression of established