A Taxonomy of Collaborative- Based Recommender Systems 91 trust between each other. This trust is defined in the context of similarity condi- tions and is computed by means of the Pearson correlation(see equation 3). The Imore similar two users are, the greater their established trust would becor While nutation of trust in direct associations is based on user-to-user milarity, for length-K associations a transitive rule is adopted. According to this, trust is propagated in the network and associations between users are built even if they have no co-rated items. IfV=IVi; i= 1, 2,...K is the set of all intermediate nodes in a trust path that connects user k with user l, then their associated inferred trust would be given by kt一k=(k-)由T-v)由…)由TVk1-)Tvk-1(20) The symbol e denotes a special operation that can be best understood for the case of only one intermediate node z 2,=Tk_ eT Bk,z|+1B2/k-2 n1Bz1-21 Bkzl+Bz.ll here Bkz= Rk nRz, Bkz= Rk nRz and +1ifTk-2>0,Tz-1>0 ifTk-z·Tz-1<0 The inferred trust is not applicable if Tk-z<0 and Tz_1<0. In this case the length of the path between users k and l is supposed to be infinite To build a recommendation for the active user, a collection of paths between he user and another trusted users is selected in a first step. Pagagelis et al 46 proposed different selection mechanisms but one of the best approaches was Weighted Average Composition, which computes the trust between any two unconnected users k and l using the following equatione P k→l k1·T where Ck=l expresses the confidence of the association k- l through the path P and the confidence of each direct association k I is assumed to be directly related to the number of co-rated items between the users Ck→t Rk∩B k∩R where umar represents the user who rated most items in common with user k Predictions for unseen items can be computed using equation 2 in which each weight wk. I is given by equation 21
A Taxonomy of Collaborative-Based Recommender Systems 91 trust between each other. This trust is defined in the context of similarity conditions and is computed by means of the Pearson correlation (see equation 3). The more similar two users are, the greater their established trust would become. While computation of trust in direct associations is based on user-to-user similarity, for length-K associations a transitive rule is adopted. According to this, trust is propagated in the network and associations between users are built, even if they have no co-rated items. If V = {Vi;i = 1, 2, ...K} is the set of all intermediate nodes in a trust path that connects user k with user l, then their associated inferred trust would be given by: T V1→...→VK k→l = (((Tk→V1) ⊕ TV1→V2 ) ⊕ ...) ⊕ TVK−1→VK ) ⊕ TVK→l (20) The symbol ⊕ denotes a special operation that can be best understood for the case of only one intermediate node Z: T Z k→l = Tk→Z ⊕ TZ→l = δ · |Bk,Z| |Bk,Z| + |BZ,l| |Tk→Z| + |Bk,Z| |Bk,Z| + |BZ,l| |TZ→l| where Bk,Z = Rk ∩ RZ, Bk,Z = Rk ∩ RZ and δ = +1 if Tk→Z > 0, TZ→l > 0 −1 if Tk→Z · TZ→l < 0 The inferred trust is not applicable if Tk→Z < 0 and TZ→l < 0. In this case the length of the path between users k and l is supposed to be infinite. To build a recommendation for the active user, a collection of paths between the user and another trusted users is selected in a first step. Pagagelis et al. [46] proposed different selection mechanisms but one of the best approaches was Weighted Average Composition, which computes the trust between any two unconnected users k and l using the following equation: Tk→l = 1 |P | i=1 CPi k→l |P | i=1 CPi k→l · T Pi k→l (21) where CPi k→l expresses the confidence of the association k → l through the path Pi, CV1→...VK k→l = ((Ck→V1 · CV1→V2 · ...) · CVK−1→VK ) · CVK→l (22) and the confidence of each direct association k → l is assumed to be directly related to the number of co-rated items between the users: Ck→l = |Rk ∩ Rl| |Rk ∩ Rumax | (23) where umax represents the user who rated most items in common with user k. Predictions for unseen items can be computed using equation 2 in which each weight wk,l is given by equation 21.
92 F.P. Lousame and e. sanchez Improved Neighborhood-Based The success of neighborhood-based algorithms depends on the choice of the interpolation weights (equations 2, 14)which are used to compute unknown ratings from neighboring known ones. But the aforementioned user- and item oriented approaches lack of a rigorous way to derive these weights. Different algorithms use different heuristics to compute these weights and there is not any fundamental justification to choose one or another. Bell and Koren[13 proposed a method to learn interpolation weights directly from the ratings. Their approach improved prediction accuracy by means of two mechanisms: (1)preprocessing the user-item rating matrix removing global effects to make the different ratings more comparable and(2)deriving interpolation weights from the rating matrix. The preprocessing step consists of a set of rating transformations that prepare input data: remove systematic user or item effects(to adjust that some items were mostly rated by users that tend to rate high, etc), adjust ratings using item variables(such as the number of ratings given to an item, the average rating of an item, etc. or adjust ratings by analyzing characteristics( such as date of rating) that may explain some of the variation in ratings. Interpolation weights are computed by modeling the relations between item j and its neighbors through the following optimization problem minw rk.J U'i,j·Tk,i and are used with 14 in order to predict rk. j. Authors reported that this ap- proach can be very successful when combined with model-based approaches that use matrix factorization techniques( see section 2.2). An alternative user-based approach formulation can be derived analogously by simply switching roles of users and items 2.2 Model-Based Collaborative Filtering Model-based collaborative filtering first learns a descriptive model of user prefer ences and then uses it for predicting ratings. Many of these methods are inspired from machine learning algorithms: neural-network classifiers [14, induction rule le 61, Bayesian networks [15, dependency networks 26, latent class models [ 31, 38, principal component analysis [24 and association rule mining 39. Table 2 synthesizes some of the model-based algorithms that are described in next subse Cluster Models and Bayesian Classifiers From a probabilistic perspective, the collaborative filtering task can be viewed as calculating the expected value of the active users rating on an item given what we know about the user: Further information about mathematical formulation of these preprocessing steps can be found in [13
92 F.P. Lousame and E. S´anchez Improved Neighborhood-Based The success of neighborhood-based algorithms depends on the choice of the interpolation weights (equations 2, 14) which are used to compute unknown ratings from neighboring known ones. But the aforementioned user- and itemoriented approaches lack of a rigorous way to derive these weights. Different algorithms use different heuristics to compute these weights and there is not any fundamental justification to choose one or another. Bell and Koren [13] proposed a method to learn interpolation weights directly from the ratings. Their approach improved prediction accuracy by means of two mechanisms: (1) preprocessing the user-item rating matrix removing global effects to make the different ratings more comparable and (2) deriving interpolation weights from the rating matrix. The preprocessing step consists of a set of rating transformations that prepare input data: remove systematic user or item effects (to adjust that some items were mostly rated by users that tend to rate high, etc), adjust ratings using item variables (such as the number of ratings given to an item, the average rating of an item, etc.) or adjust ratings by analyzing characteristics (such as date of rating) that may explain some of the variation in ratings2. Interpolation weights are computed by modeling the relations between item j and its neighbors through the following optimization problem: minw k,j /∈Rk rk,j − i∈Rk wi,j · rk,i 2 (24) and are used with 14 in order to predict rk,j . Authors reported that this approach can be very successful when combined with model-based approaches that use matrix factorization techniques (see section 2.2). An alternative user-based approach formulation can be derived analogously by simply switching roles of users and items. 2.2 Model-Based Collaborative Filtering Model-based collaborative filtering first learns a descriptive model of user preferences and then uses it for predicting ratings. Many of these methods are inspired from machine learning algorithms: neural-network classifiers [14], induction rule learning [61], Bayesian networks [15], dependency networks [26], latent class models [31, 38], principal component analysis [24] and association rule mining [39]. Table 2 synthesizes some of the model-based algorithms that are described in next subsections. Cluster Models and Bayesian Classifiers From a probabilistic perspective, the collaborative filtering task can be viewed as calculating the expected value of the active user’s rating on an item given what we know about the user: 2 Further information about mathematical formulation of these preprocessing steps can be found in [13]
A Taxonomy of Collaborative-Based Recommender Syster Table 2. Different model-based algorithms based on the different components of the recommendation process: data preprocessing, model building and prediction computation Data processing Model building uttering Bayesian Instance-based Probabilistic presentation aggregation Dependency networks Latent ary preference atent class models class models representation Probabilistic selection SVD Low dimensional representation spacen in the reduced Instance-based Probabilistic representation Naive Bayes classifier Associati Association rule mining support and confidence representation Eigentaste Most frequent item PMCF Generative probabilistic model Dk.j p( ark) where the probability expression is the probability that the active user will have a particular rating to item j given the previously observed ratings rk Tk.i,ie Rk. Character z denotes rating values in interval [min, Tmar Breese et al. [15 presented two different probabilistic models for computing p(rk j= rTk, i, i E Rk). In the first algorithm, users are clustered using the conditional Bayesian probability based on the idea that there are certain groups that capture common sets of user preferences. The probability of observing a user belonging to a particular cluster cs EC=C1, C2, .CK) given certain set of item ratings rk is estimated from the probability distribution of ratings in each cluster p(cs,Ik)=p(cs)IIP(K, iIcs) The clustering solution(parameters p(cs)and p(rk iIcs )) is computed from data sing the expect The second algorithm is based on Bayesian network models where each item in the database is modeled as a node having states corresponding to the rating of that item. The learning problem consists of building a network on these nodes such that each node has a set of parent nodes that are the best predictors for the child's rating. They presented a detailed comparison of these two model-based
A Taxonomy of Collaborative-Based Recommender Systems 93 Table 2. Different model-based algorithms based on the different components of the recommendation process: data preprocessing, model building and prediction computation Data processing Model building Prediction computation Bayesian networks Instance-based representation · Probabilistic clustering → EM fitting · Bayesian classifier · Dependency networks · Probabilistic aggregation Latent class models Binary preference representation · Latent class models → EM fitting · Probabilistic selection SVD Low dimensional representation → SVD · Neighborhood formation in the reduced space → User-based Simple Bayesian classifier Instance-based representation · Naive Bayes classifier · Probabilistic classification Association rule mining · Binary rating representation · Instance-based representation Association rule mining · Selection based on support and confidence of rules Eigentaste PCA rating transformation Low dimensionality reduction Recursive rectangular clustering · Most frequent item PMCF Generative probabilistic model · Probabilistic aggregation νk,j = s p(rk,j = x|rk) · x (25) where the probability expression is the probability that the active user will have a particular rating to item j given the previously observed ratings rk = {rk,i, i ∈ Rk}. Character x denotes rating values in interval [rmin, rmax]. Breese et al. [15] presented two different probabilistic models for computing p(rk,j = x|rk,i, i ∈ Rk). In the first algorithm, users are clustered using the conditional Bayesian probability based on the idea that there are certain groups that capture common sets of user preferences. The probability of observing a user belonging to a particular cluster cs ∈ C = {C1, C2, ...CK} given certain set of item ratings rk is estimated from the probability distribution of ratings in each cluster: p(cs, rk) = p(cs) i p(rk,i|cs) (26) The clustering solution (parameters p(cs) and p(rk,i|cs)) is computed from data using the expectation maximization (EM) algorithm. The second algorithm is based on Bayesian network models where each item in the database is modeled as a node having states corresponding to the rating of that item. The learning problem consists of building a network on these nodes such that each node has a set of parent nodes that are the best predictors for the child’s rating. They presented a detailed comparison of these two model-based