TagiCoFi:Tag Informed Collaborative Filtering Yi Zhen Wu-Jun Li Dit-Yan Yeung Department of Computer Department of Computer Department of Computer Science and Engineering Science and Engineering Science and Engineering Hong Kong University of Hong Kong University of Hong Kong University of Science and Technology Science and Technology Science and Technology Hong Kong,China Hong Kong,China Hong Kong,China yzhen@cse.ust.hk liwujun@cse.ust.hk dyyeung@cse.ust.hk ABSTRACT available.Some representative examples include product Besides the rating information,an increasing number of mod- recommendation in Amazon.com 14,movie recommenda- ern recommender systems also allow the users to add per- tion in Netflix [3]and MovieLens'[16],reference recommen- sonalized tags to the items.Such tagging information may dation in CiteULike2,and bookmark recommendation in provide very useful information for item recommendation, Del.icio.us3.Existing recommender systems can be roughly because the users'interests in items can be implicitly re divided into two major categories 1.Content-based sys- flected by the tags that they often use.Although some tems 2,12,15 make use of profiles of the users or products content-based recommender systems have made preliminary to characterize their nature.On the other hand,systems attempts recently to utilize tagging information to improve based on collaborative filtering (CF)[4,9,16,17,19 do not the recommendation performance,few recommender systems exploit explicit user profiles but only past activities of the based on collaborative filtering (CF)have employed tagging users,such as their transaction history or product satisfac- information to help the item recommendation procedure. tion expressed in ratings,to predict the future activities of In this paper,we propose a novel framework,called tag the users.In recent years.CF-based systems have become informed collaborative filtering (TagiCoFi),to seamlessly in- more and more popular than content-based systems because tegrate tagging information into the CF procedure.Experi- it is much easier to collect the past activities of users than mental results demonstrate that TagiCoFi outperforms its their profiles due to privacy considerations. counterpart which discards the tagging information even In recent years,besides the ratings on the items given when it is available,and achieves state-of-the-art perfor- by the users.an increasing number of modern recommender systems also allow the users to add personalized tags4,in mance. the form of words or phrases,to the items.For example, users may add tags to movies in MovieLens,to web sites in Categories and Subject Descriptors Del.icio.us and to references in CiteULike.Such tagging in- H.3 Information Storage and Retrieval):Information formation may provide very useful information for item rec- Search and Retrieval-Information Filtering:H.2 [Database ommendation,because the users'interests in items can be Management:Database Application-Data Mining implicitly reflected by the tags that they often use 21.For example,if two users often use the tags "Oscar"and "Tom General Terms Hanks”,both of them may like the movie“Forrest Gump'” In fact,the effectiveness of tags in representing users'prefer- Algorithms ence or interests has been validated by Zanardi et al.in the CiteULike dataset [27.Very recently,some content-based Keywords systems,such as those in 6,22,23,have made some pre- Collaborative filtering,recommender systems,tag liminary attempts to utilize tagging information to improve the recommendation performance.However,there has been little work on improving CF-based systems with the help of 1.INTRODUCTION tagging information.Because CF-based systems have be- Since the amount of information on the Web is increasing come more popular than content-based systems,it would be at an astonishing rate that is much faster than our ability a very worthwhile endeavor to devise novel CF techniques to process it,recommendation plays a more and more im- which can also utilize tagging information for item recom- portant role for us to make effective use of the information mendation Existing CF methods can be divided into two main cat- http://movielens.umn.edu/ Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are http://www.citeulike.org/ not made or distributed for profit or commercial advantage and that copies http://delicious.com/ bear this notice and the full citation on the first page.To copy otherwise,to AIt should be emphasized that the setting in this paper is republish,to post on servers or to redistribute to lists,requires prior specific different from those about tag recommendation 7,24 in permission and or a tee. which the recommended objects are tags.The recommended RecSys'09.October 23-25.2009.New York.New York.USA. objects in this paper are called items,whereas tags are other Copyright2009ACM978-1-60558-435-5/09/10.$10.00. objects about the items added by users
TagiCoFi: Tag Informed Collaborative Filtering Yi Zhen Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong, China yzhen@cse.ust.hk Wu-Jun Li Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong, China liwujun@cse.ust.hk Dit-Yan Yeung Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong, China dyyeung@cse.ust.hk ABSTRACT Besides the rating information, an increasing number of modern recommender systems also allow the users to add personalized tags to the items. Such tagging information may provide very useful information for item recommendation, because the users’ interests in items can be implicitly re- flected by the tags that they often use. Although some content-based recommender systems have made preliminary attempts recently to utilize tagging information to improve the recommendation performance, few recommender systems based on collaborative filtering (CF) have employed tagging information to help the item recommendation procedure. In this paper, we propose a novel framework, called tag informed collaborative filtering (TagiCoFi), to seamlessly integrate tagging information into the CF procedure. Experimental results demonstrate that TagiCoFi outperforms its counterpart which discards the tagging information even when it is available, and achieves state-of-the-art performance. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: Information Search and Retrieval—Information Filtering; H.2 [Database Management]: Database Application—Data Mining General Terms Algorithms Keywords Collaborative filtering, recommender systems, tag 1. INTRODUCTION Since the amount of information on the Web is increasing at an astonishing rate that is much faster than our ability to process it, recommendation plays a more and more important role for us to make effective use of the information Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. RecSys’09, October 23–25, 2009, New York, New York, USA. Copyright 2009 ACM 978-1-60558-435-5/09/10 ...$10.00. available. Some representative examples include product recommendation in Amazon.com [14], movie recommendation in Netflix [3] and MovieLens1 [16], reference recommendation in CiteULike2 , and bookmark recommendation in Del.icio.us3 . Existing recommender systems can be roughly divided into two major categories [1]. Content-based systems [2, 12, 15] make use of profiles of the users or products to characterize their nature. On the other hand, systems based on collaborative filtering (CF) [4, 9, 16, 17, 19] do not exploit explicit user profiles but only past activities of the users, such as their transaction history or product satisfaction expressed in ratings, to predict the future activities of the users. In recent years, CF-based systems have become more and more popular than content-based systems because it is much easier to collect the past activities of users than their profiles due to privacy considerations. In recent years, besides the ratings on the items given by the users, an increasing number of modern recommender systems also allow the users to add personalized tags4 , in the form of words or phrases, to the items. For example, users may add tags to movies in MovieLens, to web sites in Del.icio.us and to references in CiteULike. Such tagging information may provide very useful information for item recommendation, because the users’ interests in items can be implicitly reflected by the tags that they often use [21]. For example, if two users often use the tags “Oscar” and “Tom Hanks”, both of them may like the movie “Forrest Gump”. In fact, the effectiveness of tags in representing users’ preference or interests has been validated by Zanardi et al. in the CiteULike dataset [27]. Very recently, some content-based systems, such as those in [6, 22, 23], have made some preliminary attempts to utilize tagging information to improve the recommendation performance. However, there has been little work on improving CF-based systems with the help of tagging information. Because CF-based systems have become more popular than content-based systems, it would be a very worthwhile endeavor to devise novel CF techniques which can also utilize tagging information for item recommendation. Existing CF methods can be divided into two main cat- 1 http://movielens.umn.edu/ 2 http://www.citeulike.org/ 3 http://delicious.com/ 4 It should be emphasized that the setting in this paper is different from those about tag recommendation [7, 24] in which the recommended objects are tags. The recommended objects in this paper are called items, whereas tags are other objects about the items added by users
egories [10].Memory-based methods,such as [9,19],try U E RDxN and V E RDxM,where typically D N,M. to predict new ratings by (weighted)averaging the ratings and use R UTV to approximate the rating matrix R. of similar users or items.On the other hand.model-based The column vectors U.i and V.;represent the user-specific methods,such as probabilistic matrix factorization(PMF)[17 and item-specific latent feature vectors,respectively. try to learn a model from data using statistical learning tech- Let Z be the tagging matrix,and each of its elements Zik niques.To the best of our knowledge,there exists only one is the tf*idf value of user i and tag k [18,23]: CF method 25 which attempts to utilize tagging informa- tion to improve item recommendation.This method is a Zk=tf(i,R)×log2 memory-based one.The experimental results in 25 show df() (1) that little improvement could be achieved on item recom- mendation by integrating tagging information into the CF where tf(i,k)is the normalized frequency of tag k appeared procedure under the memory-based framework. in user i's tagging history and df(k)is the number of users In this paper,we propose a novel framework,called tag who have used tag k. informed collaborative filtering (TagiCoFi),to seamlessly in- tegrate tagging information into the model-based CF proce- 2.2 Probabilistic Matrix Factorization dure.More specifically,we use tagging information to regu- PMF [17]seeks to derive the aforementioned low-rank ma- larize the matrix factorization (MF)procedure of PMF [17] trices U and V by analyzing the rating matrix R in a prob- which has been demonstrated to be one of the state-of-the- abilistic framework.The likelihood of the observed ratings art CF methods.Some promising properties of TagiCoFi R is defined as follows: are highlighted here: w To the best of our knowledge,TagiCoFi is the first p(R U,V,o)= iWR,1uV,2)],2② 1 work that incorporates tagging information into a model- based CF system for item recommendation. where N(ru,o2)denotes the (univariate)Gaussian distri- TagiCoFi outperforms its counterpart,PMF,which bution with mean u and variance o discards the tagging information even when it is avail- Putting zero-mean spherical Gaussian priors on the user- able.This shows that the tagging information does specific and item-specific feature vectors: contain useful information for item recommendation and TagiCoFi can utilize it very effectively. p(Ul)=IN(U.:10,I) TagiCoFi can overcome,or at least alleviate,the over- fitting problem 17 suffered by most MF-based CF methods due to the sparsity of the rating matrix. p(Vlav)= ΠN(V|0,I, TagiCoFi can solve the cold-start problem 8,11,20 in that it can give recommendations to novel users who we can obtain the marimum a posteriori (MAP)estimates have no preference on any items. of U and V by minimizing the following objective function defined based on the sum of squared errors: The rest of this paper is organized as follows.In Section 2, N M we will introduce the notations and some preliminaries.Sec- tion 3 describes the details of our model.Experimental re- E= 2∑∑,(R,-UgVP =1j=1 sults are presented in Section 4 and,finally,we conclude the paper in Section 5. +(UTU)+t(VTV). 2 2 (3) 2.NOTATIONS AND PRELIMINARIES where Au =02/ai and Av =a2/av In this section,we first introduce some notations used in this paper.We then briefly review PMF [17 which is closely 3. TAG INFORMED COLLABORATIVE FIL- related to our work. TERING 2.1 Notations Because PMF [17]has achieved state-of-the-art perfor- We use boldface uppercase letters,such as A,to denote mance for CF tasks,we use it as the base model to make matrices,and boldface lowercase letters,such as b,to denote further enhancement by integrating tagging information in a vectors.The ith row and the jth column of a matrix A are principled way.The result is our tag informed collaborative denoted as Ai and A.,respectively.The (i,j)th element filtering method,which will be abbreviated as TagiCoFi in of A is denoted as Aij and the ith element of b as bi. the sequel.The key idea of TagiCoFi is to use tagging in- Suppose there are N users,M items and K tags.Let R be formation to regularize the MF procedure of PMF.More the rating matrix in which Ri;represents the rating of user i specifically,we seek to make two user-specific latent feature for item j.The matrix R is sparse because many elements vectors as similar as possible if the two users have similar are missing,and each such element Rij is assigned the value tagging history. of 0 to indicate that item j has not been rated by user i. In the rest of this section,we first introduce some met- Y is the indicator matrix where Yij is an indicator variable rics for characterizing the similarity between users based on which is equal to 1 if user i rated item j and 0 otherwise tagging information.We then propose our TagiCoFi model MF-based methods 17 seek to find two low-rank matrices based on the computed user similarities
egories [10]. Memory-based methods, such as [9, 19], try to predict new ratings by (weighted) averaging the ratings of similar users or items. On the other hand, model-based methods, such as probabilistic matrix factorization (PMF) [17], try to learn a model from data using statistical learning techniques. To the best of our knowledge, there exists only one CF method [25] which attempts to utilize tagging information to improve item recommendation. This method is a memory-based one. The experimental results in [25] show that little improvement could be achieved on item recommendation by integrating tagging information into the CF procedure under the memory-based framework. In this paper, we propose a novel framework, called tag informed collaborative filtering (TagiCoFi), to seamlessly integrate tagging information into the model-based CF procedure. More specifically, we use tagging information to regularize the matrix factorization (MF) procedure of PMF [17] which has been demonstrated to be one of the state-of-theart CF methods. Some promising properties of TagiCoFi are highlighted here: • To the best of our knowledge, TagiCoFi is the first work that incorporates tagging information into a modelbased CF system for item recommendation. • TagiCoFi outperforms its counterpart, PMF, which discards the tagging information even when it is available. This shows that the tagging information does contain useful information for item recommendation and TagiCoFi can utilize it very effectively. • TagiCoFi can overcome, or at least alleviate, the over- fitting problem [17] suffered by most MF-based CF methods due to the sparsity of the rating matrix. • TagiCoFi can solve the cold-start problem [8, 11, 20] in that it can give recommendations to novel users who have no preference on any items. The rest of this paper is organized as follows. In Section 2, we will introduce the notations and some preliminaries. Section 3 describes the details of our model. Experimental results are presented in Section 4 and, finally, we conclude the paper in Section 5. 2. NOTATIONS AND PRELIMINARIES In this section, we first introduce some notations used in this paper. We then briefly review PMF [17] which is closely related to our work. 2.1 Notations We use boldface uppercase letters, such as A, to denote matrices, and boldface lowercase letters, such as b, to denote vectors. The ith row and the jth column of a matrix A are denoted as Ai∗ and A∗j , respectively. The (i, j)th element of A is denoted as Aij and the ith element of b as bi. Suppose there are N users, M items and K tags. Let R be the rating matrix in which Rij represents the rating of user i for item j. The matrix R is sparse because many elements are missing, and each such element Rij is assigned the value of 0 to indicate that item j has not been rated by user i. Y is the indicator matrix where Yij is an indicator variable which is equal to 1 if user i rated item j and 0 otherwise. MF-based methods [17] seek to find two low-rank matrices U ∈ R D×N and V ∈ R D×M, where typically D N, M, and use Rˆ = UT V to approximate the rating matrix R. The column vectors U∗i and V∗j represent the user-specific and item-specific latent feature vectors, respectively. Let Z be the tagging matrix, and each of its elements Zik is the tf*idf value of user i and tag k [18, 23]: Zik = tf(i, k) × log2 „ N df(k) « , (1) where tf(i, k) is the normalized frequency of tag k appeared in user i’s tagging history and df(k) is the number of users who have used tag k. 2.2 Probabilistic Matrix Factorization PMF [17] seeks to derive the aforementioned low-rank matrices U and V by analyzing the rating matrix R in a probabilistic framework. The likelihood of the observed ratings R is defined as follows: p(R | U, V, σ 2 ) = YN i=1 YM j=1 h N (Rij | U T ∗iV∗j , σ 2 ) iYij , (2) where N (x | µ, σ2 ) denotes the (univariate) Gaussian distribution with mean µ and variance σ 2 . Putting zero-mean spherical Gaussian priors on the userspecific and item-specific feature vectors: p(U | σ 2 U ) = YN i=1 N (U∗i | 0, σ 2 U I) p(V | σ 2 V ) = YM j=1 N (V∗j | 0, σ 2 V I), we can obtain the maximum a posteriori (MAP) estimates of U and V by minimizing the following objective function defined based on the sum of squared errors: E = 1 2 XN i=1 XM j=1 Yij (Rij − U T ∗iV∗j ) 2 + λU 2 tr(U T U) + λV 2 tr(V T V), (3) where λU = σ 2 /σ2 U and λV = σ 2 /σ2 V . 3. TAG INFORMED COLLABORATIVE FILTERING Because PMF [17] has achieved state-of-the-art performance for CF tasks, we use it as the base model to make further enhancement by integrating tagging information in a principled way. The result is our tag informed collaborative filtering method, which will be abbreviated as TagiCoFi in the sequel. The key idea of TagiCoFi is to use tagging information to regularize the MF procedure of PMF. More specifically, we seek to make two user-specific latent feature vectors as similar as possible if the two users have similar tagging history. In the rest of this section, we first introduce some metrics for characterizing the similarity between users based on tagging information. We then propose our TagiCoFi model based on the computed user similarities
3.1 Tag-based User Similarity Measures where Sj is the tag-based similarity between user i and We introduce several possible measures for characterizing user j computed based on one of the measures defined in user similarities based on the tagging matrix Z.Here,Tij Section 3.1,C =D-S is known as the Laplacian matrix [5] denotes the index set of tags which are used by both user i with D being a diagonal matrix whose diagonal elements and user j. D=;Si,and tr()denotes the trace of a matrix. To integrate tagging information into the CF procedure 3.1.I Cosine Similarity TagiCoFi combines the criteria (9)and (10)to give the fol- The cosine similarity is defined as follows: lowing objective function for minimization: ∑kETy ZikZjk (4) √∑krZ疏V∑kerZ保 (n-u v) 3.1.2 Pearson Similarity )+(VV)] + The Pearson correlation coefficient between two users is c]. (11) defined as follows: where B is an additional regularization parameter to control p1(i,j)= ∑ker(Zk-Z)(Zk-Z) (5) the contribution from the tagging information. V∑keTy(Zk-Z)PV∑ker(Zk-Z)2 The formulation in (11)can be seen as an adaptation of relation reqularized matrir factorization(RRMF)[13]which where Zi= The Pearson similarity is then models relational data containing both relation information and content information.The main difference between Tagi- defined as: CoFi and RRMF is that TagiCoFi can handle missing data. s=1+exp(-p6,刃 (6) which is one of the key characteristics of CF. 3.1.3 Euclidean-based Similarity 3.3 Learning The objective function in (11)can be rewritten as follows: The Euclidean distance between two users is defined as follows: NM p2(i,)= (Zik-Zjk)2. (7) =1=1 The Euclidean-based similarity is then defined as: +U(aI+Bc)UT] S号e=exp [P(色,]2 +vv]. (12) 2σ2 (8) where I is the identity matrix.We use an alternating gra- where o is a user-controlled parameter. dient descent procedure to optimize (12).More specifically, each time we fix one variable (U or V)and minimize the 3.2 Model Formulation of TagiCoFi objective function with respect to the other one (V or U). Like in PMF [17].we adopt a similar MF procedure to find This procedure is repeated for several iterations until some U and V by minimizing the following criterion function: termination condition is satisfied. To learn U,we first rewrite (12)as follows: N A ∑2,(R-uvP+ruU+vv], f=g+h+C. (13) =1j=1 where C is a constant independent of U,and where a is a regularization parameter for complexity control. Furthermore,TagiCoFi employs the user similarities de- N M fined based on the tagging information to regularize the MF 9=∑∑,(B-UgVg procedure,with the goal to make the user-specific latent fea- i=1j=1 ture vectors as similar as possible if the corresponding users D have similar tagging history.We can achieve this goal by h=号∑Ua.(al+Bc)ui minimizing the following criterion function: 2 (14) d=1 NN From (14),we can see that the rows of U in h are decou- pled.Hence,we apply gradient descent to optimize one row =1=1 of U at a time with the other rows fixed. Because d=1 =(∑)a- D 1=1 =∑Ua.cUa >YijVa(Ris -Un:V.;+Ua:Vaj),(15) =tr(UCU), (10) j=1
3.1 Tag-based User Similarity Measures We introduce several possible measures for characterizing user similarities based on the tagging matrix Z. Here, T ij denotes the index set of tags which are used by both user i and user j. 3.1.1 Cosine Similarity The cosine similarity is defined as follows: S cos ij = P k∈T ij ZikZjk qP k∈T ij Z2 ikqP k∈T ij Z2 jk . (4) 3.1.2 Pearson Similarity The Pearson correlation coefficient between two users is defined as follows: ρ1(i, j) = P k∈T ij (Zik − Z¯i)(Zjk − Z¯j ) qP k∈T ij (Zik − Z¯i) 2 qP k∈T ij (Zjk − Z¯j ) 2 , (5) where Z¯i = P k∈T ij Zik |T ij | . The Pearson similarity is then defined as: S pea ij = 1 1 + exp(−ρ1(i, j)). (6) 3.1.3 Euclidean-based Similarity The Euclidean distance between two users is defined as follows: ρ2(i, j) = s X k∈T ij (Zik − Zjk) 2. (7) The Euclidean-based similarity is then defined as: S euc ij = exp „ − [ρ2(i, j)]2 2σ2 « , (8) where σ is a user-controlled parameter. 3.2 Model Formulation of TagiCoFi Like in PMF [17], we adopt a similar MF procedure to find U and V by minimizing the following criterion function: 1 2 XN i=1 XM j=1 Yij (Rij − U T ∗iV∗j ) 2 + α 2 h tr(U T U) + tr(V T V) i , (9) where α is a regularization parameter for complexity control. Furthermore, TagiCoFi employs the user similarities de- fined based on the tagging information to regularize the MF procedure, with the goal to make the user-specific latent feature vectors as similar as possible if the corresponding users have similar tagging history. We can achieve this goal by minimizing the following criterion function: f1 = 1 2 XN i=1 XN j=1 SijkU∗i − U∗jk 2 = 1 2 XN i=1 XN j=1 h Sij XD d=1 (Udi − Udj ) 2 i = XD d=1 Ud∗LU T d∗ = tr(ULU T ), (10) where Sij is the tag-based similarity between user i and user j computed based on one of the measures defined in Section 3.1, L = D−S is known as the Laplacian matrix [5] with D being a diagonal matrix whose diagonal elements Dii = P j Sij , and tr(·) denotes the trace of a matrix. To integrate tagging information into the CF procedure, TagiCoFi combines the criteria (9) and (10) to give the following objective function for minimization: f = 1 2 XN i=1 XM j=1 Yij (Rij − U T ∗iV∗j ) 2 + α 2 h tr(U T U) + tr(V T V) i + β 2 h tr(ULU T ) i , (11) where β is an additional regularization parameter to control the contribution from the tagging information. The formulation in (11) can be seen as an adaptation of relation regularized matrix factorization (RRMF) [13] which models relational data containing both relation information and content information. The main difference between TagiCoFi and RRMF is that TagiCoFi can handle missing data, which is one of the key characteristics of CF. 3.3 Learning The objective function in (11) can be rewritten as follows: f = 1 2 XN i=1 XM j=1 Yij (Rij − U T ∗iV∗j ) 2 + 1 2 trh U(αI + βL)U T i + α 2 h tr(V T V) i , (12) where I is the identity matrix. We use an alternating gradient descent procedure to optimize (12). More specifically, each time we fix one variable (U or V) and minimize the objective function with respect to the other one (V or U). This procedure is repeated for several iterations until some termination condition is satisfied. To learn U, we first rewrite (12) as follows: f = g + h + C, (13) where C is a constant independent of U, and g = 1 2 XN i=1 XM j=1 Yij (Rij − U T ∗iV∗j ) 2 h = 1 2 XD d=1 Ud∗(αI + βL)U T d∗. (14) From (14), we can see that the rows of U in h are decoupled. Hence, we apply gradient descent to optimize one row of U at a time with the other rows fixed. Because ∂g ∂Udi = “XM j=1 YijV 2 dj” Udi− XM j=1 YijVdj (Rij − U T ∗iV∗j + UdiVdj ), (15)
we have 1.How does TagiCoFi perform in real applications when ag=WUN.-x. compared with state-of-the-art methods? (16) aUa. 2.How effective are the different user similarity mea- where W is an Nx N diagonal matrix with W=YV, sures? and x is an N x 1 vector with i=YijVa(Rij- 3.How does tagging information improve collaborative UnV.;+UaiVa). filtering? Then,we can get 4.How does the number of latent features used affect the ag+00: performance of TagiCoFi? aUa∂Uu* =(W+aI+BC)U2.-x. 5.Does TagiCoFi work for users without any training (17) ratings? The learning process of V is different from that of U These questions are answered separately:question 1 in because the columns (not rows)of V are decoupled.Hence Section 4.3,questions 2-4 in Section 4.4 as three different we apply gradient descent to optimize one column of V at subsubsections,and question 5 in Section 4.5. a time with the other columns fixed.The gradient can be computed as follows: 4.1 Data Set af We evaluate our algorithm on the MovieLens datasets V对 =(al+YUUa)V-】 Yig RijU.i.(18) which,as far as we know,is the only publicly available dataset containing both tagging and rating information. The overall learning procedure of TagiCoFi is summarized We first prune the dataset for our analysis.For the tag- in Algorithm 1 below. ging information,we only keep those tags which are added on at least three distinct movies.As for the users,we only keep those users who used at least 3 distinct tags in their Algorithm 1 Learning procedure of TagiCoFi tagging history.For movies,we only keep those movies that 1:INPUT: are annotated by at least 3 distinct tags.It should be em- R-rating matrix phasized that our model still works under situations where Z-tagging matrix there are users or movies with rating information only but D-number of latent features no tagging information.For those users without any tag- W-number of iterations ging information,the tag-based similarities between them 6-step size for gradient descent and the other users are 0,which means that the last term 2:Compute user similarity matrix S based on Z in (11)will have no effect on those users.Subsequently,the 3:Compute Laplacian matrix C based on S recommendation result for those users without tagging in- 4:Initialize U,Vo formation only depends on the MF procedure of the rating 5:for w=1 to W do matrix.which is similar to the result of PMF.As the focus 6: for d=1 to D do of this paper is on evaluating the effectiveness of tagging 7: U2-U -6. information in addition to rating information,we only keep g end for the users who have both rating history and tagging history 9: for j=1 to M do in the original rating records. 10: V号-V写-i We obtain two kinds of records after pruning,the tag- 11: end for ging records and the rating records.The tagging records in- 12:end for clude 13,431 tagging applications contributed by 757 users 13:return UW,Vw with 2,271 distinct tags.Based on the tagging records,we construct the tagging matrix Z.whose elements are defined by Equation (1)in Section 2.1.The rating records include 3.4 Complexity Analysis 167,474 ratings rated by 757 users (the same as those in the tagging records)on 9,485 movies,and based on these rating The main computation of TagiCoFi is to evaluate the gra records we construct the rating matrix R.More statistics dients of the objective function with respect to the latent about the rating matrix R are shown in Table 1,where the variables and to compute the user similarities.The time numbers behind+denote the standard deviations. complexity of computing the gradient is(ND))and that ofis(NMD).The time complexity of comput Table 1:Description of rating data ing the user similarities and C is O(N2K).Hence,the time Statistics Users Movies complexity of the entire alternating gradient descent proce- Min.of ratings 20 dure is O(W(N2D+NMD)+N2K). Max.of ratings 2.634 625 Mean#of ratings441.95士420.8835.27士67.30 4.EXPERIMENTAL EVALUATION We have conducted several experiments to compare the performance of our method with that of other methods 5http://www.grouplens.org/node/73 Through the experiments,we have tried to answer the fol- 6If user i adds tag k on item j,we say this is a tagging lowing questions: application
we have ∂g ∂Ud∗ = WU T d∗ − x, (16) where W is an N×N diagonal matrix with Wii = PM j=1 YijV 2 dj , and x is an N × 1 vector with xi = PM j=1 YijVdj (Rij − UT ∗iV∗j + UdiVdj ). Then, we can get ∂f ∂Ud∗ = ∂g ∂Ud∗ + ∂h ∂Ud∗ = (W + αI + βL)U T d∗ − x. (17) The learning process of V is different from that of U, because the columns (not rows) of V are decoupled. Hence, we apply gradient descent to optimize one column of V at a time with the other columns fixed. The gradient can be computed as follows: ∂f ∂V∗j = “ αI + XN i=1 YijU∗iU T ∗i ” V∗j − XN i=1 YijRijU∗i. (18) The overall learning procedure of TagiCoFi is summarized in Algorithm 1 below. Algorithm 1 Learning procedure of TagiCoFi 1: INPUT: R – rating matrix Z – tagging matrix D – number of latent features W – number of iterations δ – step size for gradient descent 2: Compute user similarity matrix S based on Z 3: Compute Laplacian matrix L based on S 4: Initialize U0 , V0 5: for w = 1 to W do 6: for d = 1 to D do 7: Uw d∗ ← Uw−1 d∗ − δ ∂f ∂Ud∗ 8: end for 9: for j = 1 to M do 10: Vw ∗j ← Vw−1 ∗j − δ ∂f ∂V∗j 11: end for 12: end for 13: return UW , VW 3.4 Complexity Analysis The main computation of TagiCoFi is to evaluate the gradients of the objective function with respect to the latent variables and to compute the user similarities. The time complexity of computing the gradient ∂f ∂Ud∗ is O(N 2D)) and that of ∂f ∂V∗j is O(NMD). The time complexity of computing the user similarities and L is O(N 2K). Hence, the time complexity of the entire alternating gradient descent procedure is O(W(N 2D + NMD) + N 2K). 4. EXPERIMENTAL EVALUATION We have conducted several experiments to compare the performance of our method with that of other methods. Through the experiments, we have tried to answer the following questions: 1. How does TagiCoFi perform in real applications when compared with state-of-the-art methods? 2. How effective are the different user similarity measures? 3. How does tagging information improve collaborative filtering? 4. How does the number of latent features used affect the performance of TagiCoFi? 5. Does TagiCoFi work for users without any training ratings? These questions are answered separately: question 1 in Section 4.3, questions 2–4 in Section 4.4 as three different subsubsections, and question 5 in Section 4.5. 4.1 Data Set We evaluate our algorithm on the MovieLens dataset5 , which, as far as we know, is the only publicly available dataset containing both tagging and rating information. We first prune the dataset for our analysis. For the tagging information, we only keep those tags which are added on at least three distinct movies. As for the users, we only keep those users who used at least 3 distinct tags in their tagging history. For movies, we only keep those movies that are annotated by at least 3 distinct tags. It should be emphasized that our model still works under situations where there are users or movies with rating information only but no tagging information. For those users without any tagging information, the tag-based similarities between them and the other users are 0, which means that the last term in (11) will have no effect on those users. Subsequently, the recommendation result for those users without tagging information only depends on the MF procedure of the rating matrix, which is similar to the result of PMF. As the focus of this paper is on evaluating the effectiveness of tagging information in addition to rating information, we only keep the users who have both rating history and tagging history in the original rating records. We obtain two kinds of records after pruning, the tagging records and the rating records. The tagging records include 13,431 tagging applications6 contributed by 757 users with 2,271 distinct tags. Based on the tagging records, we construct the tagging matrix Z, whose elements are defined by Equation (1) in Section 2.1. The rating records include 167,474 ratings rated by 757 users (the same as those in the tagging records) on 9,485 movies, and based on these rating records we construct the rating matrix R. More statistics about the rating matrix R are shown in Table 1, where the numbers behind ± denote the standard deviations. Table 1: Description of rating data Statistics Users Movies Min. # of ratings 20 1 Max. # of ratings 2,634 625 Mean # of ratings 441.95 ± 420.88 35.27 ± 67.30 5 http://www.grouplens.org/node/73 6 If user i adds tag k on item j, we say this is a tagging application
4.2 Evaluation Metric Table 3:p-values for the significance tests For consistency with experiments reported in the litera- )5 D=10 D=20 ture,we use the Mean Absolute Error (MAE)as evaluation 20%Training3.91×10-5 8.27×10-7 1.04×10-16 metric.MAE gives the average absolute deviation of predic- 40%Training 4.11×10-8 2.10×10-16 4.52×10-16 tion from the ground truth: 60%Training 1.35 x 10-11 4.20×10-12 4.15×10-12 MAE=∑:∑YlR-l 80%Training1.24×10-8 2.85×1025.99×10-12 ∑,写 where Ri;and Ra are the true and predicted rating values, In order to compare TagiCoFi with PMF more thoroughly, respectively.A smaller value of MAE indicates a better we compare their performance on users with different num- performance bers of observed ratings.The results are shown in Figure 1, In our experiments,we randomly split the rating records from which we can find that TagiCoFi outperforms PMF for into two parts,each of which contains 50%of the observa- all users and the improvement is more significant for users tions in the rating matrix.One part is used as the test set with only few observed ratings.This is a very promising which is kept the same for all experiments.The other part property of TagiCoFi because those users with a small num- is used as a pool from which training sets are generated.For ber of ratings are typically new customers who have just example,a training set size of 20%means that 20%of the started to use the system.If we can provide good recom- records are randomly selected from the pool to form a train- mendation to them,we will have a higher chance to keep ing set.For each training set size,we randomly generate them as our long-term customers.Otherwise we will likely 10 different training sets based on which 10 experiments are lose them performed and the average result is reported. 4.3 Performance --20是Training In this section,we compare our method with PMF which 80号T工a1n1ng has been demonstrated to be one of the state-of-the-art CF methods 17.For fairness,we perform parameter tuning in advance for each method and then use the best settings found in all the experiments.For both methods,we initial- ize the latent features to random numbers in 0,1 and set the step size for gradient descent to 0.001.The parame- ters specific to our method are set as a 1 and B=50. Actually,we find that the performance will be stable af- ter about 1000 rounds of gradient decent (see Figure 3). Hence,we set W=1000 for all the following results.Fur- thermore,we adopt the Pearson similarity for all the experi- ments.The performance of other measures will be discussed -10 in Section 4.4.1. 11-20 161-320 3320 The results reported in Table 2 are the average MAE val- ues of PMF and TagiCoFi and their corresponding standard Figure 1:Performance improvement of TagiCoFi deviations.The better results are shown in bold.It iss clear over that of PMF on different user rating scales(no that TagiCoFi achieves better performance than PMF. users in a 20%training set have more than 320 ob- To evaluate how significant TagiCoFi outperforms PMF served ratings) we have conducted paired t-tests [26 on the results of PMF and TagiCoFi.Given two approaches,say A and B,and a set of n experiments,the MAE values are obtained for 4.4 Sensitivity to Parameters both approaches,denoted by ai and bi for i=1,2,...,n Let di ai-bi denote the difference of ai and bi and d 4.4.I User Similarity Measures be the average of the di values for i=1,2....,n.The null In this section,we conduct a set of experiments to compare hypothesis is d =0 whereas the alternative hypothesis is the effectiveness of the aforementioned user similarity mea- d>0.The p-value is computed using the t-statistic: sures:cosine similarity,Pearson similarity and Euclidean- d based similarity.Due to the page limit restriction,we only T= s/√元1 report results with parameters a =1,B=50,D=10 in Fig- ure 2.We have also observed the same trend in other param- where s is the standard deviation of d.A small p-value eter settings.From Figure 2,we see that the Pearson simi- (0.01)indicates the existence of statistically significant larity always gives the best performance and the Euclidean- evidence against the null hypothesis. based similarity is always the worst.Although the difference Table 3 shows the p-values obtained in our experiments. between these measures is obvious,Figure 2 shows that the It is easily observed that TagiCoFi significantly outperforms difference decreases as the training set size increases.One PMF.Because the main difference between TagiCoFi and may ask if changing the o parameter in the Euclidean-based PMF lies in the extra tagging information used by TagiCoFi, similarity measure will help.We have tuned the parameter we can conclude that the tagging information is very useful by trying different values but cannot make it outperform the and TagiCoFi can utilize it very effectively. other similarity measures.Based on this analysis,we adopt
4.2 Evaluation Metric For consistency with experiments reported in the literature, we use the Mean Absolute Error (MAE) as evaluation metric. MAE gives the average absolute deviation of prediction from the ground truth: MAE = P i P j Yij |Rij − Rˆij | P i P j Yij , where Rij and Rˆij are the true and predicted rating values, respectively. A smaller value of MAE indicates a better performance. In our experiments, we randomly split the rating records into two parts, each of which contains 50% of the observations in the rating matrix. One part is used as the test set, which is kept the same for all experiments. The other part is used as a pool from which training sets are generated. For example, a training set size of 20% means that 20% of the records are randomly selected from the pool to form a training set. For each training set size, we randomly generate 10 different training sets based on which 10 experiments are performed and the average result is reported. 4.3 Performance In this section, we compare our method with PMF which has been demonstrated to be one of the state-of-the-art CF methods [17]. For fairness, we perform parameter tuning in advance for each method and then use the best settings found in all the experiments. For both methods, we initialize the latent features to random numbers in [0, 1] and set the step size for gradient descent to 0.001. The parameters specific to our method are set as α = 1 and β = 50. Actually, we find that the performance will be stable after about 1000 rounds of gradient decent (see Figure 3). Hence, we set W = 1000 for all the following results. Furthermore, we adopt the Pearson similarity for all the experiments. The performance of other measures will be discussed in Section 4.4.1. The results reported in Table 2 are the average MAE values of PMF and TagiCoFi and their corresponding standard deviations. The better results are shown in bold. It iss clear that TagiCoFi achieves better performance than PMF. To evaluate how significant TagiCoFi outperforms PMF, we have conducted paired t-tests [26] on the results of PMF and TagiCoFi. Given two approaches, say A and B, and a set of n experiments, the MAE values are obtained for both approaches, denoted by ai and bi for i = 1, 2, . . . , n. Let di = ai − bi denote the difference of ai and bi and d¯ be the average of the di values for i = 1, 2, . . . , n. The null hypothesis is d¯ = 0 whereas the alternative hypothesis is d >¯ 0. The p-value is computed using the t-statistic: T = d¯ s/√ n , where s is the standard deviation of d. A small p-value (≤ 0.01) indicates the existence of statistically significant evidence against the null hypothesis. Table 3 shows the p-values obtained in our experiments. It is easily observed that TagiCoFi significantly outperforms PMF. Because the main difference between TagiCoFi and PMF lies in the extra tagging information used by TagiCoFi, we can conclude that the tagging information is very useful and TagiCoFi can utilize it very effectively. Table 3: p-values for the significance tests D = 5 D = 10 D = 20 20% Training 3.91 × 10−15 8.27 × 10−17 1.04 × 10−16 40% Training 4.11 × 10−13 2.10 × 10−16 4.52 × 10−16 60% Training 1.35 × 10−11 4.20 × 10−12 4.15 × 10−12 80% Training 1.24 × 10−8 2.85 × 10−12 5.99 × 10−12 In order to compare TagiCoFi with PMF more thoroughly, we compare their performance on users with different numbers of observed ratings. The results are shown in Figure 1, from which we can find that TagiCoFi outperforms PMF for all users and the improvement is more significant for users with only few observed ratings. This is a very promising property of TagiCoFi because those users with a small number of ratings are typically new customers who have just started to use the system. If we can provide good recommendation to them, we will have a higher chance to keep them as our long-term customers. Otherwise we will likely lose them. 1−10 11−20 21−40 41−80 81−160 161−320 >320 0 0.05 0.1 Number of Observed Ratings MAE Improvement 20% Training 80% Training Figure 1: Performance improvement of TagiCoFi over that of PMF on different user rating scales (no users in a 20% training set have more than 320 observed ratings) 4.4 Sensitivity to Parameters 4.4.1 User Similarity Measures In this section, we conduct a set of experiments to compare the effectiveness of the aforementioned user similarity measures: cosine similarity, Pearson similarity and Euclideanbased similarity. Due to the page limit restriction, we only report results with parameters α = 1, β = 50, D = 10 in Figure 2. We have also observed the same trend in other parameter settings. From Figure 2, we see that the Pearson similarity always gives the best performance and the Euclideanbased similarity is always the worst. Although the difference between these measures is obvious, Figure 2 shows that the difference decreases as the training set size increases. One may ask if changing the σ parameter in the Euclidean-based similarity measure will help. We have tuned the parameter by trying different values but cannot make it outperform the other similarity measures. Based on this analysis, we adopt