Tags Meet Ratings: Improving Collaborative Filtering with Tag-Based Neighborhood Method Zhe Wang Yo w Hu wu National engineerin National engineerin National Engineering Research Center of Research Center of Fundamental software Fundamental Software Fundamental Software Institute of software Institute of software Institute of software Chinese academy of Chinese academy of Chinese academy of Sciences Sciences Sciences Graduate University of yang@techs. iscas ac cn wuhu@@techs.iscasaccn Chinese academy of wangzhe07@iscas ac cn ABSTRACT Nowadays people are inundated by choices. Personal- Collaborative filtering(CF) is a method for personal- ized recommendation is a solution to this problem. Var- ized recommendation. The sparsity of rating data se ious kinds of recommender systems are employed for riously impairs the quality of CF's recommendation. better user experience. Collaborative filtering 4, 12 is Meanwhile, there is more and more tag information gen- one of the best techniques of choice therein. This tech ated by online users that implies their preferences. nique tries to identify users that have relevant interests Exploiting these tag data is a promising means to al- by calculating similarities among user profiles. The idea leviate the sparsity problem. Although the intention is is that it may be of benefit to one's search for informa- raight-forward, there's no existed solution that makes tion to consult the behavior of other users who share full use of tags to improve the recommendation qual- the same or relevant interests ity of traditional rating-based collaborative filtering ap- proaches. In this paper, we propose a novel approach Because collaborative filtering recommendation depends to fuse a tag-based neighborhood method into the tradi- on the preference of the users with the same or rele- tional rating-based CF. Tag-based neighborhood method vant interests, the similarity computation imposes sig employed to find similar users and items. These nificant influence on the quality of recommendation neighborhood information helps the sequent Cf pro- Early item-based and user-based collaborative filtering cedure produce higher quality recommendations. The approaches find similar users or items(neighbors) by experiments show that our approach outperforms the calculating Pearson correlation coefficient (23. These state-of-the-art ones approaches are efficient and effective. But simply com- paring the rating records of different users or items can- ACM Classification Keyword not help to find the best neighbors. If a user has few H.3.3 Information Storage and Retrieval: Information ratings for items or this user only gives all his/her rat- Search and Retrieval-Information filtering ings to the unpopular ones, it will be difficult for those approaches to find the proper neighbors General terms gorithms, Experimentation Recently, matrix factorization approaches earn popularity because of their higher recommendation ity and smaller online costs. One of the most signif Author Keywords cant differences from early approaches is that they ex- Tags, Latent Dirichlet Allocation(LDA), Collaborative tract the "features"of the users and the items. By this Filtering, Neighborhood Method way, they decompose the original preference matrix into several low rank approximates or the items. ev- INTRODUCTION ery feature reflects the preference by a group of similar users. For the users, every feature reflects their pref- erence for a collection of similar items. By virtue of Permission to make d extracting users'and items'features, matrix factoriza or hard copies of all or part of this work for tion approaches are able to find bett ts an not made or distributed for profit or commercial advantage and that copies hence produce better recommendations bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific Despite the merits mentioned before, the existing ma- ermission and/or a fee Workshop SRS'10, February 7, 2010 Hong Kong, China trix factorization approaches 6, 7, 8, 16, 26 fail to ex Copyright2010ACM978-1-60558-995-4.s10.00
Tags Meet Ratings: Improving Collaborative Filtering with Tag-Based Neighborhood Method Zhe Wang National Engineering Research Center of Fundamental Software, Institute of Software, Chinese Academy of Sciences Graduate University of Chinese Academy of Sciences wangzhe07@iscas.ac.cn Yongji Wang National Engineering Research Center of Fundamental Software, Institute of Software, Chinese Academy of Sciences ywang@itechs.iscas.ac.cn Hu Wu National Engineering Research Center of Fundamental Software, Institute of Software, Chinese Academy of Sciences wuhu@itechs.iscas.ac.cn ABSTRACT Collaborative filtering (CF) is a method for personalized recommendation. The sparsity of rating data seriously impairs the quality of CF’s recommendation. Meanwhile, there is more and more tag information generated by online users that implies their preferences. Exploiting these tag data is a promising means to alleviate the sparsity problem. Although the intention is straight-forward, there’s no existed solution that makes full use of tags to improve the recommendation quality of traditional rating-based collaborative filtering approaches. In this paper, we propose a novel approach to fuse a tag-based neighborhood method into the traditional rating-based CF. Tag-based neighborhood method is employed to find similar users and items. These neighborhood information helps the sequent CF procedure produce higher quality recommendations. The experiments show that our approach outperforms the state-of-the-art ones. ACM Classification Keywords H.3.3 Information Storage and Retrieval: Information Search and Retrieval-Information filtering General Terms Algorithms, Experimentation Author Keywords Tags, Latent Dirichlet Allocation (LDA), Collaborative Filtering, Neighborhood Method INTRODUCTION Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Workshop SRS’10, February 7, 2010 Hong Kong, China Copyright 2010 ACM 978-1-60558-995-4... $10.00 Nowadays people are inundated by choices. Personalized recommendation is a solution to this problem. Various kinds of recommender systems are employed for better user experience. Collaborative filtering [4, 12] is one of the best techniques of choice therein. This technique tries to identify users that have relevant interests by calculating similarities among user profiles. The idea is that it may be of benefit to one’s search for information to consult the behavior of other users who share the same or relevant interests. Because collaborative filtering recommendation depends on the preference of the users with the same or relevant interests, the similarity computation imposes significant influence on the quality of recommendation. Early item-based and user-based collaborative filtering approaches find similar users or items (neighbors) by calculating Pearson correlation coefficient [23]. These approaches are efficient and effective. But simply comparing the rating records of different users or items cannot help to find the best neighbors. If a user has few ratings for items or this user only gives all his/her ratings to the unpopular ones, it will be difficult for those approaches to find the proper neighbors. Recently, matrix factorization approaches earn more popularity because of their higher recommendation quality and smaller online costs. One of the most signifi- cant differences from early approaches is that they extract the “features” of the users and the items. By this way, they decompose the original preference matrix into several low rank approximates [15]. For the items, every feature reflects the preference by a group of similar users. For the users, every feature reflects their preference for a collection of similar items. By virtue of extracting users’ and items’ features, matrix factorization approaches are able to find better neighbors and hence produce better recommendations. Despite the merits mentioned before, the existing matrix factorization approaches [6, 7, 8, 16, 26] fail to ex-
tract sufficient feature information, which reflects the where iE N+, j E M+ and Tii E [1, Rmaz]. The usual problem of data sparsity. It is because they fit the orig evaluation process is the hold-out cross validation]. A nal matrix by feature extraction only based on the rat certain proportion of ratings are hidden for testing and ing data while the rating data are extremely sparse. If the rest are used for training. The measures of evalua- ve could obtain more ratings, we would surely enhance tion include complexity and accuracy. Nevertheless, the he quality of fitting process. From this standing point much more important because most of the we propose a better collaborative filtering approach to Collaborative Filtering approaches are offline. There exploit additional knowledge from the tags as a supple- fore, it is the focus in this paper nent to ratings. Tags are simple, ad-hoc labels assigned by users to de Naive Estimates scribe or annotate any kind of resource for future re- One of the most instinctive predicting methods is to a user's perspective and preference with ease. Most re- item's average biases involved, we get the naive estimate ent work focuses on the tag recommendation in which the objects to recommend are tags [18, 20, 22, 27. In the case of item-based recommendation, users expect bx=μ+b+b;, to get specific suggestion on which item might be in- where bij indicate the predicted rating of user on itemj teresting. There are a limited number of solutions for p is the global average rating; bi and bj denote useri's this situation, and most of them do not have a gen d item,'s average bias, respectively ralized adaptation to different data resources because they ignore abundant rating data 11, 25. In this paper, This naive method is effective and scalable, but it does we offer a novel personalized recommendation method not take the interaction between users into account ev- which matches the case of containing both ratings and ery user's rating for a item has infuences on other users opinions to that item. This interdependence between the users forms a social network 24 which connects all Our approach still shares the main idea of classic neigh- users together. The personalized recommendations are borhood method. but there are some differences in where not delivered in isolation, but in the context of this so- to find neighbors. The neighbors are usually found in cial network [14]. The neighborhood method is one of the ratings for the traditional CF approach[1. We do the most effective methods to analyze the context not find neighbors directly by this means. First we ex- ploit the latent topic grouping information hidden in tags and then we find groups of the users interested in Neighborhood Method similar topics and collections of the items under sim- he aim of the neighborhood method 2 is to find the ilar topics. To predict the user's rating for the item ers who give similar ratings and the items which re- we consult the ratings of both of the user's and the ceive similar ratings. The approximate ratings infer the item's neighbors by employing a neighborhood method. potential similarity of the future ratings. This is the Thanks to taking into account both tag neighbors and basic assumption of collaborative filtering. Because the rating neighbors, our method outperforms most popular neighborhood method digs out from the neighbors the CF a clues that indicate the potential ratings, it produces better predictions than the naive estimate. The model of the neighborhood method unifying item-based and Section 2 we introduce the background and the related user-based collaborative filtering approaches is rative filtering method in details.Setn4wegv=b+∑硎,(h-b)+∑(-b3) two toy examples and compare our method with NMF h∈Sk(jii) h∈sk(i:y) PMF and SVD on a popular movie dataset. And finally we conclude this paper with some future work where Fi is the predicted rating: bij refers to the naive estimate's prediction; G; i)denotes the set includ- ng k nearest rated items neighboring with item, for a PRELIMINARIES given user and Thi; S(i; j)denotes the set including k Rating prediction is one of the most popular means to nearest users neighboring with user: for a given item; valuate the performance of collaborative filtering al and rih: 0 reflects the different weights of Tih. There gorithms. From the rating data of most collaborative are several representations for the weights. The cosine filtering datasets, we can obtain a N x M rating matrix similarity is one the most effective measures to indicate R including N users and M items. Matrix R is defined the different weights Cosine Similarty useris rating for item,, if user has rated item otherwis where a and b are both vectors with the same dimension
tract sufficient feature information, which reflects the problem of data sparsity. It is because they fit the original matrix by feature extraction only based on the rating data while the rating data are extremely sparse. If we could obtain more ratings, we would surely enhance the quality of fitting process. From this standing point, we propose a better collaborative filtering approach to exploit additional knowledge from the tags as a supplement to ratings. Tags are simple, ad-hoc labels assigned by users to describe or annotate any kind of resource for future retrieval. Their flexibility means they probably capture a user’s perspective and preference with ease. Most recent work focuses on the tag recommendation in which the objects to recommend are tags [18, 20, 22, 27]. In the case of item-based recommendation, users expect to get specific suggestion on which item might be interesting. There are a limited number of solutions for this situation, and most of them do not have a generalized adaptation to different data resources because they ignore abundant rating data [11, 25]. In this paper, we offer a novel personalized recommendation method which matches the case of containing both ratings and tags. Our approach still shares the main idea of classic neighborhood method, but there are some differences in where to find neighbors. The neighbors are usually found in the ratings for the traditional CF approach [1]. We do not find neighbors directly by this means. First we exploit the latent topic grouping information hidden in tags and then we find groups of the users interested in similar topics and collections of the items under similar topics. To predict the user’s rating for the item, we consult the ratings of both of the user’s and the item’s neighbors by employing a neighborhood method. Thanks to taking into account both tag neighbors and rating neighbors, our method outperforms most popular CF approaches. The structure of the rest of the paper is as follows. In Section 2 we introduce the background and the related works. In section 3 we explain our improved collaborative filtering method in details. In Section 4 we give two toy examples and compare our method with NMF, PMF and SVD on a popular movie dataset. And finally we conclude this paper with some future work. PRELIMINARIES Rating prediction is one of the most popular means to evaluate the performance of collaborative filtering algorithms. From the rating data of most collaborative filtering datasets, we can obtain a N ×M rating matrix R including N users and M items. Matrix R is defined as rij = ½ useri ’s rating for itemj , if useri has rated itemj 0 , otherwise , where i ∈ N +, j ∈ M+ and rij ∈ [1, Rmax]. The usual evaluation process is the hold-out cross validation[5]. A certain proportion of ratings are hidden for testing and the rest are used for training. The measures of evaluation include complexity and accuracy. Nevertheless, the accuracy is much more important because most of the Collaborative Filtering approaches are offline. Therefore, it is the focus in this paper. Naive Estimates One of the most instinctive predicting methods is to compute the mean values. Taking the user’s and the item’s average biases involved, we get the naive estimate [8]: bij = µ + bi + bj , (1) where bij indicate the predicted rating of useri on itemj ; µ is the global average rating; bi and bj denote useri ’s and itemj ’s average bias, respectively. This naive method is effective and scalable, but it does not take the interaction between users into account. Every user’s rating for a item has influences on other users’ opinions to that item. This interdependence between the users forms a social network [24] which connects all users together. The personalized recommendations are not delivered in isolation, but in the context of this social network [14]. The neighborhood method is one of the most effective methods to analyze the context. Neighborhood Method The aim of the neighborhood method [2] is to find the users who give similar ratings and the items which receive similar ratings. The approximate ratings infer the potential similarity of the future ratings. This is the basic assumption of collaborative filtering. Because the neighborhood method digs out from the neighbors the clues that indicate the potential ratings, it produces better predictions than the naive estimate. The model of the neighborhood method unifying item-based and user-based collaborative filtering approaches is rˆij = bij+ X h∈Sk(j;i) θ i hj (rih−bih)+ X h∈Sk(i;j) θ j ih(rhj−bhj ), (2) where ˆrij is the predicted rating; bij refers to the naive estimate’s prediction; S k (j;i) denotes the set including k nearest rated items neighboring with itemj for a given useri and rhj ; S k (i; j) denotes the set including k nearest users neighboring with useri for a given itemj and rih; θ reflects the different weights of rih. There are several representations for the weights. The cosine similarity is one the most effective measures to indicate the different weights. Cosine Similarty = a · b ||a||2 ||b||2 , where a and b are both vectors with the same dimension
The neighborhood method find itemi's k nearest neigh- CF recommendation methods. In the background of bors(k-NN). These neighbors infer the potential value electronic commerce and video on demand, proper item of ri to different degree according to their similarit recommendations are better since the items are over with item,. Although there are several different simi- whelmingly numerous larity measures employed to compute the similarity be- tween the items, the similarity between items is repre nted by the distance between their rating vectors. The Topic Finding hilarity of the items which have less common raters As with the rating data, the tag data can be represented are structurally lower. If there're high level features ex- as a n x m sparse matrix T given n users and m items tracted to represent the user and the item, the similarity can be better measured this way. Matrix factorization ti, =fuser, 's tags for items, if user: has tagged item,j methods learn this lesson The users are allowed to give more than one tag to each Matrix Factorization item. So if the tags are clearly separated, T becomes To extract high level feature, matrix factorization meth a three-dimensional tensor The three dimensions are ods try to find the rating matrix's low rank approxi user, item. and tag. This is a tough case to take care of mations [15, 21]. They focus on fitting the user-item and it is why there is little work on extract preference rating matrix by low-rank approximation and use the information from this data resource. Innovatively, we fitting result to make sequent predictions 6, 7, 8, 16 divide T into user-tag and item tag matrices represent- The premise behind this low-dimensional factor model ng the tags given by the users and the tags received is that there is only a small number of factors or features oy the items, respectively. The user-tag and item-tag influencing preferences, and that a user's preference for matrices are denoted as TU and T which are defined an item is only determined by that user's feature vector as follows: and that item's feature vector [t1,t2 What is related to our work is not the basic matrix factorization methods. Recently, some matrix factoriza tion methods which involve auxiliary information analy- where tu denotes the tags user has given, and ti de is draw our attention. [13 proposes an trust-aware col notes the tags item has been given laborative filtering algorithm. The algorithm is based on the general knowledge that people normally ask friends When the original tensor T is converted into bags of for recommendations. Due to the memory-based model words in T and T, we can apply LDa 3 to find this algorithm suffers from huge online costs. Trust latent topic information therein. Processing T and values need to be computed like similarity measures. TI separately, we find their latent topics in the form of SoRec [10 fuses the existed trust-based approach with probability. Probabilistic Matrix Factorization(PMF)[16. This 0g= p(topic=jjuser=i) methods is model-based, but it cannot be widely ap- plied due to the scarce resource of trust information 0i=p(topic= litem= i) which involves people's privacy. 9 proposes a relation regularized matrix factorization method for relational where oi denotes user ' probability of preferring for data analysis. Yet it is designed for making recommen- topic and Bi; denotes itemi' probability of being re- dations concerning objects that have both content and lated to the topicj. This is a kind of " soft clustering links. The idea of Collective Matrix Factorization 19 It is possible that a user or item is under multiple top- innovative: factorizing multiple matrices simultane- ics with the same probability. The similarity between ously with shared parameters. The weakness of this these row vectors in e and e more appropriately re- method is that the parameter learning process is com- flects the users'and items'similarity because clustering putationally costly is based on the semantics of the tags The matrices e and e are not directly used for rating prediction TAG-BASED ITEM RECOMMENDATION Because they are full matrices which are not appropri- Since tags and ratings are two of the most attributes ate for computation, we set a threshold value to reserve attached to items, we propose a generalized neighbor hood recommendation method to make use of them in mportant reason for this process is that most of the the same time. Our work is based on the assumption users and items are indirectly related with each other that the behavior of tagging and rating share the same in reality notivation: item classification. In this sense. the la tent preference information found in tagging data ha After finding the matrices e and e, it is easy to em- more power than that in rating data. Regarding tags, ploy k-nN clustering to find the groups whose members there are two types of recommendation: item recom- share the same interests or attributes mendation and keyword recommendation. Our Conce is item recommendation which is the same with most Rating Prediction
The neighborhood method find itemj ’s k nearest neighbors (k-NN). These neighbors infer the potential value of rij to different degree according to their similarity with itemj . Although there are several different similarity measures employed to compute the similarity between the items, the similarity between items is represented by the distance between their rating vectors. The similarity of the items which have less common raters are structurally lower. If there’re high level features extracted to represent the user and the item, the similarity can be better measured this way. Matrix factorization methods learn this lesson. Matrix Factorization To extract high level feature, matrix factorization methods try to find the rating matrix’s low rank approximations [15, 21]. They focus on fitting the user-item rating matrix by low-rank approximation and use the fitting result to make sequent predictions [6, 7, 8, 16]. The premise behind this low-dimensional factor model is that there is only a small number of factors or features influencing preferences, and that a user’s preference for an item is only determined by that user’s feature vector and that item’s feature vector. What is related to our work is not the basic matrix factorization methods. Recently, some matrix factorization methods which involve auxiliary information analysis draw our attention. [13] proposes an trust-aware collaborative filtering algorithm. The algorithm is based on the general knowledge that people normally ask friends for recommendations. Due to the memory-based model, this algorithm suffers from huge online costs. Trust values need to be computed like similarity measures. SoRec [10] fuses the existed trust-based approach with Probabilistic Matrix Factorization (PMF) [16]. This methods is model-based, but it cannot be widely applied due to the scarce resource of trust information which involves people’s privacy. [9] proposes a relation regularized matrix factorization method for relational data analysis. Yet it is designed for making recommendations concerning objects that have both content and links. The idea of Collective Matrix Factorization [19] is innovative: factorizing multiple matrices simultaneously with shared parameters. The weakness of this method is that the parameter learning process is computationally costly. TAG-BASED ITEM RECOMMENDATION Since tags and ratings are two of the most attributes attached to items, we propose a generalized neighborhood recommendation method to make use of them in the same time. Our work is based on the assumption that the behavior of tagging and rating share the same motivation: item classification. In this sense, the latent preference information found in tagging data has more power than that in rating data. Regarding tags, there are two types of recommendation: item recommendation and keyword recommendation. Our concern is item recommendation which is the same with most CF recommendation methods. In the background of electronic commerce and video on demand, proper item recommendations are better since the items are overwhelmingly numerous. Topic Finding As with the rating data, the tag data can be represented as a n × m sparse matrix T given n users and m items, tij = ½ useri ’s tags for itemj , if useri has tagged itemj null , otherwise. The users are allowed to give more than one tag to each item. So if the tags are clearly separated, T becomes a three-dimensional tensor. The three dimensions are user, item, and tag. This is a tough case to take care of and it is why there is little work on extract preference information from this data resource. Innovatively, we divide T into user-tag and item tag matrices representing the tags given by the users and the tags received by the items, respectively. The user-tag and item-tag matrices are denoted as T U and T I which are defined as follows: T U = [ t1, t2, ..., tn] T , T I = [ t1, t2, ..., tm] T , where tu denotes the tags useru has given, and ti denotes the tags itemi has been given. When the original tensor T is converted into bags of words in T U and T I , we can apply LDA [3] to find latent topic information therein. Processing T U and T I separately, we find their latent topics in the form of probability. θ U ij = p(topic = j|user = i), θ I ij = p(topic = j|item = i), where θ U ij denotes useri ’ probability of preferring for topicj and θ I ij denotes itemi ’ probability of being related to the topicj . This is a kind of “soft clustering”. It is possible that a user or item is under multiple topics with the same probability. The similarity between these row vectors in ΘU and ΘI more appropriately re- flects the users’ and items’ similarity because clustering is based on the semantics of the tags. The matrices ΘU and ΘI are not directly used for rating prediction. Because they are full matrices which are not appropriate for computation, we set a threshold value to reserve high similarity relations and clear the others. Another important reason for this process is that most of the users and items are indirectly related with each other in reality. After finding the matrices ΘU and ΘI , it is easy to employ k-NN clustering to find the groups whose members share the same interests or attributes. Rating Prediction
We assume all the users who tag the items also give rat- delve into it. Their dataset includes three files. One ngs and that all the items which are tagged also receive is rating data which contains users ratings for movies ratings. If some users actually fail to give either ratings another is tagging data which contains movies'tags and or tags, we still can make use of what they input to the the users id who made the tag, and the other is movie recommender system. Even with few entries, the recom overview data which contains the movie's name release mender system still understands what the user wants. year and genre. The user are allowed to give ratings We hold this claim because tags are more informative. and tags to the movies they have seen. The ratings are If the user only put one tag amine"te tegers between 1 to 5. could infer that this is a animation fun. But if this user only give a high rating to the movie Avatar", what We intend to leverage tag analysis to help rating pre- should we infer from this? Which groups of movie does diction. So we need the movies that have both ratings is user like actions. adventures, fantasies or sci-fis? nd tags and the users that give both ratings and tags After taking the intersection of the rating data and tag Most recommender systems use the integral interval data, we get the rating datas subset which contains all 1, Rmaz to represent the users' preference on items the tagged movies. This subset contains 905686 ratings It is necessary to normalize the ratings into the ran from 4001 for 7600 movies. The density of these to 0, 1, because only this interval makes sense for prob- rating data is ability. There are many mapping methods. One of the most widely used mapping function is f(a)=(r 905686 1)/(Rmaz-1). As far as we know, the influence exerted 4001×7600=2.974% on the final recommendation by different mapping func- From this subset, we randomly and independently choose tions is not significantly different 20%, 50%, 80% and 99% of rating data as separate training sets. The remaining rating data are used as So our next step is to make rating predictions based on the testing sets. The experiments are all repeated five the grouping results stated in the last section. The pre- times to reduce errors diction is made according to a neighborhood method. Fij= A+bi+b Toy Examples Because the quality of recommendation is eventually b chero(rhj-bhj)Ih thet( h) reflected in the results of rating prediction accuracy. To obtain a clearer vision about the qualitative quality, we ∑h∈ro(Th-bnh)l (3) present two toy examples in smaller data volume scale Iih First we extract the tags from 6 users. The tag matrix where bi denotes useri's bias for the topic T(i), and b denotes item,'s bias for the topic T(). T(i) and T() denote the topic user is interested in and the topic killer action horror tem, is under, respectively. Each topic is a set which thrill action contains a number of users or items 1墨 anine fantasy anine Plus, we give different weights to the neighbors with Japanese animE different distances. The algorithms weighted variant ic documentary American realistic μ+b+b ∑her(mh-bmy)Sh1lB We set the hyper-parameter topic number as 3 and con- duct LDa analysis to get the probabilistic matrix 2h∈T(i) )(rih-bin)Sp /0.3456790.3456790.308642 0.3641980.3086420.308642 where Ah, 0i and O, denote the row vectors of proba- e=0.306420345679035679 bilities in E. S represents the cosine similarity of the vectors hh and 6 0.3456790.3086420.345679 0.3086420.3456790.345679 EXPERIMENTAL ANALYSIS Dataset Description It is quite obvious that user1 and user2 have the same Movielens Dataset is created by Movielens movie rec- or similar interests. The first column values is the m ommender which aims to provide online movie recom- imum among all three columns for both of him, which endation service [17. Their work is a more involved infers the topic they are most probably interested in is system rather than a particular algorithm, so we do not the first topic. For the same reason, users and usera
We assume all the users who tag the items also give ratings and that all the items which are tagged also receive ratings. If some users actually fail to give either ratings or tags, we still can make use of what they input to the recommender system. Even with few entries, the recommender system still understands what the user wants. We hold this claim because tags are more informative. If the user only put one tag ”amine” to some item, we could infer that this is a animation fun. But if this user only give a high rating to the movie ”Avatar”, what should we infer from this? Which groups of movie does this user like, actions, adventures, fantasies or sci-fis? Most recommender systems use the integral interval [1, Rmax] to represent the users’ preference on items. It is necessary to normalize the ratings into the range to [0, 1], because only this interval makes sense for probability. There are many mapping methods. One of the most widely used mapping function is f(x) = (x − 1)/(Rmax−1). As far as we know, the influence exerted on the final recommendation by different mapping functions is not significantly different. So our next step is to make rating predictions based on the grouping results stated in the last section. The prediction is made according to a neighborhood method: rˆij = µ + b ∗ i + b ∗ j , b ∗ i = P h∈T(i) (rhj − bhj )I R hj P h∈T(i) I R hj , b ∗ j = P h∈T(j) (rih − bih)I R ih P h∈T(j) I R ih , (3) where b ∗ i denotes useri ’s bias for the topic T(i), and b ∗ j denotes itemj ’s bias for the topic T(j). T(i) and T(j) denote the topic useri is interested in and the topic itemj is under, respectively. Each topic is a set which contains a number of users or items. Plus, we give different weights to the neighbors with different distances. The algorithm’s weighted variant is rˆij = µ + b ∗ i + b ∗ j , b ∗ i = P h∈T(i) (rhj − bhj )ShiI R hj P h∈T(i) ShiI R hj , b ∗ j = P h∈T(j) (rih − bih)Shj I R ih P h∈T(j) Shj I R ih , (4) where θh, θi and θj denote the row vectors of probabilities in Θ. S represents the cosine similarity of the vectors θh and θj . EXPERIMENTAL ANALYSIS Dataset Description Movielens Dataset is created by Movielens movie recommender which aims to provide online movie recommendation service [17]. Their work is a more involved system rather than a particular algorithm, so we do not delve into it. Their dataset includes three files. One is rating data which contains users’ ratings for movies, another is tagging data which contains movies’ tags and the user’s id who made the tag, and the other is movie overview data which contains the movie’s name, release year and genre. The user are allowed to give ratings and tags to the movies they have seen. The ratings are integers between 1 to 5. We intend to leverage tag analysis to help rating prediction. So we need the movies that have both ratings and tags and the users that give both ratings and tags. After taking the intersection of the rating data and tag data, we get the rating data’s subset which contains all the tagged movies. This subset contains 905686 ratings from 4001 users for 7600 movies. The density of these rating data is 905686 4001 × 7600 = 2.974%. From this subset, we randomly and independently choose 20%, 50%, 80% and 99% of rating data as separate training sets. The remaining rating data are used as the testing sets. The experiments are all repeated five times to reduce errors. Toy Examples Because the quality of recommendation is eventually reflected in the results of rating prediction accuracy. To obtain a clearer vision about the qualitative quality, we present two toy examples in smaller data volume scale. First we extract the tags from 6 users. The tag matrix T U is as follows: horror killer action horror horror action thrill action f antasy anime f antasy anime anime Japanese anime f antasy documentary 911 terrorist hero historic documentary American realistic We set the hyper-parameter topic number as 3 and conduct LDA analysis to get the probabilistic matrix Θ U = 0.345679 0.345679 0.308642 0.364198 0.308642 0.308642 0.308642 0.345679 0.345679 0.327160 0.345679 0.327160 0.345679 0.308642 0.345679 0.308642 0.345679 0.345679 It is quite obvious that user1 and user2 have the same or similar interests. The first column values is the maximum among all three columns for both of him, which infers the topic they are most probably interested in is the first topic. For the same reason, user3 and user4
DieAnotherDay Casino Roya Topic 3 Topic 4 It Happened one Night Blood diamo Pulp Fictio Fahrenheit The sixth sens Hotel star Wars: Episode V- Empire sakes The Matrox 争●“ 67891011 Topic 10 The shawshank Redomatio 8910111213 1. Animation 6.Thriller cummcntar 2.ch⊥1dren 3. Fantasy Adventure 13. Musical 15. Romance Figure 1. Top 5 movies under 10 different topics and their provided genre information are similar and users and users are alike. The sub 1. There are at least 4 movies under the same genre ctive grouping result is exactly the same with that of for every topic. Topicl, Topic2 and Topic3 all have a k-nn method to the matrix Theata with the same more than two such common genres. The high co- hyper-parameters. Considering the semantic meanin occurrence frequency of different movies under the of these 24 tags, we conclude that the result of the anal- same genre reflects the large extent of conformation ysis on the user-tag matrix is persuasive. The analysis The coexistence of movie series like The matriz and on the item-tag matrix is effective in a similar way. Yet there are more merits for analyzing the item-tag ma can find not only the movies of the same genres but rix because we can obtain the names and genres of the Iso the movies of the same series movies from the overview data of the dataset 2. The five movies in Topic6 all reflect big social prob- In the second example, we extract the tag data in movie- lems. This problem could be war, terrorist attack, or lens dataset and get the e-tag matrix. There are social security crisis. This explains why these movies 600 movies in this matrix. We make the tag analysis under different genres are in the same topic. The to find the latent topics again. For the sake of leaving topic finding result is not equal to the genre classifica- more space for showing more important results, we set tion. The topic"social problem"may be interesting the desired topic number as 10. Fig. 1 presents the five for some of the users. These details are more valuable most probable movies per topic. According to it, we for inferring the users' preference than genres. have several observations as follows 3. According to the corresponding rating data, the aver- age variance of ratings of the five movies under their
1.Animation 2.Children 3.Fantasy 4.Comedy 5.Crime 6.Thriller 7.Action 8.Adventure 9.Sci-Fi 10.Drama 11.Western 12.War 13.Musical 14.Mystery 15.Romance 16.Documentary 17.Horror Figure 1. Top 5 movies under 10 different topics and their provided genre information are similar and user5 and user6 are alike. The subjective grouping result is exactly the same with that of a k-NN method to the matrix T heataU with the same hyper-parameters. Considering the semantic meanings of these 24 tags, we conclude that the result of the analysis on the user-tag matrix is persuasive. The analysis on the item-tag matrix is effective in a similar way. Yet, there are more merits for analyzing the item-tag matrix because we can obtain the names and genres of the movies from the overview data of the dataset. In the second example, we extract the tag data in Movielens dataset and get the movie-tag matrix. There are 7600 movies in this matrix. We make the tag analysis to find the latent topics again. For the sake of leaving more space for showing more important results, we set the desired topic number as 10. Fig.1 presents the five most probable movies per topic. According to it, we have several observations as follows: 1. There are at least 4 movies under the same genre for every topic. T opic1, T opic2 and T opic3 all have more than two such common genres. The high cooccurrence frequency of different movies under the same genre reflects the large extent of conformation. The coexistence of movie series like The Matrix and Star Wars under T opic7 illustrates our tag analysis can find not only the movies of the same genres but also the movies of the same series. 2. The five movies in T opic6 all reflect big social problems. This problem could be war, terrorist attack, or social security crisis. This explains why these movies under different genres are in the same topic. The topic finding result is not equal to the genre classification. The topic ”social problem” may be interesting for some of the users. These details are more valuable for inferring the users’ preference than genres. 3. According to the corresponding rating data, the average variance of ratings of the five movies under their