Improving Tag-Based Recommendation by Topic Diversification Christian Wartena and martin Wibbels Noway, Brouwerijstraat 1, 7523 XC Enschede, The Netherlands Christian Wartena, Martin. Wibbels lenovay nl Abstract. Collaborative tagging has emerged as a mechanism to de- scribe items in large on-line collections. Tags are assigned by users to describe and find back items, but it is also tempting to describe the users in terms of the tags they assign or in terms of the tags of the items they are interested in. The tag-based profile thus obtained can be used to recommend new items If we recommend new items by computing their similarity to the user profile or to all items seen by the user, we run into the risk of recom- mending only neutral items that are a bit relevant for each topic a user is interested in. In order to increase user satisfaction many recommender systems not only optimize for accuracy but also for diversity. Often it assumed that there exists a trade-off between accuracy and diversity In this paper we introduce topic aware recommendation algorithms. Topic aware algorithms first detect different interests in the user profile and then generate recommendations for each of these interests. We study opic aware variants of three tag based recommendation algorithms and show that each of them gives better recommendations than their base variants, both in terms of precision and recall and in terms of diversity. 1 Introduction Collaborative tagging has emerged in the past decade as a mechanism to describe items in large collections available on-line. Tags are assigned by users to describe and find back previously viewed items. Thus a ternary relation between users tags and items is established. In this paper we will investigate some possibilities to construct a user profile based on tags, to identify distinct interests in this profile, and to recommend items relevant to those interests When we use a tag based user profile an item is recommended if it is relevant to II tags in the user profile. Similarly, if we use a collaborative filtering approach, we require the item to be similar to all items in the user profile. However, for a user who has some distinct interests, an item that fits the average of all his interests might be less accurate than an item that fits exactly one of his interests Thus we expect, at least in some cases, that recommendations will improve if we identify different interests in the user profile and take these into account for the recommendation of new items. If a recommendation strategy does so, re will call this strategy topic aware. In the following we will make a number P. Clough et al.(Eds ) ECIR 2011, LNCS 6611, pp. 43-54, 2011
Improving Tag-Based Recommendation by Topic Diversification Christian Wartena and Martin Wibbels Novay, Brouwerijstraat 1, 7523 XC Enschede, The Netherlands {Christian.Wartena,Martin.Wibbels}@novay.nl Abstract. Collaborative tagging has emerged as a mechanism to describe items in large on-line collections. Tags are assigned by users to describe and find back items, but it is also tempting to describe the users in terms of the tags they assign or in terms of the tags of the items they are interested in. The tag-based profile thus obtained can be used to recommend new items. If we recommend new items by computing their similarity to the user profile or to all items seen by the user, we run into the risk of recommending only neutral items that are a bit relevant for each topic a user is interested in. In order to increase user satisfaction many recommender systems not only optimize for accuracy but also for diversity. Often it is assumed that there exists a trade-off between accuracy and diversity. In this paper we introduce topic aware recommendation algorithms. Topic aware algorithms first detect different interests in the user profile and then generate recommendations for each of these interests. We study topic aware variants of three tag based recommendation algorithms and show that each of them gives better recommendations than their base variants, both in terms of precision and recall and in terms of diversity. 1 Introduction Collaborative tagging has emerged in the past decade as a mechanism to describe items in large collections available on-line. Tags are assigned by users to describe and find back previously viewed items. Thus a ternary relation between users, tags and items is established. In this paper we will investigate some possibilities to construct a user profile based on tags, to identify distinct interests in this profile, and to recommend items relevant to those interests. When we use a tag based user profile an item is recommended if it is relevant to all tags in the user profile. Similarly, if we use a collaborative filtering approach, we require the item to be similar to all items in the user profile. However, for a user who has some distinct interests, an item that fits the average of all his interests might be less accurate than an item that fits exactly one of his interests. Thus we expect, at least in some cases, that recommendations will improve if we identify different interests in the user profile and take these into account for the recommendation of new items. If a recommendation strategy does so, we will call this strategy topic aware. In the following we will make a number P. Clough et al. (Eds.): ECIR 2011, LNCS 6611, pp. 43–54, 2011. c Springer-Verlag Berlin Heidelberg 2011
C. Wartena and m. wibbels of different algorithms for top-n recommendation topic aware by clustering the tags or items in the profile and generating separate recommendation lists for each topic cluster. The final list of recommendations is obtained by merging the topic specific lists. We do not consider rating prediction. Lists of recommended items are more interesting to a user if they are more di verse. Thus diversity is regarded as a desirable property of recommendations. In most studies on topic diversification, the diversity is increased by reordering the elements in an initial list of recommended items. If a list is constructed aiming at optimal results for precision and recall, the reordering usually causes a decrease of performance on these evaluation measures. Thus a trade-off between accuracy and diversity emerges In our approach, however, adding diversity improves pre cision and recall. We do not re-rank results that are already optimal for precision and recall, but the final diversity of the recommendation is a core property of the recommendation strategy. Thus our method is fundamentally different from the re-ranking approaches: we first distinguish different topics and then generate a list of relevant items The remainder of this paper is organized as follows. In the next section we discuss related work. In section 3 we introduce three different tag based recom- mendation algorithms In section 4 we discuss topic detection for tagging system and define topic aware versions of the tag based algorithms In section 5 we port on an evaluation of the proposed algorithms with data from Library Thing, a bookmarking service for book 2 Related work Most work on recommendation and tagging is about recommend ing tags for item prediction has received less attention. Basically, we find two approaches for tag-based item recommendation. The first approach uses tags t compute item-item or user-user similarities that then are used in classical user or item based nearest neighbor recommendation algorithms. The second approach characterizes both users and items by tag vectors, making it possible to compute the similarity between users and items. The items that are most similar to a user now are recommended One of the first papers that integrates tag-based similarities in a nearest neigh bors recommender is [1, who extend the user-item matrix with user-tag simi- larities in order to compute user-user similarities, and extend it with tag-item relations in order to compute item-item similarities. Both similarities are used to compute recommendations. This approach was refined by [2 taking also into account whether users used the tags for the same or for different items. Said et al. 3 use probabilistic latent semantic analysis to model both the user-item matrix and the item-tag matrix. By sharing the latent variables between the modelsthey are able to use the tagging information in the user-item model. Bogers and Van den Bosch [4 use the tag based similarities instead of the clas- sical similarities based on the user-item matrix. They show improvements for item-based nearest neighbor recommendation on various datasets, but did not compare their method to approaches combining both types of similarity
44 C. Wartena and M. Wibbels of different algorithms for top-n recommendation topic aware by clustering the tags or items in the profile and generating separate recommendation lists for each topic cluster. The final list of recommendations is obtained by merging the topic specific lists. We do not consider rating prediction. Lists of recommended items are more interesting to a user if they are more diverse. Thus diversity is regarded as a desirable property of recommendations. In most studies on topic diversification, the diversity is increased by reordering the elements in an initial list of recommended items. If a list is constructed aiming at optimal results for precision and recall, the reordering usually causes a decrease of performance on these evaluation measures. Thus a trade-off between accuracy and diversity emerges. In our approach, however, adding diversity improves precision and recall. We do not re-rank results that are already optimal for precision and recall, but the final diversity of the recommendation is a core property of the recommendation strategy. Thus our method is fundamentally different from the re-ranking approaches: we first distinguish different topics and then generate a list of relevant items. The remainder of this paper is organized as follows. In the next section we discuss related work. In section 3 we introduce three different tag based recommendation algorithms. In section 4 we discuss topic detection for tagging systems and define topic aware versions of the tag based algorithms. In section 5 we report on an evaluation of the proposed algorithms with data from LibraryThing, a bookmarking service for books. 2 Related Work Most work on recommendation and tagging is about recommending tags. Using tags for item prediction has received less attention. Basically, we find two approaches for tag-based item recommendation. The first approach uses tags to compute item-item or user-user similarities that then are used in classical user or item based nearest neighbor recommendation algorithms. The second approach characterizes both users and items by tag vectors, making it possible to compute the similarity between users and items. The items that are most similar to a user now are recommended. One of the first papers that integrates tag-based similarities in a nearest neighbors recommender is [1], who extend the user-item matrix with user-tag similarities in order to compute user-user similarities, and extend it with tag-item relations in order to compute item-item similarities. Both similarities are used to compute recommendations. This approach was refined by [2] taking also into account whether users used the tags for the same or for different items. Said et al. [3] use probabilistic latent semantic analysis to model both the user-item matrix and the item-tag matrix. By sharing the latent variables between the models they are able to use the tagging information in the user-item model. Bogers and Van den Bosch [4] use the tag based similarities instead of the classical similarities based on the user-item matrix. They show improvements for item-based nearest neighbor recommendation on various datasets, but did not compare their method to approaches combining both types of similarity
Improving Tag-Based Recommendation by Topic Diversification The second approach, recommendation based on the distance between an item and the tag based user profile, is e. g. followed by Firan et al. 5. The focus of their work is to determine the optimal set of tags to represent a user. The obvious idea is to represent a user by the tags that he has assigned. However, the resulting tag vector is usually too sparse to compute useful user-item similarities. Some users employ only a very small set of tags, and even for more actively tagging users it might be well the case that a relevant item is tagged only with synonyms of a tag employed by the user. Thus 5 investigate various methods to condense the user profile. The most effective method is to use the tags of all items the user has bookmarked. The same observation was also made by [6. The problem of the sparse user profiles was also identified by [7. They solve this problem not by condensing the user profile, but by taking co-occurring tags into account in the computation of similarities. For each user and each tag t a user specific tag weight is computed. The weight for t is determined by the weight of the most similar tag t' in the user profile and the similarity between t and t', where the inter tag similarity is determined by a variant of the Jaccard-coefficient. The relevance of an item finally is the sum of weights of its tags In 8 the two approaches sketched above are combined: a nearest neighbor gorithm is used to find an initial set of items and subsequently user-item simi- larities are computed for the preselected items to obtain a final recommendation The more interesting aspect of their approach is, that they replace each tag t in the user profile with the tags co-occurring with t on items tagged by the user and restrict the set of tags to the (globally) most popular tags. This results in roughly the same profiles as would be obtained by using the(most popular) tags of all items bookmarked (or tagged) by the user, as we have seen in other approaches. The problem that many recommender systems tend to recommend a set of rery similar items in order to optimize accuracy was noted by 9 and coined the portfolio effect. Reordering of recommended elements to alleviate this problem was proposed by [10. As discussed above, the reordering increases diversity, but at the cost of accuracy. An approach that is very similar to ours is proposed by Zhang and Hurley [11]. They cluster the set of items of a user and apply a item based nearest neighbor recommender to each of the clusters. Finally they merge the results of the sub-recommendation to obtain a list of recommended popular items is recommended very often at the cost of novel or less common items. The main difference with our recommendation strategy is that all item similarities are based on the user-item matrix, whereas we base similarities on descriptive meta data, especially tags. Zhang and Hurley can improve diversity of the recommendation lists in some cases while the influence of the partitioning on the precision is not very large Gemmell et al. [12 propose to cluster tags in a user profile to improve person lized search. They show that clustering improves the search results. Gemmell et al did not consider recommendation, but our approach for recommendation is both in spirit and results similar to their work. An interesting solution targeting
Improving Tag-Based Recommendation by Topic Diversification 45 The second approach, recommendation based on the distance between an item and the tag based user profile, is e.g. followed by Firan et al. [5]. The focus of their work is to determine the optimal set of tags to represent a user. The obvious idea is to represent a user by the tags that he has assigned. However, the resulting tag vector is usually too sparse to compute useful user-item similarities. Some users employ only a very small set of tags, and even for more actively tagging users it might be well the case that a relevant item is tagged only with synonyms of a tag employed by the user. Thus [5] investigate various methods to condense the user profile. The most effective method is to use the tags of all items the user has bookmarked. The same observation was also made by [6]. The problem of the sparse user profiles was also identified by [7]. They solve this problem not by condensing the user profile, but by taking co-occurring tags into account in the computation of similarities. For each user and each tag t a user specific tag weight is computed. The weight for t is determined by the weight of the most similar tag t in the user profile and the similarity between t and t , where the inter tag similarity is determined by a variant of the Jaccard-coefficient. The relevance of an item finally is the sum of weights of its tags. In [8] the two approaches sketched above are combined: a nearest neighbor algorithm is used to find an initial set of items and subsequently user-item similarities are computed for the preselected items to obtain a final recommendation. The more interesting aspect of their approach is, that they replace each tag t in the user profile with the tags co-occurring with t on items tagged by the user, and restrict the set of tags to the (globally) most popular tags. This results in roughly the same profiles as would be obtained by using the (most popular) tags of all items bookmarked (or tagged) by the user, as we have seen in other approaches. The problem that many recommender systems tend to recommend a set of very similar items in order to optimize accuracy was noted by [9] and coined the portfolio effect. Reordering of recommended elements to alleviate this problem was proposed by [10]. As discussed above, the reordering increases diversity, but at the cost of accuracy. An approach that is very similar to ours is proposed by Zhang and Hurley [11]. They cluster the set of items of a user and apply a item based nearest neighbor recommender to each of the clusters. Finally they merge the results of the sub-recommendation to obtain a list of recommended items for the user. The focus of their work is to avoid that a small set of very popular items is recommended very often at the cost of novel or less common items. The main difference with our recommendation strategy is that all item similarities are based on the user-item matrix, whereas we base similarities on descriptive meta data, especially tags. Zhang and Hurley can improve diversity of the recommendation lists in some cases while the influence of the partitioning on the precision is not very large. Gemmell et al. [12] propose to cluster tags in a user profile to improve personalized search. They show that clustering improves the search results. Gemmell et al. did not consider recommendation, but our approach for recommendation is both in spirit and results similar to their work. An interesting solution targeting
46 C. Wartena and m. wibbels at recommendation is proposed by [6 who use non-negative matrix factoriza tion to obtain a weak clustering of tags into topics. Now recommendations are computed by using the probability that a user is interested in a topic and the probability that an item is relevant to that topic. This idea comes very close to the approach for topic diversification that we propose below. However, [ 6use global tag clusters, whereas we apply clustering to the tags of each user profile Moreover, we do not sum up the probabilities for an item over all clusters. 3 Tag based recommendation In the following we will use two basic strategies for tag based recommendation We will show that both strategies can be modified easily to become topic aware and that this leads to improvement of recommendation results in all cases. The first type of algorithm we will consider is item based collaborative filtering. The directly. Therefore we call these algorithms profile based tem and a user profile In all cases the recommendations are based on the tags assigned to the item To be more precise, we will represent each item by a probability distribution over tags. Consider collections of items C=i1,.ik, tags T=ti,.ti) and users= ul,.um Let n(i, t, u) be the number of times a user u assigned tag t to item i, usually 0 or 1. To consider the tags assigned to an item and the ags assigned by a user, respectively, we let n(i,t)=∑n(t,)and (1) n(u,t)=∑n(i,t,u) Furthermore. let nr()=∑n(, ()=∑n(t)an Now we define probability distributions pr(t)and pc(ilz) on respectively the et of tags T and the corpus C that describe how tag occurrences of a given i are distributed over different tags, respectively how the occurrences of a tag z are distributed over different items: Pr(ti)=n(i, t)/nc(i) pc(it)=n(i, t)/nr(t) Finally, for each u E U let Cu be the set of items seen/ bookmarked by u. Note that in many cases a user did not tag all the items he has bookmarked
46 C. Wartena and M. Wibbels at recommendation is proposed by [6] who use non-negative matrix factorization to obtain a weak clustering of tags into topics. Now recommendations are computed by using the probability that a user is interested in a topic and the probability that an item is relevant to that topic. This idea comes very close to the approach for topic diversification that we propose below. However, [6] use global tag clusters, whereas we apply clustering to the tags of each user profile. Moreover, we do not sum up the probabilities for an item over all clusters. 3 Tag Based Recommendation In the following we will use two basic strategies for tag based recommendation. We will show that both strategies can be modified easily to become topic aware and that this leads to improvement of recommendation results in all cases. The first type of algorithm we will consider is item based collaborative filtering. The second type of algorithms uses the similarity between an item and a user profile directly. Therefore we call these algorithms profile based. In all cases the recommendations are based on the tags assigned to the items. To be more precise, we will represent each item by a probability distribution over tags. Consider collections of items C = {i1,...ik}, tags T = {t1,...tl} and users U = {u1,...um} Let n(i, t, u) be the number of times a user u assigned tag t to item i, usually 0 or 1. To consider the tags assigned to an item and the tags assigned by a user, respectively, we let n(i, t) = u∈U n(i, t, u) and (1) n(u, t) = i∈C n(i, t, u). (2) Furthermore, let nT (t) = i∈C n(i, t), (3) nC(i) = t∈T n(i, t) and (4) nU (u) = t∈T n(u, t). (5) Now we define probability distributions pT (t|i) and pC(i|z) on respectively the set of tags T and the corpus C that describe how tag occurrences of a given item i are distributed over different tags, respectively how the occurrences of a given tag z are distributed over different items: pT (t|i) = n(i, t)/nC(i), (6) pC(i|t) = n(i, t)/nT (t). (7) Finally, for each u ∈ U let Cu be the set of items seen/bookmarked by u. Note that in many cases a user did not tag all the items he has bookmarked.
Improving Tag-Based Recommendation by Topic Diversification 3.1 Item Based Collaborative Filtering The first strategy for recommendation we use is a nearest neighbor approach (13 ). Given a distance measure between items, we express the relevance of an item for a given user as the average distance between the item and all items bookmarked (or seen) by the user. Formally, we define the distance of an item i∈ C to a user u∈Uas where d(i, j)is the distance between items i and j. The items with the smallest distance are recommended to the user Given our perspective of tag distributions it is natural to use divergences for e distance between items. We will base our distance on the ensen Shannon divergence, which can be considered as a symmetrical variant of the Kullback Leibler divergence or relative entropy. Since the square root of the Jensen Shan- non divergence is a proper metric satisfying the usual axioms of non-negativity, identity of indiscernibles and triangle inequality(14), we will use d(i,j)=VJSD (PT(ti), Pr(+)) where JSD(p, q) is the Jensen Shannon divergence or information radius between probability distributions p and g. 3.2 Profile based recommendation In the nearest neighbor approach we compute the average distance of an item to the items seen by the user. Alternatively, we can compute the distance of an item to the user that also can be represented by a distribution over tags. For each user we compute the characteristic tag distribution p(tu). The obvious way to define this distribution is: Pr(tu)=n(u, t)/nu(u) (10) This allows us to define a distance between an item i and a user u by setting d(u,) D(pT(tu),pr(t)) and to recommend items with a small distance to the user. However, it was already shown by 5 that this strategy does not perform very well In our experiments the results were even worse than those btained by the simple non-personalized baseline of recommending the most popular items. The main reason for this bad performance is that the distribution is too sparse and thus many relevant items will be tagged with synonyms of the tags in the user profile, but not with exactly those tags In the following we will present two possibilities to alleviate this problem Profiles based on item tags. Firan et al.(5) propose several methods to construct better user profiles. One of the most successful ones is to use the tags of the items considered by the user. We define the item based user profile as pr(tu) PT
Improving Tag-Based Recommendation by Topic Diversification 47 3.1 Item Based Collaborative Filtering The first strategy for recommendation we use is a nearest neighbor approach ([13]). Given a distance measure between items, we express the relevance of an item for a given user as the average distance between the item and all items bookmarked (or seen) by the user. Formally, we define the distance of an item i ∈ C to a user u ∈ U as d(u, i) = j∈Cu d(j, i) |Cu| (8) where d(i, j) is the distance between items i and j. The items with the smallest distance are recommended to the user. Given our perspective of tag distributions it is natural to use divergences for the distance between items. We will base our distance on the Jensen Shannon divergence, which can be considered as a symmetrical variant of the Kullback Leibler divergence or relative entropy. Since the square root of the Jensen Shannon divergence is a proper metric satisfying the usual axioms of non-negativity, identity of indiscernibles and triangle inequality ([14]), we will use d(i, j) = JSD (pT (t|i), pT (t|j)) (9) where JSD(p, q) is the Jensen Shannon divergence or information radius between probability distributions p and q. 3.2 Profile Based Recommendation In the nearest neighbor approach we compute the average distance of an item to the items seen by the user. Alternatively, we can compute the distance of an item to the user that also can be represented by a distribution over tags. For each user we compute the characteristic tag distribution p(t|u). The obvious way to define this distribution is: pT (t|u) = n(u, t)/nU (u). (10) This allows us to define a distance between an item i and a user u by setting d(u, i) = JSD (pT (t|u), pT (t|i)) and to recommend items with a small distance to the user. However, it was already shown by [5] that this strategy does not perform very well. In our experiments the results were even worse than those obtained by the simple non-personalized baseline of recommending the most popular items. The main reason for this bad performance is that the distribution is too sparse and thus many relevant items will be tagged with synonyms of the tags in the user profile, but not with exactly those tags. In the following we will present two possibilities to alleviate this problem. Profiles based on item tags. Firan et al. ([5]) propose several methods to construct better user profiles. One of the most successful ones is to use the tags of the items considered by the user. We define the item based user profile as p T (t|u) = 1 |Cu| i∈Cu pT (t|i). (11)