ARTICLE IN PRESS YJCSS: 2537 Journal of Computer and System Sciences命(命)·命一·命 Contents lists available at sciverse Science Direct Journal of Computer and System Sciences ENCES ELSEVIER www.elsevier.com/locate/jcss Resource recommendation in social annotation systems A linear-weighted hybrid approach Jonathan Gemmell Thomas Schimoler, Bamshad Mobasher, Robin Burke Center for Web Intelligence, School of Computing. DePaul University, 243 South Wabash Avenue, Chicago, IL 60604, United States ARTICLE INFO A BSTRACT cial annotation systems enable the organization of online resources with user-defined 29 April 2011 in revised form 22 July 20 users can discover resources, organize and share their finds, and connect to other users Available online xxxx with similar interests. howey size and complexity of these systems can lead to information overload and reduced utility for users. For these reasons, researchers have ought to apply the techniques of recommender systems to deliver personalized views of social annotation systems. To date, most efforts have concentrated on the problem of tag recommendation- personalized suggestions for possible annotations. Resource Hybrid recommenders recommendation has not received the same systematic evaluation, in part because the task is inherently more complex. In this article, we provide a general formulation for the problem of resource recommendation in social annotation systems that captures these variants, and we evaluate two cases: basic resource recommendation and tag- specific resource recommendation. We also propose a linear-weighted hybrid framework for resource recommendation Using six real-world datasets, we show that its integrative pproach is essential for this recommendation task and provides the most adaptability given the varying data characteristics in different social annotation systems. We find that our algorithm is more effective than other more mathematically-complex techniques and has the additional advantages of flexibility and extensibility e 2011 Elsevier Inc. All rights reserved. 1. Introdu The surge in popularity of social media systems shows no sign of abating. These systems leverage vast amounts of user-generated content, enhancing the users ability to organize information, explore resources and build communities of like-minded individuals. One class of these applications is the social annotation system in which user-generated content takes the form of tags, arbitrary labels applied by users to online resources Social annotation systems often focus on a particular type of resource. Delicious users annotate Web pages. LastFM2 permits its users to upload and share their musical tastes. Citeulike users organize scholarly publications. In addition, tagging functions have become a common feature in many established Web sites, such as Amazon 4 You Tube and others mail addresses: gemmell@cs. depauLedu ( Gemmell), tschimolerecs depauLedu (T. Schimoler mobasherecs depauLedu(B Mobasher). burke@cs. depaul.edu(R Burke ) 2www.last.fm. www.youtube.com. 0022-0000/S- see front matter 2011 Elsevier Inc. All rights reserved. doi:10016jcss201110.006 Comput. System Sci. (2011), doi..J.Gem Please cite this article in press urce recommendation in social annotation systems: A linear-weighted hybrid approach, J 1016jcss,201110.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.1 (1-15) Journal of Computer and System Sciences ••• (••••) •••–••• Contents lists available at SciVerse ScienceDirect Journal of Computer and System Sciences www.elsevier.com/locate/jcss Resource recommendation in social annotation systems: A linear-weighted hybrid approach Jonathan Gemmell ∗, Thomas Schimoler, Bamshad Mobasher, Robin Burke Center for Web Intelligence, School of Computing, DePaul University, 243 South Wabash Avenue, Chicago, IL 60604, United States article info abstract Article history: Received 29 April 2011 Received in revised form 22 July 2011 Accepted 17 October 2011 Available online xxxx Keywords: Resource recommendation Social annotation system Hybrid recommenders Social annotation systems enable the organization of online resources with user-defined keywords. Collectively these annotations provide a rich information space in which users can discover resources, organize and share their finds, and connect to other users with similar interests. However, the size and complexity of these systems can lead to information overload and reduced utility for users. For these reasons, researchers have sought to apply the techniques of recommender systems to deliver personalized views of social annotation systems. To date, most efforts have concentrated on the problem of tag recommendation – personalized suggestions for possible annotations. Resource recommendation has not received the same systematic evaluation, in part because the task is inherently more complex. In this article, we provide a general formulation for the problem of resource recommendation in social annotation systems that captures these variants, and we evaluate two cases: basic resource recommendation and tagspecific resource recommendation. We also propose a linear-weighted hybrid framework for resource recommendation. Using six real-world datasets, we show that its integrative approach is essential for this recommendation task and provides the most adaptability given the varying data characteristics in different social annotation systems. We find that our algorithm is more effective than other more mathematically-complex techniques and has the additional advantages of flexibility and extensibility. © 2011 Elsevier Inc. All rights reserved. 1. Introduction The surge in popularity of social media systems shows no sign of abating. These systems leverage vast amounts of user-generated content, enhancing the user’s ability to organize information, explore resources and build communities of like-minded individuals. One class of these applications is the social annotation system in which user-generated content takes the form of tags, arbitrary labels applied by users to online resources. Social annotation systems often focus on a particular type of resource. Delicious1 users annotate Web pages. LastFM2 permits its users to upload and share their musical tastes. Citeulike3 users organize scholarly publications. In addition, tagging functions have become a common feature in many established Web sites, such as Amazon,4 YouTube5 and others. * Corresponding author. E-mail addresses: jgemmell@cs.depaul.edu (J. Gemmell), tschimoler@cs.depaul.edu (T. Schimoler), mobasher@cs.depaul.edu (B. Mobasher), rburke@cs.depaul.edu (R. Burke). 1 delicious.com. 2 www.last.fm. 3 www.citeulike.org. 4 www.amazon.com. 5 www.youtube.com. 0022-0000/$ – see front matter © 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.jcss.2011.10.006
ARTICLE IN PRESS YJCSS: 253 J Gemmell et aL/ Journal of Computer and System Sciences··()命一· Resources Ta Fig. 1. Tag recommendation. resource3 esources Tags Fig. 2. Resource recommendation. The term folksonomy is often used to describe social annotation systems, a play on folk and taxonomy 1. These systems are popular in part because they allow users to tag resources with any tag they wish, free from any preconceived conceptual hierarchy. The aggregation of resources and tags used to describe them produces a rich information space in which users interact 3]. Users may also appreciate the ability to prepare resources for their own future access and the possibility of expressing opinions about resources [4]. These are not features present in traditional manual indexing settings The freedom and richness that social annotation systems provide do not come without a cost. Because the number of users, resources or tags in these systems is often measured in the millions, the sheer volume of data can quickly burden he user with information overload. The unrestricted nature of the tagging function is liberating, but also means that the resulting tag data will be noisy Ambiguous tags abound: one user may apply "jaguar"only to cars, another only to large felines 3. An early survey of social annotation systems identified another obstacle; while the uncontrolled vocabulary common in most systems reduces the entry cost and invites more participants, precision of stricter taxonomies [5. There is no mechanism to enforce specific semantics for tags. Redundant tag ng synonyms and mis-spellings cannot be prevented and make it more difficult for a user to choose tags on Ambiguous and redundant tags have been shown to impact the quality of recommendations [6] For these reasons, recommender systems have much to offer social annotation systems. As in e-commerce settings, recommendation function in social annotation systems is considerably more complex than in the e-commerce applications to which it has typically been applied. Research in recommendation has, for the most part, concentrated on systems in hich users' preferences with respect to items have been expressed through simple scalar values- ratings. Tags represent user interest and preference in more detail, but also make comparisons between users and between items more difficult. In addition to the increased complexity of the data, social annotation systems offer a variety of avenues for recommen- ation not found in catalog-type systems. The problem of tag recommendation(shown schematically in Fig. 1)has received a great deal of attention. The user already has a document or resource in hand, and the recommendation serves to assist in the annotation operation, helping her find tags appropriate to the resource and relevant to her own interests. Graph- based models [7], tensor factorization [8-10], and simpler linear hybrids [11-13] have all been employed to bring the three dimensions of the data to bear on the problem of finding tags Resource recommendation is less well studied, although perhaps it is more fundamental to the task of navigating through social annotation system, as opposed to the task of building one. The most basic type of resource recommendation is shown in Fig. 2. A user requests a personalized set of new and interesting resources to be recommended to him out of the universe of items known to the system. Previous work in this area has explored various methods such as tensor factoriza- tion [10], clustering[14, 15. and hybrid models [16]. However, resource recommendation within social annotation systems has yet to be studied in a systematic manner; previous work often focuses on a special case of resource recor Tasks vary according to the types of constraints associated with users' actions as they interact with the system. For example, a user may request resources similar to those found is his profile, annotated with a selected tag or related to a pa resource. Each of the cases requires a different strategy 6 While the term is relatively new. it has been argued that social annotation actually represents a renaissance of old-style manual indexing, largely Please cite this article in press as: J. Gemmell et aL., Resource recommendation in social annotation systems: A linear-weighted hybrid approach, J. Comput. System Sci. (2011). doi: 10.1016/j jcSs 2011. 10.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.2 (1-15) 2 J. Gemmell et al. / Journal of Computer and System Sciences ••• (••••) •••–••• Fig. 1. Tag recommendation. Fig. 2. Resource recommendation. The term folksonomy is often used to describe social annotation systems, a play on folk and taxonomy [1].6 These systems are popular in part because they allow users to tag resources with any tag they wish, free from any preconceived conceptual hierarchy. The aggregation of resources and tags used to describe them produces a rich information space in which users can interact [3]. Users may also appreciate the ability to prepare resources for their own future access and the possibility of expressing opinions about resources [4]. These are not features present in traditional manual indexing settings. The freedom and richness that social annotation systems provide do not come without a cost. Because the number of users, resources or tags in these systems is often measured in the millions, the sheer volume of data can quickly burden the user with information overload. The unrestricted nature of the tagging function is liberating, but also means that the resulting tag data will be noisy. Ambiguous tags abound: one user may apply “jaguar” only to cars, another only to large felines [3]. An early survey of social annotation systems identified another obstacle; while the uncontrolled vocabulary common in most systems reduces the entry cost and invites more participants, it also lacks the precision of stricter taxonomies [5]. There is no mechanism to enforce specific semantics for tags. Redundant tags including synonyms and mis-spellings cannot be prevented and make it more difficult for a user to choose tags on which to search. Ambiguous and redundant tags have been shown to impact the quality of recommendations [6]. For these reasons, recommender systems have much to offer social annotation systems. As in e-commerce settings, they can alleviate information overload and combat noise by personalizing the user’s view of the system. However, the recommendation function in social annotation systems is considerably more complex than in the e-commerce applications to which it has typically been applied. Research in recommendation has, for the most part, concentrated on systems in which users’ preferences with respect to items have been expressed through simple scalar values – ratings. Tags represent user interest and preference in more detail, but also make comparisons between users and between items more difficult. In addition to the increased complexity of the data, social annotation systems offer a variety of avenues for recommendation not found in catalog-type systems. The problem of tag recommendation (shown schematically in Fig. 1) has received a great deal of attention. The user already has a document or resource in hand, and the recommendation serves to assist in the annotation operation, helping her find tags appropriate to the resource and relevant to her own interests. Graphbased models [7], tensor factorization [8–10], and simpler linear hybrids [11–13] have all been employed to bring the three dimensions of the data to bear on the problem of finding tags. Resource recommendation is less well studied, although perhaps it is more fundamental to the task of navigating through a social annotation system, as opposed to the task of building one. The most basic type of resource recommendation is shown in Fig. 2. A user requests a personalized set of new and interesting resources to be recommended to him out of the universe of items known to the system. Previous work in this area has explored various methods such as tensor factorization [10], clustering [14,15], and hybrid models [16]. However, resource recommendation within social annotation systems has yet to be studied in a systematic manner; previous work often focuses on a special case of resource recommendation. Tasks vary according to the types of constraints associated with users’ actions as they interact with the system. For example, a user may request resources similar to those found is his profile, annotated with a selected tag or related to a particular resource. Each of the cases requires a different strategy. 6 While the term is relatively new, it has been argued that social annotation actually represents a renaissance of old-style manual indexing, largely supplanted by text-based information retrieval [2]
ARTICLE IN PRESS YJCSS: 2537 /. Gemmell et aL/ Journal of Computer and System Sciences·命·(申·)·一· tt2,…t r|{t,ty,…t A Fig 3. Annotations This paper has three main aims. First, we will describe, in a formal manner, the task of resource recommendation in social annotation systems, examining current approaches and contrasting it with work on tag recommendation and other similar tasks. We then introduce a linear-weighted hybrid recommendation algorithm and show how this technique serves to combine multiple complementary components into a single integrated model that provides the most flexibility considering he unique characteristics across different social annotation systems. Finally, we show the performance of the algorithm on ix large real-world datasets, and on two different variants of the resource recommendation task. Our results show that the hybrid performs better than the individual components alone, and better than the leading factorization algorithm used in g recommendation. Moreover the hybrid has advantages over other integrative models in that it is simple, flexible and The rest of the paper is organized as follows In Section 2 we formally define the notion of lated work on recommendation techniques in social annotation systems is presented in Section 3. In Section 4, we describe our linear-weighted hybrid and the components from which it is formed, as well as the tensor factorization technique that we are using for comparison. Our experimental results and evaluation follow in Section 5. Finally, we conclude the paper with a discussion of our results 2. Background and definitions The foundation of a social annotation system is the annotation: the record of a user labeling a resource with one ore tags. A collection of annotations results in a complex network of interrelated users, resources and tags [3]. a social annotation system can be described as a four-tuple: D=(U, R, T, A), where, U is a set of users: R is a set of resources: T is a set of tags: and A is a set of annotations. Each annotation a is a tuple (u, r, Tur), where u is a user, r is a resource, and Tur is the set of tags that u has applied to r. See Fig 3. It is sometimes useful to view a social annotation system as a three-dimensional matrix, URT, in which an entry URT(u, r, t) is 1 if u has tagged r with t We define resource recommendation as the production of an ordered list of resources likely to be of interest to a particular user. To make our definition as general as possible, we include the possibility that a user may wish to impose ome requirements on such recommendations. As noted above, these requirements often naturally arise from the mode of user interaction with the system. For example, the basic personalized recommendation task usually involves a learned user profile and no additional requirements. On the other hand, in the context of navigating the system or query-based retrieval, requirements such as the tags or resources selected by the user during the current session would impose additional constraints on the set of recommended resources. For our purposes, we will assume that the ordering of recommended esources is generated by sorting the set of resources, after the system has estimated their desirability by producing real-numbered priority value for each With these considerations in mind, we can view any resource recommendation algorithm as a function φ:U×QxR→R which operates on a user u E U, a set of requirements qE Q=P(U R), and a resource re R, and produces a real- valued result p, which is the priority value of r for u: (u, g, r)=p As noted in [17, user requirements in recommendation can take a variety of forms. In general, we assume that the set equirements q can be comprised of tags, resources, or even users, though in the most typical situations, q is a singleton g or resource selected during navigation. The most basic case of resource recommendation, the one shown in Fig. 2, is the case where no requirements are imposed and the recommender must find resources based only on the identity of the user: p(u, 0, r) In this article, we will refer to this as basic resource recommendation An important special case of resource recommendation is one in which the user supplies a requirement in the form of a tag. This scenario has perhaps received the most research attention, as it arises naturally in the navigation of social annotation systems, as a user selects a tag to see what resources are related to it. In this type of browsing, users go back and forth between tags and the resources to which they have been applied, exploring both spaces in tandem ecency with no attempt at personalization. However, most authors assert personalization as a requirement for a recommender system, so nality is not, strictly speaking. a form of recommendation. Please cite this article in press as: Gemmell et al urce recommendation in social annotation systems: A linear-weighted hybrid approach, J Comput. System Sci. (2011). doi: 10.1016/ jcss. 2011.10.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.3 (1-15) J. Gemmell et al. / Journal of Computer and System Sciences ••• (••••) •••–••• 3 Fig. 3. Annotations. This paper has three main aims. First, we will describe, in a formal manner, the task of resource recommendation in social annotation systems, examining current approaches and contrasting it with work on tag recommendation and other similar tasks. We then introduce a linear-weighted hybrid recommendation algorithm and show how this technique serves to combine multiple complementary components into a single integrated model that provides the most flexibility considering the unique characteristics across different social annotation systems. Finally, we show the performance of the algorithm on six large real-world datasets, and on two different variants of the resource recommendation task. Our results show that the hybrid performs better than the individual components alone, and better than the leading factorization algorithm used in tag recommendation. Moreover the hybrid has advantages over other integrative models in that it is simple, flexible and extensible. The rest of the paper is organized as follows. In Section 2 we formally define the notion of resource recommendation. Related work on recommendation techniques in social annotation systems is presented in Section 3. In Section 4, we describe our linear-weighted hybrid and the components from which it is formed, as well as the tensor factorization technique that we are using for comparison. Our experimental results and evaluation follow in Section 5. Finally, we conclude the paper with a discussion of our results. 2. Background and definitions The foundation of a social annotation system is the annotation: the record of a user labeling a resource with one or more tags. A collection of annotations results in a complex network of interrelated users, resources and tags [3]. A social annotation system can be described as a four-tuple: D = U, R, T , A, where, U is a set of users; R is a set of resources; T is a set of tags; and A is a set of annotations. Each annotation a is a tuple u,r, Tur, where u is a user, r is a resource, and Tur is the set of tags that u has applied to r. See Fig. 3. It is sometimes useful to view a social annotation system as a three-dimensional matrix, URT, in which an entry URT(u,r,t) is 1 if u has tagged r with t. We define resource recommendation as the production of an ordered list of resources likely to be of interest to a particular user. To make our definition as general as possible, we include the possibility that a user may wish to impose some requirements on such recommendations. As noted above, these requirements often naturally arise from the mode of user interaction with the system. For example, the basic personalized recommendation task usually involves a learned user profile and no additional requirements. On the other hand, in the context of navigating the system or query-based retrieval, requirements such as the tags or resources selected by the user during the current session would impose additional constraints on the set of recommended resources. For our purposes, we will assume that the ordering of recommended resources is generated by sorting the set of resources, after the system has estimated their desirability by producing a real-numbered priority value for each. With these considerations in mind, we can view any resource recommendation algorithm as a function φ : U × Q × R → R which operates on a user u ∈ U, a set of requirements q ∈ Q = P(U ∪ T ∪ R), and a resource r ∈ R, and produces a realvalued result p, which is the priority value of r for u: φ(u, q,r) = p. As noted in [17], user requirements in recommendation can take a variety of forms. In general, we assume that the set of requirements q can be comprised of tags, resources, or even users, though in the most typical situations, q is a singleton tag or resource selected during navigation. The most basic case of resource recommendation, the one shown in Fig. 2, is the case where no requirements are imposed and the recommender must find resources based only on the identity of the user: φ(u, Ø,r). In this article, we will refer to this as basic resource recommendation. 7 An important special case of resource recommendation is one in which the user supplies a requirement in the form of a tag. This scenario has perhaps received the most research attention, as it arises naturally in the navigation of social annotation systems, as a user selects a tag to see what resources are related to it. In this type of browsing, users go back and forth between tags and the resources to which they have been applied, exploring both spaces in tandem. 7 Non-personalized “recommendation” functions also exist, for example, Delicious’s “Fresh Bookmarks” on its homepage, which are sorted only by popularity and recency with no attempt at personalization. However, most authors assert personalization as a requirement for a recommender system, so such a functionality is not, strictly speaking, a form of recommendation.
ARTICLE IN PRESS YJCSS: 253 J Gemmell et al/ Journal of Computer and System Sciences However, the user can easily run into the problems of ambiguity and redundancy identified above. For example, the car fancier does not want to see pages about wildlife reserves when she clicks on the tag "jaguar " If we treat the response to tag selection as a recommendation task, then the system can make use of the user's profile to identify resources relevant to the tag as the user sees it, and ignore other interpretations. a tag-specific resource recommendation algorithm has the form p(u, It, r), where q=It) is a set containing a single tag. Resource recommendation in social annotation systems has yet to be studied in a systematic manner. Some authors assume the basic form of resource recommendation. Others assume the tag-specific variant or other forms Often algorithms designed for one constraint are not compatible to others. Adding to this confusion is the fact that algorithms which perform well for other tasks perform poorly when applied to resource recommendation. For example, several state-of-the-art tag recommendation algorithms perform poorly when applied to variations of the resource recommendation problem Efforts in designing recommenders for social annotation systems often focus around adapting more conventional forms of recommendation like those found in e-commerce systems in which the users preferences are gathered in the form of ratings. We can think of a rating-based recommender system as operating on a two-dimensional matrix, defined by users as one dimension and items as the other. Each entry in the matrix is a rating that a user has supplied for some item. a recommender system that can accurately extrapolate missing values in the matrix can generate recommended items for users to examine. In a social annotation system, there is a third dimension that of the tag. The data becomes three- dimensional, and the task of resource recommendation is that of trying to estimate what resources the user might wish to Research in ratings-based recommendation is well established and has resulted in some well-known algorithms Instance- ased collaborative filtering makes predictions based on the similarity between users [18-20] or the similarity between resources [21, 22]. Model-based collaborative filtering instead constructs a model from the data which is then queried di- rectly for recommendations. Well-known examples include graph models, matrix factorization and probabilistic models. Hybrid models have been used to combine these recommenders in various forms[23 exploiting the strength of many recommenders rather than relying on one Instance-based collaborative filtering has been modified to resource recommendation in social annotation systems extending the ratings matrix to include tag information [24, although most efforts do not assume access to ratings data. User-based collaborative filtering has also been adapted for recommending tags. Users can be modeled as a vector of re sources, a vector of tags, a combinations of the two, or feature vectors such as those calculated through singular value decomposition [25]. Item-based collaborative filtering has also been adapted for tag-recommendation [12, 13 In this work we extend these instance-based methods to resource recommendation. An advantage of these approaches is their flexibility hey are applicable to both the basic and tag-specific case of resource recommendation. Starting from the well-known PageRank algorithm [26]. researchers have derived Adapted PageRank and FolkRank [27.71 for tag recommendation. These algorithms demonstrated the importance of an integrative framework in social annotation systems: users, resources and tags were treated as nodes and were connected based on their occurrence together in generated annotations. The computational requirements of this approach however is daunting, requiring a recalculation of he PageRank vector for each query Matrix factorization approaches that have been found successful in e-commerce recommendation depend on the two- dimensional structure of the ratings matrix, in which users and resources form the axes and the values of the matrix are known ratings. These results cannot be straightforwardly extended to the three-dimensional structure of a social annotation system in which we have users, resources and tags as the three dimensions and binary values in each cell Researchers hav begun exploring tensor factorization to reduce the dimensionality of the social data. Tucker decomposition is one approach, factoring the three-dimensional tagging data into three feature spaces and a core residual tensor 10 Given a user-resource-tag triple, the model generates a likelihood score. Consequently it is possible to build a recommender that predicts one dimension of the data(i.e resources) given elements from the other two dimension (ie a user and a tag). a drawback of this model is its inflexibility; it cannot make recommendations given other constraints. Also, the model-building phase is highly computationally-intensive a pair-wise interaction tensor factorization model has also been proposed. It offers far more reasonable run times in both he construction of the model and the generation of recommendations [8, 9]. This algorithm takes an optimization approach, trying to compute the best ranking of tags given the known user-resource pairs in the data. The model is then used to recommend tags for new user-resource pairs. In this work, we invert this method constructing an ordered list of resources tag recommendation. However, due to the nature of the algorithm (it requires a user and a tag) it is not applicable to all ce recommendation tasks Probability models have been applied to social annotation systems for both tag recommendation 28 and ecommendation 29. Users, resources and tags were tied together via a single concept space. Probabilistic latent analysis has been used for resource discovery using a separate concept space for the user-tag relation and the tag relation [30]. Probabilistic models that also incorporate resource-resource affinities have been presented [31 Please cite this article in press as: Gemmell et aL, Resource recommendation in social annotation systems: A linear-weighted hybrid approach, J. Comput. System Sci. (2011). doi: 10.1016/j jcSs 2011. 10.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.4 (1-15) 4 J. Gemmell et al. / Journal of Computer and System Sciences ••• (••••) •••–••• However, the user can easily run into the problems of ambiguity and redundancy identified above. For example, the car fancier does not want to see pages about wildlife reserves when she clicks on the tag “jaguar.” If we treat the response to tag selection as a recommendation task, then the system can make use of the user’s profile to identify resources relevant to the tag as the user sees it, and ignore other interpretations. A tag-specific resource recommendation algorithm has the form φ(u,{t},r), where q = {t} is a set containing a single tag. 3. Related work Resource recommendation in social annotation systems has yet to be studied in a systematic manner. Some authors assume the basic form of resource recommendation. Others assume the tag-specific variant or other forms. Often algorithms designed for one constraint are not compatible to others. Adding to this confusion is the fact that algorithms which perform well for other tasks perform poorly when applied to resource recommendation. For example, several state-of-the-art tag recommendation algorithms perform poorly when applied to variations of the resource recommendation problem. Efforts in designing recommenders for social annotation systems often focus around adapting more conventional forms of recommendation like those found in e-commerce systems in which the user’s preferences are gathered in the form of ratings. We can think of a rating-based recommender system as operating on a two-dimensional matrix, defined by users as one dimension and items as the other. Each entry in the matrix is a rating that a user has supplied for some item. A recommender system that can accurately extrapolate missing values in the matrix can generate recommended items for users to examine. In a social annotation system, there is a third dimension, that of the tag. The data becomes threedimensional, and the task of resource recommendation is that of trying to estimate what resources the user might wish to tag. Research in ratings-based recommendation is well established and has resulted in some well-known algorithms. Instancebased collaborative filtering makes predictions based on the similarity between users [18–20] or the similarity between resources [21,22]. Model-based collaborative filtering instead constructs a model from the data which is then queried directly for recommendations. Well-known examples include graph models, matrix factorization and probabilistic models. Hybrid models have been used to combine these recommenders in various forms [23] exploiting the strength of many recommenders rather than relying on one. Instance-based collaborative filtering has been modified to resource recommendation in social annotation systems by extending the ratings matrix to include tag information [24], although most efforts do not assume access to ratings data. User-based collaborative filtering has also been adapted for recommending tags. Users can be modeled as a vector of resources, a vector of tags, a combinations of the two, or feature vectors such as those calculated through singular value decomposition [25]. Item-based collaborative filtering has also been adapted for tag-recommendation [12,13]. In this work we extend these instance-based methods to resource recommendation. An advantage of these approaches is their flexibility; they are applicable to both the basic and tag-specific case of resource recommendation. Starting from the well-known PageRank algorithm [26], researchers have derived Adapted PageRank and FolkRank [27,7] for tag recommendation. These algorithms demonstrated the importance of an integrative framework in social annotation systems: users, resources and tags were treated as nodes and were connected based on their occurrence together in usergenerated annotations. The computational requirements of this approach however is daunting, requiring a recalculation of the PageRank vector for each query. Matrix factorization approaches that have been found successful in e-commerce recommendation depend on the twodimensional structure of the ratings matrix, in which users and resources form the axes and the values of the matrix are known ratings. These results cannot be straightforwardly extended to the three-dimensional structure of a social annotation system in which we have users, resources and tags as the three dimensions and binary values in each cell. Researchers have begun exploring tensor factorization to reduce the dimensionality of the social data. Tucker decomposition is one approach, factoring the three-dimensional tagging data into three feature spaces and a core residual tensor [10]. Given a user–resource–tag triple, the model generates a likelihood score. Consequently it is possible to build a recommender that predicts one dimension of the data (i.e. resources) given elements from the other two dimensions (i.e. a user and a tag). A drawback of this model is its inflexibility; it cannot make recommendations given other constraints. Also, the model-building phase is highly computationally-intensive. A pair-wise interaction tensor factorization model has also been proposed. It offers far more reasonable run times in both the construction of the model and the generation of recommendations [8,9]. This algorithm takes an optimization approach, trying to compute the best ranking of tags given the known user–resource pairs in the data. The model is then used to recommend tags for new user–resource pairs. In this work, we invert this method constructing an ordered list of resources for a given user–tag pair. Given its effectiveness, this technique is considered one of the state-of-the-art approaches for tag recommendation. However, due to the nature of the algorithm (it requires a user and a tag) it is not applicable to all resource recommendation tasks. Probability models have been applied to social annotation systems for both tag recommendation [28] and resource recommendation [29]. Users, resources and tags were tied together via a single concept space. Probabilistic latent semantic analysis has been used for resource discovery using a separate concept space for the user–tag relation and the resource– tag relation [30]. Probabilistic models that also incorporate resource–resource affinities have been presented [31] and the
ARTICLE IN PRESS YJCSS: 2537 /. Gemmell et aL/ Journal of Computer and System Sciences·命·(申·)·一· connection between users have been defined in a probabilistic manner in order to predict which communities they might join [32]. lusters of tags can represent topic areas [33. These clusters have been used as intermediaries between users ources allowing the recommendation of resources 34, 14, 15]. Such recommenders can accommodate both the ba g-specific constraints of resource recommendation. It is also possible to cluster resources generating topic-specific parti- tions [35 perhaps identifying resources based on other constraints. Clustering tags is also useful for overcoming the problem of redundancy [6. Another effort in combating redundancy as well as ambiguity constructed user-centric tag models [36 Hybrid models have been used to generate integrative models by combining several component recommenders like hose above into a larger framework. Pairs of recommenders have been combined to produce improve results in Citeu- like [37] and Bibsonomy [ 38]. Another approach demonstrated that a graph-based model may be improved by incorporating item-based collaborative filtering [12]. Hybrid models composed of both user-based and item-based collaborative filtering algorithms were shown to outperformed the state-of-the-art pair-wise interaction tensor factorization model in tag recom- mendation[13 Another work predicts user ratings in MovieLens, one of the few systems that contains both ratings and gs 39]. We build on these efforts proposing a flexible and easily extensible hybrid model for resource recommendation. Our definition of resource recommendation above centers on the function which assigns a real-valued score to each resource describing the relevance of the resource to the user(and, if supplied the requirements ) In this section, we describe he linear-weighted hybrid algorithm that we propose for this task, the components from which the hybrid is constructed and we will also describe the implementation of our comparative benchmark, an integrative approach based on tensor factorization A linear-weighted hybrid is composed of recor n components kI through Kk, whose output is combi omputing a weighted sum [23]. without loss of ge tion of the function i(u, g, r), producing output in that the a-values sum to 1. The hybrid is therefore 中(u,q,n)=∑a(u,q,n) (1) A linear-weighted hybrid of this style has a number of advantages for recommendation in social annotation domains. One is that it is inherently an extensible framework Components are free to specialize in particular dimensions of the data. If a particular application has access to additional types of data(for example, text in a document recommender)appropriate recommendation components can be added to the mix. The linear-weighted hybrid offers a way to construct integrative algorithms that take all dimensions of a social annotation system into account without requiring mathematically-complex and computationally-intensive dimensionality-reduction techniques, which are less extensible and flexible. Given a set of recommendation components, it remains to ascertain the correct ai for each component in order to maximize the effectiveness of the hybrid. We use a hill climbing technique which is both simple and efficient. A subset of the data is selected as a holdout set for learning the algorithm parameters, including the a values (See Methodology description in Section 5.2 below. The a vector is initialized with random positive numbers constrained such that the sun of the vector equals 1. The recommender then operates over the holdout set, using the remaining data as training data. the precision of recommendations is calculated as described in Section 5.2. The vector is then randomly modified and tested gain. If the accuracy is improved, the change is accepted; otherwise it is rejected. Occasionally a change to the a vector is accepted even when it does not improve the results in order to more fully explore the a space. Modifications continue until the vector stabilizes. Then the a vector is randomly reset and learning proceeds Now we turn to the nents that make up our hybrid. Many of these components rely on two-dimensional projec he URT matrix [40]. Such projections reduce the dimensionality of the data, but sacrifice some of its informational at. For example, the relation between resources and tags can be defined as rt(r, t), the number of users that have URT(u, r, t) This notion strongly resembles the "bag-of-words"vector space model [41 ] Note however that such a projection models the relationship between resources and tags in a non-personalized way. Similarly, we can produce a projection UT in which a user is modeled as a vector over the set of tags, where each weight, ur(u, t), measures how often a user applied a particular tag across all resources. In all, there are six possible two-dimensional projections: UR, UT, RU, RT, TU, TR For source recommendation, we have found four to be useful: UR, UT. RU, and RT. In the case of ur, we have not found it useful to weight resources by the number of tags a user applies, as this is not always a reliable measure of user interest. Rather we define UR to be binary, indicating whether or not the user has annotated the resource. Comput. System Sci. (2011) doi:).J.Gemmell et al Please cite this article in press urce recommendation in social annotation systems: A linear-weighted hybrid approach, J 1016jcss,201110.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.5 (1-15) J. Gemmell et al. / Journal of Computer and System Sciences ••• (••••) •••–••• 5 connection between users have been defined in a probabilistic manner in order to predict which communities they might join [32]. Clusters of tags can represent topic areas [33]. These clusters have been used as intermediaries between users and resources allowing the recommendation of resources [34,14,15]. Such recommenders can accommodate both the basic and tag-specific constraints of resource recommendation. It is also possible to cluster resources generating topic-specific partitions [35] perhaps identifying resources based on other constraints. Clustering tags is also useful for overcoming the problem of redundancy [6]. Another effort in combating redundancy as well as ambiguity constructed user-centric tag models [36]. Hybrid models have been used to generate integrative models by combining several component recommenders like those above into a larger framework. Pairs of recommenders have been combined to produce improve results in Citeulike [37] and Bibsonomy [38]. Another approach demonstrated that a graph-based model may be improved by incorporating item-based collaborative filtering [12]. Hybrid models composed of both user-based and item-based collaborative filtering algorithms were shown to outperformed the state-of-the-art pair-wise interaction tensor factorization model in tag recommendation [13]. Another work predicts user ratings in MovieLens, one of the few systems that contains both ratings and tags [39]. We build on these efforts proposing a flexible and easily extensible hybrid model for resource recommendation. 4. Resource recommendation algorithms Our definition of resource recommendation above centers on the function φ, which assigns a real-valued score to each resource describing the relevance of the resource to the user (and, if supplied, the requirements). In this section, we describe the linear-weighted hybrid algorithm that we propose for this task, the components from which the hybrid is constructed, and we will also describe the implementation of our comparative benchmark, an integrative approach based on tensor factorization. 4.1. Linear-weighted hybrid A linear-weighted hybrid is composed of recommendation components κ1 through κk, whose output is combined by computing a weighted sum [23]. Without loss of generality, we will assume that each component κi has its own computation of the function φi(u, q,r), producing output in the range [0..1], and a weight αi in the same range. We further require that the α-values sum to 1. The hybrid is therefore defined as φ(u, q,r) = k i=1 αiφi(u, q,r) (1) A linear-weighted hybrid of this style has a number of advantages for recommendation in social annotation domains. One is that it is inherently an extensible framework. Components are free to specialize in particular dimensions of the data. If a particular application has access to additional types of data (for example, text in a document recommender) appropriate recommendation components can be added to the mix. The linear-weighted hybrid offers a way to construct integrative algorithms that take all dimensions of a social annotation system into account without requiring mathematically-complex and computationally-intensive dimensionality-reduction techniques, which are less extensible and flexible. Given a set of recommendation components, it remains to ascertain the correct αi for each component in order to maximize the effectiveness of the hybrid. We use a hill climbing technique, which is both simple and efficient. A subset of the data is selected as a holdout set for learning the algorithm parameters, including the α values. (See Methodology description in Section 5.2 below.) The α vector is initialized with random positive numbers constrained such that the sum of the vector equals 1. The recommender then operates over the holdout set, using the remaining data as training data. The precision of recommendations is calculated as described in Section 5.2. The vector is then randomly modified and tested again. If the accuracy is improved, the change is accepted; otherwise it is rejected. Occasionally a change to the α vector is accepted even when it does not improve the results in order to more fully explore the α space. Modifications continue until the vector stabilizes. Then the α vector is randomly reset and learning proceeds again. Now we turn to the components that make up our hybrid. Many of these components rely on two-dimensional projections of the URT matrix [40]. Such projections reduce the dimensionality of the data, but sacrifice some of its informational content. For example, the relation between resources and tags can be defined as RT(r,t), the number of users that have applied t to r. RT(r,t) = ∀u∈U URT(u,r,t) (2) This notion strongly resembles the “bag-of-words” vector space model [41]. Note however that such a projection models the relationship between resources and tags in a non-personalized way. Similarly, we can produce a projection UT in which a user is modeled as a vector over the set of tags, where each weight, UT(u,t), measures how often a user applied a particular tag across all resources. In all, there are six possible two-dimensional projections: UR, UT, RU, RT, TU, TR. For resource recommendation, we have found four to be useful: UR, UT, RU, and RT. In the case of UR, we have not found it useful to weight resources by the number of tags a user applies, as this is not always a reliable measure of user interest. Rather we define UR to be binary, indicating whether or not the user has annotated the resource.