Artif Intell Rev(2010)33: 187-209 DOI10.1007/s10462-0099153-2 fth 1 tagging in recommender systems: a survey state-of-the-art and possible extensions Aleksandra Klasnja Milicevic Alexandros Nanopoulos. Mirjana Ivanovic Published online: 21 January 2010 O Springer Science+Business Media B.V. 2010 Abstract Social tagging systems have grown in popularity over the Web in the last years on account of their simplicity to categorize and retrieve content using open-ended tags. The increasing number of users providing information about themselves through social tagging activities caused the emergence of tag-based profiling approaches, which assume that users expose their preferences for certain contents through tag assignments. Thus, the tagging information can be used to make recommendations. This paper presents an overview of the field of social tagging systems which can be used for extending the capabilities of recom- mender systems. Various limitations of the current generation of social tagging systems and possible extensions that can provide better recommendation capabilities are also considered Keywords Recommender systems. Social tagging. Folksonomy. Personalization 1 Introduction The information in the Web is increasing far more quickly than people can cope with Person alized recommendation(Resnick and Varian 1997)can help people te the information overload problem, by recommending items according to users'interests ol of Professional Business Studies, University of Novi Sad, Novi Sad, Serbia snja@yahoo.com Information Systems and Machine Learning Lab, University of Hildeshei Hildesheim, Germany mail: nanopoulos @ismllde Faculty of Science, Department of Mathematics and Informatics, niversity of Novi Sad, Novi Sad, Serbia e-mail: mira(dmiuns ac rs
Artif Intell Rev (2010) 33:187–209 DOI 10.1007/s10462-009-9153-2 Social tagging in recommender systems: a survey of the state-of-the-art and possible extensions Aleksandra Klasnja Milicevic · Alexandros Nanopoulos · Mirjana Ivanovic Published online: 21 January 2010 © Springer Science+Business Media B.V. 2010 Abstract Social tagging systems have grown in popularity over the Web in the last years on account of their simplicity to categorize and retrieve content using open-ended tags. The increasing number of users providing information about themselves through social tagging activities caused the emergence of tag-based profiling approaches, which assume that users expose their preferences for certain contents through tag assignments. Thus, the tagging information can be used to make recommendations. This paper presents an overview of the field of social tagging systems which can be used for extending the capabilities of recommender systems. Various limitations of the current generation of social tagging systems and possible extensions that can provide better recommendation capabilities are also considered. Keywords Recommender systems · Social tagging · Folksonomy · Personalization 1 Introduction The information in the Web is increasing far more quickly than people can cope with. Personalized recommendation (Resnick and Varian 1997) can help people to conquer the information overload problem, by recommending items according to users’ interests. A. K. Milicevic (B) Higher School of Professional Business Studies, University of Novi Sad, Novi Sad, Serbia e-mail: aklasnja@yahoo.com A. Nanopoulos Information Systems and Machine Learning Lab, University of Hildesheim, Hildesheim, Germany e-mail: nanopoulos@ismll.de M. Ivanovic Faculty of Science, Department of Mathematics and Informatics, University of Novi Sad, Novi Sad, Serbia e-mail: mira@dmi.uns.ac.rs 123
188 A. K. Milicevic et al Recommender systems use the opinions of a community of users to help individuals in hat community more effectively identify content of interest from a potentially overwhelm set of choices(Resnick et al 1994) One of the most successful technologies for recommender systems is collaborative filter ing(Konstan et al. 2004). It is built on the assumption that people who like the items they have viewed before are likely to agree again on new items. Although the assumption that collaborative filtering relied on works well in narrow domains, it is likely to fail in more diverse or mixed settings. The reason is obvious: people have similar taste in one domain may behave quite different in others. To improve recommendation quality, metadata such as content information of items has typically been used as additional knowledge. With the increasing popularity of the collabora- ve tagging systems, tags could be interesting and useful information to enhance algorithms for recommender systems Collaborative tagging systems allow users to upload their resources, and to label them with arbitrary words, so-called tags. The systems can be distinguished according to what kind of resources are supported. Flickr, 'for instance, allows the sharing of photos, Delicious2the sharing of bookmarks, Cite and Connotea# the sharing of bibliographic references, and 43Thingseven the sharing of goals in private life. These systems are all very similar Once a user is logged in, he can add a resource to the system, and assign arbitrary tags to it. The collection of all his assignments is his personomy, the collection of all personomies con- stitutes the folksonomy. The t user can explore his personomy, as well as the personomies of the other users, in all dimensions: for a given user one can see all resources he had uploaded together with the tags he had assigned to them(hotho et al. 2006a. b, c). Besides helping user to organize his or her personal collections, a tag also can be regarded as a user's personal opinion expression, while tagging can be considered as implicit rating or voting on the tagged information resources or items(liang et al. 2008). Thus, the tagging information can be used to make recommendations In this ye describe social tagging systems which can be used for extending the capabilities of recommender systems. A comprehensive survey of the state-of-the-art in col- laborative tagging systems and folksonomy is presented in Sect. 2. Section 3 presents a model for tagging activities. Tag-based recommender systems and different approaches to find best tag recommendations for items are described in Sect. 4. In Sect. 5 we identify various limi- tations of the current generation of folksonomy systems and discuss some initial approaches to extending their capabilities in Sect. 6. Finally, Sect. 7 concludes this paper 2 The survey of collaborative tagging systems and folksonomy Collaborative tagging is the practice of allowing users to freely attach keywords or tags to content( Golder and Huberman 2005). Collaborative tagging is most useful when there is nobody in the librarian"role or there is simply too much content for a single authority to classify. People tag pictures, videos, and other resources with a couple of keywords to easily retrieve them in a later stage Ihttp://www.fickr.com,nowpartofYahoo! http://del.icio.i part of Yahoo http://www.connotea.org http://www.43things.com 2 spr
188 A. K. Milicevic et al. Recommender systems use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices (Resnick et al. 1994). One of the most successful technologies for recommender systems is collaborative filtering (Konstan et al. 2004). It is built on the assumption that people who like the items they have viewed before are likely to agree again on new items. Although the assumption that collaborative filtering relied on works well in narrow domains, it is likely to fail in more diverse or mixed settings. The reason is obvious: people have similar taste in one domain may behave quite different in others. To improve recommendation quality, metadata such as content information of items has typically been used as additional knowledge. With the increasing popularity of the collaborative tagging systems, tags could be interesting and useful information to enhance algorithms for recommender systems. Collaborative tagging systems allow users to upload their resources, and to label them with arbitrary words, so-called tags. The systems can be distinguished according to what kind of resources are supported. Flickr,1 for instance, allows the sharing of photos, Delicious2 the sharing of bookmarks, CiteULike3 and Connotea4 the sharing of bibliographic references, and 43Things5 even the sharing of goals in private life. These systems are all very similar. Once a user is logged in, he can add a resource to the system, and assign arbitrary tags to it. The collection of all his assignments is his personomy, the collection of all personomies constitutes the folksonomy. The user can explore his personomy, as well as the personomies of the other users, in all dimensions: for a given user one can see all resources he had uploaded, together with the tags he had assigned to them (Hotho et al. 2006a,b,c). Besides helping user to organize his or her personal collections, a tag also can be regarded as a user’s personal opinion expression, while tagging can be considered as implicit rating or voting on the tagged information resources or items (Liang et al. 2008). Thus, the tagging information can be used to make recommendations. In this paper, we describe social tagging systems which can be used for extending the capabilities of recommender systems. A comprehensive survey of the state-of-the-art in collaborative tagging systems and folksonomy is presented in Sect. 2. Section 3 presents a model for tagging activities. Tag-based recommender systems and different approaches to find best tag recommendations for items are described in Sect. 4. In Sect. 5 we identify various limitations of the current generation of folksonomy systems and discuss some initial approaches to extending their capabilities in Sect. 6. Finally, Sect. 7 concludes this paper. 2 The survey of collaborative tagging systems and folksonomy Collaborative tagging is the practice of allowing users to freely attach keywords or tags to content (Golder and Huberman 2005). Collaborative tagging is most useful when there is nobody in the “librarian” role or there is simply too much content for a single authority to classify. People tag pictures, videos, and other resources with a couple of keywords to easily retrieve them in a later stage. 1 http://www.flickr.com, now part of Yahoo! 2 http://del.icio.us, now part of Yahoo! 3 http://www.citeulike.org. 4 http://www.connotea.org. 5 http://www.43things.com. 123
Social tagging in recommender systems The following features of collaborative tagging are generally attributed to their success and popularity(Mathes 2004: Quintarelli 2005; Wu et al. 2006) Low cognitive cost and entry barriers. The simplicity of tagging allows any Web user to classify their favourite Web resources by using keywords that are not constrained by predefined vocabularies Immediate feedback and communication. Tag suggestions in collaborative tagging sys- tems provide mechanisms for users to communicate implicitly with each other through tag suggestions to describe resources on the Web Quick Adaptation to Changes in Vocabulary. The freedom provided by tagging allow like Web2.0, ontologies and social network can be used readily by me scg as. Term fast response to changes in the use of language and the emergency of new wor need to modify any pre-defined scheme Individual needs and formation of organization. Tagging systems provide a cor means for Web users to organize their favorite Web resources. Besides, as the systems develop, users are able to discover other people who are also interested in similar items. Since tags are created by individual users in a free form, one important problem facing tag ging is to identify most appropriate tags, while eliminating noise and spam. For this purpose Au Yeung et al. (2007)define a set of general criteria for a good tagging system High coverage of multiple facets. A good tag combination should include multiple facets of the tagged objects. The larger the number of facets the more likely a user is able to recall the tagged content. High popularity. If a set of tags are used by a large number of people for a particular object, these tags are more likely to uniquely identify the tagged content and the mor likely to be used by a new user for the given object. Least-effort. The number of tags for identifying an object should be minimized, and the number of objects identified by the tag combination should be small. As a result, a user reach any tagged objects in a small number of steps via tag browsing Uniformity(normalization). Since there is no universal ontology, different people can use different terms for the same concept. In general, we have observed two general types of divergence: those due to syntactic variance, e.g.color, colorize, colonise,colourise: d those due to synonym, e. g, student and pupil, which are different syntactic terms hat refer to the same underlying concept. These kinds of divergence are a double-edged sword. On the one hand, they introduce noises to the system; on the other hand it Exclusion of certain types of tags. For example, personally used organizational tags are less likely to be shared by different users. Thus, they should be excluded from public usage. Rather than ignoring these tags, tagging system includes a feature that auto-com- pletes tags as they are being typed by matching the prefixes of the tags entered by the user before. This not only improves the usability of the system but alsoenables the convergence Another important is how they operate. Marlow et al ns' design that may have immedia on the content and by the system. Some of these dimensions are listed belo
Social tagging in recommender systems 189 The following features of collaborative tagging are generally attributed to their success and popularity (Mathes 2004; Quintarelli 2005; Wu et al. 2006). • Low cognitive cost and entry barriers. The simplicity of tagging allows any Web user to classify their favourite Web resources by using keywords that are not constrained by predefined vocabularies. • Immediate feedback and communication. Tag suggestions in collaborative tagging systems provide mechanisms for users to communicate implicitly with each other through tag suggestions to describe resources on the Web. • Quick Adaptation to Changes in Vocabulary. The freedom provided by tagging allows fast response to changes in the use of language and the emergency of new words. Terms like Web2.0, ontologies and social network can be used readily by the users without the need to modify any pre-defined schemes. • Individual needs and formation of organization. Tagging systems provide a convenient means for Web users to organize their favorite Web resources. Besides, as the systems develop, users are able to discover other people who are also interested in similar items. Since tags are created by individual users in a free form, one important problem facing tagging is to identify most appropriate tags, while eliminating noise and spam. For this purpose, Au Yeung et al. (2007) define a set of general criteria for a good tagging system. • High coverage of multiple facets. A good tag combination should include multiple facets of the tagged objects. The larger the number of facets the more likely a user is able to recall the tagged content. • High popularity. If a set of tags are used by a large number of people for a particular object, these tags are more likely to uniquely identify the tagged content and the more likely to be used by a new user for the given object. • Least-effort. The number of tags for identifying an object should be minimized, and the number of objects identified by the tag combination should be small. As a result, a user can reach any tagged objects in a small number of steps via tag browsing. • Uniformity (normalization). Since there is no universal ontology, different people can use different terms for the same concept. In general, we have observed two general types of divergence: those due to syntactic variance, e.g., color, colorize, colorise, colourise; and those due to synonym, e.g., student and pupil, which are different syntactic terms that refer to the same underlying concept. These kinds of divergence are a double-edged sword. On the one hand, they introduce noises to the system; on the other hand it can increase recall. • Exclusion of certain types of tags. For example, personally used organizational tags are less likely to be shared by different users. Thus, they should be excluded from public usage. Rather than ignoring these tags, tagging system includes a feature that auto-completes tags as they are being typed by matching the prefixes of the tags entered by the user before. This not only improves the usability of the system but also enables the convergence of tags. Another important aspect of tagging systems is how they operate. Marlow et al. (2006) describe some key dimensions of tagging systems’ design that may have immediate effect on the content and usefulness of tags generated by the system. Some of these dimensions are listed below. 123
A. K. Milicevic et al 2. 1 Tagging rights The permission a user has to tag resources can effect the properties of an emergent folkson- systems can determine who may remove a tag. Also, systems can choose the resources hich users tag or specify different levels of permissions to tag. The spectrum of tagging ns ranges from a. Self-tagging--users can only tag their own contributions(e.g.Technorati),through b. Permission-based--users decide who can tag their resources(e.g Flickr), to ree-for-all--any user can tag any resource 2 Tagging support One important aspect of a tagging system is the way in which users assign tags to items They may assign arbitrary tags without prompting, they may add tags while considering those already added to a particular resource, or tags may be proposed. There are three dis tinct categones a. Blind tagging--user cannot see the other tags assigned to the resource they 're tagging b. Viewable tagging--users can see the other tags assigned to the resource they 're tagging Suggestive tagging--user sees suggested tags for the resource theyre tagging .3 gregation The aggregation of tags around a given resource is an important consideration. The system may allow for a multiplicity of tags for the same resource which may result in duplicate tags from different users. Alternatively, many systems ask the group to collectively tag an individual resource. It is able to distinguish two models of aggregation a. Bag-model--the same tag can be assigned to a resource multiple times, like in Delicious, allowing statistics to be generated and users to see if there is agreement among tagger about the content of the resource b. Set-model--a tag can be applied only once to a resource, like in Flickr 2.4 Types of object The implications for the nature of the resultant tags are numerous. The types of resource agged allow us to distinguish different tagging systems. Popular systems include simple objects, like: webpages, bibliographic materials, images, videos, songs, etc. Tags for text objects and multimedia objects can be varied. In reality, any object that can be virtually epresented can be tagged or used in a tagging system. For example, systems exist that let users tag physical locations or events(e. g, Upcoming) 2.5 Sources of material Some systems restrict the source through architecture(e. g, Flickr), while others restrict the source solely through social norms(e. g, CiteULike). Resources to be tagged can be supplied 6htp/wtechnoraticom 7http://www.upcoming.yahoo.con 2 spr
190 A. K. Milicevic et al. 2.1 Tagging rights The permission a user has to tag resources can effect the properties of an emergent folksonomy. Systems can determine who may remove a tag. Also, systems can choose the resources which users tag or specify different levels of permissions to tag. The spectrum of tagging permissions ranges from: a. Self-tagging—users can only tag their own contributions (e.g. Technorati6), through b. Permission-based—users decide who can tag their resources (e.g. Flickr), to c. Free-for-all—any user can tag any resource 2.2 Tagging support One important aspect of a tagging system is the way in which users assign tags to items. They may assign arbitrary tags without prompting, they may add tags while considering those already added to a particular resource, or tags may be proposed. There are three distinct categories: a. Blind tagging—user cannot see the other tags assigned to the resource they’re tagging b. Viewable tagging—users can see the other tags assigned to the resource they’re tagging c. Suggestive tagging—user sees suggested tags for the resource they’re tagging 2.3 Aggregation The aggregation of tags around a given resource is an important consideration. The system may allow for a multiplicity of tags for the same resource which may result in duplicate tags from different users. Alternatively, many systems ask the group to collectively tag an individual resource. It is able to distinguish two models of aggregation. a. Bag-model—the same tag can be assigned to a resource multiple times, like in Delicious, allowing statistics to be generated and users to see if there is agreement among taggers about the content of the resource b. Set-model—a tag can be applied only once to a resource, like in Flickr 2.4 Types of object The implications for the nature of the resultant tags are numerous. The types of resource tagged allow us to distinguish different tagging systems. Popular systems include simple objects, like: webpages, bibliographic materials, images, videos, songs, etc. Tags for text objects and multimedia objects can be varied. In reality, any object that can be virtually represented can be tagged or used in a tagging system. For example, systems exist that let users tag physical locations or events (e.g., Upcoming7). 2.5 Sources of material Some systems restrict the source through architecture (e.g., Flickr), while others restrict the source solely through social norms (e.g., CiteULike). Resources to be tagged can be supplied: 6 http://www.technorati.com. 7 http://www.upcoming.yahoo.com. 123
Social tagging in recommender systems a. by the participants(You Tube, Flickr, Technorati, Upcoming) b. by the system(ESP Game, Last. fm"0, Yahoo! Podcasts) c. open to any web resource(Delicious, Yahoo! My Web2.0) 2.6 Resource connectivity Resources in a tagging system, may be connected to each other independently of their tags For example, Web pages may be connected via hyperlinks, or resources can be assigned to groups(e.g. photo albums in Flickr) Connectivity can be roughly categorized as: linked, 2.7 Social connectivity Users of the system may be connected. Many tagging systems include social networking facilities that allow users to connect themselves to each other based on their areas of inter est, educational institutions, location and so forth. Like resource connectivity, the social connectivity could be defined as linked, grouped, or none The term folksonomy defines a user-generated and distributed classification system. emerging when large communities of users collectively tag resources(Wal 2005 ). Folksc s became popular on the Web with social software applications such as social book marking, photo sharing and weblogs. A number of social tagging sites such as Delicious Flickr, You Tube, CiteULike have become popular. Commonly cited advantages of folks- nomies are their flexibility, rapid adaptability, free-for-all collaborative customisation and their serendipity(Mathes 2004). People can in general use any term as a tag without exactly understanding the meaning of the terms they choose. The power of folksonomies stands in the aggregation of tagged information that one is interested in. This improves social serendipity by enabling social connections and by providing social search and navigation(Quintarelli 2005). Folksonomy shows a lot of benefits(Peters and Stock 2007) represent an authentic use of language, allow multiple interpretations are cheap methods of indexing, are the only way to index mass information on the Web, are sources for the development of ontologies, thesauri or classification systems give the quality"control"to the masses, allow searching and-perhaps even better-browsing, cognize neologisms, in help to identify communities, sources for collaborative recommender systems, nake people sensitive to information indexing There are two types of folksonomies: broad and narrow folksonomies(Wal 2005). The broad folksonomy, like Delicious, has many people tagging the same object and every person can tag the object with their own tags in their own vocabulary. Thus, in theory there is a great 9http://www.esp http://podcasts.yahoo.cor
Social tagging in recommender systems 191 a. by the participants (YouTube8, Flickr, Technorati, Upcoming) b. by the system (ESP Game9, Last.fm10, Yahoo! Podcasts11) c. open to any web resource (Delicious, Yahoo! MyWeb2.012) 2.6 Resource connectivity Resources in a tagging system, may be connected to each other independently of their tags. For example, Web pages may be connected via hyperlinks, or resources can be assigned to groups (e.g. photo albums in Flickr). Connectivity can be roughly categorized as: linked, grouped, or none. 2.7 Social connectivity Users of the system may be connected. Many tagging systems include social networking facilities that allow users to connect themselves to each other based on their areas of interest, educational institutions, location and so forth. Like resource connectivity, the social connectivity could be defined as linked, grouped, or none. The term folksonomy defines a user-generated and distributed classification system, emerging when large communities of users collectively tag resources (Wal 2005). Folksonomies became popular on the Web with social software applications such as social bookmarking, photo sharing and weblogs. A number of social tagging sites such as Delicious, Flickr, YouTube, CiteULike have become popular. Commonly cited advantages of folksonomies are their flexibility, rapid adaptability, free-for-all collaborative customisation and their serendipity (Mathes 2004). People can in general use any term as a tag without exactly understanding the meaning of the terms they choose. The power of folksonomies stands in the aggregation of tagged information that one is interested in. This improves social serendipity by enabling social connections and by providing social search and navigation (Quintarelli 2005). Folksonomy shows a lot of benefits (Peters and Stock 2007): • represent an authentic use of language, • allow multiple interpretations, • are cheap methods of indexing, • are the only way to index mass information on the Web, • are sources for the development of ontologies, thesauri or classification systems, • give the quality “control” to the masses, • allow searching and—perhaps even better—browsing, • recognize neologisms, • can help to identify communities, • are sources for collaborative recommender systems, • make people sensitive to information indexing. There are two types of folksonomies: broad and narrow folksonomies (Wal 2005). The broad folksonomy, like Delicious, has many people tagging the same object and every person can tag the object with their own tags in their own vocabulary. Thus, in theory there is a great 8 http://www.youtube.com. 9 http://www.espgame.org. 10 http://www.last.fm. 11 http://podcasts.yahoo.com. 12 http://myweb.yahoo.com. 123