Chapter 19 Social Tagging recommender Systems Leandro Balby Marinho, Alexandros Nanopoulos, Lars Schmidt-Thieme, Robert Jaschke, Andreas Hotho, Gerd Stumme and Panagiotis Symeonidis Abstract The new generation of Web applications known as(STS)is successfully established and poised for continued growth. STS are open and inherently social features that have been proven to encourage participation. But while STS bring new opportunities, they revive old problems, such as information overload. Rec- mender Systems are well known applications for increasing the level of relevant content over the"noise"that continuously grows as more and more content becomes available online. In StS however we face new challenges. Users are interested finding not only content, but also tags and even other users. Moreover, while tra- ditional recommender systems usually operate over 2-way data arrays, sTS data is represented as a third-order tensor or a hypergraph with hyperedges denoting(user, resource,tag)triples. In this chapter, we survey the most recent and state-of-the-art work about a whole new generation of recommender systems built to serve STS.We describe(a)novel facets of recommenders for STS, such as user, resource, and tag ecommenders, (b)new approaches and algorithms for dealing with the ternary na ure of STs data, and (c)recommender systems deployed in real world STS. More- over, a concise comparison between existing works is presented, through which we identify and point out new research directions Leandro Balby Marinho. Alexandros Nanopo Lars Schmidt-Thieme Information Systems and Machine Learning lab University of Hildesheim, Marien- burger Platz 22, 31141 Hildesheim, Germany, wwwismlluni-hildesheim. de. e-mail marinho, nanopoulos, schmidt-thieme@ismlLuni-hildesheim de Robert Jaschke. Andreas Hotho. Gerd Stumme Knowledge Data Engineering Group(KDE), University of Kassel, WilheImshoher Allee 73 34121Kassel,Germany,http://www.kde.cs.uni-kassel.de,e-mail:Gjaeschke,hotho,stumme@cs Panagiotis Symeonidis Department of Informatics, Aristotle University, 54124 Thessaloniki, Greece, e-mail: symeon@ csd auth.gr F Ricci et al. (eds ) Recommender Systems Handbook, DOI 10 1007/978-0-387-85820-3 19, o Springer Science+Business Media, LLC 2011
Chapter 19 Social Tagging Recommender Systems Leandro Balby Marinho, Alexandros Nanopoulos, Lars Schmidt-Thieme, Robert Jaschke, Andreas Hotho, Gerd Stumme and Panagiotis Symeonidis ¨ Abstract The new generation of Web applications known as (STS) is successfully established and poised for continued growth. STS are open and inherently social; features that have been proven to encourage participation. But while STS bring new opportunities, they revive old problems, such as information overload. Recommender Systems are well known applications for increasing the level of relevant content over the “noise” that continuously grows as more and more content becomes available online. In STS however, we face new challenges. Users are interested in finding not only content, but also tags and even other users. Moreover, while traditional recommender systems usually operate over 2-way data arrays, STS data is represented as a third-order tensor or a hypergraph with hyperedges denoting (user, resource, tag) triples. In this chapter, we survey the most recent and state-of-the-art work about a whole new generation of recommender systems built to serve STS. We describe (a) novel facets of recommenders for STS, such as user, resource, and tag recommenders, (b) new approaches and algorithms for dealing with the ternary nature of STS data, and (c) recommender systems deployed in real world STS. Moreover, a concise comparison between existing works is presented, through which we identify and point out new research directions. Leandro Balby Marinho · Alexandros Nanopoulos · Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Marienburger Platz 22, 31141 Hildesheim, Germany, http://www.ismll.uni-hildesheim.de, e-mail: {marinho,nanopoulos,schmidt-thieme}@ismll.uni-hildesheim.de Robert Jaschke ¨ · Andreas Hotho · Gerd Stumme Knowledge & Data Engineering Group (KDE), University of Kassel, Wilhelmshoher Allee 73, ¨ 34121 Kassel, Germany, http://www.kde.cs.uni-kassel.de, e-mail: {jaeschke,hotho,stumme}@cs. uni-kassel.de Panagiotis Symeonidis Department of Informatics, Aristotle University, 54124 Thessaloniki, Greece, e-mail: symeon@ csd.auth.gr F. Ricci et al. (eds.), Recommender Systems Handbook, DOI 10.1007/978-0-387-85820-3_19, © Springer Science+Business Media, LLC 2011 615
Leandro balby marinho et al 19.1 Introduction with the advent of affordable domestic high-speed communication facilities, in- expensive digitization devices, and the open access nature of the Web, a new and exciting family of Web applications known as Web 2.0 has been born. The underly ing idea is to decentralize and cheapen content creation, thus leading the Web into a nore open, connected, and democratic environment. In this chapter we will focus on a particular family of Web 2.0 applications known as Social Tagging Systems(STS for short ). STS assign a major role to the ordinary user, who is not only allowed to publish and edit resources, but also and more importantly, to create and share lightweight metadata in the form of freely chosen key words called tags. The expo- sure of users to both tags and resources creates a fundamental trigger for communi- cation and sharing, thus lowering the barriers to cooperation and contributing to the creation of collaborative lightweight knowledge structures known as folksonomies Some notable examples of STS are sites like Delicious, BibSonomy,, and Last.fm+, where Delicious allows the sharing of bookmarks, BibSonomy the sharing of book marks and lists of literature, and Last. fm the sharing of music. These systems are characterized by being easy to use and free to anyone willing to participate. Once a user is logged in, he can add a resource to the system, and assign arbitrary tags to it. If on the one hand this new family of applications brings new opportunities, it re- vives old problems on the other, namely the problem of information overload. Mil- lions of individual users and independent providers are flooding STS with content ind tags in an uncontrolled way, thereby lowering the potential for content retrieval and information sharing. One of the most successful approaches for increasing the level of relevant content over the"noise"that continuously grows as more and more content becomes available online lies on Recommender Systems(Rs for short). In STS however, we face several new challenges. Users are interested in finding not only but also tags, and even other users. Moreover, while traditional Rs usually operate over 2-way data arrays, folksonomy data is represented as a third- order tensor or a hypergraph with hyperedges denoting(user, resource, tag)triples. urthermore, while there is an extensive literature for rating prediction based on explicit user feedback, i.e., a numerical value denoting the degree of preference of a user for a given item, in folksonomies there are usually no ratings. Thus, before arguing why not to simply use an old solution to a recurrent problem, we need to investigate to which extent the traditional rs paradigm and approaches apply to STS Social tagging recommender systems is a young research area that has attracted significant attention recently, which is expressed by the increasing number of publi cations(e.g,[15, 11, 37, 35, 31))and is poised for continued growth. Furthermore, The term folksonomy refers to a blend of the two words folk and taxonomy, i. e, a collaborative classification system created and maintained by ordinary users. 3http://www.bibsonomy.org/ http://www.last.fml
616 Leandro Balby Marinho et al. 19.1 Introduction With the advent of affordable domestic high-speed communication facilities, inexpensive digitization devices, and the open access nature of the Web, a new and exciting family of Web applications known as Web 2.0 has been born. The underlying idea is to decentralize and cheapen content creation, thus leading the Web into a more open, connected, and democratic environment. In this chapter we will focus on a particular family of Web 2.0 applications known as Social Tagging Systems (STS for short). STS assign a major role to the ordinary user, who is not only allowed to publish and edit resources, but also and more importantly, to create and share lightweight metadata in the form of freely chosen keywords called tags. The exposure of users to both tags and resources creates a fundamental trigger for communication and sharing, thus lowering the barriers to cooperation and contributing to the creation of collaborative lightweight knowledge structures known as folksonomies1 . Some notable examples of STS are sites like Delicious2 , BibSonomy3 , and Last.fm4 , where Delicious allows the sharing of bookmarks, BibSonomy the sharing of bookmarks and lists of literature, and Last.fm the sharing of music. These systems are characterized by being easy to use and free to anyone willing to participate. Once a user is logged in, he can add a resource to the system, and assign arbitrary tags to it. If on the one hand this new family of applications brings new opportunities, it revives old problems on the other, namely the problem of information overload. Millions of individual users and independent providers are flooding STS with content and tags in an uncontrolled way, thereby lowering the potential for content retrieval and information sharing. One of the most successful approaches for increasing the level of relevant content over the “noise” that continuously grows as more and more content becomes available online lies on Recommender Systems (RS for short). In STS however, we face several new challenges. Users are interested in finding not only content, but also tags, and even other users. Moreover, while traditional RS usually operate over 2-way data arrays, folksonomy data is represented as a thirdorder tensor or a hypergraph with hyperedges denoting (user, resource, tag) triples. Furthermore, while there is an extensive literature for rating prediction based on explicit user feedback, i.e., a numerical value denoting the degree of preference of a user for a given item, in folksonomies there are usually no ratings. Thus, before arguing why not to simply use an old solution to a recurrent problem, we need to investigate to which extent the traditional RS paradigm and approaches apply to STS. Social tagging recommender systems is a young research area that has attracted significant attention recently, which is expressed by the increasing number of publications (e.g., [15, 11, 37, 35, 31]) and is poised for continued growth. Furthermore, 1 The term folksonomy refers to a blend of the two words folk and taxonomy, i.e., a collaborative classification system created and maintained by ordinary users. 2 http://delicious.com/ 3 http://www.bibsonomy.org/ 4 http://www.last.fm/
19 Social Tagging Recommender Systems real and large scale STS, such as Delicious, BibSonomy, and Last. fm, for example Iready offer some recommender services to their users, which implies an increas- ing commercial interest in the area. In this chapter we survey in a concise manner, the most recent and state-of-the-art work about a whole new generation of Rs built to serve StS. We describe: (a)novel facets of RS for STS, such as user, resource, and tag recommenders, (b) the challenges for deploying RS in real-world STS, (c) new approaches and algorithms for dealing with the inherent ternary relational data of folksonomies, and (d) approaches for tag acquisition. Emphasis is given on pre- senting a concise comparison between existing works, through which we identify and point out new research directions The chapter is structured as follows In Section 19.2 we characterize the data structure of folksonomies and point out some of the differences between the tra- ditional rs paradigm and social tagging RS In Section 19.3 we discuss the chal- lenges of deploying RS in real world STS and present the BibSonomy system as a study case. Section 19. 4 presents several families of social tagging RS, such as: graph/content-based algorithms for recommending users, resources or tags. Sec tion 19.5 provides comparisons and discussions about the algorithms presented in Section 19.4; and finally Section 19.6 closes the chapter pointing out new directions of research in this area 19.2 Social Tagging Recommenders Systems Folksonomies are the underlying structures of STS and result from the practice of collaboratively creating tags to annotate and categorize content. Tags, in general are a way of grouping content by category to make them easy to view by topic. This oach to organize a site and help users find sted in. Note that with the introduction of tags, the usual binary relation between users and resources, which is largely exploited by traditional RS, turns into a ternary between users, resources, and tags Since tags are voluntarily and freely provided by ordinary users, problems such as unwillingness to tag and diverging vocabulary can easily arise. As we will see in the course of this chapter, a possible way to address these problems is through tag RS. Tags also represent additional and personalized information about resources which if properly exploited, can eventually boost the performance of resource RS But before we delve into how rs can deal and benefit from the additional informa- tion provided by tags, we need to formally define folksonomies and its data struc orate on the differences between traditional rs and social tagging rs. and the challenges involved in deploying RS in real world STS: topics which are overed in the following section
19 Social Tagging Recommender Systems 617 real and large scale STS, such as Delicious, BibSonomy, and Last.fm, for example, already offer some recommender services to their users, which implies an increasing commercial interest in the area. In this chapter we survey in a concise manner, the most recent and state-of-the-art work about a whole new generation of RS built to serve STS. We describe: (a) novel facets of RS for STS, such as user, resource, and tag recommenders, (b) the challenges for deploying RS in real-world STS, (c) new approaches and algorithms for dealing with the inherent ternary relational data of folksonomies, and (d) approaches for tag acquisition. Emphasis is given on presenting a concise comparison between existing works, through which we identify and point out new research directions. The chapter is structured as follows. In Section 19.2 we characterize the data structure of folksonomies and point out some of the differences between the traditional RS paradigm and social tagging RS. In Section 19.3 we discuss the challenges of deploying RS in real world STS and present the BibSonomy system as a study case. Section 19.4 presents several families of social tagging RS, such as: graph/content-based algorithms for recommending users, resources or tags. Section 19.5 provides comparisons and discussions about the algorithms presented in Section 19.4; and finally Section 19.6 closes the chapter pointing out new directions of research in this area. 19.2 Social Tagging Recommenders Systems Folksonomies are the underlying structures of STS and result from the practice of collaboratively creating tags to annotate and categorize content. Tags, in general, are a way of grouping content by category to make them easy to view by topic. This is a grassroot approach to organize a site and help users find content they are interested in. Note that with the introduction of tags, the usual binary relation between users and resources, which is largely exploited by traditional RS, turns into a ternary relation between users, resources, and tags. Since tags are voluntarily and freely provided by ordinary users, problems such as unwillingness to tag and diverging vocabulary can easily arise. As we will see in the course of this chapter, a possible way to address these problems is through tag RS. Tags also represent additional and personalized information about resources, which if properly exploited, can eventually boost the performance of resource RS. But before we delve into how RS can deal and benefit from the additional information provided by tags, we need to formally define folksonomies and its data structures, elaborate on the differences between traditional RS and social tagging RS, and the challenges involved in deploying RS in real world STS; topics which are covered in the following sections
Leandro balby marinho et al 19.2.1 Folksonomy Formally, a folksonomy is a tuple F: =(U,T, R, r) where U, T, and R are non-empty finite sets, whose elements are called users, tags, and resources, resp . and Y is a ternary relation between them, i.e., YCUxTXR, whose elements are alled tag assignments. Users are typically described by their user ID, and tags may be arbitrary strings What is considered a resource depends on the type of system. For instance, in Deli- cious, the resources are URLs, in BibSonomy URLs or publication references, and in Last. fm, the resources can be artists, song tracks or albums Folksonomy data can be represented in different ways, and as we will see in Section 19.4, each representation can lead to different recommendation algorithms Folksonomies as Tensors The set of triples in Y can be represented as third-order tensors(3-dimensional arrays)A=(aul r) ERIUIXITIXIR. There are different ways to represent Y as a tensor(see left-hand sinde of Figure 19.1). Symeonidis et al. [35] for example, proposed to interpret Y as a sparse tensor in which I indicates positive feedback and 0 missing values ∫1,(at,r)∈Y 0. else Rendle et al. [26], on the other hand, distinguish between positive/negative ex amples and missing values in order to learn personalized rank tion 19.4). The idea is that positive and negative examples are only generated from observed tag assignments. Observed tag assignments are interpreted as positive feedback, whereas the non observed tag assignments of an already tagged resource are negative evidences. All other entries are assumed to be missing values(see right Note that in folksonomies, differently from typical RS, there are usually no nu- cating the explicit pre Folksonomies as Hypergraphs An equivalent, but maybe more intuitive repre entation of a folksonomy, is a tripartite(undirected)hypergraph: =(V, E), where V:= UUTUR is the set of no E: =u,t, rl(u,t, r)er is the set hyperedges(see Figure 19.2) In the original definition [12]. it is introduced additionally a subtag/supertag relation, which we omit here. The version used here is known in Formal Concept Analysis [7] as a triadic context [21 H
618 Leandro Balby Marinho et al. 19.2.1 Folksonomy Formally, a folksonomy is a tuple F := (U,T,R,Y) where • U, T, and R are non-empty finite sets, whose elements are called users, tags, and resources, resp., and • Y is a ternary relation between them, i. e., Y ⊆ U ×T ×R, whose elements are called tag assignments.5 Users are typically described by their user ID, and tags may be arbitrary strings. What is considered a resource depends on the type of system. For instance, in Delicious, the resources are URLs, in BibSonomy URLs or publication references, and in Last.fm, the resources can be artists, song tracks or albums. Folksonomy data can be represented in different ways, and as we will see in Section 19.4, each representation can lead to different recommendation algorithms. Folksonomies as Tensors The set of triples in Y can be represented as third-order tensors (3-dimensional arrays) A = (au,t,r) ∈ R |U|×|T|×|R| . There are different ways to represent Y as a tensor (see left-hand sinde of Figure 19.1). Symeonidis et al. [35], for example, proposed to interpret Y as a sparse tensor in which 1 indicates positive feedback and 0 missing values: au,t,r = { 1, (u,t,r) ∈ Y 0, else Rendle et al. [26], on the other hand, distinguish between positive/negative examples and missing values in order to learn personalized ranking of tags (see Section 19.4). The idea is that positive and negative examples are only generated from observed tag assignments. Observed tag assignments are interpreted as positive feedback, whereas the non observed tag assignments of an already tagged resource are negative evidences. All other entries are assumed to be missing values (see righthand side of Figure 19.1). Note that in folksonomies, differently from typical RS, there are usually no numerical ratings indicating the explicit preference of a user for a given resource/tag. Folksonomies as Hypergraphs An equivalent, but maybe more intuitive representation of a folksonomy, is a tripartite (undirected) hypergraph G := (V,E), where V := U∪˙ T∪˙ R is the set of nodes, and E := {{u,t,r} | (u,t,r) ∈ Y} is the set of hyperedges (see Figure 19.2). 5 In the original definition [12], it is introduced additionally a subtag/supertag relation, which we omit here. The version used here is known in Formal Concept Analysis [7] as a triadic context [21, 34]
19 Social Tagging Recommender Systems 000111 010 00|10 1110 :: 011 Fig. 19.1: Left [35]: 0/1 sparse tensor representation where positive feedback is interpreted as l and the remaining data as 0. Right [26]: Non observed tag assign- ments for a given already tagged resource are negative examples. All other entries ng values <Tag 3 Fig. 19.2: Tripartite undirected hypergraph representation of a folksonomy 19.2.2 The Traditional Recommender Systems Paradigm Recommender systems are software applications that aim at predicting the user interest for a particular resource based on a collection of user profiles, e. g, the user's history of purchase/resources'ratings, click-stream data, demographic infor mation,and so forth. Usually Rs predict ratings of resources or suggest a list of new resources that the user hopefully will like the most. Traditionally, for m users and n resources, the user profiles are represented in a sparse user-resource matrix XERXU, where denote missing values. The matrix can be decomposed Into row vectors where xu, r indicates that user u rated resource r by xu r E R. Each row vector xu cor responds thus to a user profile representing the resource's ratings of a particular user. This decomposition usually leads to algorithms that leverage user-user similarities uch as the well known user-based collaborative filtering(CF)[27]. The matrix can alternatively be represented by its column vectors
19 Social Tagging Recommender Systems 619 Fig. 19.1: Left [35]: 0/1 sparse tensor representation where positive feedback is interpreted as 1 and the remaining data as 0. Right [26]: Non observed tag assignments for a given already tagged resource are negative examples. All other entries are missing values. Fig. 19.2: Tripartite undirected hypergraph representation of a folksonomy. 19.2.2 The Traditional Recommender Systems Paradigm Recommender systems are software applications that aim at predicting the user interest for a particular resource based on a collection of user profiles, e.g., the user’s history of purchase/resources’ ratings, click-stream data, demographic information, and so forth. Usually RS predict ratings of resources or suggest a list of new resources that the user hopefully will like the most. Traditionally, for m users and n resources, the user profiles are represented in a sparse user-resource matrix X ∈ R m×n ∪ {.}, where {.} denote missing values. The matrix can be decomposed into row vectors: X := [x1,...,xm] T with xu := [xu,1,...,xu,n], for u := 1,...,m, where xu,r indicates that user u rated resource r by xu,r ∈ R. Each row vector xu corresponds thus to a user profile representing the resource’s ratings of a particular user. This decomposition usually leads to algorithms that leverage user-user similarities, such as the well known user-based collaborative filtering (CF) [27]. The matrix can alternatively be represented by its column vectors: X := [x1,...,xn] with xr := [x1,r ,...,xm,r ] T , for r := 1,...,n