Chapter 2 Related Work USER TAG TEM e Figure 2.5: Visualization of the social graph as an undirected tripartite graph of users, tems, and tags. We take the tripartite graph as the basis for our experiments in Chapter 4. It is important to note, however, that the three different types of nodes in the graph-users, items, and tags-do not all have the same roles. Users are active participants in the graph: they add items and label them with tags. Items and tags are passive nodes and can be seen as content bearers. A fitting metaphor here is that of the user as a predator, snaring items as his prey g tags. This implies that is possible to represent the folksonomy in a different type of graph representation with directed edges, although we do not consider this in the thesis due to temporal restrictions 2.3 Social Bookmarking As mentioned earlier, the Web 2.0 phenomenon caused a shift in information access and creation from local and solitary, to global and collaborative. Social bookmarking websites are a prime example of this: instead of keeping a local copy of pointers to favorite Web pages, users can instead store and access their bookmarks online through a Web interface on a remote Web server, accessible from anywhere. The underlying application then makes all stored information shareable among users. In addition to this functionality, most social bookmarking services also enable social tagging for their users, in addition to standard metadata such as the title and a brief description of the bookmarked Web page The current generation of social bookmarking websites is not the first. One of the original online bookmarking services was the now defunct itList, which was originally created April 1996(De Nie, 1999). It allowed users to store and organize their bookmarks online and, if they chose to do so, share them with other users. itList also enabled users to sort their bookmarks using a variety of categories, although it did not allow for free categorization. In the years that followed, several competing services were launched, such as Backflip, Blink, ClickMarks, and HotLink, each attempting to offer unique features to attract users(Lawlor
Chapter 2. Related Work 26 ITEM USER TAG Figure 2.5: Visualization of the social graph as an undirected tripartite graph of users, items, and tags. We take the tripartite graph as the basis for our experiments in Chapter 4. It is important to note, however, that the three different types of nodes in the graph—users, items, and tags—do not all have the same roles. Users are active participants in the graph: they add items and label them with tags. Items and tags are passive nodes and can be seen as content bearers. A fitting metaphor here is that of the user as a predator, snaring items as his prey using tags. This implies that is possible to represent the folksonomy in a different type of graph representation with directed edges, although we do not consider this in the thesis due to temporal restrictions. 2.3 Social Bookmarking As mentioned earlier, the Web 2.0 phenomenon caused a shift in information access and creation from local and solitary, to global and collaborative. Social bookmarking websites are a prime example of this: instead of keeping a local copy of pointers to favorite Web pages, users can instead store and access their bookmarks online through a Web interface on a remote Web server, accessible from anywhere. The underlying application then makes all stored information shareable among users. In addition to this functionality, most social bookmarking services also enable social tagging for their users, in addition to standard metadata such as the title and a brief description of the bookmarked Web page. The current generation of social bookmarking websites is not the first. One of the original online bookmarking services was the now defunct itList, which was originally created in April 1996 (De Nie, 1999). It allowed users to store and organize their bookmarks online and, if they chose to do so, share them with other users. itList also enabled users to sort their bookmarks using a variety of categories, although it did not allow for free categorization. In the years that followed, several competing services were launched, such as Backflip, Blink, ClickMarks, and HotLink, each attempting to offer unique features to attract users (Lawlor
Chapter 2 Related Work 27 2000). For instance, Click Marks offered automatic categorization to sort bookmarks into folders. None of these services survived the bursting of the dot-com bubble. The sec- na wave of socia al bookmarking services started in September 2003 with the creation of Delicious by Joshua Schachter (Mathes, 2004). The instant popularity of Delicious led to the launch of many other Web 2.0-style social bookmarking services, such as Diigo, Simpy, Magnolia and Mister Wong. The main differences between the two waves of social book marking services are a stronger emphasis on sharing one's bookmarks with other users and the addition of social tagging, as pioneered by Delicious. Some social bookmarking services, however, only allow users to share bookmarks without any support for social tagging, such as the GiveALink service(Stoilova et al., 2005)and Stumbleupono Below we describe domains fit for social bookmarking(Subsection 2.3.1), and how users commonly interact with a social bookmarking system(Subsection 2.3.2). We conclude the section and this chapter in Subsection 2.3.3 by giving an overview of the three research tasks that have received the majority of the attention in the related work. 2.3.1 Domains So far we have defined social bookmarking services as websites that focus on the book- marking of Web pages. In addition to Web bookmarks, there are three other domains for social'storage'services that are worth mentioning. We already mentioned that some so- cial bookmarking websites operate in the domain of scientific articles and research papers These services are known as social reference managers or social citation managers. They al- low users to store, organize, describe, and manage their collection of scientific references Although more general-purpose websites like Delicious could conceivably be used for this as well, social reference managers offer specialized features, such as article-specific meta data, creating reading lists, reading priorities, and export facilities for different formats such as BibTeX, RIS, and EndNote. Examples of social reference managers include CiteUlike (http://www.citeulike.org/),connotea(http://www.connotea.org/),Bibsonomy (http://www.bibsonomy.org),andrefbase(http://www.refbase.org/),thefirstthree of which all support collaborative tagging Next to the domains mentioned above a fourth third domain for social information storage and management is books. Social cataloging services allow users to store, describe, and manage books that they own or have read. Typically, social cataloging services allow users to rate books, and to tag and review them for their own benefit and the benefit of other users. They typically also use identifiers such as ISBN or ISSN numbers to automatically retrieve the book metadata from centralized repositories such as the Library of Congress orAmazonExamplesincludeLibrarything(http://www.librarything.com/),Shelfari (http://www.shelfari.com/),anoBii(http://www.anobii.com/),goOdreads(http //www.goodreads.com/),andWeread(http://weread.com/).Thefirstthreesupport collaborative tagging 9wereferthereadertohttp://en.wikipediaorg/wiki/list-of_social_bookmArking_websitesfor an up-to-date list of social bookmarking services ttp://www.stumbleupon.com
Chapter 2. Related Work 27 2000). For instance, ClickMarks offered automatic categorization to sort bookmarks into folders. None of these services survived the bursting of the ‘dot-com’ bubble. The second wave of social bookmarking services started in September 2003 with the creation of Delicious by Joshua Schachter (Mathes, 2004). The instant popularity of Delicious led to the launch of many other Web 2.0-style social bookmarking services, such as Diigo, Simpy, Ma.gnolia and Mister Wong9 . The main differences between the two waves of social bookmarking services are a stronger emphasis on sharing one’s bookmarks with other users. and the addition of social tagging, as pioneered by Delicious. Some social bookmarking services, however, only allow users to share bookmarks without any support for social tagging, such as the GiveALink service (Stoilova et al., 2005) and StumbleUpon10 . Below we describe domains fit for social bookmarking (Subsection 2.3.1), and how users commonly interact with a social bookmarking system (Subsection 2.3.2). We conclude the section and this chapter in Subsection 2.3.3 by giving an overview of the three research tasks that have received the majority of the attention in the related work. 2.3.1 Domains So far we have defined social bookmarking services as websites that focus on the bookmarking of Web pages. In addition to Web bookmarks, there are three other domains for social ‘storage’ services that are worth mentioning. We already mentioned that some social bookmarking websites operate in the domain of scientific articles and research papers. These services are known as social reference managers or social citation managers. They allow users to store, organize, describe, and manage their collection of scientific references. Although more general-purpose websites like Delicious could conceivably be used for this as well, social reference managers offer specialized features, such as article-specific metadata, creating reading lists, reading priorities, and export facilities for different formats such as BibTeX, RIS, and EndNote. Examples of social reference managers include CiteUlike (http://www.citeulike.org/), Connotea (http://www.connotea.org/), BibSonomy (http://www.bibsonomy.org), and refbase (http://www.refbase.org/), the first three of which all support collaborative tagging. Next to the domains mentioned above, a fourth third domain for social information storage and management is books. Social cataloging services allow users to store, describe, and manage books that they own or have read. Typically, social cataloging services allow users to rate books, and to tag and review them for their own benefit and the benefit of other users. They typically also use identifiers such as ISBN or ISSN numbers to automatically retrieve the book metadata from centralized repositories such as the Library of Congress or Amazon. Examples include LibraryThing (http://www.librarything.com/), Shelfari (http://www.shelfari.com/), aNobii (http://www.anobii.com/), GoodReads (http: //www.goodreads.com/), and WeRead (http://weread.com/). The first three support collaborative tagging. 9We refer the reader to http://en.wikipedia.org/wiki/List_of_social_bookmarking_websites for an up-to-date list of social bookmarking services. 10http://www.stumbleupon.com/
Chapter 2 Related Work The fourth example of collaborative information sharing services are formed by the so-called social news websites, that allow users to share and discover any kind of online content, but with an emphasis on online news articles. After a link or news story has been submitted, all website users get to vote and comment on the submissions, and stories are ranked on these reactions. Only stories with the most votes appear on the front page. The most popular socialnewswebsitesareDigg(http://www.digg.com/),Reddit(http://www.reddit com/),Fark(http://www.fark.com/),andMixx(http://www.mixx.com/) 2.3.2 Interacting with Social Bookmarking Websites How can users typically interact with a social bookmarking website? Because of the public shared nature of Web pages, social bookmarking websites allow for collaborative tagging The broad folksonomy that emerges from this results in a rich network of connections be tween users, items, and tags, which is reflected in the navigational structure. Figure 2.6 shows the typical navigating and browsing structure of a social bookmarking website; we take the popular service delicious as our example The personal profile page is the starting point for every user(top left in Figure 2.6). It lists the assigned tags and metadata for every bookmark that a user has added. In addition each item and tag on the page is linked to a separate page that chronologically show the activity of those tags and items. For a selected tag, clicking on it leads the user to a page that shows all other items in the system that have been annotated with that tag(bottom left) The popularity of the bookmarked item on Delicious is shown for each post. When a user clicks on this, they are forwarded to the items history page(bottom right), which show all other users who have added the selected item. For each post of that item, it also lists what tags and metadata were assigned to it by which other user. Another typical feature of social bookmarking-and websites supporting social tagging in general-is a visual representation of all of a users tags, known as a tag cloud (top right). In a tag cloud, tags are sorted alphabetically and the popularity of a tag (or rather usage intensity) is denoted by varying the markup of the link: larger font sizes and or darker font colors tend to denote more popular tag ome social bookmarking websites offer extra features on top of this: Diigo, for instance, also allows its users to highlight the Web pages they bookmark and add notes to them which are stored in the system and overlaid on the bookmarked Web pages when the user revisits them. Faves 2 is unique in being the only social bookmarking website to allow its users to explicitly rate the items they have added on a five-point scale. Social cataloging applications like Library Thing and Shelfari also allow explicit ratings. It is interesting to note, however, that some users find a way around the absence of such features. On Delicious for example, tags such as ***, or ***** are used by several people to represent the quality of a bookmarked Web page Whttp://www.diigo ttp//www.fav
Chapter 2. Related Work 28 The fourth example of collaborative information sharing services are formed by the so-called social news websites, that allow users to share and discover any kind of online content, but with an emphasis on online news articles. After a link or news story has been submitted, all website users get to vote and comment on the submissions, and stories are ranked on these reactions. Only stories with the most votes appear on the front page. The most popular social news websites are Digg (http://www.digg.com/), Reddit (http://www.reddit. com/), Fark (http://www.fark.com/), and Mixx (http://www.mixx.com/). 2.3.2 Interacting with Social Bookmarking Websites How can users typically interact with a social bookmarking website? Because of the public, shared nature of Web pages, social bookmarking websites allow for collaborative tagging. The broad folksonomy that emerges from this results in a rich network of connections between users, items, and tags, which is reflected in the navigational structure. Figure 2.6 shows the typical navigating and browsing structure of a social bookmarking website; we take the popular service Delicious as our example. The personal profile page is the starting point for every user (top left in Figure 2.6). It lists the assigned tags and metadata for every bookmark that a user has added. In addition, each item and tag on the page is linked to a separate page that chronologically show the activity of those tags and items. For a selected tag, clicking on it leads the user to a page that shows all other items in the system that have been annotated with that tag (bottom left). The popularity of the bookmarked item on Delicious is shown for each post. When a user clicks on this, they are forwarded to the item’s history page (bottom right), which show all other users who have added the selected item. For each post of that item, it also lists what tags and metadata were assigned to it by which other user. Another typical feature of social bookmarking—and websites supporting social tagging in general—is a visual representation of all of a user’s tags, known as a tag cloud (top right). In a tag cloud, tags are sorted alphabetically and the popularity of a tag (or rather usage intensity) is denoted by varying the markup of the link: larger font sizes and/or darker font colors tend to denote more popular tags. Some social bookmarking websites offer extra features on top of this: Diigo11, for instance, also allows its users to highlight the Web pages they bookmark and add notes to them, which are stored in the system and overlaid on the bookmarked Web pages when the user revisits them. Faves12 is unique in being the only social bookmarking website to allow its users to explicitly rate the items they have added on a five-point scale. Social cataloging applications like LibraryThing and Shelfari also allow explicit ratings. It is interesting to note, however, that some users find a way around the absence of such features. On Delicious, for example, tags such as *, ***, or ***** are used by several people to represent the quality of a bookmarked Web page. 11http://www.diigo.com/ 12http://www.faves.com/
Chapter 2 Related Work 29 ■ delicious Bns四 Language Processing with Pyth TAG CLOUD ITEM HISTORY wa Turka homepage Overview- Networkx v0. 99 documentation TAG HISTORY tze Plugin: Ancess ble Charts B Graphs PRo r Gnd Pacer PoFs Javascript Intones Toolst Interactve Data Vesurltmbons for the web Semis Br How to make graphs that wo Figure 2.6: Navigation on a social bookmarking website. The starting point for every user is their profile page which lists the bookmarks they added to their profile(top left). From there, users can browse to tag pages(bottom left)which show which other bookmarks have been tagged with the selected tag; and to item history pages(bottom right), which show all other users who have added the selected item, and what tags and metadata they assigned to it. Users can also get an overview of the tags they have used from their tag cloud view(top right), which marks up the more popular tags with a darker font color and larger font size 2.3.3 Research tasks We mentioned earlier that there is not a large body of related work on the task of item ecommendation on social bookmarking websites. In this section we briefly discuss the tasks and problems that have seen more attention: browsing, search, and tag recommendation 3 Browsing One of the strengths of social tagging as a way of categorizing and describing resources is that users are free to tag without any considerations of relationships between These are not the only possible tasks that can be supported on social bookmarking websites; we refer the reader to Clements(2007) for an insightful overview
Chapter 2. Related Work 29 TAG CLOUD NAVIGATION ITEM HISTORY TAG HISTORY Figure 2.6: Navigation on a social bookmarking website. The starting point for every user is their profile page which lists the bookmarks they added to their profile (top left). From there, users can browse to tag pages (bottom left) which show which other bookmarks have been tagged with the selected tag; and to item history pages (bottom right), which show all other users who have added the selected item, and what tags and metadata they assigned to it. Users can also get an overview of the tags they have used from their tag cloud view (top right), which marks up the more popular tags with a darker font color and a larger font size. 2.3.3 Research tasks We mentioned earlier that there is not a large body of related work on the task of item recommendation on social bookmarking websites. In this section we briefly discuss the tasks and problems that have seen more attention: browsing, search, and tag recommendation13 . Browsing One of the strengths of social tagging as a way of categorizing and describing resources is that users are free to tag without any considerations of relationships between 13These are not the only possible tasks that can be supported on social bookmarking websites; we refer the reader to Clements (2007) for an insightful overview
Chapter 2 Related Work 30 tags. However, it is obvious that certain tags will be more related to each other than others because they are synonyms, hyponyms, or because there is another kind of conceptual rela tionship between them. Several researchers have proposed techniques that could improve the browsing experience. One example is automatically deducing a tag hierarchy for im- proved browsing as proposed by Heymann and Garcia-Molina(2006)and Li et al.(2007) A second way of improving the browsing experience is by clustering related bookmarks to gether, as proposed by Capocci and Caldarelli(2007) and Clements et al.(2008b). Tag clouds do not take such implicit relationships into account and just visualize tags in an al phabetical list, highlighted according to popularity. Garama and De Man(2008) performed an extensive user study to investigate the usefulness of tag clouds for image search. They compared tag clouds as a browsing mechanism to a standard search engine for a known- item retrieval task, and found that, although performance as measured by success rate is ligher when using a search engine, users are more satisfied when browsing by using the tag cloud than when using a search engine. This shows the value of a simple representation such as the tag cloud Search A second popular research problem in social bookmarking is whether tags can be used to improve (1)retrieval of items on a social bookmarking website and (2) Web search in general. Since tags are condensed descriptions of content, it is possible that they car replace or augment standard retrieval algorithms; this possibility was already hypothesized arly on by, for instance, Mathes(2004). Heymann et al. (2008a) tried to answer the ques- tion Can social bookmarking help Web search? by analyzing various properties of Delicious'4 Heymann et al.(2008a) found that Web pages posted to Delicious tend to be quite dynamic and updated frequently. An analysis of the tags showed that they were overwhelmingly and objective. However, compared to the Web, Delicious still only produces small amounts of data, limiting its possible impact on improving Web search. The tags themselves were found to occur in over half of the pages they annotate, and in 80% of the cases they occurred in either the page text, the bookmark title or the Web page URL. Heymann et al. (2008a) therefore concluded that it is unlikely that tags will be more useful than a full text earch emphasizing page titles, and that Web search engine performance is therefore un ely to be impacted. Bischoff et al.(2008)repeated the work of Heymann et al.(2008a) for two more social bookmarking systems, Flickr and Last. fm, and were more positive about the potential for tags. They report a higher percentage of uniqueness of tags compared to anchor and page text, especially in the music domain, where content analysis is more diffi cult. They conclude that for certain domains tags can have a large impact on search engine performance. Morrison(2007) directly compared the performance of social bookmarking websites with Web search engines and Web directories. He found that there was only a small overlap in the results of each type of retrieval tool, with 90% of the results being unique to the system that retrieved them. The largest of the social bookmarking websites performed significantly better than the Web directories. Social bookmarking websites were also better than search engines on dynamic queries such as current events, which confirms the findings of Heymann et al.(2008a) 14A year earlier, Yanbe et al. (2007) performed the same kind of analysis on Japanese social bookmarking websites and came to similar conclusion
Chapter 2. Related Work 30 tags. However, it is obvious that certain tags will be more related to each other than others, because they are synonyms, hyponyms, or because there is another kind of conceptual relationship between them. Several researchers have proposed techniques that could improve the browsing experience. One example is automatically deducing a tag hierarchy for improved browsing as proposed by Heymann and Garcia-Molina (2006) and Li et al. (2007). A second way of improving the browsing experience is by clustering related bookmarks together, as proposed by Capocci and Caldarelli (2007) and Clements et al. (2008b). Tag clouds do not take such implicit relationships into account and just visualize tags in an alphabetical list, highlighted according to popularity. Garama and De Man (2008) performed an extensive user study to investigate the usefulness of tag clouds for image search. They compared tag clouds as a browsing mechanism to a standard search engine for a knownitem retrieval task, and found that, although performance as measured by success rate is higher when using a search engine, users are more satisfied when browsing by using the tag cloud than when using a search engine. This shows the value of a simple representation such as the tag cloud. Search A second popular research problem in social bookmarking is whether tags can be used to improve (1) retrieval of items on a social bookmarking website and (2) Web search in general. Since tags are condensed descriptions of content, it is possible that they can replace or augment standard retrieval algorithms; this possibility was already hypothesized early on by, for instance, Mathes (2004). Heymann et al. (2008a) tried to answer the question Can social bookmarking help Web search? by analyzing various properties of Delicious14 . Heymann et al. (2008a) found that Web pages posted to Delicious tend to be quite dynamic and updated frequently. An analysis of the tags showed that they were overwhelmingly relevant and objective. However, compared to the Web, Delicious still only produces small amounts of data, limiting its possible impact on improving Web search. The tags themselves were found to occur in over half of the pages they annotate, and in 80% of the cases they occurred in either the page text, the bookmark title or the Web page URL. Heymann et al. (2008a) therefore concluded that it is unlikely that tags will be more useful than a full text search emphasizing page titles, and that Web search engine performance is therefore unlikely to be impacted. Bischoff et al. (2008) repeated the work of Heymann et al. (2008a) for two more social bookmarking systems, Flickr and Last.fm, and were more positive about the potential for tags. They report a higher percentage of uniqueness of tags compared to anchor and page text, especially in the music domain, where content analysis is more diffi- cult. They conclude that for certain domains tags can have a large impact on search engine performance. Morrison (2007) directly compared the performance of social bookmarking websites with Web search engines and Web directories. He found that there was only a small overlap in the results of each type of retrieval ‘tool’, with 90% of the results being unique to the system that retrieved them. The largest of the social bookmarking websites performed significantly better than the Web directories. Social bookmarking websites were also better than search engines on dynamic queries such as current events, which confirms the findings of Heymann et al. (2008a). 14A year earlier, Yanbe et al. (2007) performed the same kind of analysis on Japanese social bookmarking websites and came to similar conclusions