Social bookmark ing for Scholarly Digital Libraries Social bookmarking services have recently gained popularity among Web users Whereas numerous studies provide a historical account of tagging systems, the authors use their analysis of a domain-specific social bookmarking service called and tag reuse. They examine the relationship between these two metrics and articulate design implications for enhancing social bookmarking services. The authors briefly reflect on their own work on developing a social bookmarking service for CiteSeer, an online scholarly digital library for computer science Umer Farooq, Yang Song, he contemporary Web has popular- social bookmarking service can encour John M. Carroll ized social bookmarking services, age different levels of tag growth and and C. Lee Giles which let users specify keywords or reuse through design Pennsylvania State University tags for Web resources that they 're inter- Here, we focus on social bookmarking ested in. Well-known examples include services for scholarly communities in del.icio.us(http://del.icio.usandFlickrwhichuserscollectivelyorganizeandtag (http://flickr.com),whichletuserstagintellectualresourcesUsingacasestudy Web sites and pictures, respectively of Citeulike(Http: //citeulike. org), a social One way to measure how effective bookmarking service for tagging scholar social bookmarking services are is to ana- ly papers, we analyzed tag growth and lyze their tag vocabularies. Two common- tag reuse over time. Our results provide ly used metrics g growth, which design implications for developing and assesses the addition of new tags to the enhancing scholarly social bookmarking verall tag vocabulary, and tag reuse, services. We also briefly reflect on our which looks at the recycling of existing own work to develop a social bookmark tags. We examine the relationship ing service for Cite Seer, an online schol between these two metrics and how a arly digital library for computer science. 16 Published by the IEEE Computer Society 1089-7801/07/s2500●2007EE IEEE INTERNET COMPUTING
Umer Farooq,Yang Song, John M. Carroll, and C. Lee Giles Pennsylvania State University Social Bookmarking for Scholarly Digital Libraries Social bookmarking services have recently gained popularity among Web users. Whereas numerous studies provide a historical account of tagging systems, the authors use their analysis of a domain-specific social bookmarking service called CiteULike to reflect on two metrics for evaluating tagging behavior: tag growth and tag reuse. They examine the relationship between these two metrics and articulate design implications for enhancing social bookmarking services. The authors briefly reflect on their own work on developing a social bookmarking service for CiteSeer, an online scholarly digital library for computer science. T he contemporary Web has popularized social bookmarking services, which let users specify keywords or tags for Web resources that they’re interested in. Well-known examples include del.icio.us (http://del.icio.us) and Flickr (http://flickr.com), which let users tag Web sites and pictures, respectively. One way to measure how effective social bookmarking services are is to analyze their tag vocabularies. Two commonly used metrics are tag growth, which assesses the addition of new tags to the overall tag vocabulary, and tag reuse, which looks at the recycling of existing tags. We examine the relationship between these two metrics and how a social bookmarking service can encourage different levels of tag growth and reuse through design. Here, we focus on social bookmarking services for scholarly communities in which users collectively organize and tag intellectual resources. Using a case study of CiteULike (http://citeulike.org), a social bookmarking service for tagging scholarly papers, we analyzed tag growth and tag reuse over time. Our results provide design implications for developing and enhancing scholarly social bookmarking services. We also briefly reflect on our own work to develop a social bookmarking service for CiteSeer, 1 an online scholarly digital library for computer science. 16 Published by the IEEE Computer Society 1089-7801/07/$25.00 © 2007 IEEE IEEE INTERNET COMPUTING S o cial S e a r c h
section title headers Table 1. Data from four social bookmarking services, compared with our CiteULike data Name Data collection Del icio us" Collaborative tagging system for Web bookmarks Four days(212 URLs; 19, 422 bookmarks) Flickr Photo-sharing system for users to store and tag their and others' personal photos No time data (25,000 users) Social bookmarking service for a large enterprise(IBM's intranet) Eight weeks(13, 174 bookmarks, 686 users) Movielens Movie recommender system that also lets users tag their favorite movies Approximately one month(3,26 tags: 635 users) CiteULike Social bookmarking service for sharing, storing, and organizing scholarly papers More than two years(2,0Il users: 9,623 papers: 6, 527 tags) Social Bookmarking Anatomy The basic unit of information in a social book. marking service comprises three elements in a triple, represented as(user, resource, tag) User Adapting terminology from previous work,this riple is called a tag application (in which a user applies a tag to a resource; in some cases, it's also called a tag post). The combination of elements in a tag application is unique- that is, if a user tags a paper twice with the same tag, it counts as only one tag application. Figure 1. Anatomy of a tag application in CiteULike User /(blue) Resources can mean different things for differ- has two applications and user 2 (yellow) has one tag application ent social bookmarking services. With delicio us, for example, the resource is a Web site; with CiteU- Like, it's a scholarly paper. challenges--Puturn possibilit Adapting social bookmarking,'s schematic depic- tion from Ciro Cattuto's work, Figure 1 illustrates the schema for tag applications in CiteULike. This example has three tag applications: (user 1. paper bc."tag“A"),(user1, paper“xyz",tag We analyze CiteULike's data around the (user resource, tag) elements in the tag application We compare and contrast our results, in general with four other social bookmarking analyses. Table 1 briefly lists each one's purpose and how much data the researchers collected while analyz ing that service; we also include the CiteULike data set we analyze in this article CiteULike overview CiteULike is a free online social bookmarking serv ice that lets researchers share, store, and organize nformation about scholarly papers. Users can add links to papers on CiteULike to their own online (b) collections and import references from other schol- arly digital libraries(Figure 2a). For example, users Figure 2. The CiteULike social bookmarking service. ( a)A screenshot can link to an IEEE or Cite Seer paper in their per- of the Web site shows a scholarly paper tagged in CiteULike. b The sonal CiteULike collection. The service also provides tagging page on CiteULike additional information about the paper, such as all users'tags for that paper and the BibTex entry Adding papers to a personal collection and tag- view the link to a favorite paper, they see every- ging them is a two-stem process When users first one's tags for that paper(Figure 2a).However, to NOVEMBER. DECEMBER 2007
Social Bookmarking Anatomy The basic unit of information in a social bookmarking service comprises three elements in a triple, represented as (user, resource, tag). 2 Adapting terminology from previous work, 3 this triple is called a tag application (in which a user applies a tag to a resource; in some cases, it’s also called a tag post). The combination of elements in a tag application is unique — that is, if a user tags a paper twice with the same tag, it counts as only one tag application. Resources can mean different things for different social bookmarking services. With del.icio.us, for example, the resource is a Web site; with CiteULike, it’s a scholarly paper. Adapting social bookmarking’s schematic depiction from Ciro Cattuto’s work, 2 Figure 1 illustrates the schema for tag applications in CiteULike. This example has three tag applications: (user 1, paper “abc,” tag “A”), (user 1, paper “xyz”, tag “A”), and (user 2, paper “abc,” tag “B”). We analyze CiteULike’s data around the (user, resource, tag) elements in the tag application. We compare and contrast our results, in general, with four other social bookmarking analyses. 3–6 Table 1 briefly lists each one’s purpose and how much data the researchers collected while analyzing that service; we also include the CiteULike data set we analyze in this article. CiteULike Overview CiteULike is a free online social bookmarking service that lets researchers share, store, and organize information about scholarly papers. Users can add links to papers on CiteULike to their own online collections and import references from other scholarly digital libraries (Figure 2a). For example, users can link to an IEEE or CiteSeer paper in their personal CiteULike collection. The service also provides additional information about the paper, such as all users’ tags for that paper and the BibTeX entry. Adding papers to a personal collection and tagging them is a two-stem process. When users first view the link to a favorite paper, they see everyone’s tags for that paper (Figure 2a). However, to NOVEMBER • DECEMBER 2007 17 section title headers Figure 1.Anatomy of a tag application in CiteULike.User 1 (blue) has two applications and user 2 (yellow) has one tag application. Paper abc Paper xyz Tag A Tag B User 1 User 2 Table 1. Data from four social bookmarking services, compared with our CiteULike data. Name Purpose Data collection Del.icio.us4 Collaborative tagging system for Web bookmarks Four days (212 URLs; 19,422 bookmarks) Flickr5 Photo-sharing system for users to store and tag their and others’ personal photos No time data (25,000 users) Dogear6 Social bookmarking service for a large enterprise (IBM’s intranet) Eight weeks (13,174 bookmarks; 686 users) MovieLens3 Movie recommender system that also lets users tag their favorite movies Approximately one month (3,263 tags; 635 users) CiteULike Social bookmarking service for sharing, storing, and organizing scholarly papers More than two years (2,011 users;9,623 papers;6,527 tags) Figure 2.The CiteULike social bookmarking service. (a) A screenshot of the Web site shows a scholarly paper tagged in CiteULike. (b) The tagging page on CiteULike. (a) (b)
Social search (185), learning(175), and network(175). The average number of tag applications per paper was 3.35(the total tag applications divided by the total number of papers). The median and modal number of tag applications per paper were 2 and 1, resp The average number of tag applications per user was 16.03(the total tag applications divided by the total users). However, the median and modal number of tag applications per user was 4 and 1, respectively. These figures are close to the 222528 4952 ones for the MovieLens analysis, which reported an average of 18 tag applications per user with a median of 3 Figure 3. Number of users vs number of tag applications. Relatively In MovieLens, relatively few users generated few users generated most of the tag applications most of the tag applications, approximating a power-law distribution. CiteULike's data set is sim ilar, with y= 790.02r1.3484, R2=0.9225(the data add this paper as a favorite, users click on a link set included 1, 921 users for a range of 1 to 55 tag post a copy to your library" )that takes them to applications). Figure 3 shows the relationship a different tagging page(Figure 2b). On this page, between the number of users and the number of users can optionally tag the paper to add it to their tag applications ersonal collection. Users can create new tags(by We also computed the correlation between the typing them in a textbox), which might overlap number of papers each user tagged and the num- with existing tags others have used before, or they ber of distinct tags each user generated. The cor can select existing tags (clicking on a tag automat- relation is high(0.944), and is thus starkly different ally adds it to the textbox) but only ones from from those of other social bookmarking services their personal collections. Note that users don,t For example, in Dogear, the correlation between have the option to select a tag from everyone's tag the number of tags used and the number of book collection; if they want to do this, they have to marks created was 0.56, although it was higher for remember the tag that others used (from when they users with bookmark collections smaller than 10 first viewed the paper's link) and manually type it (0.74). For Flickr, the correlation between distinct in, which we'll discuss in more depth later tags and photos was 0.518, and for del icio us, no trong association existed between the number of General User Activity bookmarks users had created and the number of The analysis we describe here is based on data col- tags they used in those bookmarks. lected between 15 November 2004 and 13 Febru- The high correlation for CiteULike suggests a ary 2007. Although it would be interesting and strong linear relationship between the number of useful to run our analysis on the whole CiteULike papers and the number of distinct tags for each data set, because we are part of the CiteSeer user. This relationship could be due to the fact that research group, the underlying data set we had as users tag more papers, the number of tags in access to comprised only tag applications for their personal tag vocabulary increases. papers in CiteSeer that CiteULike indexes Our data set contained a total of 32, 242 tag Tag Growth applications, 2,011 distinct users, 9, 623 distinct Social bookmarking services premise is that users papers, and 6, 527 distinct tags. The two most pro- collaboratively generate and reuse tags. One way lific users had 3, 883 and 634 tag applications, to index collaboration in social bookmarking serv- while 42 users had 100 or more tag applications. ices is to look at how users create new tags over The two most tagged papers were both coauthored time. We categorized the number of new tags per by Larry Page, and were tagged 135 and 94 month, choosing months as the unit of temporal times, respectively. The five most frequently used analysis (a finer-grained denomination, such tags were clustering(245), p2p(220), logic days or weeks, would have resulted in too many www.computer.org/internet/ IEEE INTERNET COMPUTING
add this paper as a favorite, users click on a link (“post a copy to your library”) that takes them to a different tagging page (Figure 2b). On this page, users can optionally tag the paper to add it to their personal collection. Users can create new tags (by typing them in a textbox), which might overlap with existing tags others have used before, or they can select existing tags (clicking on a tag automatically adds it to the textbox) but only ones from their personal collections. Note that users don’t have the option to select a tag from everyone’s tag collection; if they want to do this, they have to remember the tag that others used (from when they first viewed the paper’s link) and manually type it in, which we’ll discuss in more depth later. General User Activity The analysis we describe here is based on data collected between 15 November 2004 and 13 February 2007. Although it would be interesting and useful to run our analysis on the whole CiteULike data set, because we are part of the CiteSeer research group, the underlying data set we had access to comprised only tag applications for papers in CiteSeer that CiteULike indexes. Our data set contained a total of 32,242 tag applications, 2,011 distinct users, 9,623 distinct papers, and 6,527 distinct tags. The two most prolific users had 3,883 and 634 tag applications, while 42 users had 100 or more tag applications. The two most tagged papers were both coauthored by Larry Page, 7,8 and were tagged 135 and 94 times, respectively. The five most frequently used tags were clustering (245), p2p (220), logic (185), learning (175), and network (175). The average number of tag applications per paper was 3.35 (the total tag applications divided by the total number of papers). The median and modal number of tag applications per paper were 2 and 1, respectively. The average number of tag applications per user was 16.03 (the total tag applications divided by the total users). However, the median and modal number of tag applications per user was 4 and 1, respectively. These figures are close to the ones for the MovieLens3 analysis, which reported an average of 18 tag applications per user with a median of 3. In MovieLens, relatively few users generated most of the tag applications, approximating a power-law distribution. CiteULike’s data set is similar, with y = 790.02x–1.3484, R2 = 0.9225 (the data set included 1,921 users for a range of 1 to 55 tag applications). Figure 3 shows the relationship between the number of users and the number of tag applications. We also computed the correlation between the number of papers each user tagged and the number of distinct tags each user generated. The correlation is high (0.944), and is thus starkly different from those of other social bookmarking services. For example, in Dogear, 6 the correlation between the number of tags used and the number of bookmarks created was 0.56, although it was higher for users with bookmark collections smaller than 10 (0.74). For Flickr, 5 the correlation between distinct tags and photos was 0.518, and for del.icio.us, 4 no strong association existed between the number of bookmarks users had created and the number of tags they used in those bookmarks. The high correlation for CiteULike suggests a strong linear relationship between the number of papers and the number of distinct tags for each user. This relationship could be due to the fact that as users tag more papers, the number of tags in their personal tag vocabulary increases. Tag Growth Social bookmarking services’ premise is that users collaboratively generate and reuse tags. One way to index collaboration in social bookmarking services is to look at how users create new tags over time. We categorized the number of new tags per month, choosing months as the unit of temporal analysis (a finer-grained denomination, such as days or weeks, would have resulted in too many 18 www.computer.org/internet/ IEEE INTERNET COMPUTING Social Search Figure 3.Number of users vs. number of tag applications. Relatively few users generated most of the tag applications. 0 50 100 150 200 250 300 350 400 450 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 Tag applications Users
section title headers 7000 New tags data points to feasibly analyze visually 6000 New users One form of tag vocabulary growth occurs at a diminishing rate over time, which we can perhaps=5,000 expect for a social bookmarking service, as it 9 implies increasing stability in the tag vocabulary. However,for CiteULike, the tag vocabulary seems 9 to be consistently growing. When we plotted the 3.000 new tags'cumulative frequency (their aggregate 3 2.000 ear, as the green line in Figure 4 shos p was lin e think this consistent growth is due to the 1,000 proportional increase in the number of new users. In the citeulike data we identified users as new 5791315171921232527 when they applied a tag for the first time. We cat- Month(November 2004 to February 2007) egorized new users across time (per month), and their cumulative frequency was a linear relation- Figure 4. Cumulative frequency of new tags and new users over ship(the red line in Figure 4), implying that they're time New tags and new users seem to be consistently growing in a also consistently growing linear fashio To compare the cumulative frequencies of new tags and new users across time on the same scale, we calculated the cumulative frequency percent- more accurate and robust tag reuse metric Shilad age. For new tags, we calculate the cumulative fre Sen and colleagues developed for MovieLens, one quency of new tags per month as a percentage of that calculates the number of users per tag accord the total number of tags for new users, and we cal- ing to the following formula culate cumulative frequency month as a percentage of the total number of users. tag reuse The cumulative frequency percentages of new 2(# of distinct users for each tag)/# of tags tags and new users over time are perfectly corre- lated (0.997), both growing at a linear rate and Given that each tag will have at least one associ dependent on each other, which is consistent with ated user, the minimum value for tag reuse is 1 our speculation that as new users apply tags, they users per tag. For CiteULike, tag reuse was 1.59 create new ones users per tag. This is fairly low for tag reuse based on baseline figures from the movielens Tag Reuse analysis. 3 For a social bookmarking service to be highly col We also calculated how many tag reuse occur- laborative, we expect the tag vocabulary to con- rences existed for each tag (number of tag appli verge and tag reuse to increase significantly over cations per tag minus one). The average number of time. We can measure tag reuse in many ways- tag reuse occurrences was 3.9; however, the medi for example, a simple metric is to calculate the an and modal numbers were both zero. This indi- number of tag reuse applications cates that most tags werent reused, but a few tags were reused many times. tag reuse applications Figure 5a shows how many tags have been tag applications -distinct tags reused The r-axis indicates tag reuse occurrences, whereas the y-axis indicates the number of tags The minimum value for tag applications is the We've sorted the data in ascending order of tag number of distinct tags, which implies that the reuse occurrences. For ple, data point“A minimum value for the number of tag reuse appli- indicates that 1,014 tags were reused once; data cations is zero(that is, there is no tag reuse). Using point"B"indicates that 514 tags were reused twice, his metric, CiteULike had 25, 715 tag reuse applica- and so on. The data resembles a power-law distri- tions in our analysis. bution: y= 2043.6.rb/, R=0.9469(the data set This number doesn't tell us a whole lot about included 3, 058 tags for a range of 1 to 48 tag reuse he amount of tag reuse, however. Thus, we use the occurrences). NOVEMBER. DECEMBER 2007
data points to feasibly analyze visually). One form of tag vocabulary growth occurs at a diminishing rate over time, 5 which we can perhaps expect for a social bookmarking service, as it implies increasing stability in the tag vocabulary. However, for CiteULike, the tag vocabulary seems to be consistently growing. When we plotted the new tags’ cumulative frequency (their aggregate summation) across time, the relationship was linear, as the green line in Figure 4 shows. We think this consistent growth is due to the proportional increase in the number of new users. In the CiteULike data, we identified users as new when they applied a tag for the first time. We categorized new users across time (per month), and their cumulative frequency was a linear relationship (the red line in Figure 4), implying that they’re also consistently growing over time. To compare the cumulative frequencies of new tags and new users across time on the same scale, we calculated the cumulative frequency percentage. For new tags, we calculate the cumulative frequency of new tags per month as a percentage of the total number of tags for new users, and we calculate cumulative frequency of new users per month as a percentage of the total number of users. The cumulative frequency percentages of new tags and new users over time are perfectly correlated (0.997), both growing at a linear rate and dependent on each other, which is consistent with our speculation that as new users apply tags, they create new ones. Tag Reuse For a social bookmarking service to be highly collaborative, we expect the tag vocabulary to converge and tag reuse to increase significantly over time. We can measure tag reuse in many ways — for example, a simple metric is to calculate the number of tag reuse applications: tag reuse applications = tag applications – distinct tags The minimum value for tag applications is the number of distinct tags, which implies that the minimum value for the number of tag reuse applications is zero (that is, there is no tag reuse). Using this metric, CiteULike had 25,715 tag reuse applications in our analysis. This number doesn’t tell us a whole lot about the amount of tag reuse, however. Thus, we use the more accurate and robust tag reuse metric Shilad Sen and colleagues developed for MovieLens, 3 one that calculates the number of users per tag according to the following formula: tag reuse = ! (# of distinct users for each tag) / # of tags Given that each tag will have at least one associated user, the minimum value for tag reuse is 1.0 users per tag. For CiteULike, tag reuse was 1.59 users per tag. This is fairly low for tag reuse based on baseline figures from the MovieLens analysis. 3 We also calculated how many tag reuse occurrences existed for each tag (number of tag applications per tag minus one). The average number of tag reuse occurrences was 3.9; however, the median and modal numbers were both zero. This indicates that most tags weren’t reused, but a few tags were reused many times. Figure 5a shows how many tags have been reused. The x-axis indicates tag reuse occurrences, whereas the y-axis indicates the number of tags. We’ve sorted the data in ascending order of tag reuse occurrences. For example, data point “A” indicates that 1,014 tags were reused once; data point “B” indicates that 514 tags were reused twice, and so on. The data resembles a power-law distribution: y = 2043.6x–1.6727, R2 = 0.9469 (the data set included 3,058 tags for a range of 1 to 48 tag reuse occurrences). NOVEMBER • DECEMBER 2007 19 section title headers Figure 4. Cumulative frequency of new tags and new users over time.New tags and new users seem to be consistently growing in a linear fashion. 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 Month (November 2004 to February 2007) Cumulative frequency New tags New users
Social search power-law distribution: y 3707r3172, R2 1.0005A 0.8862 (the data set included 879 users for a range of 1 to 49 tags reused) Does citeUlike Support"Social"Bookmarking? Although CiteULike supports tag reuse, many users didnt reuse tags from others'collections, although they reused tags from their own. We can explain his disparity at a human-computer interaction level. Clearly, the interface that CiteULike gives users during tagging affects their tagging behav- 0"1 357 91113 15 171921 25 27 3335373941434547 ior When users tag papers, the interface lets them (a) Reuse occurrences conveniently select and reuse tags from their per- sonal collections; when they want to reuse tags from outside their collections, however, they can't view them during tagging As mentioned previously, the only way users can deliberately reuse tags from others' collec- tion is to remember them from when they first viewed the article link; through mere coinci dence, they might also reuse a tag. Thus, CiteU- Like doesn't explicitly support reuse through social transactions, which would explain why such tag reuse is low If social bookmarking services want to encour- age greater tag reuse, they should pay particular attention to interface design. For example, in Cite Seer, we' re now designing an integrated tagging 10 13 16 19 22 25 28 31 34 37 40 4346 49 interface such that users can see existing tags Tags reused from both their personal collections and others Encouraging tag reuse requires not only Figure 5. Tag reuse. (a) For tag reuse occurrences, A"indicates that integrated tagging interface but also an appropri 1,014 tags were reused once; " B"indicates that 5 4 tags were ate tagging recommendation system. Not all exist reused twice. b) Users and the frequency of reuse occurrences from ing tags are relevant to every paper; when the their personal collections. "A"indicates that 167 users reused one number of existing tags gets sufficiently large, tag,"B"indicates that 36 users reused two tags users will be cognitively overloaded with respect to browsing and selecting relevant tags Tag rec- ommendation can address this problem by sug- We also wanted to understand how many tags gesting appropriate tags for papers based on Isers were reusing from their personal collections several criteria. Currently, CiteULike presents the (that is, how much a user reuses tags he or she has most frequently used tags (using visual enhance- applied before). The average number of tag reuse ment- that is, a larger font size) in a user's per occurrences for each user was 8.5: the median and sonal collection to that user when he or she tags a modal numbers were 5 and 1, respectively. This paper. Although tag frequency is one heuristic for dicates that users were moderately reusing tags recommending tags, it doesn't have any bearing from their personal collections when tagging new on those tags' relevance to the paper papers. Figure 5b shows the results. Data point"A A more practical way to recommend tags is to indicates that 167 users reused one tag from their compare similarities between papers and their personal collections; data point"B" indicates that associated tags. When a user is about to tag a new 136 users reused two tags from their personal col- paper, an automatic tagging recommendation sys- lections, and so on. Again, the data resembled a tem can suggest relevant tags based on similarity www.computer.org/internet/ IEEE INTERNET COMPUTING
We also wanted to understand how many tags users were reusing from their personal collections (that is, how much a user reuses tags he or she has applied before). The average number of tag reuse occurrences for each user was 8.5; the median and modal numbers were 5 and 1, respectively. This indicates that users were moderately reusing tags from their personal collections when tagging new papers. Figure 5b shows the results. Data point “A” indicates that 167 users reused one tag from their personal collections; data point “B” indicates that 136 users reused two tags from their personal collections, and so on. Again, the data resembled a power-law distribution: y = 370.7x–1.3172, R2 = 0.8862 (the data set included 879 users for a range of 1 to 49 tags reused). Does CiteULike Support “Social” Bookmarking? Although CiteULike supports tag reuse, many users didn’t reuse tags from others’ collections, although they reused tags from their own. We can explain this disparity at a human–computer interaction level. Clearly, the interface that CiteULike gives users during tagging affects their tagging behavior. When users tag papers, the interface lets them conveniently select and reuse tags from their personal collections; when they want to reuse tags from outside their collections, however, they can’t view them during tagging. As mentioned previously, the only way users can deliberately reuse tags from others' collection is to remember them from when they first viewed the article link; through mere coincidence, they might also reuse a tag. Thus, CiteULike doesn’t explicitly support reuse through social transactions, which would explain why such tag reuse is low. If social bookmarking services want to encourage greater tag reuse, they should pay particular attention to interface design. For example, in CiteSeer, we’re now designing an integrated tagging interface such that users can see existing tags, from both their personal collections and others’. Encouraging tag reuse requires not only an integrated tagging interface but also an appropriate tagging recommendation system. Not all existing tags are relevant to every paper; when the number of existing tags gets sufficiently large, users will be cognitively overloaded with respect to browsing and selecting relevant tags. Tag recommendation can address this problem by suggesting appropriate tags for papers based on several criteria. Currently, CiteULike presents the most frequently used tags (using visual enhancement — that is, a larger font size) in a user’s personal collection to that user when he or she tags a paper. Although tag frequency is one heuristic for recommending tags, it doesn’t have any bearing on those tags’ relevance to the paper. A more practical way to recommend tags is to compare similarities between papers and their associated tags. When a user is about to tag a new paper, an automatic tagging recommendation system can suggest relevant tags based on similarity 20 www.computer.org/internet/ IEEE INTERNET COMPUTING Social Search Figure 5.Tag reuse. (a) For tag reuse occurrences,“A” indicates that 1,014 tags were reused once;“B” indicates that 514 tags were reused twice. (b) Users and the frequency of reuse occurrences from their personal collections.“A” indicates that 167 users reused one tag;“B” indicates that 136 users reused two tags. 0 200 400 600 800 1,000 1,200 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 A B 0 20 40 60 80 100 120 140 160 180 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 A B Tags Users Reuse occurrences Tags reused (a) (b)