Www 2008/ Refereed Track: Rich Media April 21-25, 2008. Beijing, China Flickr Tag recommendation based on collective Knowledge okur Siqurbjornsson Roelof van Zwol C/Cata 1 C/Cata 1 08003 Barcelona 08003 Barcelona Spain borkur@yahoo-inc.com roelof@yahoo-inc.com ⊥ BSTRACT objects that provide little or no textual context, such Online photo services such as Flickr and Zooomr allow users bookmarks, photos and videos. to share their photos with family, friends, and the online The availability of rich media annotations is essential for community at large. An important facet of these services he cur- is that users manually annotate their photos using so called rent state-of-the-art in content-based image retrieval is pro- tags, which describe the contents of the photo or provide gressing, but has not yet succeeded in bridging the semantic additional contextual and semantical information. In this gap between human concepts, e. g, keyword-based queries and low-level visual features that are extracted from the im- ase. The contribution of our research is twofold. We ages[22]. However, the success of Flickr proves that users analyse a representative snapshot of Flickr and present the are willing to provide this semantic context through manual results by means of a tag characterisation focussing on how annotations. Recent user studies on this topic reveal that users tags photos and what information is contained in the sers do annotate their photos with the motivation to make tagging. Based on this analysis, we present and evaluate tag them better accessible to the general public [4 recommendation strategies to support the user in the photo Photo annotations provided by the user reflect the per- annotation task by recommending a set of tags that can be sonal perspective and context that is important to the photo added to the photo. The results of the empirical evaluation owner and her audience. This implies that if the same photo show that we can effectively recommend relevant tags for a would be annotated by another user it is possible that a dif- variety of photos with different levels of exhaustiveness of ferent description is produced. In Flickr, one can find many riginal tagging hotos on the same subject from many different users, which are consequentially described by a wide variety of tags Categories and subject Descriptors For example, a Flickr photo of La Sagrada Familia massive Roman Catholic basilica under construction in H.3.1 Information Storage and Retrieval]: Content Barcelona -is described by its owner using the tags Sagrada Analysis and Indexing: H 3.5 [ Information Storage and Familia, and Barcelona. Using the collective knowledge that Retrieval]: Online Information Services resides in Flickr community on this particular topic one can extend the description of the photo with the tags: Gaudi General terms Spain, Catalunya, architecture, and church. This extension provides a richer semantical description of the photo and can Algorithms, Experimentation, Performance be used to retrieve the photo for a larger range of keyword queries. Keywords The contribution of this paper is twofold. First we analyse Flickr, tag characterisation, tag recommendation, photo an- " how users tag photos"and"what kind of tags they provide", notations, collective knowledge, tag co-occurence, aggregated based on a representative snapshot of Flickr consisting of 52 tag suggestion. million publicly available photos. Second, we present four different tag recommendation strategies to support to the 1. INTRODUCTION user when annotating photos by tapping into the collective knowledge of the Flickr community as a whole. With the In recent years, tagging -the act of adding keywords incredible amount of photos being tagged by users, we can (tags)to objects- has become a popular means to anno- derive relationships between tags, using global co-occurrence tate various web resources, such as web page bookmarks [8, metrics. Given a user-defined tag and a photo, tags co- academic publications [6, and multimedia objects [ 11, 25 occurring with the user-defined tag are usually good car The tags provide meaningful descriptors of the objects, an didates for recommendation but their relevance of course allow the user to organise and index her content. This be- depends on the photo. Likewise, for a given set of user- comes even more important, when dealing with multimedia defined tags and a photo, tags co-occurring with tags in the Copyright is held by the International World Wide Web Conference Com- set are good candidates. However, in this case a tag aggre- mittee(IW3C2). Distribution of these papers is limited to classroom use. gation step is needed to produce the short list of tags that ill be recommended www 2008, April 21-25, 2008, Beijing, China. ACM978-1-60558-085-2/08/04
Flickr Tag Recommendation based on Collective Knowledge Börkur Sigurbjörnsson Yahoo! Research C/Ocata 1 08003 Barcelona Spain borkur@yahoo-inc.com Roelof van Zwol Yahoo! Research C/Ocata 1 08003 Barcelona Spain roelof@yahoo-inc.com ABSTRACT Online photo services such as Flickr and Zooomr allow users to share their photos with family, friends, and the online community at large. An important facet of these services is that users manually annotate their photos using so called tags, which describe the contents of the photo or provide additional contextual and semantical information. In this paper we investigate how we can assist users in the tagging phase. The contribution of our research is twofold. We analyse a representative snapshot of Flickr and present the results by means of a tag characterisation focussing on how users tags photos and what information is contained in the tagging. Based on this analysis, we present and evaluate tag recommendation strategies to support the user in the photo annotation task by recommending a set of tags that can be added to the photo. The results of the empirical evaluation show that we can effectively recommend relevant tags for a variety of photos with different levels of exhaustiveness of original tagging. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing; H.3.5 [Information Storage and Retrieval]: Online Information Services General Terms Algorithms, Experimentation, Performance Keywords Flickr, tag characterisation, tag recommendation, photo annotations, collective knowledge, tag co-occurence, aggregated tag suggestion. 1. INTRODUCTION In recent years, tagging – the act of adding keywords (tags) to objects – has become a popular means to annotate various web resources, such as web page bookmarks [8], academic publications [6], and multimedia objects [11, 25]. The tags provide meaningful descriptors of the objects, and allow the user to organise and index her content. This becomes even more important, when dealing with multimedia Copyright is held by the International World Wide Web Conference Committee (IW3C2). Distribution of these papers is limited to classroom use, and personal use by others. WWW 2008, April 21–25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. objects that provide little or no textual context, such as bookmarks, photos and videos. The availability of rich media annotations is essential for large-scale retrieval systems to work in practice. The current state-of-the-art in content-based image retrieval is progressing, but has not yet succeeded in bridging the semantic gap between human concepts, e.g., keyword-based queries, and low-level visual features that are extracted from the images [22]. However, the success of Flickr proves that users are willing to provide this semantic context through manual annotations. Recent user studies on this topic reveal that users do annotate their photos with the motivation to make them better accessible to the general public [4]. Photo annotations provided by the user reflect the personal perspective and context that is important to the photo owner and her audience. This implies that if the same photo would be annotated by another user it is possible that a different description is produced. In Flickr, one can find many photos on the same subject from many different users, which are consequentially described by a wide variety of tags. For example, a Flickr photo of La Sagrada Familia – a massive Roman Catholic basilica under construction in Barcelona – is described by its owner using the tags Sagrada Familia, and Barcelona. Using the collective knowledge that resides in Flickr community on this particular topic one can extend the description of the photo with the tags: Gaudi, Spain, Catalunya, architecture, and church. This extension provides a richer semantical description of the photo and can be used to retrieve the photo for a larger range of keyword queries. The contribution of this paper is twofold. First we analyse “how users tag photos” and “what kind of tags they provide”, based on a representative snapshot of Flickr consisting of 52 million publicly available photos. Second, we present four different tag recommendation strategies to support to the user when annotating photos by tapping into the collective knowledge of the Flickr community as a whole. With the incredible amount of photos being tagged by users, we can derive relationships between tags, using global co-occurrence metrics. Given a user-defined tag and a photo, tags cooccurring with the user-defined tag are usually good candidates for recommendation, but their relevance of course depends on the photo. Likewise, for a given set of userdefined tags and a photo, tags co-occurring with tags in the set are good candidates. However, in this case a tag aggregation step is needed to produce the short list of tags that will be recommended. 327 WWW 2008 / Refereed Track: Rich Media April 21-25, 2008. Beijing, China
Www 2008/ Refereed Track: Rich Media April 21-25, 2008. Beijing, China We evaluate the four tag recommendation strategies in an earn points suggesting same tags as another player. The experimental evaluation, by implementing a blind pooling mobile photo upload tool ZoneTag provides tag suggestion method to collect candidate tags for a given photo with user- based on personal history, geographic location and time 24 defined tags. We repeatedly measure the performance on The different approaches work on different input data and 200 randomly selected photos with a varying number of user- thus complement each other. The methods we present here defined tags per photo. The number of user-defined tags per complement the ones above since we use yet a different in photo range from a single tag for sparsely annotated photos put data, namely, the tags assigned originally by the photo to more than six tags for exhaustively annotated photos. We owner. Our method can be applied on top of any of the measure the effectiveness of the recommendation strategies tagging methods described above. using four different metrics to gain detailed insight in the Our co-occurrence analysis is related to the construction performance of term hierarchies and ontologies that have been studied in We envision two potential applications for the recommen- the information retrieval and semantic dation strategies. In one application, the recommendations 17, 21. However, in the case of Flickr, the vocabulary is are presented to the user, who can select the relevant tag unlimited, and relations between nodes in the graph have and add them to the photo. Alternatively, the recommended an uncontrolled nature. Despite these two aspects, we use gs are directly used to enrich the index of an image re. similar concepts to analyse the tag relations. trieval system. There has been some previous work on adding semantic The remainder of the paper is structured as follows. We labels to Flickr tags. Rattenbury et al. describe an approach start with discussing the related work in Section 2, followed for extracting event and place semantics of tags [18. The by the analysis of tag behaviour in Flickr in Section 3, where intuition behind their methods is that event- and place-tags we focus on tag frequencies and tag semantics. In Section 4"burst" in a specific segments of time or regions in space, we present the four tag recommendation strategies for ex respectively. Their evaluation is based on a set of geotagged tending photo annotations in Flickr. The setup of the ex- Flickr photographs. Using the method described above they perimental evaluation is described in Section 5, while the were able to achieve fairly high precision of classifying tags as results of the experiment are presented in Section 6. Fi either a place or event. The semantic tag analysis presented nally, in Section 7 we come to the conclusions and explore in this paper we complement this method using WordNet to add a richer set of semantic tags 2. RELATED WORK 3. TAG BEHAVIOUR IN FLICKR Tagging is a popular means of annotating objects on the In this section we describe the Flickr photo collection that web. A detailed account of different types of tagging sys is used for the evaluation, and we provide insights in the ems can be found in [13 and [16]. The tags have been photo tagging behaviour of users. In particular we are in- shown to be useful to give improved access to photo collec- terested in discovering "How do users tag? " and"what are tions both using temporal information 9 and geographic in- they tagging? ". Besides these two aspects, a third aspect is ormation 3. The methods we present in this paper extend of importance, when studying tag behaviour in Flickr:"Why the tagging of individual photos making them even more do people tag?". This aspect is studied thoroughly in [23 useful for the visualisation applications above 16, 14, 4. There it is concluded that users are highly driven The usefulness of tagging information depends on the mo- by social incentives. tivation of users. Ames and Naaman explore the motiva tions for tagging photographs in mobile and online media [4 3.1 Flickr photo collection Their investigation focuses on the use of the Zone Tag 2 Flickr is an online photo-sharing service that contains hun- 24] application in combination with Flickr, where users can dreds of millions of photos that are uploaded, tagged and or- phones.They finds photos to Flickr using their mobile ganised by more then 8.5 million registered Web-users. To photos for organisation for the general public. They con- times up to 12, 000 photos are being served per second, and clude that the tag-suggestion option included in ZoneTag the record for number of photos uploaded per day exceeds 2 encourages users to tag their photos. However, suggesting million photos [ 12. For the research described in this paper non-obvious tags may be confusing for users. Furthermore, we have used a random snapshot from Flickr of 52 million users may be inclined to add suggested tags, even if they are publicly available photos with annotations. The photos were not immediately relevant uploaded between February 2004 and June 2007 and each Various methods exist to(semi-automatically annotate photo has at least one user-defined tag features to semantic labels 5, 151. The mappings from visual 3.2 General Tag Characteristics photographs. In the image processing and machine learning When developing tag recommendation strategies, it is im- put a set of labelled images and try to learn which low level portant to analyse why, how, and what users are tagging visual features correspond to higher level semantic labels. The focus in this section is on how users tag their photos The mapping can then be applied to suggest labels for u The collection we use in this paper consists of over 52 labelled images based on visual features alone. For a more million photos that contain about 188 million tags in to- detailed account of content-based analysis for image annot tal, and about 3.7 million unique tags. Figure 1 shows the ion we refer to a recent overview paper by Datta et al. 7. distribution of the tag frequency on a log-log scale. The The ESP game is a tool for adding meaningful labels to im- x-axis represents the 3.7 million unique tags, ordered by ages using a computer game 23. Users play the game by scending tag frequency. The y-axis refers to the tag fre- suggesting tags for photos that appear on their screen and quency. The distribution can be modeled quite accurately
We evaluate the four tag recommendation strategies in an experimental evaluation, by implementing a blind pooling method to collect candidate tags for a given photo with userdefined tags. We repeatedly measure the performance on 200 randomly selected photos with a varying number of userdefined tags per photo. The number of user-defined tags per photo range from a single tag for sparsely annotated photos to more than six tags for exhaustively annotated photos. We measure the effectiveness of the recommendation strategies using four different metrics to gain detailed insight in the performance. We envision two potential applications for the recommendation strategies. In one application, the recommendations are presented to the user, who can select the relevant tags and add them to the photo. Alternatively, the recommended tags are directly used to enrich the index of an image retrieval system. The remainder of the paper is structured as follows. We start with discussing the related work in Section 2, followed by the analysis of tag behaviour in Flickr in Section 3, where we focus on tag frequencies and tag semantics. In Section 4 we present the four tag recommendation strategies for extending photo annotations in Flickr. The setup of the experimental evaluation is described in Section 5, while the results of the experiment are presented in Section 6. Finally, in Section 7 we come to the conclusions and explore future directions. 2. RELATED WORK Tagging is a popular means of annotating objects on the web. A detailed account of different types of tagging systems can be found in [13] and [16]. The tags have been shown to be useful to give improved access to photo collections both using temporal information [9] and geographic information [3]. The methods we present in this paper extend the tagging of individual photos making them even more useful for the visualisation applications above. The usefulness of tagging information depends on the motivation of users. Ames and Naaman explore the motivations for tagging photographs in mobile and online media [4]. Their investigation focuses on the use of the ZoneTag [2, 24] application in combination with Flickr, where users can upload and annotate photos to Flickr using their mobile phones. They find that most users are motivated to tag photos for organisation for the general public. They conclude that the tag-suggestion option included in ZoneTag encourages users to tag their photos. However, suggesting non-obvious tags may be confusing for users. Furthermore, users may be inclined to add suggested tags, even if they are not immediately relevant. Various methods exist to (semi-)automatically annotate photographs. In the image processing and machine learning communities there is work on learning mappings from visual features to semantic labels [5, 15]. The methods take as input a set of labelled images and try to learn which low level visual features correspond to higher level semantic labels. The mapping can then be applied to suggest labels for unlabelled images based on visual features alone. For a more detailed account of content-based analysis for image annotation we refer to a recent overview paper by Datta et al. [7]. The ESP game is a tool for adding meaningful labels to images using a computer game [23]. Users play the game by suggesting tags for photos that appear on their screen and earn points suggesting same tags as another player. The mobile photo upload tool ZoneTag provides tag suggestion based on personal history, geographic location and time [24]. The different approaches work on different input data and thus complement each other. The methods we present here complement the ones above since we use yet a different input data, namely, the tags assigned originally by the photo owner. Our method can be applied on top of any of the tagging methods described above. Our co-occurrence analysis is related to the construction of term hierarchies and ontologies that have been studied in the information retrieval and semantic web communities [20, 17, 21]. However, in the case of Flickr, the vocabulary is unlimited, and relations between nodes in the graph have an uncontrolled nature. Despite these two aspects, we use similar concepts to analyse the tag relations. There has been some previous work on adding semantic labels to Flickr tags. Rattenbury et al. describe an approach for extracting event and place semantics of tags [18]. The intuition behind their methods is that event- and place-tags “burst” in a specific segments of time or regions in space, respectively. Their evaluation is based on a set of geotagged Flickr photographs. Using the method described above they were able to achieve fairly high precision of classifying tags as either a place or event. The semantic tag analysis presented in this paper we complement this method using WordNet to add a richer set of semantic tags. 3. TAG BEHAVIOUR IN FLICKR In this section we describe the Flickr photo collection that is used for the evaluation, and we provide insights in the photo tagging behaviour of users. In particular we are interested in discovering “How do users tag?” and “What are they tagging?”. Besides these two aspects, a third aspect is of importance, when studying tag behaviour in Flickr: “Why do people tag?”. This aspect is studied thoroughly in [23, 16, 14, 4]. There it is concluded that users are highly driven by social incentives. 3.1 Flickr Photo Collection Flickr is an online photo-sharing service that contains hundreds of millions of photos that are uploaded, tagged and organised by more then 8.5 million registered Web-users. To get some feeling for the size of the operation, during peak times up to 12,000 photos are being served per second, and the record for number of photos uploaded per day exceeds 2 million photos [12]. For the research described in this paper we have used a random snapshot from Flickr of 52 million publicly available photos with annotations. The photos were uploaded between February 2004 and June 2007 and each photo has at least one user-defined tag. 3.2 General Tag Characteristics When developing tag recommendation strategies, it is important to analyse why, how, and what users are tagging. The focus in this section is on how users tag their photos. The collection we use in this paper consists of over 52 million photos that contain about 188 million tags in total, and about 3.7 million unique tags. Figure 1 shows the distribution of the tag frequency on a log-log scale. The x-axis represents the 3.7 million unique tags, ordered by descending tag frequency. The y-axis refers to the tag frequency. The distribution can be modeled quite accurately 328 WWW 2008 / Refereed Track: Rich Media April 21-25, 2008. Beijing, China
Www 2008/ Refereed Track: Rich Media April 21-25, 2008. Beijing, China 1e+06 Unclassifed Location or Object Person or Group Action or Event Time Other 100000 10000 1000 100 00100000 Figure 3: Most frequent WordNet categories for ure 1: Distribution of the Tag Frequency in Flickr tags by a power law[19, 1], and the probability of a tag having ally exhaustively annotated, as there are photos that have tag frequency r is proportional to x-1.15. With respect to provide useful recommendations in such a case. The tail of the tag recommendation task, the head of the power law the power law consists of more than 15 million photos with ontains tags that would be too generic to be useful as only a single tag annotated and 17 million photos having ag suggestion. For example the top 5 most frequent occur- only 2 or 3 tags. Together this already covers 64% of the ring tags are: 2006, 2005, wedding, party, and 2004. The photos. Typically, these are the cases where we expect tag very tail of the power law contains the infrequent tags that recommendation to be useful to extend the annotation of ypically can be categorised as incidentally occurring words he photo uch as mis-spellings, and complex phra To analyse the behaviour of the tag recommendation sys- ambrose tompkins, ambient vector, and more than 15.7 mil- tems for photos with different levels of exhaustiveness of the lion other tags that occur only once in this Flickr snapshot original annotation, we have defined four classes, as shown Due to their infrequent nature, we expect that these highly in Table 1. The classes differentiate from sparsely annotated specific tags will only be useful recommendations in excep- to exhaustively to exhaustively annotated photos, and take the distribution tional cases of the number of tags per photo into account as is shown in the last column of the table. In Section 6. we will use this categorisation to analyse the performance for the different annotation classes Tags per photo Photos Class i ≈15.500.000 Class I 2-3 7.500.000 Class I 4-6 ≈12,000.,000 Class IV >6 ≈7000,000 Table 1: The definition of photo-tag classes and the number of photos in each class 3.3 Tag Categorisation To answer the question " What are users tagg have mapped Flickr tags onto the WordNet broad cate- gores Figure 2: Distribution of the number of tags per egory entries are defined for a term. In that case, the tag is photo in Flickr. bound to the category with the highest ranking. Consider for example the tag London. According to wordNet, London Figure 2 shows the distribution of the number of tags per belongs to two categories: noun location, which refers to photo also follows a power law distribution. The x-axis rep- the city London, and noun person, referring to the novelist resents the 52 million photos, ordered by the number of tags Jack London. In this case the location category is ranked per photo(descending). The y-axis refers to the number higher than the person. Hence, we consider the tag London of tags assigned to the corresponding photo. The proba- to refer to the location bility of having z tags per photo is proportional to x Figure 3 shows the distribution of Flickr tags over the Again, in context of the tag recommendation task, the head nost common WordNet categories. Following this approach of the power law contains photos that are already exception- we can classify 52% of the tags in the collection, leaving 48%
1 10 100 1000 10000 100000 1e+06 1 10 100 1000 10000 100000 tag frequency tag Figure 1: Distribution of the Tag Frequency in Flickr. by a power law [19, 1], and the probability of a tag having tag frequency x is proportional to x −1.15. With respect to the tag recommendation task, the head of the power law contains tags that would be too generic to be useful as a tag suggestion. For example the top 5 most frequent occurring tags are: 2006, 2005, wedding, party, and 2004. The very tail of the power law contains the infrequent tags that typically can be categorised as incidentally occurring words, such as mis-spellings, and complex phrases. For example: ambrose tompkins, ambient vector, and more than 15.7 million other tags that occur only once in this Flickr snapshot. Due to their infrequent nature, we expect that these highly specific tags will only be useful recommendations in exceptional cases. 1 10 100 1000 1 10 100 1000 10000 100000 1e+06 number of tags photo Figure 2: Distribution of the number of tags per photo in Flickr. Figure 2 shows the distribution of the number of tags per photo also follows a power law distribution. The x-axis represents the 52 million photos, ordered by the number of tags per photo (descending). The y-axis refers to the number of tags assigned to the corresponding photo. The probability of having x tags per photo is proportional to x −0.33 . Again, in context of the tag recommendation task, the head of the power law contains photos that are already exception- 28% 16% 13% 9% 7% 27% Unclassified Location Artefact or Object Person or Group Action or Event Time Other 48% Figure 3: Most frequent WordNet categories for Flickr tags. ally exhaustively annotated, as there are photos that have more than 50 tags defined. Obviously, it will be hard to provide useful recommendations in such a case. The tail of the power law consists of more than 15 million photos with only a single tag annotated and 17 million photos having only 2 or 3 tags. Together this already covers 64% of the photos. Typically, these are the cases where we expect tag recommendation to be useful to extend the annotation of the photo. To analyse the behaviour of the tag recommendation systems for photos with different levels of exhaustiveness of the original annotation, we have defined four classes, as shown in Table 1. The classes differentiate from sparsely annotated to exhaustively annotated photos, and take the distribution of the number of tags per photo into account as is shown in the last column of the table. In Section 6, we will use this categorisation to analyse the performance for the different annotation classes. Tags per photo Photos Class I 1 ≈ 15,500,000 Class II 2 – 3 ≈ 17,500,000 Class III 4 – 6 ≈ 12,000,000 Class IV > 6 ≈ 7,000,000 Table 1: The definition of photo-tag classes and the number of photos in each class. 3.3 Tag Categorisation To answer the question “What are users tagging?”, we have mapped Flickr tags onto the WordNet broad categories [10]. In a number of cases, multiple WordNet category entries are defined for a term. In that case, the tag is bound to the category with the highest ranking. Consider for example the tag London. According to WordNet, London belongs to two categories: noun.location, which refers to the city London, and noun.person, referring to the novelist Jack London. In this case the location category is ranked higher than the person. Hence, we consider the tag London to refer to the location. Figure 3 shows the distribution of Flickr tags over the most common WordNet categories. Following this approach, we can classify 52% of the tags in the collection, leaving 48% 329 WWW 2008 / Refereed Track: Rich Media April 21-25, 2008. Beijing, China
Www 2008/ Refereed Track: Rich Media April 21-25, 2008. Beijing, China of the tags unclassified, as depicted in the in-set of Figure 3. The coefficient takes the number of intersections between When focussing on the set of classified tags, we find that the two tags, divided by the union of the two tags. The locations are tagged most frequent(28%); followed by arti- Jaccard coefficient is know to be useful to measure the sim- facts or objects(16%), people or groups(13%), actions or ilarity between two objects or sets. In general, we can use events(9%), and time(7%). The category other(27%)con- symmetric measures, like Jaccard, to induce whether two tains the set of tags that is classified by the wordNet broad ags have a similar meaning. categories, but does not belong any of the before mentioned categories. From this information, we can conclude that Asymmetric measures. Alternatively, tag co-occurrence can users do not only tag the visual contents of the photo, but be normalised using the frequency of one of the tags. For o a large extent provide a broader context in which the instance, using the equation photo was taken, such as, location, time, and actions. P(t|t1):= 4. TAG RECOMMENDATION STRATEGIES In this section we provide a detailed description of the ta It captures how often the tag ti co-occurs with tag t,nor- recommendation system. We start with a general overview malised by the total frequency of tag ti. We can interpret of the system architecture, followed by an introduction of this as the probability of a photo being annotated with tag he tag co-occurrence metrics used. Finally, we explain the t, given the it was annotated with tag ti. Several variations tag aggregation and promotion strategies that are used by of asymmetric co-occurrence measure have been proposed in the system and evaluated in the experiment literature before to build tag(or term) hierarchies [20, 17 2] 4.1 Tag Recommendation System To illustrate the difference between symmetric and asym- Figure 4 provides an overview of the tag recommenda- metric co-occurrence measures consider the tag Eiffel Tower. tion process. Given a photo with user-defined tags, an For the symmetric measure we find that the most co-occurring dered list of m candidate tags is derived for each of the tags are(in order ): Tour Eiffel, Eiffel Seine, La Tour Eiffel user-defined tags, based on tag co-occurrence. The lists of and Paris. When using the asymmetric measure the most candidate tags are then used as input for tag aggregation co-occurring tags are (in order ) Paris, france, Tour Eif- and ranking, which ultimately produces the ranked list of n fel, Eiffel and Europe. It shows that the Jaccard symmetric ecommended tags. Consider the example given in Figure 4 coefficient is good at identifying equivalent tags, like Tour there are two tags defined by the user: Sagrada Familia and Eiffel, Eiffel, and La Tour Eiffel, or picking up a close by Barcelona.For both tags, a list of 6 co-occurring tags is landmark such as the Seine. Based on this observation, it is derived. They have some tags in common, such as Spain, more likely that asymmetric tag co-occurrence will provide a Gaudi, and Catalunya, while the other candidate tags only more suitable diversity of candidate tags than its symmetric appear in one. After aggregation and ranking 5 tags are being recommended: Gaudi, Spain, Catalunya, architecture, and church. The actual number of tags being recommended 4.3 Tag aggregation and promotion should of course depend on the relevancy of the tags, and When the lists of candidate tags for each of the user- varies for each different application defined tags are known, a tag aggregation step is needed to merge the lists into a single ranking. In this section, we 4.2 Tag Co-occurrence define two aggregation methods, based on voting and sum- Tag co-occurrence is the key to our tag recommendation ming that serve this purpose. Furthermore, we implemented approach, and only works reliable when a large quantity a re-ranking procedure that promotes candidate tags having f supporting data is available. Obviously, the amount of certain properties ser-generated content that is created by Flickr users, satis- In the this section we refer to three different types of fies this demand and provides the collective knowledge base that is needed to make tag recommendation systems work in to the set of tas practise. In this sub-section we look at various methods to calculate co-occurrence coefficients between of two tags. We Candidate tags Cu is the ranked list with the top m define the co-occurrence between two tags to be the number nost co-occurring tags, for a user-defined taguE U. f photos lin our collection where both tags are used in the We denote C to refer to the union of all candidate tags same annotation for each user-defined tag uEU. Using the raw tag co-occurrence for computing the quality f the relationship between two tags is not very meaningful Recommended tags R is the ranked list of n most s these values do not take the frequency of the individual elevant tags produced by the tag recommendation sys- tags into account. Therefore it is common to normalise the CO-occurrence count with the overall frequency of the tags. For a given set of candidate tags(C)a tag aggregation There are essentially two different normalisation methods step is needed to produce the final list of recommended tags symmetric and asymmetric. R), whenever there is more than one user-defined tag. In this section, we define two aggregation strategies. One strat- Symmetric measures. According to the Jaccard coefficient egy is based on voting, and does not take the co-occurrence re can normalise the co-occurrence of two tags ti and t, by values of the candidate tags into account, while the summing calculating strategy uses the co-occurrence values to produce the final ranking. In both e apply the strategy to the top m J(t,t1;):= t:∩tl cO-occurring tags in the list 330
of the tags unclassified, as depicted in the in-set of Figure 3. When focussing on the set of classified tags, we find that locations are tagged most frequent (28%); followed by artifacts or objects (16%), people or groups (13%), actions or events (9%), and time (7%). The category other (27%) contains the set of tags that is classified by the WordNet broad categories, but does not belong any of the before mentioned categories. From this information, we can conclude that users do not only tag the visual contents of the photo, but to a large extent provide a broader context in which the photo was taken, such as, location, time, and actions. 4. TAG RECOMMENDATION STRATEGIES In this section we provide a detailed description of the tag recommendation system. We start with a general overview of the system architecture, followed by an introduction of the tag co-occurrence metrics used. Finally, we explain the tag aggregation and promotion strategies that are used by the system and evaluated in the experiment. 4.1 Tag Recommendation System Figure 4 provides an overview of the tag recommendation process. Given a photo with user-defined tags, an ordered list of m candidate tags is derived for each of the user-defined tags, based on tag co-occurrence. The lists of candidate tags are then used as input for tag aggregation and ranking, which ultimately produces the ranked list of n recommended tags. Consider the example given in Figure 4, there are two tags defined by the user: Sagrada Familia and Barcelona. For both tags, a list of 6 co-occurring tags is derived. They have some tags in common, such as Spain, Gaudi, and Catalunya, while the other candidate tags only appear in one. After aggregation and ranking 5 tags are being recommended: Gaudi, Spain, Catalunya, architecture, and church. The actual number of tags being recommended should of course depend on the relevancy of the tags, and varies for each different application. 4.2 Tag Co-occurrence Tag co-occurrence is the key to our tag recommendation approach, and only works reliable when a large quantity of supporting data is available. Obviously, the amount of user-generated content that is created by Flickr users, satis- fies this demand and provides the collective knowledge base that is needed to make tag recommendation systems work in practise. In this sub-section we look at various methods to calculate co-occurrence coefficients between of two tags. We define the co-occurrence between two tags to be the number of photos [in our collection] where both tags are used in the same annotation. Using the raw tag co-occurrence for computing the quality of the relationship between two tags is not very meaningful, as these values do not take the frequency of the individual tags into account. Therefore it is common to normalise the co-occurrence count with the overall frequency of the tags. There are essentially two different normalisation methods: symmetric and asymmetric. Symmetric measures. According to the Jaccard coefficient we can normalise the co-occurrence of two tags ti and tj by calculating: J(ti, tj ) := |ti ∩ tj | |ti ∪ tj | (1) The coefficient takes the number of intersections between the two tags, divided by the union of the two tags. The Jaccard coefficient is know to be useful to measure the similarity between two objects or sets. In general, we can use symmetric measures, like Jaccard, to induce whether two tags have a similar meaning. Asymmetric measures. Alternatively, tag co-occurrence can be normalised using the frequency of one of the tags. For instance, using the equation: P(tj |ti) := |ti ∩ tj | |ti| (2) It captures how often the tag ti co-occurs with tag tj normalised by the total frequency of tag ti. We can interpret this as the probability of a photo being annotated with tag tj given the it was annotated with tag ti. Several variations of asymmetric co-occurrence measure have been proposed in literature before to build tag (or term) hierarchies [20, 17, 21]. To illustrate the difference between symmetric and asymmetric co-occurrence measures consider the tag Eiffel Tower. For the symmetric measure we find that the most co-occurring tags are (in order): Tour Eiffel, Eiffel, Seine, La Tour Eiffel and Paris. When using the asymmetric measure the most co-occurring tags are (in order): Paris, France, Tour Eiffel, Eiffel and Europe. It shows that the Jaccard symmetric coefficient is good at identifying equivalent tags, like Tour Eiffel, Eiffel, and La Tour Eiffel, or picking up a close by landmark such as the Seine. Based on this observation, it is more likely that asymmetric tag co-occurrence will provide a more suitable diversity of candidate tags than its symmetric opponent. 4.3 Tag Aggregation and Promotion When the lists of candidate tags for each of the userdefined tags are known, a tag aggregation step is needed to merge the lists into a single ranking. In this section, we define two aggregation methods, based on voting and summing that serve this purpose. Furthermore, we implemented a re-ranking procedure that promotes candidate tags having certain properties. In the this section we refer to three different types of tags: • User-defined tags U refers to the set of tags that the user assigned to a photo. • Candidate tags Cu is the ranked list with the top m most co-occurring tags, for a user-defined tag u ∈ U. We denote C to refer to the union of all candidate tags for each user-defined tag u ∈ U. • Recommended tags R is the ranked list of n most relevant tags produced by the tag recommendation system. For a given set of candidate tags (C) a tag aggregation step is needed to produce the final list of recommended tags (R), whenever there is more than one user-defined tag. In this section, we define two aggregation strategies. One strategy is based on voting, and does not take the co-occurrence values of the candidate tags into account, while the summing strategy uses the co-occurrence values to produce the final ranking. In both cases, we apply the strategy to the top m co-occurring tags in the list. 330 WWW 2008 / Refereed Track: Rich Media April 21-25, 2008. Beijing, China
Www 2008/ Refereed Track: Rich Media April 21-25, 2008. Beijing, China User-defined Tags Candidate Tags Recommended Tags Sagrada Familia Sagrada Familia: Barcelona Gaudi Catalunya d Spain architecture architecture 正| church church 8 Barcelona G0 au6 d i Catalunya Euro Figure 4: System overview of the tag recommendation process. Vote. The voting strategy computes a score for each candi- Stability-promotion. Considered that user-defined late tag cEC, where a vote for c is cast, whenever cE Cu ags with very low collection frequency are less reliable than tags with higher collection frequency, we want to promote those tags for which the statistics are more stable. This is achieved with the following function: stability(u) kis +abs(ks -log(luD)) computed as: In principle this is a weighting function that weights score(c): =>vote(u,c), (4) he impact of the candidate tags for a given user- u∈U defined tag. lul is the collection frequency of the tag u and ks is a parameter in this function, which is de- ermined by training. The function abs(a)returns the Sum. The didate tag list of the tags lists(C), and sums also takes the union of all can- absolute value of a as over the co-occurrence valt f a candidate tag c E C as cal- Tags with very high culated as: frequency are likely to be too general for individual score():=∑(P(c),ifc∈Cn) hotos. We want to promote the descriptiveness b damping the contribution of candidate tags with a very high-frequency The function P(clu) calculates the asymmetric co-occurrence value, as defined in Equation 2. Note that the score of candi- descriptive(c) g(lcD)) date tag c is obtained by only summing over the tags cE Ct We will use these two aggregation strategies as This is another weighting function, now only applied to line for our evaluation as is presented in Section 6 re-value the weight of a candidate tag. kd is parameter in this function, and is configured by training Promotion. In Section 3 vations with respect to tagging behaviour. In this section Rank- promotion. The co-occurrence values of tag we translate these observations into a"promotion function provide good estimates of the relevance of a candi- to promote more descriptive tags for recommendation. date tag for a user-defined tag. In principle, this is From the tag frequency distribution presented in Figure 1 already used by the aggregation strategy for summing, we learnt that both the head and the tail of the power law but we observed that the co-occurrence values decline would probably not contain good tags for recommendation very fast. The rank promotion does not look at the co Tags in the tail were judged to be unstable descriptors, due ccurrence value, but at the position r of the candidate to their infrequent nature. The head on the other han tag c E Cu for a given user-defined tag u contained tags that would be too generic to be useful(2006 2005, wedding, etc. rank(u, c) k+(r-1
Recommended Tags Gaudi Spain Catalunya architecture church Candidate Tags Sagrada Familia: Barcelona Gaudi Spain architecture Catalunya church Barcelona: Spain Gaudi 2006 Catalunya Europe travel User-defined Tags Sagrada Familia Barcelona Tag Co-occurence Tag Aggregation & Ranking Figure 4: System overview of the tag recommendation process. Vote. The voting strategy computes a score for each candidate tag c ∈ C, where a vote for c is cast, whenever c ∈ Cu. vote(u, c) = 1 if c ∈ Cu 0 otherwise (3) A list of recommended tags R is obtained by sorting the candidate tags on the number of votes. A score is therefore computed as: score(c) := X u∈U vote(u, c), (4) Sum. The summing strategy also takes the union of all candidate tag lists (C), and sums over the co-occurrence values of the tags, thus the score of a candidate tag c ∈ C as calculated as: score(c) := X u∈U (P(c|u) , if c ∈ Cu) (5) The function P(c|u) calculates the asymmetric co-occurrence value, as defined in Equation 2. Note that the score of candidate tag c is obtained by only summing over the tags c ∈ Cu. We will use these two aggregation strategies as the baseline for our evaluation as is presented in Section 6. Promotion. In Section 3 we have made a number of observations with respect to tagging behaviour. In this section, we translate these observations into a “promotion function” to promote more descriptive tags for recommendation. From the tag frequency distribution presented in Figure 1, we learnt that both the head and the tail of the power law would probably not contain good tags for recommendation. Tags in the tail were judged to be unstable descriptors, due to their infrequent nature. The head on the other hand contained tags that would be too generic to be useful (2006, 2005, wedding, etc.). • Stability-promotion. Considered that user-defined tags with very low collection frequency are less reliable than tags with higher collection frequency, we want to promote those tags for which the statistics are more stable. This is achieved with the following function: stability(u) := ks ks + abs(ks − log(|u|)) (6) In principle this is a weighting function that weights the impact of the candidate tags for a given userdefined tag. |u| is the collection frequency of the tag u and ks is a parameter in this function, which is determined by training. The function abs(x) returns the absolute value of x. • Descriptiveness-promotion. Tags with very high frequency are likely to be too general for individual photos. We want to promote the descriptiveness by damping the contribution of candidate tags with a very high-frequency: descriptive(c) := kd kd + abs(kd − log(|c|)) (7) This is another weighting function, now only applied to re-value the weight of a candidate tag. kd is parameter in this function, and is configured by training. • Rank-promotion. The co-occurrence values of tags provide good estimates of the relevance of a candidate tag for a user-defined tag. In principle, this is already used by the aggregation strategy for summing, but we observed that the co-occurrence values decline very fast. The rank promotion does not look at the cooccurrence value, but at the position r of the candidate tag c ∈ Cu for a given user-defined tag u: rank(u, c) = kr kr + (r − 1) (8) 331 WWW 2008 / Refereed Track: Rich Media April 21-25, 2008. Beijing, China