5.4 Results The tag recommendation approach presented in this work exploits training bookmark meta-information and tags, but does not analyse document contents, and does not make use of external knowledge bases, to enrich the set of suggested tags. Thus, all our recommended tags belong to the training collection, and our algorithm is only suitable for Task 2 of the ECML PKDD 2009 Discovery Challenge Table 7 shows recall, precision and F-measure values for the test datasets provided in the tasks. In task 2, recommending 5 tags, we reach an average F-measure value of 0.3065. We obtain a precision of 42% if we only recommend one tag, and 25% when we recommend 5 tags Table 7. Average precision and F-measure values obtained in tasks I and 2 of ECML PKDD 2009 Discovery Challenge for different numbers of recommended tags. Number of recommended tags PrecisionF-measure 0.0593 0.l810 0.0894 0091001453 0.1120 Task I 0131 0.1233 0.1179 013090.109101 0.1454 0.4190 0.2159 2351 0.3477 0.2805 Task 2 030590302 6 Conclusions and future work In this work, we have presented a social tag recommendation model for a collaborative bookmarking system. Our approach receives as input a bookmark(of a web page or a research publication), analyses and processes its textual metadata (document title, URL, abstract and descriptions), and suggests tags relevant to ookmarks whose metadata are similar to those of the input bookmark Besides focusing on those tags that best fit the bookmark metadata, our strategy lso takes into account global characteristics of the system folksonomy. More specifically, it makes use of the tag co-occurrence graph to compute vertex centralities of related tags. assuming that tags with higher vertex centralities are more informative to describe the bookmark contents our model weights the retrieved tags through their centrality values in a small co-occurrence sub-graph generated for the input bookmark. As additional features, the weighting mechanism also penalises tags that are too generic, and strengthens tags that have been previously used by the user to hom the tag recommendations are conducted
5.4 Results The tag recommendation approach presented in this work exploits training bookmark meta-information and tags, but does not analyse document contents, and does not make use of external knowledge bases, to enrich the set of suggested tags. Thus, all our recommended tags belong to the training collection, and our algorithm is only suitable for Task 2 of the ECML PKDD 2009 Discovery Challenge. Table 7 shows recall, precision and X-measure values for the test datasets provided in the tasks. In task 2, recommending 5 tags, we reach an average X-measure value of 0.3065. We obtain a precision of 42% if we only recommend one tag, and 25% when we recommend 5 tags. Table 7. Average recall, precision and F-measure values obtained in tasks 1 and 2 of ECML PKDD 2009 Discovery Challenge for different numbers of recommended tags. 6umber of recommended tags Recall Precision F-measure Task 1 1 0.0593 0.1810 0.0894 2 0.0910 0.1453 0.1120 3 0.1131 0.1233 0.1179 4 0.1309 0.1091 0.1190 5 0.1454 0.0991 0.1179 Task 2 1 0.1454 0.4190 0.2159 2 0.2351 0.3477 0.2805 3 0.2991 0.3059 0.3025 4 0.3462 0.2716 0.3044 5 0.3916 0.2518 0.3065 6 Conclusions and future work In this work, we have presented a social tag recommendation model for a collaborative bookmarking system. Our approach receives as input a bookmark (of a web page or a research publication), analyses and processes its textual metadata (document title, URL, abstract and descriptions), and suggests tags relevant to bookmarks whose metadata are similar to those of the input bookmark. Besides focusing on those tags that best fit the bookmark metadata, our strategy also takes into account global characteristics of the system folksonomy. More specifically, it makes use of the tag co-occurrence graph to compute vertex centralities of related tags. Assuming that tags with higher vertex centralities are more informative to describe the bookmark contents, our model weights the retrieved tags through their centrality values in a small co-occurrence sub-graph generated for the input bookmark. As additional features, the weighting mechanism also penalises tags that are too generic, and strengthens tags that have been previously used by the user to whom the tag recommendations are conducted. 31
Two are the main benefits of our approach: a low computational cost, and the capability of providing diversity in the recommended tag sets. On one hand, an index of keywords and tags for the available bookmarks, and the global tag co-occurrence graph, are the only information resources needed. On the other hand, the combination of exploiting content-based features, tag popularity and personalisation in the recommendation process allows suggesting tags that not only are relevant for the input bookmark, but also might belong to different domains A main drawback of our approach is its limitation to recommend tags that already exist in the system folksonomy. The suggestion of new terms, for example extracted from the bookmarked text contents or from external knowledge bases such as dictionaries or thesauri, is thus an open research line More investigation is needed to improve and evaluate the effectiveness of our tag recommender. In this context, the study of alternative graph vertex centrality measures(e.g. [llD), and the exploitation of extra folksonomic information obtained from the user and item spaces(e.g, as done in [6]), represent priority tasks to address in the future. The evaluation has to be also done comparing our approach with other state-of-the-art techniques Acknowledgments. This research was supported by the European Commission under contracts FP6-027122-SALERO. FP6-033715-MIAUCE and FP6-045032 SEMEDIA The expressed content is the view of the authors but not necessarily the view of SALERO, MIAUCE and semedia projects as a whole Refe Adomavicius. G. Tuzhilin. A. 2005. Toward the Next Gen on Knowledge and Data Engineering, pp. 734-749 2. Alfonseca, E, Moreno-Sandoval, A, Guirao, J. M. Ruiz-Casado, M. 2006. The Wraetl NLP Suite. In Proceedings of the 5th International Conference on Language Resources and valuation (LREC 2006) Byde, A, Wan, H, Cayzer, S. 2007. Personalized Tag Recommendations via Tagging and Content-based Similarity Metrics. In Proceedings of the 2007 Intemational Conference Weblogs and Social Media 4. Chirita, P. A, Costache, S, Handschuh, S, Nejdl, w. 2007. P-TAG. Large scale Automatic Generation of Personalized Annotation TAGs for the Web. In Proceedings of the 1 6th International Conference on World Wide Web(www 2007), pp. 845-854 5. Heymann, P, Ramage, D, Garcia-Molina, H. 2008. Social Tag Prediction. In Proceedings of the 3lst Annual International ACM Conference on Research and Development Information Retrieval(SIGir 2008), pp. 531-538 Hotho, A, Jaschke, R, Schmitz, C, Stumme, G. 2006. Information Retrieval in Folksonomies. 2006. In Proceedings of the 3rd European Semantic Web Conference EswC2006),pp.411426 7. Jaschke, R, Marinho, L, Hotho, A, Schmidt-Thieme, L, Stumme, G. 2008. Tay Recommendations in Social Bookmarking Systems. In Al Communications, 21, pp. 231
Two are the main benefits of our approach: a low computational cost, and the capability of providing diversity in the recommended tag sets. On one hand, an index of keywords and tags for the available bookmarks, and the global tag co-occurrence graph, are the only information resources needed. On the other hand, the combination of exploiting content-based features, tag popularity and personalisation in the recommendation process allows suggesting tags that not only are relevant for the input bookmark, but also might belong to different domains. A main drawback of our approach is its limitation to recommend tags that already exist in the system folksonomy. The suggestion of new terms, for example extracted from the bookmarked text contents or from external knowledge bases such as dictionaries or thesauri, is thus an open research line. More investigation is needed to improve and evaluate the effectiveness of our tag recommender. In this context, the study of alternative graph vertex centrality measures (e.g. [11]), and the exploitation of extra folksonomic information obtained from the user and item spaces (e.g., as done in [6]), represent priority tasks to address in the future. The evaluation has to be also done comparing our approach with other state-of-the-art techniques. Acknowledgments. This research was supported by the European Commission under contracts FP6-027122-SALERO, FP6-033715-MIAUCE and FP6-045032 SEMEDIA. The expressed content is the view of the authors but not necessarily the view of SALERO, MIAUCE and SEMEDIA projects as a whole. References 1. Adomavicius, G., Tuzhilin, A. 2005. Toward the 1ext Generation of Recommender. Systems: A Survey of the State-of-the-Art and Possible Extensions. In IEEE Transactions on Knowledge and Data Engineering, pp. 734-749. 2. Alfonseca, E., Moreno-Sandoval, A., Guirao, J. M., Ruiz-Casado, M. 2006. The Wraetlic 1LP Suite. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006). 3. Byde, A., Wan, H., Cayzer, S. 2007. Personalized Tag Recommendations via Tagging and Content-based Similarity Metrics. In Proceedings of the 2007 International Conference on Weblogs and Social Media. 4. Chirita, P. A., Costache, S., Handschuh, S., Nejdl, W. 2007. P-TAG: Large Scale Automatic Generation of Personalized Annotation TAGs for the Web. In Proceedings of the 16th International Conference on World Wide Web (WWW 2007), pp. 845-854. 5. Heymann, P., Ramage, D., Garcia-Molina, H. 2008. Social Tag Prediction. In Proceedings of the 31st Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR 2008), pp. 531-538. 6. Hotho, A., Jäschke, R., Schmitz, C., Stumme, G. 2006. Information Retrieval in Folksonomies. 2006. In Proceedings of the 3rd European Semantic Web Conference (ESWC 2006), pp. 411-426. 7. Jäschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., Stumme, G. 2008. Tag Recommendations in Social Bookmarking Systems. In AI Communications, 21, pp. 231- 247. 32
8. Kleinberg, J thoritative Sources in a Hyperlinked Environment. In Journal of the ACM,46(5) Two-way Poisson Mixture Models for Simultaneous Document Classification and Word Clustering In Computational Statistics& Data Analysis, 50(1), 163-180 10. Mishne, G. 2006. AutoTag: A Collaborative Approach to Automated Tag Assignment for Weblog Posts. In Proceedings of the 15th International Conference on World wide Web (www2006,pp.953-954. 11. Newman, M. E.J. 2005. A measure of Betweenness Centrality based on Random Walks. In 12. Page, L, Brin, S, Motwani, R, Winograd, T. 1999. The PageRank Citation Ranking ringing Order to the Web. Technical Report. Stan ou,N, Suchak, M, Bergstrom, P, Riedl, J Architecture for Collaborative Filtering of Netnews. 1994. In Proceedings of the 1994 ACM conference on Computer Supported Cooperative Work(CSCw 1994), pp 175-186 14. Salton, G, McGill, M.J. 1983. Introduction to Modern Information Retrieval. McGraw- Hill. Inc. New York 15. Song, Y, Zhuang, Z, Li, H, Zhao, Q, Li, J, Lee, w.C., Giles, C. L. 2008. Real-time Automatic Tag Recommendation. In Proceedings of the 31st Annual International ACM onference on Research and Development in Information Retrieval (SIGIR 2008), pp 515-522. 16. Tatu, M, Srikanth, M, D'Silva, T. 2008. RSDC08: Tag Recommendations using Bookmark Content. In Proceedings of the ECML PKDD 2008 Discovery Challenge (RSDC2008) 17. Xu, Z, Fu, Y, Mao, J., Su, D. 2006. Towards the Semantic Web: Collaborative Tag Suggestions. In Proc. of the wwww 2006 Workshop on Collaborative Web Tagging 18. Zha, H, He, X, Ding, C Simon, H. 2001. Bipartite Graph Partitioning and data Clustering. In Proceedings of the 10th ACM International Conference on Information and Knowledge(CIKM 2001), pp 25-32
8. Kleinberg, J. 1999. Authoritative Sources in a Hyperlinked Environment. In Journal of the ACM, 46(5), pp. 604–632. 9. Li, J., Zha, H. 2006. Two-way Poisson Mixture Models for Simultaneous Document Classification and Word Clustering. In Computational Statistics & Data Analysis, 50(1), pp. 163-180. 10. Mishne, G. 2006. AutoTag: A Collaborative Approach to Automated Tag Assignment for Weblog Posts. In Proceedings of the 15th International Conference on World Wide Web (WWW 2006), pp. 953-954. 11. Newman, M. E. J. 2005. A measure of Betweenness Centrality based on Random Walks. In Social Networks, 27, pp. 39–54. 12. Page, L., Brin, S., Motwani, R., Winograd, T. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab. 13. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J. 1994. GroupLens: An Open Architecture for Collaborative Filtering of 1etnews. 1994. In Proceedings of the 1994 ACM conference on Computer Supported Cooperative Work (CSCW 1994), pp. 175-186. 14. Salton, G., McGill, M. J. 1983. Introduction to Modern Information Retrieval. McGrawHill, Inc., New York. 15. Song, Y., Zhuang, Z., Li, H., Zhao, Q., Li, J., Lee, W. C., Giles, C. L. 2008. Real-time Automatic Tag Recommendation. In Proceedings of the 31st Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR 2008), pp. 515-522. 16. Tatu, M., Srikanth, M., D’Silva, T. 2008. RSDC'08: Tag Recommendations using Bookmark Content. In Proceedings of the ECML PKDD 2008 Discovery Challenge (RSDC 2008). 17. Xu, Z., Fu, Y., Mao, J., Su, D. 2006. Towards the Semantic Web: Collaborative Tag Suggestions. In Proc. of the WWWW 2006 Workshop on Collaborative Web Tagging. 18. Zha, H., He, X., Ding, C., Simon, H. 2001. Bipartite Graph Partitioning and Data Clustering. In Proceedings of the 10th ACM International Conference on Information and Knowledge (CIKM 2001), pp. 25-32. 33
Social Tag Prediction Base on Supervised Ranking model Hao Cao, Maoqiang Xie, Lian Xue, Chunhua Liu, Fei Teng, and Yalou Huang College of Software, Nankai University, Tianjin, P R China Icaohao,xuelianlotus,hytjfxk,nktengfeil@mail.nankai.edu.cn Ixiemq,huangyl/@nankai.edu.cn Abstract. Recently, social tag recommendation has gained more at- tention in web research, and many approaches were proposed, which in be classified into two types: rule-based and classification-based ap- proaches. However, too much expert experience and manual work are needed in rule-based approaches, and its generalization is limited. Ad- tionally, there are some essential barriers in classification-based ap- proaches, since tag recommendation is transformed into a multi-classes classification problem, such as tag collection is not fixed. Different from them, ranking model is more suitable, in which supervised learning can be used. In additions. the whole tag recommendation task can be divided into 4 subtasks according to the existence of users and resources. In dif- ferent subtasks, different features are constructed, in order that existed information can be used sufficiently. The experimental results show that the proposed supervised ranking model performs well on the training and test dataset of rsdc 2008 recovered by ourselves. 1 Introduction Tag is a new form to index web resources, which help users to categorize and share the resources, and later search them. Also, the tags assigned by specified user revea the user's interests, therefore, according to the tags user have already tagged, someone can find other users who have the similar interests, as well as similar interesting resources. Therefore, it is widely used in social network such as Bibsonmy, Del icio us, Last. fm, etc a tag recommendation system can suggest someone a few tags to specified web resource, thus it can save the user time and effort when them mark up re- sources. Further, the recommended tags and existing tags can be used to predict the profile of the user and the interesting to the web resource, for example, to predict what they like and dislike. The research of tag recommendation is also very suggestive for other applications, such as online advertisement In the field of online advertisement, we can predict what advertisement the browser might be interested in with the help of the surrounding text and his browsing history Recently, social tag recommendation has gained more attention in web re- search. It has been a hot issue for both industry and research area. For e ag recommendation is one of the tasks in ECML RSDC's 08 Now, in ECML
Social Tag Prediction Base on Supervised Ranking Model Hao Cao, Maoqiang Xie, Lian Xue, Chunhua Liu, Fei Teng, and Yalou Huang College of Software, Nankai University, Tianjin, P.R.China {caohao, xuelianlotus, hytjfxk, nktengfei}@mail.nankai.edu.cn {xiemq,huangyl }@nankai.edu.cn Abstract. Recently, social tag recommendation has gained more attention in web research, and many approaches were proposed, which can be classified into two types: rule-based and classification-based approaches. However, too much expert experience and manual work are needed in rule-based approaches, and its generalization is limited. Additionally, there are some essential barriers in classification-based approaches, since tag recommendation is transformed into a multi-classes classification problem, such as tag collection is not fixed. Different from them, ranking model is more suitable, in which supervised learning can be used. In additions, the whole tag recommendation task can be divided into 4 subtasks according to the existence of users and resources. In different subtasks, different features are constructed, in order that existed information can be used sufficiently. The experimental results show that the proposed supervised ranking model performs well on the training and test dataset of RSDC 2008 recovered by ourselves. 1 Introduction Tag is a new form to index web resources, which help users to categorize and share the resources, and later search them. Also, the tags assigned by specified user reveal the user’s interests, therefore, according to the tags user have already tagged, someone can find other users who have the similar interests, as well as similar interesting resources. Therefore, it is widely used in social network such as Bibsonmy, Del.icio.us, Last.fm , etc. A tag recommendation system can suggest someone a few tags to specified web resource, thus it can save the user time and effort when them mark up resources. Further, the recommended tags and existing tags can be used to predict the profile of the user and the interesting to the web resource, for example, to predict what they like and dislike. The research of tag recommendation is also very suggestive for other applications, such as online advertisement. In the field of online advertisement, we can predict what advertisement the browser might be interested in with the help of the surrounding text and his browsing history. Recently, social tag recommendation has gained more attention in web research. It has been a hot issue for both industry and research area. For example, tag recommendation is one of the tasks in ECML RSDC’s 08. Now, in ECML 35