other tags in the system, and find the most similar profiles A/1, A/2/,.A/n/ are iteratively computed profile vectors and return the matching tags of URLs and tags (c)If the document is not already in the bigraph, we can E is a unit vector representing a tag or document entry first use a standard information retrieval technique to find he most similar document that is in our bigraph, and use method (a) or(b) above to find related documents in our M is a matrix representation of the bigraph arranged by bigraph column or row according to the selected entry node For a given tag, if we want to find related documents or a and B are parameters for adjusting spreading activation. related tags to it, we can again use similar methods(a)or (b) above. if alread Step 3) Having constructed these profiles, we now have bigraph. If the given tagging keyword is not in the bigraph, several options for retrieval. These profiles form the basis we can first perform a standard keyword search to find the for doing similarity computations and lookups for retrieval first initial related documents and tags. We can then further search, and recommendations. For example, for a given refine the result set by the above methods document. if we want to find more related document to it we have three options Multi-word queries (a) For a document, we can simply lookup the For any query with multiple tags as the query, the system corresponding document profile, and pick the top highest looks up all of the corresponding document profiles for weighted documents in that profile and return that set hose tags and simply adds these profiles together to find (b)We can also use the corresponding document profile or are expressed as spreading activation vectors, addition tag profile and compare it against all other document or tag simply adds the activation values together, creating the profiles in the system, and find the most similar profiles and equivalent of an oR query on the tag keywords eturn the matching documents For a more strict search simulating an AND query, instead (c)If the document is not already in the bigraph, we can of adding the activation vectors together, we only include a first use standard information retrieval techniques(for result in the set when the URl has actually been tagged find the most similar document that is in our bigraph, and se method (a)or(b)above to find related documents in Negative Keywords For relevance feedback, the user can tell the MrTaggy If instead,for a given document, we want to look for related system that a tag represents concepts that she is not tags to it, we can: interested in. To handle these negative keywords', the system, instead of adding the corresponding document (a) Lookup the corresponding tag profile for that document, profile, it subtracts the corresponding document profile and choose the top highest weighted tags in that profile and from the activation vector return that set (b)Use the corresponding document/tag profile for that tag IMPLEMENTATION BASED ON MAP-REDUCE and compare it against all other document/tag profiles for Having described the algorithms, we now turn to the Database Lucene Ma gnolia Tuples of P(Tag I URL results Other social cues patterns in a fast . Well defined APls .[User, URL, Tags, Inference index Crawling Map Reduce Web Server Figure 5. Overall dataflow and architectural diagram of the MrTaggy implementation of the t agSearch algorithm
5 where: A[1], A[2], … A[n] are iteratively computed profile vectors of URLs and tags; E is a unit vector representing a tag or document entry node; M is a matrix representation of the bigraph arranged by column or row according to the selected entry node; α and β are parameters for adjusting spreading activation. (Step 3) Having constructed these profiles, we now have several options for retrieval. These profiles form the basis for doing similarity computations and lookups for retrieval, search, and recommendations. For example, for a given document, if we want to find more related document to it, we have three options: (a) For a document, we can simply lookup the corresponding document profile, and pick the top highest weighted documents in that profile and return that set; (b) We can also use the corresponding document profile or tag profile and compare it against all other document or tag profiles in the system, and find the most similar profiles and return the matching documents; (c) If the document is not already in the bigraph, we can first use standard information retrieval techniques (for example, cosine similarity of the document word vectors) to find the most similar document that is in our bigraph, and use method (a) or (b) above to find related documents in our bigraph. If instead, for a given document, we want to look for related tags to it, we can: (a) Lookup the corresponding tag profile for that document, and choose the top highest weighted tags in that profile and return that set; (b) Use the corresponding document/tag profile for that tag and compare it against all other document/tag profiles for other tags in the system, and find the most similar profiles and return the matching tags. (c) If the document is not already in the bigraph, we can first use a standard information retrieval technique to find the most similar document that is in our bigraph, and use method (a) or (b) above to find related documents in our bigraph. For a given tag, if we want to find related documents or related tags to it, we can again use similar methods (a) or (b) as described above, if the tag already exists in our bigraph. If the given tagging keyword is not in the bigraph, we can first perform a standard keyword search to find the first initial related documents and tags. We can then further refine the result set by the above methods. Multi-word queries For any query with multiple tags as the query, the system looks up all of the corresponding document profiles for those tags and simply adds these profiles together to find the most relevant documents. Given that the tag profiles are expressed as spreading activation vectors, addition simply adds the activation values together, creating the equivalent of an OR query on the tag keywords. For a more strict search simulating an AND query, instead of adding the activation vectors together, we only include a result in the set when the URL has actually been tagged with all of the query keywords. Negative Keywords For relevance feedback, the user can tell the MrTaggy system that a tag represents concepts that she is not interested in. To handle these ‘negative keywords’, the system, instead of adding the corresponding document profile, it subtracts the corresponding document profile from the activation vector. IMPLEMENTATION BASED ON MAP-REDUCE Having described the algorithms, we now turn to the Figure 5. Overall dataflow and architectural diagram of the MrTaggy implementation of the TagSearch algorithm
description of how wi tem. Figur 3 implemented the algorithm Finally, a Web server serves up the search results along s an architecture diagram of with an interactive frontend. The frontend responds to user ased on the Web called nteraction with relevance feedback arrows by MrTaggy co communicating with the Web server using AJAX First, a crawling module goes out to the Web and crawls techniques and animating the inter tace to an updated state social tagging sites, looking for tuples of the form <User, In terms of data flow, when the user first issues a query, the URL, Tag, Time>. We keep track of these tuples in a Web server looks up the related tag recommendations as MySQL database. In our current system, we have roughly well as the URL recommendations in the Lucene index and 140 million tuples A MapReduce system based on Bayesian inference and The client presents the result to the users with the arrows reading activation then computes the probability of each buttons as relevance feedback mechanisms. When the user URL or tag being relevant given a particular combination of presses on one of the arrow buttons, the client issues an other tags and URLs. As described above, we first updated query to the Web server, and a new result set construct a bigraph between URL and tags based on the returned to the client tuples and then precompute spreading activation patterns across the graph. To do this backend computation in MRTAGGY BROWSING/SEARCH INTERFACE massively parallel way, we used the MapReduce framework Having just described how the algorithm operates in the provided by Hadoop(hadoop. apache org). The results are backend, we now describe the interaction of the relevance stored in a Lucene index(lucene. apache. org)so that we can feedback part of the system. Figure 6 shows a typical view make the retrieval of spreading activation patterns as fast as of the tag search browser, which is available publicly at possible Mrtaggy.coM v animation Mr laggy Link to this search ifk travel -govemment airport transportation -history search tags Quick JFK Airport Transportation elated tags US Helicopter Scheduled Service between Manhattan nd NY Area Airports el nyc helicopter Ii design d blog The Port Authority of New York New Jersey 2 flight Ie video Sco palhttp://www.parnyn-gow AirTrain JEK≥Home sd5 interesting .panyni-gt bad tags a. govemment 5. history Hitchsters. com Share a taxi to/from the airport in NYC http://www.hitchsters.cov Figure 6. MrTaggy UI with"search tags"section for added tags and"bad tags"section for excluded tags( both on the left
6 description of how we actually implemented the algorithm in a real system. Figure 5 shows an architecture diagram of the overall system we released on the Web called MrTaggy.com. First, a crawling module goes out to the Web and crawls social tagging sites, looking for tuples of the form <User, URL, Tag, Time>. We keep track of these tuples in a MySQL database. In our current system, we have roughly 140 million tuples. A MapReduce system based on Bayesian inference and spreading activation then computes the probability of each URL or tag being relevant given a particular combination of other tags and URLs. As described above, we first construct a bigraph between URL and tags based on the tuples and then precompute spreading activation patterns across the graph. To do this backend computation in massively parallel way, we used the MapReduce framework provided by Hadoop (hadoop.apache org). The results are stored in a Lucene index (lucene.apache.org) so that we can make the retrieval of spreading activation patterns as fast as possible. Finally, a Web server serves up the search results along with an interactive frontend. The frontend responds to user interaction with relevance feedback arrows by communicating with the Web server using AJAX techniques and animating the interface to an updated state. In terms of data flow, when the user first issues a query, the Web server looks up the related tag recommendations as well as the URL recommendations in the Lucene index and returns the results back to the frontend client. The client presents the result to the users with the arrows buttons as relevance feedback mechanisms. When the user presses on one of the arrow buttons, the client issues an updated query to the Web server, and a new result set is returned to the client. MRTAGGY BROWSING/SEARCH INTERFACE Having just described how the algorithm operates in the backend, we now describe the interaction of the relevance feedback part of the system. Figure 6 shows a typical view of the tag search browser, which is available publicly at MrTaggy.com. Figure 6. MrTaggy UI with “search tags” section for added tags and “bad tags” section for excluded tags (both on the left)
The left side of the shows how the system displays could use these quick links to get started on their topi the recommended The right side lists the top explorations as well document recommendations based on the input so far SUMMARY OF THE EVALUATION MrTaggy provides explicit search capabilities(search box We recently completed a 30-subject study of MrTaggy and nd search results list) combined with relevance feedbacl Kammerer et al. describes the study in detail [15]. In this [1, 21]for query refinements. Users have the opportunity to study, we analyzed the interaction and UI design. The main give relevance feedback to the system in two different aim was to understand whether and how MrTaggy is beneficial for domain learning elated Page Feedback: By clicking on the downward We compared the full exploratory MrTaggy interface to a arrow a search result can be excluded from the results list, baseline version of Mr Taggy that only supported traditional whereas by clicking on the upward arrow the search result can be emphasized which leads to an emphasis of other query-based search. In a learning experiment, we tested imilar Web pages participants' performance in three different topic domains and three different task types. The results are summarized Related Tag Feedback: The left of the user interface presentes a related tags list(see Figure 6), which is an 1) Subjects using the MrTaggy full exploratory interface overview of other tags related to the relevant keywords took advantage of the additional features provided by typed into the search box. For each related tag, up and relevance feedback, without giving up their usual manual down arrows are displayed to enable the user to give query typing behavior. They also spent more time on tasks relevance feedbacks. The arrows here can be used for query and appear to be more engaged in exploration than the refinements either by adding a relevant tag or by excluding participants using the baseline system an irrelevant one In addition, users can refine the search results using tags (2)For learning outcomes, subjects using the full associated with each of the search results. During search, exploratory system generally wrote summaries of higher result snippets(see Figure 7)are displayed in the search quality compared to baseline system users results list. In addition to the title and the uRl of the (3) To also gauge learning outcomes, we asked subjects to corresponding Web page, instead of a short summary generate keywords and input as many keywords as possible description, a series of tags are displayed. Other users have that were relevant to the topic domain in a certain time used these tags to label the corresponding Web page. When limit. Subjects using the exploratory system were able to hovering over tags presented in the snippet, up and down generate more reasonable keywords than the baseline arrows are displayed to enable relevance feedbacks on these system users for topic domains of medium and high tags as well ambiguity, but not for the low-ambiguity domain Users' relevance feedback actions lead to an immediate Our findings regarding the use of our exploratory tag search reordering or filtering of the results list, since the relevance system are promising. The empirical results show that feedback and the search result list are tightly coupled in the subjects can effectively use data generated by social tagging interface. We use animations to display the reordering of as"navigational advice". The tag-search browser has been he search results, which emphasizes the changes that shown to support users in their exploratory search process occurred in the result list (see Video at Users' learning and investigation activities are fostered by http://www.youtube.com/watch?v=gwybonh15ss).Newbothrelevancefeedbackmechanismsaswellasrelatedtag search results due to the refinements are marked with a suggestions that give scaffolding support to domain yellow stripe understanding. The experimental results suggest that users Quick Links: A recently added feature of the interface is the domain ehs in unfamiliar topic areas are supported by the mashup of quick search results from the Yahoo! BOSS search API. In parallel with the TagSearch process, we issue a Yahoo! Search to obtain the top 1-2 results and CONCLUSION display those results as quick links. In this way, if our For exploratory tasks, information seeking systems should bookmarks do not offer any good suggestions, the user focus on providing cues that might make these explorations 5 Ways to Mix, Rip, and Mash Your Data mashup web2.0 pipes mashups tools programming webapps Ittp: //w. techcrunch. com/2007/03/02/5-ways-to-mix-rip-and-mash-your-data Figure 7. The 3 parts of a search result snippet in the MrTaggy interface: title tags, URL
7 The left side of the figure shows how the system displays the recommended tags. The right side lists the top document recommendations based on the input so far. MrTaggy provides explicit search capabilities (search box and search results list) combined with relevance feedback [1, 21] for query refinements. Users have the opportunity to give relevance feedback to the system in two different ways: Related Page Feedback: By clicking on the downward arrow a search result can be excluded from the results list, whereas by clicking on the upward arrow the search result can be emphasized which leads to an emphasis of other similar Web pages. Related Tag Feedback: The left of the user interface presentes a related tags list (see Figure 6), which is an overview of other tags related to the relevant keywords typed into the search box. For each related tag, up and down arrows are displayed to enable the user to give relevance feedbacks. The arrows here can be used for query refinements either by adding a relevant tag or by excluding an irrelevant one. In addition, users can refine the search results using tags associated with each of the search results. During search, result snippets (see Figure 7) are displayed in the search results list. In addition to the title and the URL of the corresponding Web page, instead of a short summary description, a series of tags are displayed. Other users have used these tags to label the corresponding Web page. When hovering over tags presented in the snippet, up and down arrows are displayed to enable relevance feedbacks on these tags as well. Users’ relevance feedback actions lead to an immediate reordering or filtering of the results list, since the relevance feedback and the search result list are tightly coupled in the interface. We use animations to display the reordering of the search results, which emphasizes the changes that occurred in the result list (see Video at http://www.youtube.com/watch?v=gwYbonHI5ss). New search results due to the refinements are marked with a yellow stripe. Quick Links: A recently added feature of the interface is the mashup of quick search results from the Yahoo! BOSS search API. In parallel with the TagSearch process, we issue a Yahoo! Search to obtain the top 1-2 results and display those results as quick links. In this way, if our bookmarks do not offer any good suggestions, the user could use these quick links to get started on their topic explorations as well. SUMMARY OF THE EVALUATION We recently completed a 30-subject study of MrTaggy and Kammerer et al. describes the study in detail [15]. In this study, we analyzed the interaction and UI design. The main aim was to understand whether and how MrTaggy is beneficial for domain learning. We compared the full exploratory MrTaggy interface to a baseline version of MrTaggy that only supported traditional query-based search. In a learning experiment, we tested participants’ performance in three different topic domains and three different task types. The results are summarized below: (1) Subjects using the MrTaggy full exploratory interface took advantage of the additional features provided by relevance feedback, without giving up their usual manual query typing behavior. They also spent more time on tasks and appear to be more engaged in exploration than the participants using the baseline system. (2) For learning outcomes, subjects using the full exploratory system generally wrote summaries of higher quality compared to baseline system users. (3) To also gauge learning outcomes, we asked subjects to generate keywords and input as many keywords as possible that were relevant to the topic domain in a certain time limit. Subjects using the exploratory system were able to generate more reasonable keywords than the baseline system users for topic domains of medium and high ambiguity, but not for the low-ambiguity domain. Our findings regarding the use of our exploratory tag search system are promising. The empirical results show that subjects can effectively use data generated by social tagging as “navigational advice”. The tag-search browser has been shown to support users in their exploratory search process. Users’ learning and investigation activities are fostered by both relevance feedback mechanisms as well as related tag suggestions that give scaffolding support to domain understanding. The experimental results suggest that users’ explorations in unfamiliar topic areas are supported by the domain keyword recommendations and the opportunity for relevance feedback. CONCLUSION For exploratory tasks, information seeking systems should focus on providing cues that might make these explorations Figure 7. The 3 parts of a search result snippet in the MrTaggy interface: title, tags, URL
One possible solution is social 10. Furnas. G W. Landauer. TK. Gomez. L M. and inform eking systems, in which earch Dumais, S.T. The vocabulary problem in human-system other people. Social bookmarks provide one such (1987),964971 exploratory cue that systems can utilize for navigational 11.Glance, N.s. Community search assistant. Proc. IUI signposts in a topic space 200/, ACM Press(2001).91-96. In this paper, we described the detailed implementation of 12. Golder. S. and Huberman, B.A. Usage Patterns of the Tag Search algorithm. We also summarized a past study Collaborative Tagging Systems. Journal of Information on the effectiveness of the exploratory tool. Since social science,32,2(2006),198-208 search engines that depend on social cues rely on data quality and increasing coverage of the explorable web 13Heymann, Paul and Koutrika, Georgia and garcia space, we expect that the constantly increasing popularity of social bookmarking services will improve social search Improve Web Search? In: First ACM International browsers like MrTaggy. The results of this project point to on Web Search and Data minin (WSDMO8), February 11-12, 2008, Stanford, CA the promise of social search to fulfill a need in providing 14. Hong, L,Chi, EH Budiu, R, Pirolli, P and Nelson, L navigational signposts to the best content SparTag. us: Low Cost Tagging System for Foraging of REFERENCES Web Content. Proc. AVI 2008, ACM Press(2008), 65 1. Ackerman. M. S: McDonald. D. W. 1996. Answer Garden 2: merging organizational memory with 15.Kammerer. Y. Nairn. R. Pirolli P. and Chi. E.H collaborative help. In Proceedings of the 1996 ACM Conference on Computer Supported Cooperative Work 2009 Signpost from the masses: learning effects in an exploratory social tag search browser. In Proceedings of CSCW 96. ACM. New York. NY. 97-105 the 27th international Conference on human Factors in 2. Baeza-Yates.R. and Ribeiro-Neto. B. Modern Computing Systems CHI 09. ACM, New York, NY, Information Retrieval. Addison Wesley, 1999 625-634. 3. Bush, V.As We May Think. The Atlantic Monthly, 176, 16 Kittur, A. Chi, E.H. and Suh, B. Crowdsourcing user 1,(1945),101-108 studies with Mechanical Turk. Proc. CHI 2008. ACM 4. Chi, E.H. and Mytkowicz, T. Understanding the Pres(2008),453-456 Efficiency of Social Tagging Systems using Information 17. Lee, K J. What Goes Around Comes Around: An Theory. Proc. Hypertext 2008, ACM Press(2008), 81 ACM Press(2006.191-194 5.Ed H. Chi. Peter Pirolli. Kim Chen. James Pitkow. 18. Marchionini, G. Exploratory search: From finding to Using Information Scent to Model User Information understanding. Communications of the ACM, 49, 4 Needs and Actions on the Web. In Proc of ACm chI 2001 Conference on Human Factors in Computing 19. Millen, D. Yang, M. Whittaker, Sand Feinberg. J Systems, pp. 490-497. ACM Press, April 2001. Seattle, Social bookmarking and exploratory search. In L. Bannon, I. Wagner, C. Gutwin, R. Harper, and K 6. Cutting, Doug, David Karger, Jan Pedersen, and John Schmidt(eds ) Proc. ECSCW07, Springer(2007)21 W. Tukey Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In Proceedings 20. Raghavan, VV. and Sever, H On the Reuse of Past of the 15th Annual International ACMISIGIR Optimal Queries. Proc. S/GIR95, ACM Press (1995) Conference, Copenhagen, 1992 344-350 7. Evans, B. and Chi, E. H. Towards a Model of 21 Shneiderman, B, Byrd, D and Croft, W.B. Clarifying Understanding Social Search. In Proc. of Compute ported Cooperative Work(CSCW), pp. 485-494. lib magazine, 3, 1(1997), Available at ACM Press, 2008. San Diego, CA http://www.dlib.org/dlib/january97/retrieval/0lshneider 8. Fitzpatrick, L and Dent, M. Automatic feedback usin es: social searching? Proc. 20th Annual Intern. 22. Simon. HA. Structure of structured problems ACM SIGIR Conference. ACM Press(1997), 306-313 Artificial Intelligence, 4, 3-4(1973), 181-201 9. Furnas, G W, Fake, C, von Ahn, L, Schachter, J., 23. White, R W. Drucker. S.M. Marchionini.M.Hearst Golder. S. Fox. K. Davis. M. Marlow. C. Naaman. M M, schraefel, m.c. Exploratory search and HCI Why do tagging systems work? Extended Abstracts CHI designing and evaluating interfaces to support 2006, ACM Press(2006),36-39 xploratory search interaction. Extended Abstracts CHI 2007, ACM Press(2007),2877-2880
8 more efficient. One possible solution is building social information seeking systems, in which social search systems utilizes social cues provided by a large number of other people. Social bookmarks provide one such exploratory cue that systems can utilize for navigational signposts in a topic space. In this paper, we described the detailed implementation of the TagSearch algorithm. We also summarized a past study on the effectiveness of the exploratory tool. Since social search engines that depend on social cues rely on data quality and increasing coverage of the explorable web space, we expect that the constantly increasing popularity of social bookmarking services will improve social search browsers like MrTaggy. The results of this project point to the promise of social search to fulfill a need in providing navigational signposts to the best contents. REFERENCES 1. Ackerman, M. S.;McDonald, D. W. 1996. Answer Garden 2: merging organizational memory with collaborative help. In Proceedings of the 1996 ACM Conference on Computer Supported Cooperative Work CSCW '96. ACM, New York, NY, 97-105. 2. Baeza-Yates, R. and Ribeiro-Neto, B. Modern Information Retrieval. Addison Wesley, 1999. 3. Bush, V. As We May Think. The Atlantic Monthly, 176, 1, (1945),101-108. 4. Chi, E.H. and Mytkowicz, T. Understanding the Efficiency of Social Tagging Systems using Information Theory. Proc. Hypertext 2008, ACM Press (2008), 81- 88. 5. Ed H. Chi, Peter Pirolli, Kim Chen, James Pitkow. Using Information Scent to Model User Information Needs and Actions on the Web. In Proc. of ACM CHI 2001 Conference on Human Factors in Computing Systems, pp. 490--497. ACM Press, April 2001. Seattle, WA. 6. Cutting, Doug, David Karger, Jan Pedersen, and John W. Tukey. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen, 1992 7. Evans, B. and Chi, E. H. Towards a Model of Understanding Social Search. In Proc. of ComputerSupported Cooperative Work (CSCW), pp. 485-494. ACM Press, 2008. San Diego, CA. 8. Fitzpatrick, L. and Dent, M.. Automatic feedback using past queries: social searching? Proc. 20th Annual Intern. ACM SIGIR Conference. ACM Press (1997), 306-313. 9. Furnas, G.W., Fake, C., von Ahn, L., Schachter, J., Golder, S., Fox, K., Davis, M., Marlow, C. Naaman, M.. Why do tagging systems work? Extended Abstracts CHI 2006, ACM Press (2006), 36-39. 10.Furnas, G.W., Landauer, T.K., Gomez, L.M. and Dumais, S.T.. The vocabulary problem in human-system communication. Communications of the ACM, 30 (1987), 964-971. 11.Glance, N.S. Community search assistant. Proc. IUI 2001, ACM Press (2001), 91-96. 12.Golder, S. and Huberman, B.A. Usage Patterns of Collaborative Tagging Systems. Journal of Information Science, 32, 2 (2006), 198-208. 13.Heymann, Paul and Koutrika, Georgia and GarciaMolina, Hector (2008) Can Social Bookmarking Improve Web Search? In: First ACM International Conference on Web Search and Data Mining (WSDM'08), February 11-12, 2008, Stanford, CA. 14.Hong, L., Chi, E.H., Budiu, R., Pirolli, P. and Nelson, L. SparTag.us: Low Cost Tagging System for Foraging of Web Content. Proc. AVI 2008, ACM Press (2008), 65- 72. 15.Kammerer, Y., Nairn, R., Pirolli, P., and Chi, E. H. 2009. Signpost from the masses: learning effects in an exploratory social tag search browser. In Proceedings of the 27th international Conference on Human Factors in Computing Systems CHI '09. ACM, New York, NY, 625-634. 16.Kittur, A., Chi, E.H., and Suh, B. Crowdsourcing user studies with Mechanical Turk. Proc. CHI 2008, ACM Pres (2008), 453-456. 17.Lee, K.J. What Goes Around Comes Around: An Analysis of del.icio.us as Social Space. Proc. CSCW’06, ACM Press (2006), 191-194. 18.Marchionini, G. Exploratory search: From finding to understanding. Communications of the ACM, 49, 4 (2006), 41-46. 19.Millen, D., Yang, M., Whittaker, S.and Feinberg, J. Social bookmarking and exploratory search. In L. Bannon, I. Wagner, C. Gutwin, R. Harper, and K. Schmidt (eds.). Proc. ECSCW’07, Springer (2007) 21- 40. 20.Raghavan, V.V. and Sever, H. On the Reuse of Past Optimal Queries. Proc. SIGIR95, ACM Press (1995), 344-350. 21.Shneiderman, B., Byrd, D. and Croft, W.B. Clarifying search: a user-interface framework for text searches, Dlib magazine, 3, 1 (1997), Available at http://www.dlib.org/dlib/january97/retrieval/01shneider man.html. 22.Simon, H.A. Structure of ill structured problems. Artificial Intelligence, 4, 3-4 (1973), 181-201. 23.White, R.W., Drucker, S.M., Marchionini, M., Hearst, M., schraefel, m.c. Exploratory search and HCI: designing and evaluating interfaces to support exploratory search interaction. Extended Abstracts CHI 2007, ACM Press (2007), 2877-2880
Tags Meet Ratings: Improving Collaborative Filtering with Tag-Based Neighborhood Method Zhe Wang Yo w Hu wu National engineerin National engineerin National Engineering Research Center of Research Center of Fundamental software Fundamental Software Fundamental Software Institute of software Institute of software Institute of software Chinese academy of Chinese academy of Chinese academy of Sciences Sciences Sciences Graduate University of yang@techs. iscas ac cn wuhu@@techs.iscasaccn Chinese academy of wangzhe07@iscas ac cn ABSTRACT Nowadays people are inundated by choices. Personal- Collaborative filtering(CF) is a method for personal- ized recommendation is a solution to this problem. Var- ized recommendation. The sparsity of rating data se ious kinds of recommender systems are employed for riously impairs the quality of CF's recommendation. better user experience. Collaborative filtering 4, 12 is Meanwhile, there is more and more tag information gen- one of the best techniques of choice therein. This tech ated by online users that implies their preferences. nique tries to identify users that have relevant interests Exploiting these tag data is a promising means to al- by calculating similarities among user profiles. The idea leviate the sparsity problem. Although the intention is is that it may be of benefit to one's search for informa- raight-forward, there's no existed solution that makes tion to consult the behavior of other users who share full use of tags to improve the recommendation qual- the same or relevant interests ity of traditional rating-based collaborative filtering ap- proaches. In this paper, we propose a novel approach Because collaborative filtering recommendation depends to fuse a tag-based neighborhood method into the tradi- on the preference of the users with the same or rele- tional rating-based CF. Tag-based neighborhood method vant interests, the similarity computation imposes sig employed to find similar users and items. These nificant influence on the quality of recommendation neighborhood information helps the sequent Cf pro- Early item-based and user-based collaborative filtering cedure produce higher quality recommendations. The approaches find similar users or items(neighbors) by experiments show that our approach outperforms the calculating Pearson correlation coefficient (23. These state-of-the-art ones approaches are efficient and effective. But simply com- paring the rating records of different users or items can- ACM Classification Keyword not help to find the best neighbors. If a user has few H.3.3 Information Storage and Retrieval: Information ratings for items or this user only gives all his/her rat- Search and Retrieval-Information filtering ings to the unpopular ones, it will be difficult for those approaches to find the proper neighbors General terms gorithms, Experimentation Recently, matrix factorization approaches earn popularity because of their higher recommendation ity and smaller online costs. One of the most signif Author Keywords cant differences from early approaches is that they ex- Tags, Latent Dirichlet Allocation(LDA), Collaborative tract the "features"of the users and the items. By this Filtering, Neighborhood Method way, they decompose the original preference matrix into several low rank approximates or the items. ev- INTRODUCTION ery feature reflects the preference by a group of similar users. For the users, every feature reflects their pref- erence for a collection of similar items. By virtue of Permission to make d extracting users'and items'features, matrix factoriza or hard copies of all or part of this work for tion approaches are able to find bett ts an not made or distributed for profit or commercial advantage and that copies hence produce better recommendations bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific Despite the merits mentioned before, the existing ma- ermission and/or a fee Workshop SRS'10, February 7, 2010 Hong Kong, China trix factorization approaches 6, 7, 8, 16, 26 fail to ex Copyright2010ACM978-1-60558-995-4.s10.00
Tags Meet Ratings: Improving Collaborative Filtering with Tag-Based Neighborhood Method Zhe Wang National Engineering Research Center of Fundamental Software, Institute of Software, Chinese Academy of Sciences Graduate University of Chinese Academy of Sciences wangzhe07@iscas.ac.cn Yongji Wang National Engineering Research Center of Fundamental Software, Institute of Software, Chinese Academy of Sciences ywang@itechs.iscas.ac.cn Hu Wu National Engineering Research Center of Fundamental Software, Institute of Software, Chinese Academy of Sciences wuhu@itechs.iscas.ac.cn ABSTRACT Collaborative filtering (CF) is a method for personalized recommendation. The sparsity of rating data seriously impairs the quality of CF’s recommendation. Meanwhile, there is more and more tag information generated by online users that implies their preferences. Exploiting these tag data is a promising means to alleviate the sparsity problem. Although the intention is straight-forward, there’s no existed solution that makes full use of tags to improve the recommendation quality of traditional rating-based collaborative filtering approaches. In this paper, we propose a novel approach to fuse a tag-based neighborhood method into the traditional rating-based CF. Tag-based neighborhood method is employed to find similar users and items. These neighborhood information helps the sequent CF procedure produce higher quality recommendations. The experiments show that our approach outperforms the state-of-the-art ones. ACM Classification Keywords H.3.3 Information Storage and Retrieval: Information Search and Retrieval-Information filtering General Terms Algorithms, Experimentation Author Keywords Tags, Latent Dirichlet Allocation (LDA), Collaborative Filtering, Neighborhood Method INTRODUCTION Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Workshop SRS’10, February 7, 2010 Hong Kong, China Copyright 2010 ACM 978-1-60558-995-4... $10.00 Nowadays people are inundated by choices. Personalized recommendation is a solution to this problem. Various kinds of recommender systems are employed for better user experience. Collaborative filtering [4, 12] is one of the best techniques of choice therein. This technique tries to identify users that have relevant interests by calculating similarities among user profiles. The idea is that it may be of benefit to one’s search for information to consult the behavior of other users who share the same or relevant interests. Because collaborative filtering recommendation depends on the preference of the users with the same or relevant interests, the similarity computation imposes significant influence on the quality of recommendation. Early item-based and user-based collaborative filtering approaches find similar users or items (neighbors) by calculating Pearson correlation coefficient [23]. These approaches are efficient and effective. But simply comparing the rating records of different users or items cannot help to find the best neighbors. If a user has few ratings for items or this user only gives all his/her ratings to the unpopular ones, it will be difficult for those approaches to find the proper neighbors. Recently, matrix factorization approaches earn more popularity because of their higher recommendation quality and smaller online costs. One of the most signifi- cant differences from early approaches is that they extract the “features” of the users and the items. By this way, they decompose the original preference matrix into several low rank approximates [15]. For the items, every feature reflects the preference by a group of similar users. For the users, every feature reflects their preference for a collection of similar items. By virtue of extracting users’ and items’ features, matrix factorization approaches are able to find better neighbors and hence produce better recommendations. Despite the merits mentioned before, the existing matrix factorization approaches [6, 7, 8, 16, 26] fail to ex-