and helping users understand how the recommendation process works [88]. This point will return when we discuss Human-Recommender Interaction in Chapter 6 3. Focus on accuracy Many previous papers have focused on the accuracy of recommender algorithms (for example, see [13, 55, 59, 1311). The methodologies employed by these papers have stressed the importance of generating a list full of individually accurate recommendations. not an accurate recommendation list. We discuss this issue later in this chapter Focus on efficiency Previous research has also focused on ways to make the recommendation process more efficient. Methods such as clustering [15, 147] and SvD [129] have been applied to this problem to alleviate sparsity and speed computation time, content boosting approaches [92] have also been used to alter the ratings matrix, filling in otherwise empty cells. But what effect does making an algorithm more efficient have on the usefulness of the recommendation lists it generates? Problems with Collaborative Recommender Algorithms Collaborative algorithms, such as Collaborative Filtering, Naive Bayes Classifiers, and PLSA, are"domain independent"in that they perform no content analysis of the items in the domain. Rather, they rely on user opinions to generate recommendations. Despite being a successful technique in many domains, collaborative algorithms have their share of shortcomings [6] 1. The Cold-Start Problem (a k a. the First-Rater problem) When a collaborative system is first created, there are many items in the syster few users, and no ratings. Without ratings, the system cannot generate recommendations and users see no benefit. Without users, there is no way for new ratings to be entered into the system. When applying these algorithms to a new domain, it is valuable to seek preexisting data that can be used to seed such a database of ratings. In the case of MovieLens, the freely available EachMovie Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
dataset was used to"jump start"the system [55]. This problem has been framed and explored in [120, 133] 2. The New-User Problem Before a user can take advantage of a collaborative recommender system, the user must first provide their opinions. The user has to trust the system that the recommendations will be worth the effort of entering opinions. It is difficult, because this trust is needed before the user starts using the system. Getting new users into a recommender is a fruitful area of research we have explored in other publications [88, 89, 117]. This problem is common to other varieties of recommenders as well, but is more severe for collaborative recommenders since these recommenders cannot rely on content or categories to 'ease a user into the system 3. The Sparsity Problem In many domains, a user is likely to rate only a very small percentage of the available items. This can make it difficult to find agreement among individuals, since they may have little overlap. Different recommender algorithms deal with this problem in various ways. Item-based CF uses similarity measures between items. If we assume there are fewer items than users in a recommender (as commonly the case), then Item-based CF reduces the impact of sparsity on the same dataset than algorithms using a user-item ratings matrix. Statistical-based or latent analysis algorithms, such as Naive Bayes and PLSI, also work in sparse situations, mining all connection data to generate recommendations [63, 75, 158] Problems with Content-based Recommender algorithms Content-Based Filtering(CBF)is also commonly used in recommender systems. Applied mostly in textual domains, such as news [9], CBF recommends items to a user if these items are similar in content to items the user has liked in the past. Many algorithms, most notably TF-IDF [127], have been used in CBF systems. CBF has many strengths, including the ability to generate recommendations over all items in the domain. CBF also has its shortcomings [6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
1. Content limitation in domains In non-textual domains like movies and audio, many content algorithms cannot successfully and reliably analyze item contents. Rich metadata, such as actors, directors, artists, etc., has been improving recommendations in this area, but does not attack the problem of analyzing non-textual content 2. Analysis of quality and taste Subjective aspects of the content in an item, such as style and quality of writing or authoritativeness of the author are hard to analyze. Writing samples can be grammatically analyzed, and thus some level of quality can be achieved. But this is not a semantic analysis; the meaning of the content cannot be easily determined through automatic methods Narrow content analysis CBF recommends items similar in content to previous items, and cannot produce recommendations for items that may have different but related content. Recently, lexicons and advanced algorithms have made improvements in this area, but are still costly to use [5, 8] Problems with Knowledge-based Recommender Algorithms Knowledge-based recommenders such as Case-Based Reasoning also have their share of problems. By using both content information and content-independent knowledge rules these kinds of recommenders have unique qualities [17] 1. Focus on Domain Attributes Not all domains have a rich set of attributes. In such domains, a knowledge-based recommender would have a difficult time gathering and processing knowledge states 2. The Constraint Satisfaction Problem By using knowledge rules to select recommendation items, it is possible to be in a Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
state where no items satisfy the constraints imposed by the user and the rules Learning how to relax some rules is an area of active research[14] Citation Indexing and Recommending Research Papers There have been many recommender and personalization systems targeted at recommending research papers or research colleagues. Referral Web used collaborative filtering to recommend professional referrals mined from citations and other aspects of a social network [68]. In addition to automatic citation indexing, Cite Seer provides citation analysis and recommendations for papers in their system [10, 28, 72, 73]. Recentl Relescope created personalized conference schedules for attendees based on their publication history and an analysis of the conference program [36]. While the system was well received, users had varying opinions of usefulness. Specifically, inex users(users new the field)found it quite helpful for locating important and interesting eople, but experienced users found it of little practical value even though they strongly ked the idea--not all users want the same information oogle Scholar [45] for papers. Such interfaces are great when searching for particular items. Others have ggested using recommendation as a tool within digital libraries [40], and have even demonstrated limited implementations [64]. While we agree with the principles of this work, we note that such systems need to support a variety of users and tasks; something we are proposing in this work. Finally, as digital libraries become more a part of our lives, people will use them to share information [79], suggesting the importance of group support applications It is often interesting to measure the impact that a particular citation or journal has in its field [11, 31, 159]; ISIs Web of Knowledge uses one such example metric [38, 57 but not everyone agrees that it is the best metric to use [47]. Citation analysis and bibliometrics [ 12] can be used to create social networks between authors [134]. Not only are the patterns of authors interesting [103], there are parallels between friendship and co- citation patterns[152], and they can be used to generate recommendations [ 81]. These 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
findings agree with our ideas that citation patterns can be successfully used to generate research paper recommendations Theories of Information Seeking Information seeking theory provides us with a framework to understand user information needs and context. Models of information seeking behavior, including Taylors four stages of information need, Wilsons Mechanisms and Motivations model, Dervin's theory of Sense Making, and KuhIthau's Information Search Process, reveal the ways in which emotion, uncertainty, and compromise affect the quality and nature of a user's information search and its results [20, 22, 70 In his influential paper, Taylor proposed four stages at which people perceive information needs: a visceral need, a conscious need, a formalized need, and a compromised need [142]. We will focus on the last stage: it suggests that a user will tailor(compromise) their need when explaining it to an information repository. The user will use a language the he thinks will be better for the repository. With a human librarian, this compromise could be easily mitigated by insight and pointed question asking[153], but it is unclear how a computer can do that Expanding to the entire search process, Kuhlthau's model has six stage Initiation, Selection,Exploration, Formulation, Collection, and Presentation [70]. Her model is important as it suggests there is an exploration aspect to all information seeking processes that happens before the detailed search begins(in Formulation and Collection) Further, she suggests that each stage has different feelings attached to it. For example when starting a search process, many users feel nervous or concerned, but develop confidence during exploration Adding to this idea is Dervin's Sense Making theory that suggests that information seeking comes from a need to make sense of the world. As such, users will settle for answers that satisfy their emotional concern even at the expense of accuracy [20, 22]. For example, if a user is frustrated with a search, it could be because there is dissonance between the user's understanding of the world and the reality provided by her 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission