Chapter 2 Related Work In addition to Web browsing, IMAs have also been used to support other information-related activities such as writing research papers. Budzik and Hammond(1999) designed an IMA called WATSON that observed user interaction with a small range of everyday applications (e. g, Netscape Navigator, Microsoft Internet Explorer, and Microsoft Word). They con- structed queries based on keywords extracted from the documents or Web pages being viewed or edited by the user and sent those queries to the search engines. They report that a user study showed that 80% of the participants received at least one relevant recommen- dation. The STUFF IVE SEEN system designed by Dumais et al.(2003)performed a similar function, but was specifically geared towards re-finding documents or Web pages the user ad seen before. Certain IMAs focus particularly on acting as a writing assistant that lo cates relevant references, such as the REMEMBRANCE AGENT by Rhodes and Maes(2000), the PiRA system by Gruzd and Twidale(2006), and the a Propos project by Puerta melguizo et al.(2008). Finally, the well-known CITESEER system originally also offered personalized reference recommendations by tracking the users interests using both content-based and citation-based features (Bollacker et al., 2000) Significant work on recommending references for research papers has also been done by McNee(2006), who approached it as a separate recommendation task, and compared dif- ferent classes of recommendation algorithms both through system-based evaluation and user studies. In McNee et al.(2002), five different CF algorithms were compared on the task of recommending research papers, with the citation graph between papers serving as the matrix of ratings commonly used for CE. Here, the citation lists were taken from each paper and the cited papers were represented as the items' for the citing paper. The citing paper itself was represented as the user in the matrix. Citing papers could also be in cluded as items if they are cited themselves. McNee et al. (2002)compared two user-based and item-based filtering algorithms with a Naive Bayes classifier and two graph-based al gorithms. The first graph-based algorithm ranked items on the number of co-citations with the citations referenced by a user paper; the other considered all papers two steps away in the citation graph and ranked them on tf- idf-weighted term overlap between the paper titles. They used 10-fold cross-validation to evaluate their algorithms using a rank-based metric and found that user-based and item-based filtering performed best. In a subsequent user study, these algorithms were also the best performers because they generated the most novel and most relevant recommendations. A similar, smaller approach to using the citation graph was done by Strohman et al. (2007), who only performed a system-based evaluation. In later work, McNee et al.(2006) compared user-based filtering, a standard content-based filtering using tf- idf-weighting, and Naive Bayes with a Probabilistic Latent Semantic Anal- ysis algorithm. They defined four different reference recommendation subtasks: (1)filling out reference lists, (2)maintaining awareness of a research field, (3)exploring a research interest, and(4) finding a starting point for research(McNee et al., 2006). They evaluated the performance of their algorithms on these four tasks and found that user-based filtering performed best, with the content-based filtering a close second. In addition, they found that certain algorithms are better suited for different tasks http://citeseer.ist.psu.e By using the citation web in this way, and not representing real users as the the users in the ratings matrix ey were able to circumvent the cold start problem. This problem is much less pronounced when recommend g for social bookmarking websites because users have already implicitly rated many items by adding them to their personal profiles
Chapter 2. Related Work 16 In addition to Web browsing, IMAs have also been used to support other information-related activities such as writing research papers. Budzik and Hammond (1999) designed an IMA called WATSON that observed user interaction with a small range of everyday applications (e.g., Netscape Navigator, Microsoft Internet Explorer, and Microsoft Word). They constructed queries based on keywords extracted from the documents or Web pages being viewed or edited by the user and sent those queries to the search engines. They report that a user study showed that 80% of the participants received at least one relevant recommendation. The STUFF I’VE SEEN system designed by Dumais et al. (2003) performed a similar function, but was specifically geared towards re-finding documents or Web pages the user had seen before. Certain IMAs focus particularly on acting as a writing assistant that locates relevant references, such as the REMEMBRANCE AGENT by Rhodes and Maes (2000), the PIRA system by Gruzd and Twidale (2006), and the À PROPOS project by Puerta Melguizo et al. (2008). Finally, the well-known CITESEER5 system originally also offered personalized reference recommendations by tracking the user’s interests using both content-based and citation-based features (Bollacker et al., 2000). Significant work on recommending references for research papers has also been done by McNee (2006), who approached it as a separate recommendation task, and compared different classes of recommendation algorithms both through system-based evaluation and user studies. In McNee et al. (2002), five different CF algorithms were compared on the task of recommending research papers, with the citation graph between papers serving as the matrix of ratings commonly used for CF. Here, the citation lists were taken from each paper and the cited papers were represented as the ‘items’ for the citing paper. The citing paper itself was represented as the ‘user’ in the matrix6 . Citing papers could also be included as items if they are cited themselves. McNee et al. (2002) compared two user-based and item-based filtering algorithms with a Naive Bayes classifier and two graph-based algorithms. The first graph-based algorithm ranked items on the number of co-citations with the citations referenced by a ‘user’ paper; the other considered all papers two steps away in the citation graph and ranked them on tf·idf-weighted term overlap between the paper titles. They used 10-fold cross-validation to evaluate their algorithms using a rank-based metric and found that user-based and item-based filtering performed best. In a subsequent user study, these algorithms were also the best performers because they generated the most novel and most relevant recommendations. A similar, smaller approach to using the citation graph was done by Strohman et al. (2007), who only performed a system-based evaluation. In later work, McNee et al. (2006) compared user-based filtering, a standard content-based filtering using tf·idf-weighting, and Naive Bayes with a Probabilistic Latent Semantic Analysis algorithm. They defined four different reference recommendation subtasks: (1) filling out reference lists, (2) maintaining awareness of a research field, (3) exploring a research interest, and (4) finding a starting point for research (McNee et al., 2006). They evaluated the performance of their algorithms on these four tasks and found that user-based filtering performed best, with the content-based filtering a close second. In addition, they found that certain algorithms are better suited for different tasks. 5http://citeseer.ist.psu.edu/ 6By using the citation web in this way, and not representing real users as the the users in the ratings matrix they were able to circumvent the cold start problem. This problem is much less pronounced when recommending for social bookmarking websites because users have already implicitly rated many items by adding them to their personal profiles
Chapter 2 Related Work Routing and Assigning Paper for Reviewing A task related to recommending references is routing and assigning papers to program committee members for review. Papers are nor- mally assigned manually to reviewers based on expertise area keywords that they entered or knowledge of their expertise of other committee members. Dumais and Nielsen(1992) were among the first to investigate an automatic solution to this problem of paper assign- ment. They acquired textual representations of the submitted papers in the form of titles abstracts, and used Latent Semantic Indexing, a dimensionality reduction technique, to match these against representations of the reviewers' expertise as supplied by the reviewers in the form of past paper abstracts. With their work, Dumais and Nielsen (1992) showed it was possible to automate the task acceptably. Later approaches include Yarowsky and Florian(1999), Basu et al.(2001), Ferilli et al.(2006), and Biswas and Hasan(2007). All f them use the sets of papers written by the individual reviewers as content-based expertise evidence for those reviewers to match them to submitted papers using a variety of differ ent algorithms. The most extensive work was done by Yarowsky and Florian(1999), who performed their experiments on the papers submitted to the ACL 99 conference. They com- pared both content-based and citation-based evidence for allocating reviewers and found that combining both types resulted in the best performance. However, most of the work done in this subfield is characterized by the small size of their data sets; we refer the reader to the references given for more information 2.1.5 Recommendation in Context Historically, researchers have focused on building more accurate'recommender systems and have equated this with better liked and ' more useful without involving the users in this process(McNee, 2006). Indeed the majority of the related work described so far has fo cused on experimental validation in a laboratory setting, with only a handful of small-scale user studies. We stated in the previous chapter that we also take a system-based approach to evaluation, and we will describe our reasons in more detail in Subsection 3. 4.3. In the urrent section, we give a short overview of the most important work on how to involve the user in the recommendation process. It is important to establish that user satisfaction is influenced by more than just recommendation accuracy. This was signaled by others, Herlocker et al.(2004) and Adomavicius and Tuzhilin(2005). For instance while a good recommendation algorithm can produce personalized recommendations for each user, the type of personalization applied by an algorithm is exactly the same across all users. Th is not beneficial to user satisfaction because not every user request for recommendations is made in the same context. Different contexts can call for different types of personalization and recommendation. Depending on the situation, a user might want to fulfill quite differ ent tasks using a recommender system, and some established algorithms have been shown to be more appropriate for certain tasks than others(McNee et al., 2006). For instance, in research paper recommendation filling out a reference list is rather different from the desire to maintain awareness of a research field, requiring different recommendation algorithms In addition, the user's interaction with the system and satisfaction with the results depend on a variety of contextual factors such as the users intentions, his emotional state, and the Iser confidence in the system(McNee, 2006)
Chapter 2. Related Work 17 Routing and Assigning Paper for Reviewing A task related to recommending references is routing and assigning papers to program committee members for review. Papers are normally assigned manually to reviewers based on expertise area keywords that they entered or knowledge of their expertise of other committee members. Dumais and Nielsen (1992) were among the first to investigate an automatic solution to this problem of paper assignment. They acquired textual representations of the submitted papers in the form of titles and abstracts, and used Latent Semantic Indexing, a dimensionality reduction technique, to match these against representations of the reviewers’ expertise as supplied by the reviewers in the form of past paper abstracts. With their work, Dumais and Nielsen (1992) showed it was possible to automate the task acceptably. Later approaches include Yarowsky and Florian (1999), Basu et al. (2001), Ferilli et al. (2006), and Biswas and Hasan (2007). All of them use the sets of papers written by the individual reviewers as content-based expertise evidence for those reviewers to match them to submitted papers using a variety of different algorithms. The most extensive work was done by Yarowsky and Florian (1999), who performed their experiments on the papers submitted to the ACL ’99 conference. They compared both content-based and citation-based evidence for allocating reviewers and found that combining both types resulted in the best performance. However, most of the work done in this subfield is characterized by the small size of their data sets; we refer the reader to the references given for more information. 2.1.5 Recommendation in Context Historically, researchers have focused on building ‘more accurate’ recommender systems, and have equated this with ‘better liked’ and ‘more useful’ without involving the users in this process (McNee, 2006). Indeed, the majority of the related work described so far has focused on experimental validation in a laboratory setting, with only a handful of small-scale user studies. We stated in the previous chapter that we also take a system-based approach to evaluation, and we will describe our reasons in more detail in Subsection 3.4.3. In the current section, we give a short overview of the most important work on how to involve the user in the recommendation process. It is important to establish that user satisfaction is influenced by more than just recommendation accuracy. This was signaled by, among others, Herlocker et al. (2004) and Adomavicius and Tuzhilin (2005). For instance, while a good recommendation algorithm can produce personalized recommendations for each user, the type of personalization applied by an algorithm is exactly the same across all users. This is not beneficial to user satisfaction because not every user request for recommendations is made in the same context. Different contexts can call for different types of personalization and recommendation. Depending on the situation, a user might want to fulfill quite different tasks using a recommender system, and some established algorithms have been shown to be more appropriate for certain tasks than others (McNee et al., 2006). For instance, in research paper recommendation filling out a reference list is rather different from the desire to maintain awareness of a research field, requiring different recommendation algorithms. In addition, the user’s interaction with the system and satisfaction with the results depend on a variety of contextual factors such as the user’s intentions, his emotional state, and the user’ confidence in the system (McNee, 2006)
Chapter 2 Related Work Context in Information Seeking and Retrieval The observation is also valid in the fields of information seeking and retrieval, where the search process is similarly influenced by the context of the user. The relevance of the same set of returned results for two identical queries can easily differ between search sessions because of this. In the field of information seeking, a number of frameworks for understanding user needs and their context have been developed. Many different theories have been proposed over the years, such as the four stages of information need by Taylor (1968), the theory of sense-making by Dervin (1992) the information foraging theory by Pirolli and Card (1995), and the cognitive theory of information seeking and retrieval by Ingwersen and Jarvelin(2005). Common to all of these theories is the importance of understanding the users information context to increase the relevance of the results(McNee, 2006). In this section, we zoom in on the cognitive theory of information seeking and retrieval (IS&R) by Ingwersen and Jarvelin(2005), and describe it in the context of recommender systems. In this theory, the context of an IS&R process is represented as a nested model of seven different contextual layers, as visualized in Figure 2.2 7)historic (6)techno-economic ollective contexts (2)inter-object intra (3)session context( signs 4) Figure 2.2: A nested model of seven contextual layers for information seeking and retrieval Ingwersen and Jarvelin, 2005). Revised version adopted from Ingwersen(2006) This model allows us to classify different aspects of the recommendation process into dif- ferent types of context. All of these seven contextual layers affect users in their interaction with recommender systems. Below we list the seven relevant contextual layers and give practical examples of how they could be quantified for use in experiments (1) Intra-object context For recommendation, the intra-object context is the item itself and its intrinsic properties. It can cover a wide range of metadata, depending on the type of item, such as title, author, publication venue, musical genre, director, cast, etc In case of items with textual content, such as research papers or Web pages, it could lso include the structures within the text. however the structure of multimedia items such as movies or music is more difficult to quantify
Chapter 2. Related Work 18 Context in Information Seeking and Retrieval The observation is also valid in the fields of information seeking and retrieval, where the search process is similarly influenced by the context of the user. The relevance of the same set of returned results for two identical queries can easily differ between search sessions because of this. In the field of information seeking, a number of frameworks for understanding user needs and their context have been developed. Many different theories have been proposed over the years, such as the four stages of information need by Taylor (1968), the theory of sense-making by Dervin (1992), the information foraging theory by Pirolli and Card (1995), and the cognitive theory of information seeking and retrieval by Ingwersen and Järvelin (2005). Common to all of these theories is the importance of understanding the user’s information context to increase the relevance of the results (McNee, 2006). In this section, we zoom in on the cognitive theory of information seeking and retrieval (IS&R) by Ingwersen and Järvelin (2005), and describe it in the context of recommender systems. In this theory, the context of an IS&R process is represented as a nested model of seven different contextual layers, as visualized in Figure 2.2. (3) session context signs (1) intra‐ object context (2) inter‐object context (7) historic contexts (6) techno‐economic and societal contexts (4) individual (4‐5) social, systemic, conceptual, emoBonal contexts (5) collecBve Figure 2.2: A nested model of seven contextual layers for information seeking and retrieval (Ingwersen and Järvelin, 2005). Revised version adopted from Ingwersen (2006). This model allows us to classify different aspects of the recommendation process into different types of context. All of these seven contextual layers affect users in their interaction with recommender systems. Below we list the seven relevant contextual layers and give practical examples of how they could be quantified for use in experiments. (1) Intra-object context For recommendation, the intra-object context is the item itself and its intrinsic properties. It can cover a wide range of metadata, depending on the type of item, such as title, author, publication venue, musical genre, director, cast, etc. In case of items with textual content, such as research papers or Web pages, it could also include the structures within the text. However, the structure of multimedia items such as movies or music is more difficult to quantify
Chapter 2 Related Work (2) Inter-object context includes the relations between items, such as citations or links between authors and documents in case of research papers. External metadata such as movie keywords, assigned index terms, and tags can also link documents together, as well as playlist structures that group together a set of songs ession context involves the user- recommender interaction process and would involve real user tests or simulations of interactions. Observing system usage patterns, such as printing or reading time, would also be context in the case of recommending through IMAs (4)Individual social, systemic, conceptual, and emotional contexts If items are linked via a folksonomy, then this could serve as a possible source of social, conceptual, and even emotional context to the different documents. Rating behavior, combined with temporal information can also serve to predict, for instance, emotional context (5)Collective social, systemic, conceptual, and emotional contexts An important con- textual social aspect of recommending items is finding groups of like-minded users and similar items that have historically shown the same behavior to generate new recommendations. Again, the folksonomy could provide aggregated social, concep tual. and even emotional context to the different documents (6)Techno-economic and societal contexts This more global form of context influences all other lower tiers, but is hard to capture and quantify, as it is for IS&R (7) Historical contexts Across the other contextual layers there operates a historical con- text that influences the recommendations. Activity logs of recommender systems would be a appropriate way of capturing such context, possibly allowing for replaying Human-Recommender Interaction Neither the cognitive theory of IS&R nor the other three theories of information seeking we mentioned earlier in this section were originally developed for recommender systems. This means these theories are therefore not fully ap- plicable to the field of recommender systems. McNee(2006) was the first to recognize this lack of a user-centered, cognitive framework for recommendation and proposed a descrip- tive theory of Human-Recommender Interaction(Hri). This singular focus on recommender systems is a major advantage of HRI theory, although it has only been applied and verified experimentally on one occasion. HRi theory is meant as a common language for all stakeholders involved in the recom- designers, store owners, marketeers--to use for describing the important elements of interaction with recommender systems(McNee, 2006). These ele- ments are grouped into three main pillars of HRI: (1)the recommendation dialogue, (2)the recommendation personality, and (3) the end user's information seeking tasks. Each of these three pillars is divided up into so-called aspects, which refer to the individual elements of HRI. In its current form as defined by McNee(2006), hRi theory contains a total of 21 aspects. Figure 2.3 shows these 21 of HRI aspects and the three pillars they are grouped
Chapter 2. Related Work 19 (2) Inter-object context includes the relations between items, such as citations or links between authors and documents in case of research papers. External metadata such as movie keywords, assigned index terms, and tags can also link documents together, as well as playlist structures that group together a set of songs. (3) Session context involves the user-recommender interaction process and would involve real user tests or simulations of interactions. Observing system usage patterns, such as printing or reading time, would also be context in the case of recommending through IMAs. (4) Individual social, systemic, conceptual, and emotional contexts If items are linked via a folksonomy, then this could serve as a possible source of social, conceptual, and even emotional context to the different documents. Rating behavior, combined with temporal information can also serve to predict, for instance, emotional context. (5) Collective social, systemic, conceptual, and emotional contexts An important contextual social aspect of recommending items is finding groups of like-minded users and similar items that have historically shown the same behavior to generate new recommendations. Again, the folksonomy could provide aggregated social, conceptual, and even emotional context to the different documents. (6) Techno-economic and societal contexts This more global form of context influences all other lower tiers, but is hard to capture and quantify, as it is for IS&R. (7) Historical contexts Across the other contextual layers there operates a historical context that influences the recommendations. Activity logs of recommender systems would be a appropriate way of capturing such context, possibly allowing for replaying past recommender interaction. Human-Recommender Interaction Neither the cognitive theory of IS&R nor the other three theories of information seeking we mentioned earlier in this section were originally developed for recommender systems. This means these theories are therefore not fully applicable to the field of recommender systems. McNee (2006) was the first to recognize this lack of a user-centered, cognitive framework for recommendation and proposed a descriptive theory of Human-Recommender Interaction (HRI). This singular focus on recommender systems is a major advantage of HRI theory, although it has only been applied and verified experimentally on one occasion. HRI theory is meant as a common language for all stakeholders involved in the recommendation process—users, designers, store owners, marketeers—to use for describing the important elements of interaction with recommender systems (McNee, 2006). These elements are grouped into three main pillars of HRI: (1) the recommendation dialogue, (2) the recommendation personality, and (3) the end user’s information seeking tasks. Each of these three pillars is divided up into so-called aspects, which refer to the individual elements of HRI. In its current form as defined by McNee (2006), HRI theory contains a total of 21 aspects. Figure 2.3 shows these 21 of HRI aspects and the three pillars they are grouped under
Chapter 2 Related Work Recommendation Dialogue Recommendation Personality End Users Information eeking Task Quantity Personalizatio Risk taking/ of task Usefulness Boldness usefulness Affirmation compromising Saliency Adaptability Serendipity Usability appropriateness meeting need Freshness Figure 2.3: A visualization of the theory of Human-Recommender Interaction by McNee (2006). HRI theory consists of 21 interaction aspects, organized into three pillars. Figure taken from McNee(2006) The aspects can be seen as the words of the shared language that the stakeholders can use to communicate about interaction with recommender system. hRI theory states that each aspect can have both a system-centered and a user-centered perspective. This means that for most aspects it is possible to devise a metric that allows the system designer to measure the objective performance of the system on that interaction aspect. The user's perception of how well the recommender system performs on this aspect does not necessarily have to match of these metrics however. Both perspectives are seen as equally important in We will briefly describe each of the three pillars and give a few examples of aspects belong to those pillars. We refer to McNee(2006)for a more detailed description of all aspects and the three pillars. The first pillar, recommendation dialogue, deals with the immediate interaction between the user and the recommendation algorithm, which is cyclical in nature An example aspect here is transparency, which means that the user should understand (at a high level)where a recommendation is coming from, for instance, in terms of how an item similar to items that the user has rated before. Greater transparency has been shown to lead to higher acceptance of a recommender system(Herlocker et al., 2000; Tintarev and Masthoff, 2006) The second pillar of HRI is the recommendation personality, which covers the overall im- pression that a user constructs of a recommender system over time. Recommender systems are meant to get to know the user, which means users can start attributing personality characteristics to the system. An negative example of a personality-related aspect is pigeon holing, where a user receives a large number of similar recommendations in a short time, which could change the users perception for the worse. The item-based CF algorithm, for instance, has shown an aptitude for getting stuck in'similarity wells' of similar items (Rashid et al., 2002). Trust is another important aspect, and is related to the "Dont look stupid" principle formulated by McNee et al. (2006). It states that even a single nonsense recommendation can cause the user to lose confidence in the recommender system, even if the other recommendations are relevant The last hRI pillar focuses on the end user's information seeking task and the match with the recommendation algorithm. An example is recommender appropriateness: not every
Chapter 2. Related Work 20 Recommenda)on Dialogue Correctness Transparancy Saliency Serendipity Quan3ty Usefulness Spread Usability Recommenda)on Personality Personaliza3on Boldness Adaptability Risk taking / a version Affirma3on Pigeonholing Freshness Trust / First impressions End User's Informa)on Seeking Task Expecta3ons of recommender usefulness Recommender importance in mee3ng need Recommender appropriateness Concr eteness of task Task compromising Figure 2.3: A visualization of the theory of Human-Recommender Interaction by McNee (2006). HRI theory consists of 21 interaction aspects, organized into three pillars. Figure taken from McNee (2006). The aspects can be seen as the ‘words’ of the shared language that the stakeholders can use to communicate about interaction with recommender system. HRI theory states that each aspect can have both a system-centered and a user-centered perspective. This means that for most aspects it is possible to devise a metric that allows the system designer to measure the objective performance of the system on that interaction aspect. The user’s perception of how well the recommender system performs on this aspect does not necessarily have to match the outcome of these metrics however. Both perspectives are seen as equally important in HRI theory. We will briefly describe each of the three pillars and give a few examples of aspects belong to those pillars. We refer to McNee (2006) for a more detailed description of all aspects and the three pillars. The first pillar, recommendation dialogue, deals with the immediate interaction between the user and the recommendation algorithm, which is cyclical in nature. An example aspect here is transparency, which means that the user should understand (at a high level) where a recommendation is coming from, for instance, in terms of how an item is similar to items that the user has rated before. Greater transparency has been shown to lead to higher acceptance of a recommender system (Herlocker et al., 2000; Tintarev and Masthoff, 2006). The second pillar of HRI is the recommendation personality, which covers the overall impression that a user constructs of a recommender system over time. Recommender systems are meant to ‘get to know the user’, which means users can start attributing personality characteristics to the system. An negative example of a personality-related aspect is pigeonholing, where a user receives a large number of similar recommendations in a short time, which could change the user’s perception for the worse. The item-based CF algorithm, for instance, has shown an aptitude for getting stuck in ‘similarity wells’ of similar items (Rashid et al., 2002). Trust is another important aspect, and is related to the “Don’t look stupid” principle formulated by McNee et al. (2006). It states that even a single nonsense recommendation can cause the user to lose confidence in the recommender system, even if the other recommendations are relevant. The last HRI pillar focuses on the end user’s information seeking task and the match with the recommendation algorithm. An example is recommender appropriateness: not every