⊙2004 Kluwer Acad Publishers. Pri Recommender Systems Research: A Connection- Centric Survey AVERIO PERUGINI @cs.vt. edu MARCOS ANDRE GONCALVES EDWARD A FOX fox(cs. vt. edu Department of Computer Science, Virginia Tech, Blacksburg. VA 2406/ Received June 5, 2002: Revised November 24, 2003: Accepted December 3, 2003 bstract. Recommender systems attempt to reduce information overload and retain customers by selecting a subset of items from a universal set based on user preferences. While research in recommender systems grew out of information retrieval and filtering, the topic has steadily advanced into a legitimate and challenging research area nally been studied from a content-based filtering vs collaborative esign perspective. Recommendations, however, are not delivered within a vacuum, but rather cast within an formal community of users and social context. Therefore, ultimately all recommender systems make connections mong people and thus should be surveyed from such a perspective. This viewpoint is under-emphasized in the recommender systems literature. We therefore take a connection-oriented perspective toward recommender research. We posit that recommendation has an inherently social element and is ultimately intended to connect people either directly as a result of explicit user modeling or indirectly through the discovery of relationships implicit in extant data. Thus, recommender systems are characterized by how they model users to bring people together: explicitly or implicitly. Finally, user modeling and the connection-centric viewpoint se broadening and social issues-such as evaluation, targeting, and privacy and trust-which we also briefly Keywords: recommendation, recommender systems, small-worlds, social networks, user modeling What information consumes is rather obvious: it consumes the attention of its recipients Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might con- sume it Herbert A. simon 1. Introduction The advent of the www and concomitant increase in information available online has caused information overload and ignited research in recommender systems. By selecting a subset of items from a universal set based on user preferences, recommender systems attempt to reduce information overload and retain customers. Examples of systems include top-N lists, book(Mooney and Roy, 2000)and movie(Alspector et al., 1998)recommenders, advanced search engines( Chakrabarti et al., 1999), and intelligent avatars(Andre and Rist
Journal of Intelligent Information Systems, 23:2, 107–143, 2004 c 2004 Kluwer Academic Publishers. Printed in The United States. Recommender Systems Research: A Connection-Centric Survey SAVERIO PERUGINI sperugin@cs.vt.edu MARCOS ANDRE´ GONC¸ ALVES mgoncalv@cs.vt.edu EDWARD A. FOX fox@cs.vt.edu Department of Computer Science, Virginia Tech, Blacksburg, VA 24061 Received June 5, 2002; Revised November 24, 2003; Accepted December 3, 2003 Abstract. Recommender systems attempt to reduce information overload and retain customers by selecting a subset of items from a universal set based on user preferences. While research in recommender systems grew out of information retrieval and filtering, the topic has steadily advanced into a legitimate and challenging research area of its own. Recommender systems have traditionally been studied from a content-based filtering vs. collaborative design perspective. Recommendations, however, are not delivered within a vacuum, but rather cast within an informal community of users and social context. Therefore, ultimately all recommender systems make connections among people and thus should be surveyed from such a perspective. This viewpoint is under-emphasized in the recommender systems literature. We therefore take a connection-oriented perspective toward recommender systems research. We posit that recommendation has an inherently social element and is ultimately intended to connect people either directly as a result of explicit user modeling or indirectly through the discovery of relationships implicit in extant data. Thus, recommender systems are characterized by how they model users to bring people together: explicitly or implicitly. Finally, user modeling and the connection-centric viewpoint raise broadening and social issues—such as evaluation, targeting, and privacy and trust—which we also briefly address. Keywords: recommendation, recommender systems, small-worlds, social networks, user modeling “What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.” Herbert A. Simon 1. Introduction The advent of the WWW and concomitant increase in information available online has caused information overload and ignited research in recommender systems. By selecting a subset of items from a universal set based on user preferences, recommender systems attempt to reduce information overload and retain customers. Examples of systems include top-N lists, book (Mooney and Roy, 2000) and movie (Alspector et al., 1998) recommenders, advanced search engines (Chakrabarti et al., 1999), and intelligent avatars (Andr´e and Rist
PERUGINI, GONCALVES AND FOX 2002). The benefits of recommendation are most salient in voluminous and ephemeral domains(e. g, news) and include 'predictive utility(Konstan et al., 1997), the value of recommendation as advice given prior to investing time, energy, and in most cases, money in consuming a product Recommender systems harness techniques which develop a model of user preferences to predict future ratings of artifacts. The underlying algorithms to realize recommendation range from keyword matching(Housman and Kaskela, 1996)to sophist cated data mining of customer profiles(Adomavicius and Tuzhilin, 1999). Recommender systems are now widely believed to be critical to sustaining the Internet economy(Shapiro and Varian, 1999) Researchers have identified four main dimensions to help in the study of recommender systems: how the system is (i)modeled and designed (i.e, are recommendations content- based or collaborative? ) (ii) targeted (to an individual, group, or topic), (iii) built, and (iv) maintained(online vs. offline)(Mirza, 2001). Recommender systems are typically studied along the modeling dimension. The most popular(and over-emphasized) modeling dichotomy is content-based filtering(Mooney and Roy, 2000)vs. collaborative filtering Goldberg et al., 1992). Content-based filtering involves recommending items similar to those the user has liked in the past; e. g, "Since you liked The Little Lisper, you also might be interested in The Little Schemer. Collaborative filtering, on the other hand, involves rec- ommending items that users, whose tastes are similar to the user seeking recommendation have liked; e. g, "Linus and lucy like Sleepless in Seattle. Linus likes You ve Got Mail. Lucy also might like You've Got Mail. Terveen and Hill survey content-based and collaborative filtering systems in a human-computer interaction(HCI)context(Terveen and Hill, 2002). Others classify recommender systems from a business-oriented perspective (Schafer et al., 999), often based on how they are built. For instance, Schafer, Konstan, and riedl survey recommender systems in e-commerce based on interface, technology, and recommendation discovery (Schafer et al., 1999). These researchers also cast these aspects of recommenders in a two-dimensional space of recommendation lifetime(ephemeral vs. persistent) and level of automation(manual vs automatic)which is related to how they are maintained. Recommender systems, however, have an inherently social element and ultimately brin people together-a viewpoint under-emphasized in the literature-and therefore should be urveyed from this perspective. Accordingly, in this survey, we take a connection-centric approach toward studying recommender systems To help illustrate the elusive presence of a social connectivity element, consider that the process of recommendation in a"brick and mortar'setting is inherently dependent on knowledge of personal taste. For example, in a restaurant with a friend, the following dia- log might arise: The menu looks enticing. Since you are a returning patron, what do you recommend? 'Well, since you like spicy dishes, you may enjoy the chilli chicken curry. A mutually reinforcing dynamic ensues. The recommender's personal knowledge of her friend's interests are incorporated into the recommendation process. Conversely, after a recommendation is made, the recipients personal knowledge of the recommenders repu tation helps him evaluate the recommendation Recommender systems attempt to emulate and automate this naturally social process. This seemingly simple example speaks volumes about the process of making recommendations. Not only does a recommender system have an underlying social element, but its effectiveness is predicated upon its representation of
108 PERUGINI, GONC¸ ALVES AND FOX 2002). The benefits of recommendation are most salient in voluminous and ephemeral domains (e.g., news) and include ‘predictive utility’ (Konstan et al., 1997), the value of a recommendation as advice given prior to investing time, energy, and in most cases, money in consuming a product. Recommender systems harness techniques which develop a model of user preferences to predict future ratings of artifacts. The underlying algorithms to realize recommendation range from keyword matching (Housman and Kaskela, 1996) to sophisticated data mining of customer profiles (Adomavicius and Tuzhilin, 1999). Recommender systems are now widely believed to be critical to sustaining the Internet economy (Shapiro and Varian, 1999). Researchers have identified four main dimensions to help in the study of recommender systems: how the system is (i) modeled and designed (i.e., are recommendations contentbased or collaborative?), (ii) targeted (to an individual, group, or topic), (iii) built, and (iv) maintained (online vs. offline) (Mirza, 2001). Recommender systems are typically studied along the modeling dimension. The most popular (and over-emphasized) modeling dichotomy is content-based filtering (Mooney and Roy, 2000) vs. collaborative filtering (Goldberg et al., 1992). Content-based filtering involves recommending items similar to those the user has liked in the past; e.g., ‘Since you liked The Little Lisper, you also might be interested in The Little Schemer.’ Collaborative filtering, on the other hand, involves recommending items that users, whose tastes are similar to the user seeking recommendation, have liked; e.g., ‘Linus and Lucy like Sleepless in Seattle. Linus likes You’ve Got Mail. Lucy also might like You’ve Got Mail.’ Terveen and Hill survey content-based and collaborative filtering systems in a human-computer interaction (HCI) context (Terveen and Hill, 2002). Others classify recommender systems from a business-oriented perspective (Schafer et al., 1999), often based on how they are built. For instance, Schafer, Konstan, and Riedl survey recommender systems in e-commerce based on interface, technology, and recommendation discovery (Schafer et al., 1999). These researchers also cast these aspects of recommenders in a two-dimensional space of recommendation lifetime (ephemeral vs. persistent) and level of automation (manual vs. automatic) which is related to how they are maintained. Recommender systems, however, have an inherently social element and ultimately bring people together—a viewpoint under-emphasized in the literature—and therefore should be surveyed from this perspective. Accordingly, in this survey, we take a connection-centric approach toward studying recommender systems. To help illustrate the elusive presence of a social connectivity element, consider that the process of recommendation in a ‘brick and mortar’ setting is inherently dependent on knowledge of personal taste. For example, in a restaurant with a friend, the following dialog might arise: ‘The menu looks enticing. Since you are a returning patron, what do you recommend?’ ‘Well, since you like spicy dishes, you may enjoy the chilli chicken curry.’ A mutually reinforcing dynamic ensues. The recommender’s personal knowledge of her friend’s interests are incorporated into the recommendation process. Conversely, after a recommendation is made, the recipient’s personal knowledge of the recommender’s reputation helps him evaluate the recommendation. Recommender systems attempt to emulate and automate this naturally social process. This seemingly simple example speaks volumes about the process of making recommendations. Not only does a recommender system have an underlying social element, but its effectiveness is predicated upon its representation of
RECOMMENDER SYSTEMS RESEARCH the recipient. Therefore, recommender systems involve user modeling, which includes de- veloping a representation of user preferences and interests. User models can be constructed by explicitly soliciting feedback(e. g, asking the user to rate products or services)(Konstan et al., 1997)or gleaning implicit declarations of interest(e. g, through monitoring usage) (Terveen et al., 1997) User modeling is directed toward developing a basis to compute overlap, and ultimately conducted to make connections among people to drive recommendation. Thus, once enough users are engaged and modeled to sufficiently sustain a system, connections(rec- ommendations)can be made Recommendations, thus, are not delivered within a vacuum, but rather cast within an ' informal [community] of collaborators, colleagues, or friends (Kautz et al., 1997), known as a social network (Wasserman and Galaskiewicz, 1994 Explicit user modeling(and correlating the resulting ratings)then can be seen as directed toward forming such connected (community) graph components. Collecting implicit decla- ations of preference also can be viewed as directed toward inducing social networks. This is analogous to techniques to discover existing social networks from patterns embedded in interaction(transaction)data. Therefore an extension to traditional approaches to implicit user modeling, and an approach toward a basis to compute recommendations, entails directly exposing these self-organizing and self-maintaining social structures. Since social networks model social processes, these informal communities with shared interests are implicit in data generated automatically by electronic communications. This extension is corroborated by a recent trend toward exploring and exploiting connections of social processes in graph representations of self-organizing structures, such as the web, as a viable and increasingly opular way to satisfy information-seeking and recommendation-oriented goals(Broder, 2003: Kleinberg, 1999; Kleinberg and Lawrence, 2001). This less invasive approach not only supersedes the need to explicitly model users individually, but also results in more natural, reflective, and fertile organizations for recommendation. Exploration of identified existing social networks fosters the discovery of serendipitous connections(Schwartz and Wood, 1993), social referrals(Kautz et al., 1997), and cyber-communities(Kumar et al. 999), and hence offers many opportunities for recommendation. The use of social net- works has expanded to many diverse application domains such as movie recommendation (Mirza et al., 2003), digital libraries(Nevill-Manning, 2001), and community-based service location( Singh et al, 2001) This connection-oriented viewpoint and these two ways of realizing it provide the basis for this survey. We posit that recommendation has an inherently social element and is ultimatel concerned with connecting people either directly as a result of explicit user modeling or indirectly through the discovery of relationships implicit in existing data(see figure 1).We make connection-based distinctions. Systems are characterized by how they model users to bring people together: explicitly or implicitly. The goal then of a recommender system is to bring as many people together as possible, which also suggests a novel evaluation criterion(e.g, algorithm A connects x individuals while algorithm B connects y)(Mirza et al., 2003). Thus, while Amazon may make better book recommendations than Barnes and Noble, if they arrive at connected user components in the same manner, then in this survey they would be considered equivalent
RECOMMENDER SYSTEMS RESEARCH 109 the recipient. Therefore, recommender systems involve user modeling, which includes developing a representation of user preferences and interests. User models can be constructed by explicitly soliciting feedback (e.g., asking the user to rate products or services) (Konstan et al., 1997) or gleaning implicit declarations of interest (e.g., through monitoring usage) (Terveen et al., 1997). User modeling is directed toward developing a basis to compute overlap, and ultimately is conducted to make connections among people to drive recommendation. Thus, once enough users are engaged and modeled to sufficiently sustain a system, connections (recommendations) can be made. Recommendations, thus, are not delivered within a vacuum, but rather cast within an ‘informal [community] of collaborators, colleagues, or friends’ (Kautz et al., 1997), known as a social network (Wasserman and Galaskiewicz, 1994). Explicit user modeling (and correlating the resulting ratings) then can be seen as directed toward forming such connected (community) graph components. Collecting implicit declarations of preference also can be viewed as directed toward inducing social networks. This is analogous to techniques to discover existing social networks from patterns embedded in interaction (transaction) data. Therefore an extension to traditional approaches to implicit user modeling, and an approach toward a basis to compute recommendations, entails directly exposing these self-organizing and self-maintaining social structures. Since social networks model social processes, these informal communities with shared interests are implicit in data generated automatically by electronic communications. This extension is corroborated by a recent trend toward exploring and exploiting connections of social processes in graph representations of self-organizing structures, such as the web, as a viable and increasingly popular way to satisfy information-seeking and recommendation-oriented goals (Broder, 2003; Kleinberg, 1999; Kleinberg and Lawrence, 2001). This less invasive approach not only supersedes the need to explicitly model users individually, but also results in more natural, reflective, and fertile organizations for recommendation. Exploration of identified existing social networks fosters the discovery of serendipitous connections (Schwartz and Wood, 1993), social referrals (Kautz et al., 1997), and cyber-communities (Kumar et al., 1999), and hence offers many opportunities for recommendation. The use of social networks has expanded to many diverse application domains such as movie recommendation (Mirza et al., 2003), digital libraries (Nevill-Manning, 2001), and community-based service location (Singh et al., 2001). This connection-oriented viewpoint and these two ways of realizing it provide the basis for this survey. We posit that recommendation has an inherently social element and is ultimately concerned with connecting people either directly as a result of explicit user modeling or indirectly through the discovery of relationships implicit in existing data (see figure 1). We make connection-based distinctions. Systems are characterized by how they model users to bring people together: explicitly or implicitly. The goal then of a recommender system is to bring as many people together as possible, which also suggests a novel evaluation criterion (e.g., algorithm A connects x individuals while algorithm B connects y) (Mirza et al., 2003). Thus, while Amazon may make better book recommendations than Barnes and Noble, if they arrive at connected user components in the same manner, then in this survey they would be considered equivalent
110 PERUGINI, GONCALVES AND FOX User Modeling Explicit Social Network Implicit connections feedback/form p2 discovery communication the Web Figure 1. A connection-centric view of recommendation as bringing people together into a social network(cen- ter).(left) Formation of a social network by explicitly collecting ratings or profiles. (right) Identification and discovery of a network by exposing self-organizing communities implicit in user-generated data such as commu- nication or web logs. Although not illustrated explicitly, these two approaches may be combined. Reader's Guide. The balance of this survey is organized as follows. Section 2 presents an historical perspective of recommender systems and outlines their evolution from IR. Section 3 showcases approaches to creating connections for recommendation via explicit ser modeling while Section 4 describes approaches to identifying social networks im- plicit in(usage) data to explore for recommendation. The relative lengths of these two sections reflect the emphasis each places on connections. Approaches toward identify ing implicit communities and resulting systems make social networks salient and thus are treated in greater detail. User modeling and the connection-centric viewpoint raise broad- ening and social issues, such as evaluation, targeting, and privacy and trust, which we cursorily address in Section 5. We identify various opportunities for future research in Section 6 2. a chequered history While Amazon. com(Linden et al., 2003), a pioneer in the e-commerce revoluti headed a movement toward recommenders and was instrumental in bringing tems to critical mass, recommender systems research is a result of a series in information systems (IS)research. In the 1970s a great deal of Is research was
110 PERUGINI, GONC¸ ALVES AND FOX Figure 1. A connection-centric view of recommendation as bringing people together into a social network (center). (left) Formation of a social network by explicitly collecting ratings or profiles. (right) Identification and discovery of a network by exposing self-organizing communities implicit in user-generated data such as communication or web logs. Although not illustrated explicitly, these two approaches may be combined. Reader’s Guide. The balance of this survey is organized as follows. Section 2 presents an historical perspective of recommender systems and outlines their evolution from IR. Section 3 showcases approaches to creating connections for recommendation via explicit user modeling while Section 4 describes approaches to identifying social networks implicit in (usage) data to explore for recommendation. The relative lengths of these two sections reflect the emphasis each places on connections. Approaches toward identifying implicit communities and resulting systems make social networks salient and thus are treated in greater detail. User modeling and the connection-centric viewpoint raise broadening and social issues, such as evaluation, targeting, and privacy and trust, which we cursorily address in Section 5. We identify various opportunities for future research in Section 6. 2. A chequered history While Amazon.com (Linden et al., 2003), a pioneer in the e-commerce revolution, spearheaded a movement toward recommenders and was instrumental in bringing such systems to critical mass, recommender systems research is a result of a series of shifts in information systems (IS) research. In the 1970s a great deal of IS research was
RECOMMENDER SYSTEMS RESEARCH l11 focused on IR. In this era Salton and his students developed the vector-space model (Salton et al., 1975) and the Smart system(rocchio, 1971). Researchers modeled IR systems with large sparse(and anti-symmetric) term-document matrices which permit ted document similarity to be measured by the cosine of the angle between vectors in a multi-dimensional space. Precision and recall became the two quintessential IR met- rics(Salton and McGill, 1983). The emphasis of such research and systems was on sat- isfying short-term information-seeking goals by retrieving information deemed relevant to queries. IR research flourished in this period and many supportive techniques such as relevance feedback (Rocchio, 1971)were developed, demonstrating qualified As the end of the 1970s drew near. electronic information become more abundant The 1980s brought arapid proliferation of information due to desktop computers and applications such as word processors and spreadsheets. In addition, the introduction of e-mail into the mainstream further exasperated the copious amounts of text residing in computers(termed electronic junk by Denning(1982). The new found ease of information generation ignited a shift in is research initiatives, Researchers began to focus on removing irrelevant infor- mation rather than retrieving relevant information. Information categorization routing, and filtering became of immediate importance. This first shift spawned an information filtering thread In 1991 Bellcore hosted a workshop on information filtering (IF) which lead to the December 1992 Communications of the ACM special issue on the topic(Loeb and Terry, 992). In this issue Belkin and Croft compared and contrasted IF and IR(Belkin and Croft, 1992). While IR entails returning relevant information in response to short-term information-seeking goals via requests such as queries, information filtering involves re- moving persistent and irrelevant information over a long period of time. Information fil- tering systems model document features in user profiles(Mooney and roy, 2000), which replaced terms in a modeling matrix as a result of this shift(see Table 1). Information filter- ing later became known as content-based filtering to the recommender system communit and has been applied to recommend movies(Alspector et al., 1998)and books(Mooney and Roy, 2000). Content-based systems model content features of artifacts, rather than of documents, and recommend items by querying such product features against keywords or preferences supplied by the user(Krulwich and Burkley, 1996). SDI(Selective Dissem- ination of Information), one of the first information filtering systems, was based on key word matching(Housman and Kaskela, 1996). Content-based filtering is most effective in Table 1. Shifts in matrix models outlining the evolution of recommender systems from information retrieval Modeling matrix Information retrieval terms x documents Information filtering features x documents Content-based filtering features x artifacts people x documents Recommender systems people x artifacts
RECOMMENDER SYSTEMS RESEARCH 111 focused on IR. In this era Salton and his students developed the vector-space model (Salton et al., 1975) and the SMART system (Rocchio, 1971). Researchers modeled IR systems with large sparse (and anti-symmetric) term-document matrices which permitted document similarity to be measured by the cosine of the angle between vectors in a multi-dimensional space. Precision and recall became the two quintessential IR metrics (Salton and McGill, 1983). The emphasis of such research and systems was on satisfying short-term information-seeking goals by retrieving information deemed relevant to queries. IR research flourished in this period and many supportive techniques such as relevance feedback (Rocchio, 1971) were developed, demonstrating qualified success. As the end of the 1970s drew near, electronic information become more abundant. The 1980s brought a rapid proliferation of information due to desktop computers and applications such as word processors and spreadsheets. In addition, the introduction of e-mail into the mainstream further exasperated the copious amounts of text residing in computers (termed ‘electronic junk’ by Denning (1982)). The new found ease of information generation ignited a shift in IS research initiatives. Researchers began to focus on removing irrelevant information rather than retrieving relevant information. Information categorization, routing, and filtering became of immediate importance. This first shift spawned an information filtering thread. In 1991 Bellcore hosted a workshop on information filtering (IF) which lead to the December 1992 Communications of the ACM special issue on the topic (Loeb and Terry, 1992). In this issue Belkin and Croft compared and contrasted IF and IR (Belkin and Croft, 1992). While IR entails returning relevant information in response to short-term information-seeking goals via requests such as queries, information filtering involves removing persistent and irrelevant information over a long period of time. Information filtering systems model document features in user profiles (Mooney and Roy, 2000), which replaced terms in a modeling matrix as a result of this shift (see Table 1). Information filtering later became known as content-based filtering to the recommender system community and has been applied to recommend movies (Alspector et al., 1998) and books (Mooney and Roy, 2000). Content-based systems model content features of artifacts, rather than of documents, and recommend items by querying such product features against keywords or preferences supplied by the user (Krulwich and Burkley, 1996). SDI (Selective Dissemination of Information), one of the first information filtering systems, was based on keyword matching (Housman and Kaskela, 1996). Content-based filtering is most effective in Table 1. Shifts in matrix models outlining the evolution of recommender systems from information retrieval. Concept Modeling matrix Information retrieval terms × documents Information filtering features × documents Content-based filtering features × artifacts Collaborative filtering people × documents Recommender systems people × artifacts