links to various articles, and the online photo management and sharing site Flickr hosts more than 250 million photos. This problem of having too many choices to be able to find the right ones is known as information overload Recommende ms are ont particular, recommender systems have been shown to help users find items of interest from among a large pool of potentially interesting items. Recommender systems bring mutual benefits to both users and operators of the sites with information overload. Users benefit as they do not have to sift through hundreds of thousands of available items to find interesting items. In addition, many recommender systems are, at least in part, proactive in nature in that they automatically populate a list of personalized recommendations based on the user's preferences and other similar users'input. The user may discover many new items she is curious to explore from the recommendations. On the other hand, e-commerce sites that employ recommender systems can increase sales revenue in at least two ways: a) by drawing customers'attention to items that they are likely to buy, and b) by cross-selling items. Netflirs, an online DVD rental service that uses a recommender system, is a good example of this-the] average Netfix customer rents seven DVDs a month, three times the rate at brick-and-mortar stores"(Anderson 2004). And"between 70 and 80 percent of Netflix rentals come from the company's back catalog of 38,000 films rather than recent eleases, 6 Netfix benefits because the old releases are cheaper, and the customer demand is not concentrated only on the expensive latest releases. Clear benefits like this are probably thereasonswhysomanye-commercesites-amazon.comitunEs.com,last.fm,netfix.com pandora.com,justtonameafewhavebeenmakingrecommendersystemsintegraltotheir operations. We are experiencing, more than ever, phrases such as"Get Recommendations Albums for You?,"Customers who bought items in your Shopping Cart also bought"in chttp://www.fickr.cor http://www.netfix.com http://www.nytimes.com/2006/01/23/technology/23recommend.html Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
various online systems and retail sites Collaborative filtering. The best-known technology for building recommender sys- ems is called collaborative filtering(CF)(Resnick et al. 1994). The main idea of collabora- tive filtering is simple-a group of users who share a common interest can help each other find stuff. That is, a user's query about how much she would like an item is resolved by con sulting other like-minded users'opinions about the item. Typically, the degree of common interest between two users is measured by examining their opinions about items they both have evaluated. Therefore, the premise is that if two users have agreed (or disagreed)on a set of items, the resulting pattern of their preferences would project onto rest of the items Researchers have attempted to view the collaborative filtering problem from various an gles and proposed algorithms accordingly, including similarity between users (resnick et al 1994), similarity between items(Sarwar et al. 2001), probabilistic latent semantic indexing (Hofmann 2004), personality diagnosis(Pennock et al. 2000), latent association between users and items in a reduced-dimension space using SVD(Sarwar et al. 2000) a generic model of recommender systems. Formally, a collaborative filtering system is a 4-tuple: <U, I,R, f>, where U represents of a set of n customers or users un), I indicates a set of m products or items a1, a2, .. am), R represents rectangular matrix of n rows and m columns that stores users' preferences on items, and f dicates the recommendation algorithm that uses the matrix r to generate recommendations (usually for the missing entries of R). Typically, each user only expresses her preferences for a small number of items. In other words, the corresponding user x item matrix R is sparse. Users' preferences can be in terms of explicit ratings on some scale including a binary like/dislike, or they can be implicit-for example, preferences garnered from a customer's her browsing patterns. A recommender system may also maintain de- mographic and other information about the users, and information about item features such n eTc thisathesis we use customers, members, and users synonymously. Products and items are used Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
as actors, directors, and genres in the case of a movie. This additional content information can be used to create content-based filtering(Melville, Mooney, and Nagarajan 2002; Salton and McGill 1983), which can help improve a CF system, particularly where rating data is limited or absent(e. g, newly introduced items). In this paper we consider CF systems consisting of explicit numerical ratings and no content information. Most researchers of rec ommender systems use explicit ratings and no content information. Furthermore, a number of real world large scale recommender systems including Netflir have reported to use explicit ratings only. We leave extending our work to implicit ratings-based recommender system as a future work, since there are quite a few commercial recommender systems employing mplicit ratings A CF recommender system can produce two forms of recommendations on the items the target user has not already rated: a) predicted ratings on the items called predictions within she field, and b)an ordered list of items the user might like the most. The latter type of recommendations is sometimes referred to as top-N recommendations(Sarwar et al. 2000 Sarwar et al. 2001). Note that a top-N list can be trivially constructed by first computing predictions on all items not yet rated, and then sorting the result and keeping the top N. Perhaps this is the reason why research on CF algorithms has predominantly focused on rating predictions. We study the former type of recommendations in this paper Taxonomy of collaborative filtering approaches. In machine learning terms, the problem of collaborative filtering can be treated in either of the following two ways( billsus and Pazzani 1998). First, it can be regarded as a special regression problem, where the goal is to predict a real valued number indicating how much the target user would like the item Second, it can be treated as a classification problem, where user preferences are discrete valued and the goal is to find the most probable preference (class) for the item. Since there is a rich set of solution approaches to both of these problems in the machine learning literature, attempts have been made to apply those approaches to the collaborative filtering Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
domain(Billsus and Pazzani 1998; Adomavicius and Tuzhilin 2005) Most of the early collaborative filtering algorithms(Shardanand and Maes 1995; Konstan et al. 1997; Resnick et al. 1994)were of Instance-based learning methods(Mitchell 1997) k-NEAREST NEIGHBOR learners, the predominant form of instance-based learning methods are often called lazy/memory-based learners. This is because these algorithms simply store the training instances(user profiles, in this context)and delay processing them until a query for classification/regression is received. Over the years, however, many other approaches have been proposed. Breese et al(Breese, Heckerman, and Kadie 1998)observed that fundamentally two types of algorithms were being proposed. They proposed a classification scheme for CF algorithms by dividing the algorithms into memory-based and model-based approaches (Breese, Heckerman, and Kadie 1998) In this thesis we perform our experiments on two Cf algorithms, namely USER-BASED kNN and ITEM-BASED knn. The algorithms chosen are frequently cited in the related lit erature and the algorithms manifest an interesting range of properties of the recommender stem itself. The basic version of the USER-BASED kNN CF algorithm is one of the earliest CF algorithms proposed(Resnick et al 1994; Shardanand and Maes 1995). However, we incorporate the improvements suggested by(Herlocker et al. 1999 ) ITEM-BASED knn is a relatively recent CF algorithm and has been reported to be successfully in use in some of the largest online recommender systems incorporating millions of users and items(linden Smith, and York 2003). We describe these algorithms in the next chapter. Both CF al based. However, as explained in the folle number of differing properties. Further, ITEM-BASED knn can be regarded as a model-based algorithm as well (Sarwar et al. 2001 ). In practice keeping a much compact item-to-item elationship model rather than the entire training data suffices without a significant drop in accuracy of recommendations Although we perform our empirical analysis on these two CF algorithms throughout Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
the thesis, we believe that most of our findings would generalize to other CF algorithms that maintain the following property. We require the CF algorithms to be consistent. We define a Cf algorithm to be consistent if, everything being equal, it generates the same recommendations between runs. Built-in randomizations of some of the probabilistic CF algorithms may generate different recommendations even if the dataset is unchanged between the two model-building steps. This inconsistency makes it hard to estimate the effect of users and items deterministicall 1.2 Recommender Systems and Influence Recommender systems find association among users or items from users'evaluations of items. As explained before, in collaborative filtering, recommendations for a user are com- puted based on how her like-minded users have expressed opinions on the items on which recommendations are being computed. Therefore, implicit relationships among users are formed through the way they help each other in finding the right product. In turn, these implicit relationships can be considered to have induced social networks in CF systems. Moreover, there might be a set of users in such a social network who help a large number of other users receive recommendations. We label these users as influential users. In this thesis we focus primarily on these infuential users and the impact they might have on others Status, prestige, or influence in a social network has been a subject of extensive research for a number of decades(Wasserman, Faust, and Iacobucci 1994; Freeman 1977). The following excerpt from an article published as early as 1949 is an example of how early researchers were able to realize the recursive nature of influence( Seeley 1949) we are involved in an "infinite regress" /an actor's status/ is a function of the status of those who choose him; and their/status/ is a function of those who choose them, and so ad infinitum.. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission