HYBRID RECOMMENDER SYSTEMS SURVEY AND EXPERIMENTS system, the agreement between a user's past ratings and the recommendations of ach technique are used to select the technique to employ for the next recom- witching hybrids introduce additional complexity into the recommendation pro- cess since the switching criteria must be determined, and this introduces another level of parameterization. However, the benefit is that the system can be sensitive to the strengths and weaknesses of its constituent recommenders 33. MIXED Where it is practical to make large number of recommendations simultaneously, it may be possible to use amixed hybrid, where recommendations from more than one technique are presented together. The PTV system(Smyth Cotter 2000)uses this approach to assemble a recommended program of television viewing. It uses content-based techniques based on textual descriptions of Tv shows and collabor ative information about the preferences of other users. Recommendations from the two techniques are combined together in the final suggested program The mixed hybrid avoids the ' new item' start-up problem: the content-based component can be relied on to recommend new shows on the basis of their descriptions even if they have not been rated by anyone, It does not get around the 'new user'start-up problem, since both the content and collaborative methods need some data about user preferences to get off the ground, but if such a system is integrated into a digital television, it can track what shows are watched (and for how long)and build its profiles accordingly. Like the fallback hybrid, this tech nique has the desirable 'niche-finding, property in that it can bring in new items that a strict focus on content would eliminate The Ptv case is somewhat unusual because it is using recommendation to assemble a composite entity, the viewing schedule. Because many recommendations are needed to fill out such a schedule, it can afford to use suggestions from as man sources as possible. Where conflicts occur, some type of arbitration between method is required-in PTv, content-based recommendation take precedence over collab orative responses. Other implementations of the mixed hybrid, ProfBuilder (Wasfi 99)and PickAFlick(Burke et al., 1997; Burke, 2000), present multiple recommen dation sources side-by-side. Usually, recommendation requires ranking of items or selection of a single best recommendation, at which point some kind of combination technique must be employed 3.4. FEATURE COMBINATION Another way to achieve the content/ colla borative merger is to treat collaborative use conle on as simply additional feature data associated with each example and nt-based techniques over this augmented data set. For example, Basu, Hirsh and Cohen (1998)report on experiments in which the inductive rule learner Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
342 ROBIN BURKE Ripper was applied to the task of recommending movies using both user ratings and content features, and achieved significant improvements in precision over a purely collaborative approach. However, this benefit was only achieved by hand-filtering content features. The authors found that employing all of the available content fea- tures improved recall but not precision The feature combination hybrid lets the system consider collaborative data without relying on it exclusively, so it reduces the sensitivity of the system to the number of users who have rated an item. Conversely, it lets the system have information about the inherent similarity of items that are otherwise opaque to a collaborative system 3.5. CASCADE Unlike the previous hybridization methods, the cascade hybrid involves a staged process. In this technique, one recommendation technique is employed first to pro- duce a coarse ranking of candidates and a second technique refines the recommen- dation from among the candidate set. The restaurant recommender EntreeC, described below, is a cascaded knowledge-based and collaborative recommender Like Entree, it uses its knowledge of restaurants to make recommendations based on the user's stated interests. The recommendations are placed in buckets of equal preference, and the collaborative technique is employed to break ties, further rank ing the suggestions in each bucket Cascading allows the system to avoid employing the second, lower-priority, tech ue on items that are already well-differentiated by the first or that are sufficiently poorly-rated that they will never be recommended Because the cascade's second step ocuses only on those items for which additional discrimination is needed, it is more efficient than, for example, a weighted hybrid that applies all of its techniques to all items. In addition, the cascade is by its nature tolerant of noise in the operation of a low-priority technique, since ratings given by the high-priority recommender can only be refined, not overturned 36. FEATURE AUGMENTATION One technique is employed to produce a rating or classification of an item and that information is then incorporated into the processing of the next recommendation technique. For example, the Libra system(Mooney roy, 1999) makes con- tent-based recommendations of books based on data found in Amazon. com, using a naive Bayes text classifier. In the text data used by the system is included related authors'and related titles' information that Amazon generates using its internal collaborative systems. These features were found to make a significant contribution to the quality of recommendations The GroupLens research team working with Usenet news filtering also employed feature augmentation(Sarwar et al., 1998). They implemented a set of knowledge- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
HYBRID RECOMMENDER SYSTEMS SURVEY AND EXPERIMENTS 343 based"filterbots using specific criteria, such as the number of spelling errors and the size of included messages. These bots contributed ratings to the database of ratings used by the colla borative part of the system, acting as artificial users. With fairly simple agent implementations, they were able to improve email filterin Augmentation is attractive because it offers a way to improve the performance of a ore system, like the NetPerceptions' GroupLens Recommendation Engine or a ive Bayes text classifier, without modifying it. Additional functionality is added by intermediaries who can use other techniques to augment the data itself. Note that this is different from feature combination in which raw data from different sources is combined While both the cascade and augmentation techniques sequence two recom menders, with the first recommender having an influence over the second, they are fundamentally quite different. In an augmentation hybrid, the features used by the second recommender include the output of the first one, such as the ratings contributed by GroupLens'filterbots. In a cascaded hybrid, the second recom- mender does not use any output from the first recommender in producing its rankings, but the results of the two recommenders are combined in a prioritized manner 3.7. META-LEVEL Another way that two recommendation techniques can be combined is by using the model generated by one as the input for another. This differs from feature augmentation: in an augmentation hybrid, we use a learned model to generate fea- tures for input to a second algorithm; in a meta-level hybrid the entire model becomes the input. The first meta-level hybrid was the web filtering system Fab (Balabanovic 1997, 1998). In Fab, user-specific selection agents perform con- tent-based filtering using Rocchio,s method(Rocchio 1971) to maintain a term vector model that describes the user's area of interest. Collection agents which garner new pages from the web, use the models from all users in their gathering operations.So,documents are first collected on the basis of their interest to the community as a whole and then distributed to particular users. In addition to he way that user models were shared, Fab was also performing a cascade of col- laborative collection and content-based recommendation, although the collabor- ative step only created a pool of documents and its ranking information was not used by the selection component. A meta-level hybrid that focuses exclusively on recommendation is described by Pazzani(1999) as'collaboration via content. A content-based model is built by innow(Littlestone Warmuth, 1994)for eac describing the features that predict restaurants the user likes. These models, essentially vectors of terms and thts. can then be across users to make predie M Condliff et al. (1999) have used a two-stage Bayesian mixed-effects scheme: a con- tent-based naive Bayes classifier is built for each user and then the parameters Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
344 ROBIN BURKE of the classifiers are linked across different users using regression. LaboUr(Schwab, et al., 2001)uses instance-based learning to create content-based user profiles which are then compared in a collaborative manner The benefit of the meta-level method, especially for the content/colla borative hybrid is that the learned model is a compressed representation of a user's interest and a collaborative mechanism that follows can operate on this information-dense epresentation more easily than on raw rating data 3.8. SUMMARY Hybridization can alleviate some of the problems associated with collaborative filtering and other recommendation techniques. Content/collaborative hybrids regardless of type, will always demonstrate the ramp-up problem since both tech niques need a database of ratings. Still, such hybrids are popular, because in many situations such ratings already exist or can be inferred from data. Meta techniques avoid the problem of sparsity by compressing ratings over many examples into a model, which can be more easily compared across users. Knowledge-based and utility-based techniques seem to be good candidates for hybridization since they are not subject to ramp-up problems. Table IV summarizes some of the most prominent research in hybrid recommender systems. For the sake of simplicity, the table combines knowledge- based and utility-based techniques (since utility-based recommendation is a special case of knowledge-based ). 4 There are four hybridization techniques that are order-insensitive: Weighted Mixed, Switching and Feature Combination. with these hybrids, it does not make sense to talk about the order in which the techniques are applied: a CN/CF mixed system would be no different from a CF/CN one. The redundant combinations are marked in gray The cascade, augmentation and meta- level hybrids are inherently ordered. For example, a feature augmentation hybrid that used a content-based recommender to contribute features to be used by a second collaborative process, would be quite ifferent from one that used collaboration first. To see the difference. consider the example of news filtering: the former case, content-based/ colla borative, would correspond to a learning content-based version of the GroupLens ' idea The latter arrangement, collaborative/content-based, could be implemented as a collaborative system that assigns users to a clique or cluster of similar users and then uses the clique ids as input to a content-based system, using these identifiers as well as terms from the news articles to produce the final recommendation We would expect these systems to have quite different characteristics. With cascade, re hybrids that combine techniques of the same type, although some do exist. PickA- Flick( Burke et al. 1997), for example, is a knowledge-based/knowledge-based mixed hybrid com- bining two different knowledge-based strategies. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission