347 text-based news stories and leverages a range of keyword-based content analysis tech niques during recommendation; see for example, 19] and Chapter 18[8]in this book In contrast, case-based recommender systems rely on more structured representations of item content. These representations are similar to those used to represent case- knowledge in case-based reasoners. For example, they often use a set of well-defined features and feature values to describe items. rather than free-form text. This reliance on structured content means that case-based recommenders are particularly well adapted to many consumer recommendation domains, particularly e-commerce domains, where detailed feature-based product descriptions are often readily available Cannon EOS D60 63 0 Num of Batteries 1.0 BP-511 Cable USB and video Software CD-Rom featuring Adobe Photosh Price Fig. 11.3. An example product case from a digital camera product catalog Figure 11.3 shows one such example product case from a catalog of cameras. The case is for a Canon digital camera and, as can be seen, the product details are captured us ng ll different features(e. g, manufacturer model, memory type, price, etc. )with each feature associated with one of a well-defined space of possible feature values(e. g, the manufacturer feature values are drawn from a well-defined set of possible manufactur ers such as Canon, Nikon, Sony etc. ) The example also highlights how different types of features can be used within a product description. In this case, a mixture of numeric and nominal features are used. For instance, price is an example of a numeric feature, which obviously represents the cost of the camera, and can take on values anywhere in a range of possible prices, from about $100 to upwards of $3000. Alternatively, memory type is a nominal feature, whose values come from a well-defined set of alternatives corresponding to the 4 or 5 different memory card options that are commonly used by digital cameras. The Entree recommender is another good example of a case-based recommender system. This system will be explored in more detail in Section 11.4 but suffice it to say that Entree is designed to make restaurant suggestions; see Figure 11.4 In terms of its core representation, Entree also uses a structured case format-although
11 Case-Based Recommendation 347 text-based news stories and leverages a range of keyword-based content analysis techniques during recommendation; see for example, [9] and Chapter 18 [8] in this book. In contrast, case-based recommender systems rely on more structured representations of item content. These representations are similar to those used to represent caseknowledge in case-based reasoners. For example, they often use a set of well-defined features and feature values to describe items, rather than free-form text. This reliance on structured content means that case-based recommenders are particularly well adapted to many consumer recommendation domains, particularly e-commerce domains, where detailed feature-based product descriptions are often readily available. Fig. 11.3. An example product case from a digital camera product catalog. Figure 11.3 shows one such example product case from a catalog of cameras. The case is for a Canon digital camera and, as can be seen, the product details are captured using 11 different features (e.g., manufacturer, model, memory type, price, etc.) with each feature associated with one of a well-defined space of possible feature values (e.g., the manufacturer feature values are drawn from a well-defined set of possible manufacturers such as Canon, Nikon, Sony etc.). The example also highlights how different types of features can be used within a product description. In this case, a mixture of numeric and nominal features are used. For instance, price is an example of a numeric feature, which obviously represents the cost of the camera, and can take on values anywhere in a range of possible prices, from about $100 to upwards of $3000. Alternatively, memory type is a nominal feature, whose values come from a well-defined set of alternatives corresponding to the 4 or 5 different memory card options that are commonly used by digital cameras. The Entree recommender is another good example of a case-based recommender system. This system will be explored in more detail in Section 11.4 but suffice it to say that Entree is designed to make restaurant suggestions; see Figure 11.4. In terms of its core representation, Entree also uses a structured case format—although
taee pesudo hicago restaurant you chose is Michael Jordan 500N, LaSale St (G ve& Iling s St) Chicago, 312-644-3055 Excelent Decor, Good Service Good Facc, Busines3 Scene, Hp Pace To Be, Privete Rooms Av ivate Paties, Pcop aet Great for Pepole vrarchina See the Came Singles scene (Ohin S., Chicage, 312-2G6 :15-30 ing See the geneart Fig. 11.4. Entree [21, 22] recommends restaurants to users based on a variety of features such as the presentation in Figure 11 12 is largely textual the basic case representation is fun damentally feature-based--using features such as price, cuisine ty tmo to represent each restaurant case 11.2.2 Similarity Assessment The second important distinguishing feature of case-based recommender systems re- lates to their use of various sophisticated approaches to similarity assessment when it comes to judging which cases to retrieve in response to some user query. Because case based recommenders rely on structured case representations they can take advantage of more structured approaches to similarity assessment than their content-based cousins For example, traditional content-based techniques tend to use keyword-based similar- ity metrics, measuring the similarity between a user query and a product in terms of the frequency of occurrence of overlapping query terms within the product text. If the user is looking for a"$1000 6 mega-pixel DSLR"then cameras with all of these terms will be rated highly, and depending on the strength of the similarity criterion used, if no cameras exist with all of these terms then none may be retrieved. We have already highlighted this type of retrieval inflexibility(stonewalling) as a critical problem in the introduction to this chapter. Any reasonable person would be happy to receive a recom- mendation for a"$900 6.2 mega-pixel digital SLR", for the above query, even though strictly speaking there is no overlap between the terms used in this description and the query. Case-based recommenders can avail of more sophisticated similarity metrics that are based on an explicit mapping of case features and the availability of specialised feature level similarity knowledge. An online property recommender might use case-
348 B. Smyth Fig. 11.4. Entree [21, 22] recommends restaurants to users based on a variety of features such as price, cuisine type, atmosphere, etc.. the presentation in Figure 11.12 is largely textual the basic case representation is fundamentally feature-based—using features such as price, cuisine type, atmosphere, etc. to represent each restaurant case. 11.2.2 Similarity Assessment The second important distinguishing feature of case-based recommender systems relates to their use of various sophisticated approaches to similarity assessment when it comes to judging which cases to retrieve in response to some user query. Because casebased recommenders rely on structured case representations they can take advantage of more structured approaches to similarity assessment than their content-based cousins. For example, traditional content-based techniques tend to use keyword-based similarity metrics, measuring the similarity between a user query and a product in terms of the frequency of occurrence of overlapping query terms within the product text. If the user is looking for a “$1000 6 mega-pixel DSLR” then cameras with all of these terms will be rated highly, and depending on the strength of the similarity criterion used, if no cameras exist with all of these terms then none may be retrieved. We have already highlighted this type of retrieval inflexibility (stonewalling) as a critical problem in the introduction to this chapter. Any reasonable person would be happy to receive a recommendation for a “$900 6.2 mega-pixel digital SLR”, for the above query, even though strictly speaking there is no overlap between the terms used in this description and the query. Case-based recommenders can avail of more sophisticated similarity metrics that are based on an explicit mapping of case features and the availability of specialised feature level similarity knowledge. An online property recommender might use case-
based techniques to make suggestions that are similar to a target query even when exact matches are not available. For example, a user who looking for a "2 bedroom apartment in Dublin with a rent of 1150 euro"might receive recommendations for properties that match the bedroom feature and that are similar to the target query in terms of price and location; the recommendations might offer slightly higher or lower priced properties in a nearby location when no exact matches are available (11.1) ∑/=1.nwi Assessing similarity at the case level (or between the target query and a candidate case) bviously involves combining the individual feature level similarities for the relevant features. The usual approach is to use a weighted sum metric such as that shown in Equation 11.1. In brief, the similarity between some target query, t and some candi- date case(or item), c, is the weighted sum of the individual similarities between the corresponding features of t and c, namely t; and ci. Each weight encodes the relative mportance of a particular feature in the similarity assessment process and each indi vidual feature similarity is calculated according to a similarity function that is defined for that feature, simi (ti, ci). For instance, looking to the property recommender example above, if rent is very important to the user then the weight associated with this feature will be higher than the weights associated with less important features. In turn, when it comes to comparing the query and a case in terms of their rent the recommender system may draw on a specialised similarity metric designed for comparing monthly rents. a different metric might be used for comparing the number of bedrooms or the property type. We must also consider the source of the individual feature level similarities and how they can be calculated. For example, returning to our camera recommender system, consider a numeric feature such as pixel resolution. The target query and a candidate case might be compared in terms of this feature using a similarity metric with the sort of similarity profile shown in Figure 11.5(a): maximum similarity is achieved when the pixel resolution of a candidate case matches that of the target query, and for cases with higher or lower pixel resolution there is a corresponding decline in similarity. This is an example of a symmetric similarity metric because there is no bias in favour of either higher or lower resolution cases simplice(Pt, Pc)=1--IPt-Pel ax(pr, pc) ometimes symmetric similarity metrics are not appropriate. For instance, consider the price feature: it is reasonable to expect that a user will view cameras with prices(pc) that are lower than their target price(pt)to be preferable to cameras with higher prices, all other things being equal. The similarity metric in Equation 11. 2 is used in many recommender systems(e.g, see [52, 75]) as one way to capture this notion and the metric displays a similarity profile similar to that shown in Figure 11.5(b). For instance consider a S1000 target price and two candidate cases, one with a price of $500 and one for $1500. In terms of the symmetric similarity metric represented by Figure 11.5(a), the latter candidate corresponds to the point x in Figure 11.5(a) and the former to point y
11 Case-Based Recommendation 349 based techniques to make suggestions that are similar to a target query even when exact matches are not available. For example, a user who looking for a “2 bedroom apartment in Dublin with a rent of 1150 euro” might receive recommendations for properties that match the bedroom feature and that are similar to the target query in terms of price and location; the recommendations might offer slightly higher or lower priced properties in a nearby location when no exact matches are available. Similarity(t,c) = ∑i=1..nwi ∗ simi(ti,ci) ∑i=1..nwi (11.1) Assessing similarity at the case level (or between the target query and a candidate case) obviously involves combining the individual feature level similarities for the relevant features. The usual approach is to use a weighted sum metric such as that shown in Equation 11.1. In brief, the similarity between some target query, t and some candidate case (or item), c, is the weighted sum of the individual similarities between the corresponding features of t and c, namely ti and ci. Each weight encodes the relative importance of a particular feature in the similarity assessment process and each individual feature similarity is calculated according to a similarity function that is defined for that feature, simi(ti,ci). For instance, looking to the property recommender example above, if rent is very important to the user then the weight associated with this feature will be higher than the weights associated with less important features. In turn, when it comes to comparing the query and a case in terms of their rent the recommender system may draw on a specialised similarity metric designed for comparing monthly rents. A different metric might be used for comparing the number of bedrooms or the property type. We must also consider the source of the individual feature level similarities and how they can be calculated. For example, returning to our camera recommender system, consider a numeric feature such as pixel resolution. The target query and a candidate case might be compared in terms of this feature using a similarity metric with the sort of similarity profile shown in Figure 11.5(a); maximum similarity is achieved when the pixel resolution of a candidate case matches that of the target query, and for cases with higher or lower pixel resolution there is a corresponding decline in similarity. This is an example of a symmetric similarity metric because there is no bias in favour of either higher or lower resolution cases. simprice(pt, pc) = 1− |pt − pc| max(pt, pc) (11.2) Sometimes symmetric similarity metrics are not appropriate. For instance, consider the price feature: it is reasonable to expect that a user will view cameras with prices (pc) that are lower than their target price (pt) to be preferable to cameras with higher prices, all other things being equal. The similarity metric in Equation 11.2 is used in many recommender systems (e.g., see [52, 75]) as one way to capture this notion and the metric displays a similarity profile similar to that shown in Figure 11.5(b). For instance, consider a $1000 target price and two candidate cases, one with a price of $500 and one for $1500. In terms of the symmetric similarity metric represented by Figure 11.5(a), the latter candidate corresponds to the point x in Figure 11.5(a) and the former to point y
p, < pc 0 pt -pc Fig. 11.5. Two example similarity profiles for numeric similarity metrics: (a)corresponds to a standard symmetric similarity metric;(b) corresponds to an asymmetric metric that gives prefer- ence to features values that are lower than the targets value For both cases the similarity assessment is the same, reflecting that both differ from the target price by the same amount, $500, with no preference given to whether a case is less or more expensive. These cases are plotted in the same way in Figure 11.5(b) but now we see that the more expensive case(point x)has a lower similarity than the heaper camera(point y). Even though both candidates differ by $500, preference is given to the cheaper case To evaluate the similarity of non-numeric features in a meaningful way requires dditional domain knowledge. For example, in a vacation recommender it might be im- portant to be able to judge the similarities of cases of different vacation types Is a skiing holiday more similar to a walking holiday than it is to a city break or a beach holiday? One way to make such judgments is by referring to suitable domain knowledge such as an ontology of vacation types. In Figure 11.6 we present part of what such an on gy might look like with different feature values represented as nodes and simil are values grouped near to each other. In this way, the similarity between two ar- bitrary nodes can be evaluated as an inverse function of the distance between them or the distance to their nearest common ancestor. Accordingly, a skiing holiday is more similar to a walking holiday( they share a direct ancestor, activity holidays)than it is to a beach holiday, where the closest common ancestor is the ontology root node 11.2.3 Acquiring Similarity Knowledge Similarity assessment is obviously a key issue for case-based reasoning and case- based recommender systems. Of course the availability and use of similarity knowledge (feature-based similarity measures and weighting functions )is an important distinguish ing feature of case-based recommendation. Although further detailed discussion of this particular issue is beyond the scope of this chapter, it is nonetheless worth consider ing the origin of this knowledge in many systems. For the most part this knowledge is hand-coded: similarity tables and trees, such as the vacation ontology above, are made
350 B. Smyth Similarity 1 0 pt - pc pt < pc pt > pc 0 Similarity 1 0 pt - pc pt < pc pt > pc 0 x y x y (a) (b) Fig. 11.5. Two example similarity profiles for numeric similarity metrics: (a) corresponds to a standard symmetric similarity metric; (b) corresponds to an asymmetric metric that gives preference to features values that are lower than the target’s value. For both cases the similarity assessment is the same, reflecting that both differ from the target price by the same amount, $500, with no preference given to whether a case is less or more expensive. These cases are plotted in the same way in Figure 11.5(b) but now we see that the more expensive case (point x) has a lower similarity than the cheaper camera (point y). Even though both candidates differ by $500, preference is given to the cheaper case. To evaluate the similarity of non-numeric features in a meaningful way requires additional domain knowledge. For example, in a vacation recommender it might be important to be able to judge the similarities of cases of different vacation types. Is a skiing holiday more similar to a walking holiday than it is to a city break or a beach holiday? One way to make such judgments is by referring to suitable domain knowledge such as an ontology of vacation types. In Figure 11.6 we present part of what such an ontology might look like with different feature values represented as nodes and similar feature values grouped near to each other. In this way, the similarity between two arbitrary nodes can be evaluated as an inverse function of the distance between them or the distance to their nearest common ancestor. Accordingly, a skiing holiday is more similar to a walking holiday (they share a direct ancestor, activity holidays) than it is to a beach holiday, where the closest common ancestor is the ontology root node. 11.2.3 Acquiring Similarity Knowledge Similarity assessment is obviously a key issue for case-based reasoning and casebased recommender systems. Of course the availability and use of similarity knowledge (feature-based similarity measures and weighting functions) is an important distinguishing feature of case-based recommendation. Although further detailed discussion of this particular issue is beyond the scope of this chapter, it is nonetheless worth considering the origin of this knowledge in many systems. For the most part this knowledge is hand-coded: similarity tables and trees, such as the vacation ontology above, are made
11 Case-Based Recommendation Vacation Type Activi Relaxation Skiing Walking Beach Countryside Sim(skiing, walking)4~平 Sim(skiing, beach Fig. 11.6. A partial ontology of vacation types can be used as the basis for similarity judgments for non-numeric features available and codified by a domain knowledge expert. Similarly, importance weights might be assigned by the user at retrieval time or by the domain expert during sys- tem design. Hand-coding this knowledge is, of course, expensive and so increasingly researchers have begun to explore how machine learning techniques can be used to relieve these knowledge acquisition costs A number of researchers have looked at the issue of automatically learning the fea ture weights that are be used to influence the level of importance of different features during the case similarity calculations. Wettschereck and Aha [102], for example, de scribe an evaluation of a number of weight-learning algorithms that are knowledge poor, in the sense that they avoid the need for detailed domain knowledge to drive the learning process. They show how even these knowledge-poor techniques can result in significant improvements in case-based classification tasks and present a general frame- work for understanding and evaluating different weight-learning approaches; see also the work of [42, 81] for approaches to local weight-learning in CBR based on reinforce- Stahl [ 96] also looks at feature-weight learning but describes an alternative approac in which feedback is provided by a"similarity teacher"whose job it is to evaluate the ordering a given retrieval set. For example, in a recommender context a user may play the role of the similarity teacher because her selections can be interpreted as retrieval feedback; if our user selects product number 3 in the list of recommendations first then we can conclude that the correct ordering should have placed this product at the top of the list. Stahl,s learning algorithm attempts to minimise the average ordering error in retrieval sets. The work of Cohen et al. [25] looks at the related issue of learning to order items given feedback in the form of preference judgements. They describe technique for automatically learning a preference function to judge how advisable it to rank some item i ahead of item j, and go on to show how a set of items can then
11 Case-Based Recommendation 351 Fig. 11.6. A partial ontology of vacation types can be used as the basis for similarity judgments for non-numeric features. available and codified by a domain knowledge expert. Similarly, importance weights might be assigned by the user at retrieval time or by the domain expert during system design. Hand-coding this knowledge is, of course, expensive and so increasingly researchers have begun to explore how machine learning techniques can be used to relieve these knowledge acquisition costs. A number of researchers have looked at the issue of automatically learning the feature weights that are be used to influence the level of importance of different features during the case similarity calculations. Wettschereck and Aha [102], for example, describe an evaluation of a number of weight-learning algorithms that are knowledgepoor, in the sense that they avoid the need for detailed domain knowledge to drive the learning process. They show how even these knowledge-poor techniques can result in significant improvements in case-based classification tasks and present a general framework for understanding and evaluating different weight-learning approaches; see also the work of [42, 81] for approaches to local weight-learning in CBR based on reinforcement learning. Stahl [96] also looks at feature-weight learning but describes an alternative approach in which feedback is provided by a “similarity teacher” whose job it is to evaluate the ordering a given retrieval set. For example, in a recommender context a user may play the role of the similarity teacher because her selections can be interpreted as retrieval feedback; if our user selects product number 3 in the list of recommendations first then we can conclude that the correct ordering should have placed this product at the top of the list. Stahl’s learning algorithm attempts to minimise the average ordering error in retrieval sets. The work of Cohen et al. [25] looks at the related issue of learning to order items given feedback in the form of preference judgements. They describe a technique for automatically learning a preference function to judge how advisable it is to rank some item i ahead of item j, and go on to show how a set of items can then