L Chen P Pu navigate through the multi-dimensional space. An important interface component in FindMe is called tweaking, which allows users to critique the current recommendation by selecting one of the proposed simple tweaks(e. g, ""cheaper", "bigger"an When a user finds the current recommendation short of her expectations and responds to a tweak, the remaining candidates will be filtered to leave only those candidates satisfying the tweak c The critique suggestions in FindMe are called unit critiques since each of them constrains on a single feature at a time. More recently, a so-called dynamic critiquing method(Reilly et al. 2004; McCarthy et al. 2004a)has been developed with the objective of automatically generating a set of compound critiques, each of which can operate over multiple features simultaneously (e. g, ""Different Manufacture, Lower Resolution and Cheaper). A live-user trial showed that the integration of the dynamic critiquing method can effectively reduce users' intention cycles from an average of 29 in purely applying unit critiques to 6(McCarthy et al. 2005c). The compound critiques can also perform as explanations, revealing the ng recommendation opportunities except for the current product(Reilly et al. 2005). Therefore, we use the Dynamic Critiquing system as the representative to illustrate the main components lat a single-item system--suggested critiquing system may comprise 2.1.1 Dynamic Critiquing Figure 2 shows a sample Dynamic Critiquing interface where both unit and compound critiques are available to users as feedback options(Reilly et al. 2004; McCarthy et al. 2005c). It can be seen that the Dynamic Critiquing interface mainly contains three components: a single item as the current recommendation, a unit critiquing area and a list of compound critiques In the first recommendation cycle, an item that best matches the user's initially stated preferences is returned, and then after each critiquing action, a new item that satisfies the user's critique as well as being most similar to the previous recommended product will be shown as the current recommendation In the unit critiquing area, the system determines a set of main features, one of which users can choose to critique at a time. For each numerical feature(e. g, price). two critiquing directions are provided: increasing the value (e.g, more expensive)or decreasing it(e.g, cheaper), and for discrete features(e. g, manufacture) all of the relevant options are displayed under a drop-down menu. Therefore, this area performs more like a user-initiated unit critiquing support, rather than a limited small set of unit critique suggestions as in FindMe systems The list of three compound critiques are automatically computed by discoverin the recurring subsets of unit differences between the current recommended item and he remaining products using a data mining algorithm called Apriori(Agrawal et al 1993). More concretely, each remaining product, except the current recommendation, is first converted into a critique pattern indicating its differences from the current recommended product in terms of all main features(e.g,I(manufacture, =),(price <),(weight, >).)) Since there will be a number of critique patterns represent ing all of the remaining products, the Apriori algorithm is employed to discover the frequent association rules among features within these patterns. A set of compound
172 L. Chen, P. Pu navigate through the multi-dimensional space. An important interface component in FindMe is called tweaking, which allows users to critique the current recommendation by selecting one of the proposed simple tweaks (e.g., “cheaper”, “bigger” and “nicer”). When a user finds the current recommendation short of her expectations and responds to a tweak, the remaining candidates will be filtered to leave only those candidates satisfying the tweak. The critique suggestions in FindMe are called unit critiques since each of them constrains on a single feature at a time. More recently, a so-called dynamic critiquing method (Reilly et al. 2004; McCarthy et al. 2004a) has been developed with the objective of automatically generating a set of compound critiques, each of which can operate over multiple features simultaneously (e.g., “Different Manufacture, Lower Resolution and Cheaper”). A live-user trial showed that the integration of the dynamic critiquing method can effectively reduce users’ intention cycles from an average of 29 in purely applying unit critiques to 6 (McCarthy et al. 2005c). The compound critiques can also perform as explanations, revealing the remaining recommendation opportunities except for the current product (Reilly et al. 2005). Therefore, we use the DynamicCritiquing system as the representative to illustrate the main components that a single-item system-suggested critiquing system may comprise. 2.1.1 DynamicCritiquing Figure 2 shows a sample DynamicCritiquing interface where both unit and compound critiques are available to users as feedback options (Reilly et al. 2004; McCarthy et al. 2005c). It can be seen that the DynamicCritiquing interface mainly contains three components: a single item as the current recommendation, a unit critiquing area and a list of compound critiques. In the first recommendation cycle, an item that best matches the user’s initially stated preferences is returned, and then after each critiquing action, a new item that satisfies the user’s critique as well as being most similar to the previous recommended product will be shown as the current recommendation. In the unit critiquing area, the system determines a set of main features, one of which users can choose to critique at a time. For each numerical feature (e.g., price), two critiquing directions are provided: increasing the value (e.g., more expensive) or decreasing it (e.g., cheaper), and for discrete features (e.g., manufacture) all of the relevant options are displayed under a drop-down menu. Therefore, this area performs more like a user-initiated unit critiquing support, rather than a limited small set of unit critique suggestions as in FindMe systems. The list of three compound critiques are automatically computed by discovering the recurring subsets of unit differences between the current recommended item and the remaining products using a data mining algorithm called Apriori (Agrawal et al. 1993). More concretely, each remaining product, except the current recommendation, is first converted into a critique pattern indicating its differences from the current recommended product in terms of all main features (e.g., {(manufacture, =), (price, <), (weight, >),…}). Since there will be a number of critique patterns representing all of the remaining products, the Apriori algorithm is employed to discover the frequent association rules among features within these patterns. A set of compound 123
Interaction design guidelines found according to your preferences anon Powershot s2 Is Digital Camera[ Ads padm One recommended item ldjut your preferenees to find the right camera for you Manufaeturer xEnon +5JMEol 412 User-initiated unit Flash Memony+ 16MB criti LcD Sereen size Thiel nese 429 weighs 40479 have more matching cameras with the following 1. Less Cpbca Zoom an Thrner and Leter weigrt System-suggested mpound critiques Fig. 2 The DynamicCritiquing interface critique options(as the frequent association rules) will be then produced. For example supposing the occurrence of heavier laptops is highly frequently associated with the occurrence of cheaper prices in the remaining items, a compound critique with the form of ([weight >] [price <l)(i.e, heavier and cheaper)will be generated. Thus, the Dynamic Critiquing agent uses the Apriori algorithm to discover the highest recurring compound critiques representative of a given data set. It then favors those candidates with the lowest support values("support value "refers to the percentage of products that satisfy the critique). Such selection criterion was motivated by the fact that pre- senting critiques with lower support values provides a good balance between their kely applicability to the user and their ability to narrow the search(McCarthy et al 2004a,2005bc) In addition to functioning as critique suggestions, the dynamically generated com pound critiques have been also regarded as explanations exposing the recommendation opportunities that exist in the available products(McCarthy et al. 2004b; Reilly et al 2005). They may help users be familiar with the product domain and understand the relationship among different features within the alternatives. Users can be then stim- ulated to express more preferences or be prevented from making retrieval failures (Reilly et al. 2005) 2.2 K-item user-initiated critiquing Instead of suggesting pre-computed critiques for users to select, the purely user initiated critiquing approach focuses on showing examples and stimulating users to define critiques themselves. It does not limit the size of critiques a user can manip- ulate during each cycle, so that the user can post either unit or compound critiques
Interaction design guidelines 173 One recommended item User-initiated unit critiquing System-suggested compound critiques Fig. 2 The DynamicCritiquing interface critique options (as the frequent association rules) will be then produced. For example, supposing the occurrence of heavier laptops is highly frequently associated with the occurrence of cheaper prices in the remaining items, a compound critique with the form of {[weight >], [price <]} (i.e., heavier and cheaper) will be generated. Thus, the DynamicCritiquing agent uses the Apriori algorithm to discover the highest recurring compound critiques representative of a given data set. It then favors those candidates with the lowest support values (“support value” refers to the percentage of products that satisfy the critique). Such selection criterion was motivated by the fact that presenting critiques with lower support values provides a good balance between their likely applicability to the user and their ability to narrow the search (McCarthy et al. 2004a, 2005b,c). In addition to functioning as critique suggestions, the dynamically generated compound critiques have been also regarded as explanations exposing the recommendation opportunities that exist in the available products (McCarthy et al. 2004b; Reilly et al. 2005). They may help users be familiar with the product domain and understand the relationship among different features within the alternatives. Users can be then stimulated to express more preferences or be prevented from making retrieval failures (Reilly et al. 2005). 2.2 K-item user-initiated critiquing Instead of suggesting pre-computed critiques for users to select, the purely userinitiated critiquing approach focuses on showing examples and stimulating users to define critiques themselves. It does not limit the size of critiques a user can manipulate during each cycle, so that the user can post either unit or compound critiques 123
L Chen P Pu over any combination of features with freedom. In fact, the purpose of this type of critiquing support is to assist users in freely executing tradeoff navigation, which a process shown to improve users'decision accuracy and confidence(Pu and Kumar 2004: Pu and Chen 2005). The Expert Clerk(Shimazu 2001), ATA (Automated Travel Assistant)(Linden et al 1997) and Smart client( Pu and Faltings 2000) were all exam- ples of such systems. Nguyen et al. 2004 realized the idea mainly to support on-tour recommendations for mobile users Such system is mainly composed of two components: a recommender agent that computes a set of k items that best match the user's current preference model, and a critiquing component that allows the user to actively create critiquing criteria and then examine a new set of k tradeoff alternatives. Expert Clerk and atA display three items at a time. whereas Smart Client returned seven items in its recent versions. Users can select any of the displayed items and navigate to products that offer tradeoff potentials. As for the critiquing aid, ExpertClerk provides a natural language dialog to request for users'feedback, ATA stated that it developed a graphical interface but without detailed description, and Smart Client has constantly improved the usability of its critiquing facility through user evaluations. We have chosen a latest version of Smart Client, called Example Critiquing, to explain the typical constructs of a k-item user-initiated critiquing system 2. 2. 1 Example critiquing Smart Client was originally developed as an online preference-based search tool for finding flights(Pu and Faltings 2000: Torrens et al. 2002). Its elementary model is he example-and-critiquing interaction, which was subsequently applied to product catalogs of vacation packages, insurance policies, apartments, and more recent com- mercial products such as tablet PCs and digital cameras(Pu and Faltings 2004; Pu and Kumar 2004; Chen and Pu 2006) In the latest Example Critiquing system, the recommendation part can be further divided into two sub-components: the first set of recommendations computed accord ng to the user's initial preferences, and the set of tradeoff alternatives recommended after each critiquing process. For example, for product catalogs of digital cameras and tablet PCs, k items(e. g,k= 7)are displayed in both cases. The number k was determined according to(Faltings et al. 2004)that discussed the optimal number of displayed solutions based on catalog sizes In the critiquing panel(see Fig 3), three radio buttons are next to each main feature facilitating users to choose to"keep"its value, "improve"it, or accept a compromised value suggested by the system (i.e, via"Take any suggestion). In particular, users can freely compose compound critiques by combining criteria on any set of mul- tiple features. The interface also supports users to perform simple similari critiquing(e.g,""show similar products with this one")by just keeping all current values, or define concrete value improvements on features(for example, under the "Improve"dropdown menu of price, there are options"S100 cheaper", ""$200 cheaper This kind of critiquing support has been also named as tradeoff assistance in some elated literatures(Pu and Kumar 2004; Chen and Pu 2006), since it is in nature to
174 L. Chen, P. Pu over any combination of features with freedom. In fact, the purpose of this type of critiquing support is to assist users in freely executing tradeoff navigation, which is a process shown to improve users’ decision accuracy and confidence (Pu and Kumar 2004; Pu and Chen 2005). The ExpertClerk (Shimazu 2001), ATA (Automated Travel Assistant) (Linden et al. 1997) and SmartClient (Pu and Faltings 2000) were all examples of such systems. Nguyen et al.2004 realized the idea mainly to support on-tour recommendations for mobile users. Such system is mainly composed of two components: a recommender agent that computes a set of k items that best match the user’s current preference model, and a critiquing component that allows the user to actively create critiquing criteria and then examine a new set of k tradeoff alternatives. ExpertClerk and ATA display three items at a time, whereas SmartClient returned seven items in its recent versions. Users can select any of the displayed items and navigate to products that offer tradeoff potentials. As for the critiquing aid, ExpertClerk provides a natural language dialog to request for users’ feedback, ATA stated that it developed a graphical interface but without detailed description, and SmartClient has constantly improved the usability of its critiquing facility through user evaluations. We have chosen a latest version of SmartClient, called ExampleCritiquing, to explain the typical constructs of a k-item user-initiated critiquing system. 2.2.1 ExampleCritiquing SmartClient was originally developed as an online preference-based search tool for finding flights (Pu and Faltings 2000; Torrens et al. 2002). Its elementary model is the example-and-critiquing interaction, which was subsequently applied to product catalogs of vacation packages, insurance policies, apartments, and more recent commercial products such as tablet PCs and digital cameras (Pu and Faltings 2004; Pu and Kumar 2004; Chen and Pu 2006). In the latest ExampleCritiquing system, the recommendation part can be further divided into two sub-components: the first set of recommendations computed according to the user’s initial preferences, and the set of tradeoff alternatives recommended after each critiquing process. For example, for product catalogs of digital cameras and tablet PCs, k items (e.g., k = 7) are displayed in both cases. The number k was determined according to (Faltings et al. 2004) that discussed the optimal number of displayed solutions based on catalog sizes. In the critiquing panel (see Fig. 3), three radio buttons are next to each main feature, facilitating users to choose to “keep” its value, “improve” it, or accept a compromised value suggested by the system (i.e., via “Take any suggestion”). In particular, users can freely compose compound critiques by combining criteria on any set of multiple features. The interface also supports users to perform simple similarity-based critiquing (e.g., “show similar products with this one”) by just keeping all current values, or define concrete value improvements on features (for example, under the “Improve” dropdown menu of price, there are options “$100 cheaper”, “$200 cheaper”, etc.). This kind of critiquing support has been also named as tradeoff assistance in some related literatures (Pu and Kumar 2004; Chen and Pu 2006), since it is in nature to 123
Interaction design guidelines tatem argot c 的1材1 随m的,1m,22?m items(k=7) 品二 [ fnd slmA produets with bentar vaues than thi The product user selected to critique 2mM段1mMmm,1m User-initiated critiquing facility sam pes for creating unit or Osteal zoom compound Removable Flash Maman o te MB 2 Fig. 3 The Example Critiquing interfaces facilitate a user to specify tradeoff criteria: improving on one or several attributes that are important to her, while accepting compromised values on less important ones Tradeoff process involving only one feature(unit critique)or multiple features(com pound critique)are respectively termed as simple and complex tradeoffs by Pu and Kumar(2004) The search engine of computing recommended alternatives is adjusted for different decision environments. For configurable products, it employs sophisticated constraint satisfaction algorithms and models user preferences as soft constraints (Torrens et al 2002). For multi-attribute products, it is in theory grounded on the Weighted Additive sum rule (WADD), a compensatory decision strategy for explicitly resolving con cting values(Payne et al. 1993). As required by WADD, the users preferences are structured as a set of (attribute's acceptable value, relative importance)pairs After a user specifies her initial preferences, all alternatives will be ranked by their weighted utilities, and the top k items best matching the user's stated requirements will
Interaction design guidelines 175 k recommended items (k = 7) The product user selected to critique User-initiated critiquing facility for creating unit or compound critiques Fig. 3 The ExampleCritiquing interfaces facilitate a user to specify tradeoff criteria: improving on one or several attributes that are important to her, while accepting compromised values on less important ones. Tradeoff process involving only one feature (unit critique) or multiple features (compound critique) are respectively termed as simple and complex tradeoffs by Pu and Kumar (2004). The search engine of computing recommended alternatives is adjusted for different decision environments. For configurable products, it employs sophisticated constraint satisfaction algorithms and models user preferences as soft constraints (Torrens et al. 2002). For multi-attribute products, it is in theory grounded on the Weighted Additive sum rule (WADD), a compensatory decision strategy for explicitly resolving con- flicting values (Payne et al. 1993). As required by WADD, the user’s preferences are structured as a set of (attribute’s acceptable value, relative importance) pairs. After a user specifies her initial preferences, all alternatives will be ranked by their weighted utilities, and the top k items best matching the user’s stated requirements will 123
L Chen P Pu be returned. Among the initial set of recommendations, the user either accepts a result, or takes a near solution to activate the critiquing panel(by clicking on the button "Value Comparison"along with the product, see Fig 3). Once the critiquing criteria have been built in the critiquing panel, the system will refine the user's preference model and adjust the relative importance of all critiqued attributes (i.e, the weight of improved attribute(s)will be increased and that of compromised attribute(s)will be decreased) The search engine will then apply a combination of elimination-by-aspect(EBA)and ADD strategy(Payne et al. 1993). The combined strategy begins with EBA to first eliminate products that do not reach the minimal acceptable value (i.e, cutoff) of the improved attribute(s), and WADD is then applied to examine the remaining alternatives in more detail to select ones that best satisfy all of the user's tradeoff criteria. This example-and-critiquing process completes one cycle of interaction, and it continues as long as the user wants to refine the results. 3 Control variables In a summary, the components contained by both Dynamic Critiquing and E xample Critiquing can be categorized into two independent variables: the number of recom mendations that users could examine at a time based on which to perform critiquing, and the critiquing aid by which users could specify specific feedback criteria. As introduced before, two typical combinations of the two variables are single-item system-suggested critiquing and k-item user-initiated critiquing, but there should be more combination possibilities. In this section, we mainly discuss each variable's possible values 3.1 Critiquing coverage(the number of recommendations) Here we refer the critiquing coverage to the number of example products that are recommended to users for them to choose the final choice or critiqued object. In the Example Critiquing system, multiple examples are displayed during each recommen- dation cycle, because its objective is to stimulate users to make self-initiated critiques On the contrary, the lich system-suggested critiques are generated. This simple display strat egy has the advantage of not overwhelming users with too much information, but it deprives users of the right of choosing their own interested critiquing product, and potentially brings them the risk of engaging in a longer interaction session The critiquing coverage can be further separated into two sub-variables: the number of the first rounds recommendations right after users' initial preference specification (called Nr), and the number of items (i.e, tradeoff alternatives)in the later cycle afte each critiquing action(called NCR). The two numbers can be equal or different. For example, in Dynamic Critiquing and Example Critiquing, they are both equal to l or 7 It is also possible to set them differently, for example, NIR as I and NCR as 7 if users are only interested in one best matching product according to their initial preferences, but would like to see multiple alternatives comparable with their critiqued reference once critiquing a product
176 L. Chen, P. Pu be returned. Among the initial set of recommendations, the user either accepts a result, or takes a near solution to activate the critiquing panel (by clicking on the button “Value Comparison” along with the product, see Fig. 3). Once the critiquing criteria have been built in the critiquing panel, the system will refine the user’s preference model and adjust the relative importance of all critiqued attributes (i.e., the weight of improved attribute(s) will be increased and that of compromised attribute(s) will be decreased). The search engine will then apply a combination of elimination-by-aspect (EBA) and WADD strategy (Payne et al. 1993). The combined strategy begins with EBA to first eliminate products that do not reach the minimal acceptable value (i.e., cutoff) of the improved attribute(s), and WADD is then applied to examine the remaining alternatives in more detail to select ones that best satisfy all of the user’s tradeoff criteria. This example-and-critiquing process completes one cycle of interaction, and it continues as long as the user wants to refine the results. 3 Control variables In a summary, the components contained by both DynamicCritiquing and E xampleCritiquing can be categorized into two independent variables: the number of recommendations that users could examine at a time based on which to perform critiquing, and the critiquing aid by which users could specify specific feedback criteria. As introduced before, two typical combinations of the two variables are single-item system-suggested critiquing and k-item user-initiated critiquing, but there should be more combination possibilities. In this section, we mainly discuss each variable’s possible values. 3.1 Critiquing coverage (the number of recommendations) Here we refer the critiquing coverage to the number of example products that are recommended to users for them to choose the final choice or critiqued object. In the ExampleCritiquing system, multiple examples are displayed during each recommendation cycle, because its objective is to stimulate users to make self-initiated critiques. On the contrary, the FindMe and DynamicCritiquing agent only returns one product based on which system-suggested critiques are generated. This simple display strategy has the advantage of not overwhelming users with too much information, but it deprives users of the right of choosing their own interested critiquing product, and potentially brings them the risk of engaging in a longer interaction session. The critiquing coverage can be further separated into two sub-variables: the number of the first round’s recommendations right after users’ initial preference specification (called NIR), and the number of items (i.e., tradeoff alternatives) in the later cycle after each critiquing action (called NCR). The two numbers can be equal or different. For example, in DynamicCritiquing and ExampleCritiquing, they are both equal to 1 or 7. It is also possible to set them differently, for example, NIR as 1 and NCR as 7 if users are only interested in one best matching product according to their initial preferences, but would like to see multiple alternatives comparable with their critiqued reference once critiquing a product. 123