User Model User-Adap Inter(2009)19: 167-206 DOI10.1007/s11257-008-9057-x ORIGINAL PAPER Interaction design guidelines on critiquing-based recommender systems Li Chen· Pearl Pu Received: 19 September 2007/ Accepted in revised form: 25 August 2008/ Published online: 3 October 2008 O Springer Science+Business Media B V. 2008 Abstract A critiquing-based recommender system acts like an artificial salesperson. It engages users in a conversational dialog where users can provide feedback in the form of critiques to the sample items that were shown to them. The feedback, in turn, enables the system to refine its understanding of the users preferences and prediction of what the user truly wants. The system is then able to recommend products that may better stimulate the user's interest in the next interaction cycle. In this paper, we report our extensive investigation of comparing various approaches in devising critiquing opportunities designed in these recommender systems. More specifically, we have investigated two major design elements which are necessary for a critiquing based recommender system: critiquing coverage--one vs. multiple items that are returned during each recommendation cycle to be critiqued; and critiquing aid- system-suggested critiques(i. e, a set of critique suggestions for users to select)vs user-initiated critiquing facility (i.e, facilitating users to create critiques on their own) Through a series of three user trials, we have measured how real-users reacted to systems with varied setups of the two elements. In particular, it was found that giving users the choice of critiquing one of multiple items(as opposed to just one) has significantly positive impacts on increasing users' decision accuracy(particularly in the first recommendation cycle)and saving their objective effort(in the later critiquing ycles). As for critiquing aids, the hybrid design with both system-suggested critiques and user-initiated critiquing support exhibits the best performance in inspiring users decision confidence and increasing their intention to return, in comparison with the uncombined exclusive approaches. Therefore, the results from our studies shed light L Chen(四)·PP Human Computer Interaction Group, School of Computer and Communication Sciences Swiss Federal Institute of Technology in Lausanne(EPFL), 1015 Lausanne, Switzerland e-mail: li chen( epfl. ch P Pu e-mail: pearl- pu(@epfl. ch
User Model User-Adap Inter (2009) 19:167–206 DOI 10.1007/s11257-008-9057-x ORIGINAL PAPER Interaction design guidelines on critiquing-based recommender systems Li Chen · Pearl Pu Received: 19 September 2007 / Accepted in revised form: 25 August 2008 / Published online: 3 October 2008 © Springer Science+Business Media B.V. 2008 Abstract A critiquing-based recommender system acts like an artificial salesperson. It engages users in a conversational dialog where users can provide feedback in the form of critiques to the sample items that were shown to them. The feedback, in turn, enables the system to refine its understanding of the user’s preferences and prediction of what the user truly wants. The system is then able to recommend products that may better stimulate the user’s interest in the next interaction cycle. In this paper, we report our extensive investigation of comparing various approaches in devising critiquing opportunities designed in these recommender systems. More specifically, we have investigated two major design elements which are necessary for a critiquingbased recommender system: critiquing coverage—one vs. multiple items that are returned during each recommendation cycle to be critiqued; and critiquing aid— system-suggested critiques (i.e., a set of critique suggestions for users to select) vs. user-initiated critiquing facility (i.e., facilitating users to create critiques on their own). Through a series of three user trials, we have measured how real-users reacted to systems with varied setups of the two elements. In particular, it was found that giving users the choice of critiquing one of multiple items (as opposed to just one) has significantly positive impacts on increasing users’ decision accuracy (particularly in the first recommendation cycle) and saving their objective effort (in the later critiquing cycles). As for critiquing aids, the hybrid design with both system-suggested critiques and user-initiated critiquing support exhibits the best performance in inspiring users’ decision confidence and increasing their intention to return, in comparison with the uncombined exclusive approaches. Therefore, the results from our studies shed light L. Chen (B) · P. Pu Human Computer Interaction Group, School of Computer and Communication Sciences, Swiss Federal Institute of Technology in Lausanne (EPFL), 1015 Lausanne, Switzerland e-mail: li.chen@epfl.ch P. Pu e-mail: pearl.pu@epfl.ch 123
L Chen P Pu on the design guidelines for determining the sweetspot balancing user initiative and system support in the development of an effective and user-centric critiquing-based recommender system. Keywords Critiquing-based recommender systems. Decision support Preference revision. User control. Example critiquing Dynamic critiquing. Hybrid critiquing User evaluation Usability. Human-computer interaction 1 Introduction According to adaptive decision theory(Payne et al. 1993), the human decision process is inherently highly constructive and adaptive to the current decision task and decision environment. In particular, when users are confronted with an unfamiliar product domain or a complex decision situation with overwhelming information, such as the current e-commerce environment, they are usually unable to accurately state their preferences at the outset( Viappiani et al. 2007) but likely construct them in a highly context-dependent fashion during their decision process(Tversky and Simonson 1993 Payne et al. 1999; Carenini and Poole 2002) In order to assist people in making accurate as well as confident decisions, espec in the complex decision setting, critiquing-based recommender systems have emerged in the form of both natural language models(Shimazu 2001; Thompson et al. 2004)and graphical user interfaces(Burke et al. 1996, 1997; Reilly et al. 2004; Pu and Kumar 2004). This type of system has been broadly recognized as an effective feedback mechanism that may guide users to efficiently target at their ideal products, which is particularly meaningful when users are searching for high-involvement products(e.g omputers, houses and cars) with the primary goal of avoiding any financial damage Other terms for these systems are conversational recommender systems(Smyth and McGinty 2003), conversational case-based reasoning systems(Shimazu 2001), and knowledge-based recommender systems(Burke et al. 1997; Burke 2000) More specifically, the critiquing-based recommender system mainly acts like an artificial salesperson that engages users in a conversational dialog where users can provide feedback in form of critiques(e.g, "I like this laptop, but prefer something heaper"or"with faster processor speed") to one of currently recommended items The feedback, in turn, enables the system to more accurately predict what the user truly wants and then return some products that may better interest the user in the next conversational cycle. The main component of this interaction model is therefore that of recommendation-and-critiquing, which is also called tweaking(Burke et al. 1997), critiquing feedback(Smyth and McGinty 2003), candidate/critiquing(Linden et al 1997), and navigation by proposing(Shimazu 2001) To our knowledge, the critiquing concept was first mentioned in the RABBITsystem (Williams and Tou 1982)as a new interface paradigm for formulating queries to a data- base. In recent years, it has evolved into two principal branches. One has been aimin to pro-actively generate a set of knowledge-based critiques that users may be prepared to accept as ways to improve the current pr this paper). This mechanism has been adopted in FindMe systems(Burke et al. 1997)
168 L. Chen, P. Pu on the design guidelines for determining the sweetspot balancing user initiative and system support in the development of an effective and user-centric critiquing-based recommender system. Keywords Critiquing-based recommender systems · Decision support · Preference revision · User control · Example critiquing · Dynamic critiquing · Hybrid critiquing · User evaluation · Usability · Human–computer interaction 1 Introduction According to adaptive decision theory (Payne et al. 1993), the human decision process is inherently highly constructive and adaptive to the current decision task and decision environment. In particular, when users are confronted with an unfamiliar product domain or a complex decision situation with overwhelming information, such as the current e-commerce environment, they are usually unable to accurately state their preferences at the outset (Viappiani et al. 2007) but likely construct them in a highly context-dependent fashion during their decision process (Tversky and Simonson 1993; Payne et al. 1999; Carenini and Poole 2002). In order to assist people in making accurate as well as confident decisions, especially in the complex decision setting, critiquing-based recommender systems have emerged in the form of both natural language models (Shimazu 2001; Thompson et al. 2004) and graphical user interfaces (Burke et al. 1996, 1997; Reilly et al. 2004; Pu and Kumar 2004). This type of system has been broadly recognized as an effective feedback mechanism that may guide users to efficiently target at their ideal products, which is particularly meaningful when users are searching for high-involvement products (e.g., computers, houses and cars) with the primary goal of avoiding any financial damage. Other terms for these systems are conversational recommender systems (Smyth and McGinty 2003), conversational case-based reasoning systems (Shimazu 2001), and knowledge-based recommender systems (Burke et al. 1997; Burke 2000). More specifically, the critiquing-based recommender system mainly acts like an artificial salesperson that engages users in a conversational dialog where users can provide feedback in form of critiques (e.g., “I like this laptop, but prefer something cheaper” or “with faster processor speed”) to one of currently recommended items. The feedback, in turn, enables the system to more accurately predict what the user truly wants and then return some products that may better interest the user in the next conversational cycle. The main component of this interaction model is therefore that of recommendation-and-critiquing, which is also called tweaking (Burke et al. 1997), critiquing feedback (Smyth and McGinty 2003), candidate/critiquing (Linden et al. 1997), and navigation by proposing (Shimazu 2001). To our knowledge, the critiquing concept was first mentioned in the RABBIT system (Williams and Tou 1982) as a new interface paradigm for formulating queries to a database. In recent years, it has evolved into two principal branches. One has been aiming to pro-actively generate a set of knowledge-based critiques that users may be prepared to accept as ways to improve the current product (termed system-suggested critiquesin this paper). This mechanism has been adopted in FindMe systems (Burke et al. 1997) 123
Interaction design guidelines and more recent Dynamic Critiquing agents(Reilly et al. 2004; McCarthy et al. 2005c) The advantage, as detailed in related literatures(Reilly et al. 2004; McCarthy et al. 2004b: Mc Sherry 2004), is that system-suggested critiques can not only expose the knowledge of remaining recommendation opportunities, but also potentially accel- erate the user's critiquing process if they can correspond well to the user's intended feedback criteria An alternative critiquing mechanism does not propose pre-computed critiques, but provides a facility to stimulate users to freely create and combine critiques themselves (so called user-initiated critiquing support in this paper). As a typical application, the Example Critiquing agent has been developed for this goal, and its focus is showing examples and facilitating users to compose their self-initiated critiques( Pu and Kumar 2004). In essence, the Example Critiquing agent is capable of allowing users to choose which feature(s)to be critiqued and how to critique it (or them)under their own control. Previous work proved that it enabled users to obtain significantly higher decision accuracy and preference certainty, compared to non critiquing-based systems such as a ranked list(Pu and Kumar 2004; Pu and Chen 2005) In addition to characterizing the critiquing-based recommender system in terms of its nature of critiquing support (i. e, system-suggested critiques or user-initiated cri tiquing support), another important factor is the number of items that the systemreturms during each recommendation cycle for users to critique. For example, FindMe and Dynamic Critiquing systems return one item, whereas Example Critiquing agents show multiple k items(e. g, k=7)at a cycle. Multi-item display provides users a chance to choose the product to be critiqued after making a comparison between several options Thus, there are in nature two crucial design components contained in a critiquing- based recommender system. One is its critiquing aid: suggesting critiques for users to select or aiding them to construct their own critiques. Another is the number of recommended items(called critiquing coverage in this paper ): suggesting a single vs. nultiple products for users to critique The options are inherently related to different levels of user control in either the process of identifying the critiqued reference or the process of specifying concrete critiquing criteria. As a matter of fact, perceived behavioral control has been regarded as an important determinant of user beliefs and actual behavior(Ajzen 1991). In the context of e-commerce, it has been found to have a positive effect on customers attitudes including their perceived ease of use, perceived usefulness and trust(Novak et al. 2000; Koufaris and Hampton-Sosa 2002). User control has been also determined as one of the fundamental principles for general user interface design(Shneiderman 1997)and Web usability(Nielsen 1994). However, there are few works having studied the effect of locus of user initiative in critiquing-based recommender systems. There is indeed a complex tradeoff that underlies the successful design: giving users too much control may cause them to perform an unnecessary complex critiquing, whereas giving little or no control may force users to accept system-suggested items even though they do not match users truly-intended choices. The goal of this paper is therefore to investigate the different degrees of user control vs system support in both critiquing aid and critiquing co influence users'actual decision performance and subjective attitudes ld positively erage, so as to identify the optimal combination of components that c
Interaction design guidelines 169 and more recent DynamicCritiquing agents (Reilly et al. 2004; McCarthy et al. 2005c). The main advantage, as detailed in related literatures (Reilly et al. 2004; McCarthy et al. 2004b; McSherry 2004), is that system-suggested critiques can not only expose the knowledge of remaining recommendation opportunities, but also potentially accelerate the user’s critiquing process if they can correspond well to the user’s intended feedback criteria. An alternative critiquing mechanism does not propose pre-computed critiques, but provides a facility to stimulate users to freely create and combine critiques themselves (so called user-initiated critiquing support in this paper). As a typical application, the ExampleCritiquing agent has been developed for this goal, and its focus is showing examples and facilitating users to compose their self-initiated critiques (Pu and Kumar 2004). In essence, the ExampleCritiquing agent is capable of allowing users to choose which feature(s) to be critiqued and how to critique it (or them) under their own control. Previous work proved that it enabled users to obtain significantly higher decision accuracy and preference certainty, compared to non critiquing-based systems such as a ranked list (Pu and Kumar 2004; Pu and Chen 2005). In addition to characterizing the critiquing-based recommender system in terms of its nature of critiquing support (i.e., system-suggested critiques or user-initiated critiquing support), another important factor is the number of items that the system returns during each recommendation cycle for users to critique. For example, FindMe and DynamicCritiquing systems return one item, whereas ExampleCritiquing agents show multiple k items (e.g., k = 7) at a cycle. Multi-item display provides users a chance to choose the product to be critiqued after making a comparison between several options. Thus, there are in nature two crucial design components contained in a critiquingbased recommender system. One is its critiquing aid: suggesting critiques for users to select or aiding them to construct their own critiques. Another is the number of recommended items (called critiquing coverage in this paper): suggesting a single vs. multiple products for users to critique. The options are inherently related to different levels of user control in either the process of identifying the critiqued reference or the process of specifying concrete critiquing criteria. As a matter of fact, perceived behavioral control has been regarded as an important determinant of user beliefs and actual behavior (Ajzen 1991). In the context of e-commerce, it has been found to have a positive effect on customers’ attitudes including their perceived ease of use, perceived usefulness and trust (Novak et al. 2000; Koufaris and Hampton-Sosa 2002). User control has been also determined as one of the fundamental principles for general user interface design (Shneiderman 1997) and Web usability (Nielsen 1994). However, there are few works having studied the effect of locus of user initiative in critiquing-based recommender systems. There is indeed a complex tradeoff that underlies the successful design: giving users too much control may cause them to perform an unnecessary complex critiquing, whereas giving little or no control may force users to accept system-suggested items even though they do not match users’ truly-intended choices. The goal of this paper is therefore to investigate the different degrees of user control vs. system support in both critiquing aid and critiquing coverage, so as to identify the optimal combination of components that could positively influence users’ actual decision performance and subjective attitudes. 123
L Chen P Pu To achieve our goal, we have conducted a series of three trials. In our first user trial we compared two well-known critiquing-based recommender agents which respec tively represent a typical setup combination of critiquing coverage and critiquing aid Concretely, one is the Dynamic Critiquing system that shows one recommended prod- uct during each interaction cycle, accompanied by a user-initiated unit critiquing area d a list of system-suggested compound critiques. Another is the Example Critiquin system that returns multiple products in a display and stimulates users in building and composing critiques to one of the shown products in their self-motivated way. The experimental results show that the Example Critiquing agent achieved significantly higher decision accuracy (in terms of both objective and subjective measures) and users'behavioral intentions (i.e, intention to purchase and return), while requiring lower level of interaction and cognitive effort. In the second trial, we modified Example Critiquing and DynamicCritiquing to make their critiquing coverage (i.e, the number of recommended items during each cycle) constant and keep the difference only on their critiquing aids. The results surprisingly showed that there is no significant difference between the two modified versions in terms of both objective and subjective measures. Further analysis of participants comments revealed the pros and cons of system-suggested critiques and user-initiated critiquing support. Additionally, combining the results with the first trials, we found that giving users the choice of critiquing one of multiple items(as opposed to just one) has significantly positive impacts on increasing their decision accuracy and confidence particularly in the first recommendation cycle and saving objective effort in the later critiquing rounds The third user trial was conducted to measure users' performance in a hybrid cri tiquing system where system-suggested critiques and user-initiated critiquing aid was combined on one screen. Analyzing users'critiquing application frequency in such system shows that the application of user-initiated critiquing support in creating users own critiques is relatively higher than picking suggested critique options. Moreove the respective practical effects of user-initiated and system-suggested critiquing facil ities were identified. That is, they are both significantly contributive to improve users decision confidence and return intention, and system-suggested critiques are even effective in saving effort perception Therefore, all of our trial results infer that giving users multiple recommended products as critiqued options and providing them both system-suggested and user- initiated critiquing aids for specifying concrete critiquing criteria can obtain substantial benefits Another contribution of our work is that we have established a user -evaluation framework. It contains both objective variables such as decision accuracy, task com pletion time and interaction effort, and subjective measures like perceived cognitive ffort, decision confidence and trusting intentions. All of these factors are fundamen tally important, given that a recommender systems ultimate goal should be to allow its users to achieve high decision accuracy and build high trust in it, and require then to expend a minimal amount of effort to obtain these benefits(Haubl and Trifts 2000: Chen and Pu 2005; Pu and Chen 2005) The rest of this paper is organized as follows. We first introduce existing critiquing based recommender systems, with Dynamic Critiquing and Example Critiquing as two
170 L. Chen, P. Pu To achieve our goal, we have conducted a series of three trials. In our first user trial, we compared two well-known critiquing-based recommender agents which respectively represent a typical setup combination of critiquing coverage and critiquing aid. Concretely, one is the DynamicCritiquing system that shows one recommended product during each interaction cycle, accompanied by a user-initiated unit critiquing area and a list of system-suggested compound critiques. Another is the ExampleCritiquing system that returns multiple products in a display and stimulates users in building and composing critiques to one of the shown products in their self-motivated way. The experimental results show that the ExampleCritiquing agent achieved significantly higher decision accuracy (in terms of both objective and subjective measures) and users’ behavioral intentions (i.e., intention to purchase and return), while requiring lower level of interaction and cognitive effort. In the second trial, we modified ExampleCritiquing and DynamicCritiquing to make their critiquing coverage (i.e., the number of recommended items during each cycle) constant and keep the difference only on their critiquing aids. The results surprisingly showed that there is no significant difference between the two modified versions in terms of both objective and subjective measures. Further analysis of participants’ comments revealed the pros and cons of system-suggested critiques and user-initiated critiquing support. Additionally, combining the results with the first trial’s, we found that giving users the choice of critiquing one of multiple items (as opposed to just one) has significantly positive impacts on increasing their decision accuracy and confidence particularly in the first recommendation cycle and saving objective effort in the later critiquing rounds. The third user trial was conducted to measure users’ performance in a hybrid critiquing system where system-suggested critiques and user-initiated critiquing aid was combined on one screen. Analyzing users’ critiquing application frequency in such system shows that the application of user-initiated critiquing support in creating users’ own critiques is relatively higher than picking suggested critique options. Moreover, the respective practical effects of user-initiated and system-suggested critiquing facilities were identified. That is, they are both significantly contributive to improve users’ decision confidence and return intention, and system-suggested critiques are even effective in saving effort perception. Therefore, all of our trial results infer that giving users multiple recommended products as critiqued options and providing them both system-suggested and userinitiated critiquing aids for specifying concrete critiquing criteria can obtain substantial benefits. Another contribution of our work is that we have established a user-evaluation framework. It contains both objective variables such as decision accuracy, task completion time and interaction effort, and subjective measures like perceived cognitive effort, decision confidence and trusting intentions. All of these factors are fundamentally important, given that a recommender system’s ultimate goal should be to allow its users to achieve high decision accuracy and build high trust in it, and require them to expend a minimal amount of effort to obtain these benefits (Häubl and Trifts 2000; Chen and Pu 2005; Pu and Chen 2005). The rest of this paper is organized as follows. We first introduce existing critiquingbased recommender systems, with DynamicCritiquing and ExampleCritiquing as two 123
Interaction design guidelines representatives. According to their respective characteristics, we summarize two main elements that can be varied to reflect different degrees of user control. We then intro- duce a user evaluation framework with major dependent variables measured in our experiments. Detailed descriptions of three user-trials then follow, including their materials, recruited participants, experimental procedures, results analyses and dis- cussions. Finally, we conclude our work and indicate its practical implications and future directions 2 Critiquingbased recommender systems Our investigation of existing critiquing-based recommender systems revealed that they basically follow a similar interaction model(see Fig. 1). The user first specifies her initial preferences on product attributes. The system then returns one or multiple rec- ommended items. either the user selects an item as her final choice and terminates the interaction, or she makes critiques by picking system-suggested critiques or defining critiques herself. If critiques were made, the system updates the recommendation(s) and the list of suggested critiques (if provided) in the next interaction cycle. This process continues until the user decides that she has found her most preferred product Most of existing systems fall into two specific branches: one is called single-item system-suggested critiquing since it recommends one item at a time and guides users to provide feedback by selecting a system-suggested critique; another d k-item tion cycle and: quing, because it provides multiple items during each recommenda- and creating their self-specified critiquing criteria to the product. In the following, we will introduce both approaches in detail with two typical applications as examples 2. 1 Single-item system-suggested critiquing The FindMe system was the first known single-item system-suggested critiquing sys- tem(Burke et al. 1996, 1997). It uses knowledge about the product domain to help user One or multiple example outcomes are 1 displayed Iggested and/or are elicited critiquing Fig. 1 The typical interaction model of a critiquing-based recommender system
Interaction design guidelines 171 representatives. According to their respective characteristics, we summarize two main elements that can be varied to reflect different degrees of user control. We then introduce a user evaluation framework with major dependent variables measured in our experiments. Detailed descriptions of three user-trials then follow, including their materials, recruited participants, experimental procedures, results analyses and discussions. Finally, we conclude our work and indicate its practical implications and future directions. 2 Critiquing-based recommender systems Our investigation of existing critiquing-based recommender systems revealed that they basically follow a similar interaction model (see Fig. 1). The user first specifies her initial preferences on product attributes. The system then returns one or multiple recommended items. Either the user selects an item as her final choice and terminates the interaction, or she makes critiques by picking system-suggested critiques or defining critiques herself. If critiques were made, the system updates the recommendation(s) and the list of suggested critiques (if provided) in the next interaction cycle. This process continues until the user decides that she has found her most preferred product. Most of existing systems fall into two specific branches: one is called single-item system-suggested critiquing since it recommends one item at a time and guides users to provide feedback by selecting a system-suggested critique; another is called k-item user-initiated critiquing, because it provides multiple items during each recommendation cycle and a critiquing aid that assists users in choosing one product to be critiqued and creating their self-specified critiquing criteria to the product. In the following, we will introduce both approaches in detail with two typical applications as examples. 2.1 Single-item system-suggested critiquing The FindMe system was the first known single-item system-suggested critiquing system (Burke et al. 1996, 1997). It uses knowledge about the product domain to help users accept User s initial preferences are elicited One or multiple example outcomes are displayed No more effort required Systemsuggested and/or user-initiated critiquing Fig. 1 The typical interaction model of a critiquing-based recommender system 123