Decision Support Systems 48(2010)470-479 Contents lists available at science Direct Decision Support Systems ELSEVIER journalhomepagewww.elsevier.com/locate/dss Maximizing customer satisfaction through an online recommendation system: a novel associative classification model Yuanchun Jiang a b, * Jennifer Shang Yezheng Liua. c School of management, Hefei University of Technology. Hefei, Anhui 230009, China b The Joseph M. Katz Graduate School of Business, University of Pittsburgh, Pittsburgh,PA 15260,USA Key Laboratory of process Optimization and Intelligent Decision Making, Ministry of Education, Hefei, Anhui 230009, China ARTICLE O A BSTRACT Available online 17 June 2009 Offering online personalized recommendation services helps improve customer satisfaction. Conventionally a recommendation system is considered as a success if clients purchase the recommended products. However, the act of purchasing itself does not guarantee satisfaction and a truly successful recommendation system hould be one that maximizes the customer's after-use gratification. By employing an innovative associative classification method, we are able to predict a customers ultimate pleasure. Based on customers character- ion Rating tics, a product will be recommended to the potential buyer if our model predicts his/her satisfaction level will be high. The feasibility of the proposed recommendation system is validated through laptop Inspiron 1525. o 2009 Elsevier B V. All rights reserved. 1 Introduction recommendation system as successful if customers end up purchasing the suggested product(s). However, buying a product does not neces- important factors that impact a customers product selection and satis- scenario bel e client is pleased wi Personalization of product information has become one of the most sarily imply the client is pleased with the product. Let,'s consider a faction in todays competitive and challenging market. Personalized James is in need of a laptop computer. He visits online stores to look service requires firms to understand customers and offer goods or services for information and compare prices and performance of various lap- that meet their needs. Successful firms are those that provide the right tops. Between the two laptop series, Inspiron 1525 and Aspire 5735, products to the right customers at the right time and for the right price. James is uncertain which one would best fit his needs. He decides to As a type of information technology aimed to support personalized turn to the recommendation system for help. After gaining knowledge service, recommendation systems are widely used by e-commerce of James's needs and personal profile, the system recommends the actitioners and have become an important research topic in infor- Inspiron 1525. Once James follows the advice and makes his purchase, mation sciences and decision support systems [ 25]. Recommendation the recommendation system deems that it did a great job because systems are decision aids that analyze customers prioronline behavior James bought the laptop it recommended. However, after I weeks use and present information on products to match customers preferences. of the laptop, James writes a review as follows: ".a good product, but Through analyzing the patrons purchase history or communicating with not the one I really want, "It turns out James is not content with the them, recommendation systems employ quantitative and qualitative recommendation. This exemplifies the case that a customer may have methods to discover the products that best suit the customer. Most of purchased the recommended product(s), but the recommendation the current recommendation systems recommend products that have system was not successful in pleasing the customer-its ultimate goal. a high probability of being purchased [3]. They employ content-based It is therefore clear that a customers acceptance of a recommendation filtering(CBF)[41 ] collaborative filtering (CF)[18. and other data min- is not equivalent to its success. A recommendation system must endure ing techniques, for example, decision tree [12, association rule 38. the test of time. Only when customers claim that the products are what and semantic approach[25. Other literature focuses on the influence they like after their practical usage can one claim that the system has of recommendation systems on customer's purchase behavior 3.32]. made effective recommendations. This requires not only matching They argue that the recommendation decision should be based not on customers'needs, but also satisfying customers'wants. In other words purchase probability, but rather on the sensitivity of purchase proba- the recommendation system should only recommend a product if its bility due to the recommendation action. Common wisdom regard satisfaction rating is predicted to be high. How can a customers satisfaction of a specific product be measured Corresponding author. School of Management, Hefei University of Technology, and attained? The rapid development of e-commerce affords us an opportunity to predict customers' reactions after they use a product. E-mailaddress:yuanchunjiang@gmail.com(YJiang Many online stores, such as Amazon. com and DelLcom encourage 0167-9236/S- see front matter o 2009 Elsevier B V. All rights reserved oi:10.1016/ds200906.006
Maximizing customer satisfaction through an online recommendation system: A novel associative classification model Yuanchun Jiang a,b, ⁎, Jennifer Shang b , Yezheng Liu a,c a School of Management, Hefei University of Technology, Hefei, Anhui 230009, China b The Joseph M. Katz Graduate School of Business, University of Pittsburgh, Pittsburgh, PA 15260, USA c Key Laboratory of Process Optimization and Intelligent Decision Making, Ministry of Education, Hefei, Anhui 230009, China article info abstract Available online 17 June 2009 Keywords: Online recommendation Customer satisfaction Associative classification Rating classification Offering online personalized recommendation services helps improve customer satisfaction. Conventionally, a recommendation system is considered as a success if clients purchase the recommended products. However, the act of purchasing itself does not guarantee satisfaction and a truly successful recommendation system should be one that maximizes the customer's after-use gratification. By employing an innovative associative classification method, we are able to predict a customer's ultimate pleasure. Based on customer's characteristics, a product will be recommended to the potential buyer if our model predicts his/her satisfaction level will be high. The feasibility of the proposed recommendation system is validated through laptop Inspiron 1525. © 2009 Elsevier B.V. All rights reserved. 1. Introduction Personalization of product information has become one of the most important factors that impact a customer's product selection and satisfaction in today's competitive and challenging market. Personalized service requires firms to understand customers and offer goods or services that meet their needs. Successful firms are those that provide the right products to the right customers at the right time and for the right price. As a type of information technology aimed to support personalized service, recommendation systems are widely used by e-commerce practitioners and have become an important research topic in information sciences and decision support systems [25]. Recommendation systems are decision aids that analyze customer's prior online behavior and present information on products to match customer's preferences. Through analyzing the patron's purchase history or communicating with them, recommendation systems employ quantitative and qualitative methods to discover the products that best suit the customer. Most of the current recommendation systems recommend products that have a high probability of being purchased [3]. They employ content-based filtering (CBF) [41], collaborative filtering (CF) [18], and other datamining techniques, for example, decision tree [12], association rule [38], and semantic approach [25]. Other literature focuses on the influence of recommendation systems on customer's purchase behavior [3,32]. They argue that the recommendation decision should be based not on purchase probability, but rather on the sensitivity of purchase probability due to the recommendation action. Common wisdom regards a recommendation system as successful if customers end up purchasing the suggested product(s). However, buying a product does not necessarily imply the client is pleased with the product. Let's consider a scenario below. James is in need of a laptop computer. He visits online stores to look for information and compare prices and performance of various laptops. Between the two laptop series, Inspiron 1525 and Aspire 5735, James is uncertain which one would best fit his needs. He decides to turn to the recommendation system for help. After gaining knowledge of James's needs and personal profile, the system recommends the Inspiron 1525. Once James follows the advice and makes his purchase, the recommendation system deems that it did a great job because James bought the laptop it recommended. However, after 1week's use of the laptop, James writes a review as follows: “…a good product, but not the one I really want.” It turns out James is not content with the recommendation. This exemplifies the case that a customer may have purchased the recommended product(s), but the recommendation system was not successful in pleasing the customer—its ultimate goal. It is therefore clear that a customer's acceptance of a recommendation is not equivalent to its success. A recommendation system must endure the test of time. Only when customers claim that the products are what they like after their practical usage can one claim that the system has made effective recommendations. This requires not only matching customers' needs, but also satisfying customers' wants. In other words, the recommendation system should only recommend a product if its satisfaction rating is predicted to be high. How can a customer's satisfaction of a specific product be measured and attained? The rapid development of e-commerce affords us an opportunity to predict customers' reactions after they use a product. Many online stores, such as Amazon.com and Dell.com encourage Decision Support Systems 48 (2010) 470–479 ⁎ Corresponding author. School of Management, Hefei University of Technology, Hefei, Anhui 230009, China. E-mail address: yuanchunjiang@gmail.com (Y. Jiang). 0167-9236/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.dss.2009.06.006 Contents lists available at ScienceDirect Decision Support Systems j o u r n a l h om e p a g e : www. e l s ev i e r. c om / l o c a t e / d s s
Y Jiang et al. / Decision Support Systems 48(2010)470-479 The proposed model Which laptop will please the Which laptop the customer may customer most? be interested in buying Note Good Fig 1. Differences between the existing recommendation systems and the proposed model customers to write online reviews on their websites; information from 2. 1. Recommendation systems these reviews is then often used to support a firms product strategy and customer relationship management [ 11, 13 In the online reviews. Since the development of the first recommendation system by customers can discuss their needs, preferences, personal profile, and Goldberg and colleagues [17 various recommendation systems and voice their opinions about a product as poor, average, or good. From such related technologies such as CBF and CF[18, 41] have been reported. need-rating data, it is easy to obtain personalized information and Among them, the user-based collaborative filtering (CF)[23is customers'after-usesatisfactionleveloftheproductUsingpersonalsuccessfullyadoptedbyAmazoncomandDell.com.Itfindsasimilar information and responses, the online store can more accurately predict user group for the target buyer and recommends products that have customers' true sentiments toward a specific product, and recommend a been rated by users in the reference group but not yet viewed by the more suitable product for the potential customer to enjoy. target buyer. However, the user-based CF has some limitations. One is This research proposes a rating classification model to estimate a its difficulty in measuring the similarities between users, and the potential customer's satisfaction level. It builds a rating classifier for a other is the scalability issue. As the number of customers and products product by discovering rules from the need-rating database collected increases, the computation time of algorithms grows exponentially for the product. The rules imply the Co-relationship between cus- [21. The item-based CF[16 was proposed to overcome the scalability tomers' needs, preferences, demographic profile, and their ratings for problem as it calculates item similarities in an offline basis. It assumes sifier will predict his/her response toward the recommended product related to the items that he/she has already purchased Gre similar or the product. For a new customer with specific characteristics, the clas- that a user will be more likely to purchase items that and categorize it into certain class labels, such as poor, average and good. The predicted ratings estimate the customer's satisfaction level for the product. Differences between the existing recommendation Table 1 systems and the proposed one are illustrated in Fig. 1. Summary of research methods on recommendation system and associative classification. This research proposes a novel associative classification model, which (a) Motivation and objectives of warious recommendation systems first mines multi-class classification information from need-rating data, Literature Motivation then constructs a rating classifier, and finally predicts customers' ratings (2, 10, 16, 17,20,, 23,411 Which products meet the Recommend product for products. We organize the rest of the paper as follows In Section 2 we customer's preferences best? vith high probability review the literature of recommendation systems and associative 3.32 What is the influence of Recommend produ classification models. Section 3 proposes the innovative methodology to which are receptive to ustomer's purchase behavior? the recommendation. ddress the rating classification problem. A case study used to illustrate the This paper Which products can achieve a effectiveness of the proposed model is given in Section 4. Section 3 high after-use satisfaction level? with high after-use comprises the Summary, conclusions, and future research. satisfaction leveL (b) Comparing associative classification models 2. Literature review class rules lation system and associative classification. A summary of relatin- [26,31,36.371 The literature review focuses on two perspectives: the recomm research methods are given in Table 1 and explained in detail below. and this paper√
customers to write online reviews on their websites; information from these reviews is then often used to support a firm's product strategy and customer relationship management [11,13]. In the online reviews, customers can discuss their needs, preferences, personal profile, and voice their opinions about a product as poor, average, or good. From such need-rating data, it is easy to obtain personalized information and customers' after-use satisfaction level of the product. Using personal information and responses, the online store can more accurately predict customers' true sentiments toward a specific product, and recommend a more suitable product for the potential customer to enjoy. This research proposes a rating classification model to estimate a potential customer's satisfaction level. It builds a rating classifier for a product by discovering rules from the need-rating database collected for the product. The rules imply the co-relationship between customers' needs, preferences, demographic profile, and their ratings for the product. For a new customer with specific characteristics, the classifier will predict his/her response toward the recommended product and categorize it into certain class labels, such as poor, average and good. The predicted ratings estimate the customer's satisfaction level for the product. Differences between the existing recommendation systems and the proposed one are illustrated in Fig. 1. This research proposes a novel associative classification model, which first mines multi-class classification information from need-rating data, then constructs a rating classifier, and finally predicts customers' ratings for products. We organize the rest of the paper as follows. In Section 2 we review the literature of recommendation systems and associative classification models. Section 3 proposes the innovative methodology to address the rating classification problem. A case study used toillustrate the effectiveness of the proposed model is given in Section 4. Section 3 comprises the Summary, conclusions, and future research. 2. Literature review The literature review focuses on two perspectives: the recommendation system and associative classification. A summary of relative research methods are given in Table 1 and explained in detail below. 2.1. Recommendation systems Since the development of the first recommendation system by Goldberg and colleagues [17], various recommendation systems and related technologies such as CBF and CF [18,41] have been reported. Among them, the user-based collaborative filtering (CF) [23] is successfully adopted by Amazon.com and Dell.com. It finds a similar user group for the target buyer and recommends products that have been rated by users in the reference group but not yet viewed by the target buyer. However, the user-based CF has some limitations. One is its difficulty in measuring the similarities between users, and the other is the scalability issue. As the number of customers and products increases, the computation time of algorithms grows exponentially [21]. The item-based CF [16] was proposed to overcome the scalability problem as it calculates item similarities in an offline basis. It assumes that a user will be more likely to purchase items that are similar or related to the items that he/she has already purchased. Fig. 1. Differences between the existing recommendation systems and the proposed model. Table 1 Summary of research methods on recommendation system and associative classification. (a) Motivation and objectives of various recommendation systems Literature Motivation Objective [2,10,16,17,20,21,23,41] Which products meet the customer's preferences best? Recommend products with high probability. [3,32] What is the influence of recommendation systems on customer's purchase behavior? Recommend products which are receptive to the recommendation. This paper Which products can achieve a high after-use satisfaction level? Recommend products with high after-use satisfaction level. (b) Comparing associative classification models Literature Mine multiclass rules Classify using multiple rules Provide classification reasons [26,31,36,37] √ [24] √ [27] and this paper √√ √ Y. Jiang et al. / Decision Support Systems 48 (2010) 470–479 471
Y Jiang et al. Decision Support Systems 48(2010)470-479 The content-based filtering(CBF)method applies content analysis to the classification results of distinct rules. CSMC is useful rget items. Target items are described by their attributes, such as color, multiple rules-especially conflicting ones-to multi-class classification. shape, and material. The users profile is constructed by analyzing his/her However, it has deficiencies as well. That is, even though CSMc retains ponses to questionnaires, his/her rating of products, and navigation multi-class information in the process of rule acquisition, much of it is story. The recommendation system proposes items that have high lost in rule pruning. Consequently, the evidence bodies used in therating correlations with a users profile. However, a pure CBF system also has its classification are often inaccurate and classification results suffer. shortcomings. One is that users can only receive recommendations similar Moreover, CSMC employs a rough set method to derive evidence to their earlier experiences. The other is that some items, such as music, weights. Although using evidence weights may significantly improve photographs, and multimedia, are hard to analyze [10]. Based on CF and classification accuracy. its computation is very time-consuming and CBE, new data mining techniques employing decision tree, association negatively affects the efficiency of classification. ule, regression model, and Markov chain have been introduced to In this research, we propose a new algorithm to address the rating recommend movies and books [ 2], support one-to-one online market- classification problem. Compared with CSMC, the proposed algorith Unlike those recommending products based on likelihood of pur- and it can derive attribute weights much more efficiently. Detaileng ing 21, and attract customers for the tourism industry [20 can preserve most of the useful multi-class information after pru chase, Bodapati 3 argued that the recommendation decision should our algorithm are discussed next. also examine a customer's sensitivity to such a recommendation. He built a model to measure the role of recommendation systems in 3. The proposed methodology modifying customers' purchase behavior relative to what the custom- ers would have done without such recommendation interventions. 3. 1. The solution framework Although the extant recommendation systems may recommend ac ceptable products to customers, they share a common view that the To construct an efficient and powerful rating classifier for a specific act of purchase itself equates to the customers'satisfaction, which product, we follow three phases as outlined in Fig. 2. The first phase is could be far from the truth, as evidenced by James example earlier. to mine need-rating rules, where retaining multi-class information is the main task. Since conflicts and ambiguities are often present in the 2. 2. Associative classification and the combination strategy for multi- need-rating database, the rules discovered must be able to deal with contradictory facts and uncertainties, and to arrive at multi-class in formation. The second phase calculates the weight for each rule. A Classification is an important management task. Many methods such weight measures the importance of a need-rating rule in the rating as the agent-based approach 34, decision tree 30, and data classification problem. In this phase, computational efficiency is crucial, envelopment analysis(developed by professor Cooper[14))have been due to the presence of various rules useful for classifier construction. proposed to solve the decision analysis problems in various fields As- The last phase is to develop the rating classifier, whose goal is to re- sociative classification is a relatively new classification method whose commend products that yield high customer satisfaction. In this phase, aim is to apply the Apriori algorithm 1 to mine association rules and all factors having influences on the potential customers'satisfaction onstruct associative classifiers [26]. Rule mining will find the asso- levels are considered. Since customers of the same need may voice very ations between attributes(rule preconditions)and ratings (results). In different opinions for the same product, it is important to make multi associative classification, the support degree is defined as the ratio of the class ratings available. Predicting after-use ratings along with corres- number of objects satisfying a specific rule precondition and having a ponding likelihoods provides potential customers a valuable purchase pecific rating result over the total number of objects in the database. guideline, which significantly enhances the odds of customer satisfaction. The confidence degree is similar to the support degree except that the Although traditional associative classification methods could generate number of all objects satisfying the specific rule precondition is used as reasonable classification results, they suffer from three weak points when the denominator. The discovered rules are pruned to attain a minimal classifying customers' ratings. First, they do not accommodate conflicting rule set necessary to cover training data and achieve sufficient ratings. Customers of the same needs and preferences may have very accuracy [37]. Although associative classification methods may derive differentopinions(ratings)for the same product. As discussed in Section 3, more accurate classification results than other methods, they have a traditional associative classification methods resolve the situation by few drawbacks 27. First is related to multi-class classification: The retaining only therule that has the highest confidence level As a result, not associative classification methods available today do not have enough enough multi-class information is preserved to deal with the conflict multi-class information to build multi-class classifiers because all nature. The second weakness is the prediction tactic To predict accurately conflicting rules are removed 31. For example, P1-Cl and P1- C2 are a classifier needs to consider a customers needs, preferences, and two conflicting rules having the same precondition Pi but different demographics. However, traditional prediction relies only on one optimal classifications, c and C2, with confidence degrees of 51% and 49%, rule, which is not enough to attain accurate classification. Thirdly, respectively. Traditional associative classification methods will delete traditional methods classify each case without explanations, that is, they ule P1- C2 because its confidence level is lower than that of P1-Cr. simply list the applicable rule. Such classification is somewhat arbitrary. When conflicts occur, only the rule with the highest confidence level is ambiguous, and irrational. Since customers of the same need may give retained; the competing rules having lower probabilities are all different ratings to the same product, a classification method must removed. Another flaw is that they cannot easily identify an optimal capable of predicting the probabilities of attaining different customer maximizes the measure, such as the support degree, confidence deficiency. Details are describ algorithm aims to overcome the above rule when classifying a new case [24. An optimal rule is the one that ratings. The proposed classificatio degree, or interesting degree as defined by the user. The difficulty with the traditional methods is that different measures may result in 3. 2. Mine need-rating rules different optimal rules. To overcome the above weakness, Liu and his colleagues [27 have The need-rating data is first organized into a data table I=(0, AuC proposed a combination strategy for multi-class classification(CSMC). where O is the set of objects(customers), o is the number of customers in CSMC retains most of the conflicting rules and employs multiple asso- L. A is the set of attributes, A=(Al. Ah AyI. Al is the number of ciation rules to construct classifiers for new cases. After acquiring attributes in A, each Ah. h=1, 2,A, is a factor/criterion or a customer conflicting rules and calculating their weights, the evidential reasoning characteristic. C corresponds to a set of class labels, C=(Cr. cg-.. ga. q approach proposed by Yang and colleagues[40 is employed to combine is the number of dasses in l, each cg,g=1, 2,C denotes a rating grade
The content-based filtering (CBF) method applies content analysis to target items. Target items are described by their attributes, such as color, shape, and material. The user's profile is constructed by analyzing his/her responses to questionnaires, his/her rating of products, and navigation history. The recommendation system proposes items that have high correlations with a user's profile. However, a pure CBF system also has its shortcomings. One is that users can only receive recommendations similar to their earlier experiences. The other is that some items, such as music, photographs, and multimedia, are hard to analyze [10]. Based on CF and CBF, new data mining techniques employing decision tree, association rule, regression model, and Markov chain have been introduced to recommend movies and books [2], support one-to-one online marketing [21], and attract customers for the tourism industry [20]. Unlike those recommending products based on likelihood of purchase, Bodapati [3] argued that the recommendation decision should also examine a customer's sensitivity to such a recommendation. He built a model to measure the role of recommendation systems in modifying customers' purchase behavior relative to what the customers would have done without such recommendation interventions. Although the extant recommendation systems may recommend acceptable products to customers, they share a common view that the act of purchase itself equates to the customers' satisfaction, which could be far from the truth, as evidenced by James' example earlier. 2.2. Associative classification and the combination strategy for multiclass classification Classification is an important management task. Many methods such as the agent-based approach [34], decision tree [30], and data envelopment analysis (developed by professor Cooper [14]) have been proposed to solve the decision analysis problems in various fields. Associative classification is a relatively new classification method whose aim is to apply the Apriori algorithm [1] to mine association rules and construct associative classifiers [26]. Rule mining will find the associations between attributes (rule preconditions) and ratings (results). In associative classification, the support degree is defined as the ratio of the number of objects satisfying a specific rule precondition and having a specific rating result over the total number of objects in the database. The confidence degree is similar to the support degree except that the number of all objects satisfying the specific rule precondition is used as the denominator. The discovered rules are pruned to attain a minimal rule set necessary to cover training data and achieve sufficient accuracy [37]. Although associative classification methods may derive more accurate classification results than other methods, they have a few drawbacks [27]. First is related to multi-class classification: The associative classification methods available today do not have enough multi-class information to build multi-class classifiers because all conflicting rules are removed [31]. For example, P1→c1 and P1→c2 are two conflicting rules having the same precondition P1 but different classifications, c1 and c2, with confidence degrees of 51% and 49%, respectively. Traditional associative classification methods will delete rule P1→c2 because its confidence level is lower than that of P1→c1. When conflicts occur, only the rule with the highest confidence level is retained; the competing rules having lower probabilities are all removed. Another flaw is that they cannot easily identify an optimal rule when classifying a new case [24]. An optimal rule is the one that maximizes the measure, such as the support degree, confidence degree, or interesting degree as defined by the user. The difficulty with the traditional methods is that different measures may result in different optimal rules. To overcome the above weakness, Liu and his colleagues [27] have proposed a combination strategy for multi-class classification (CSMC). CSMC retains most of the conflicting rules and employs multiple association rules to construct classifiers for new cases. After acquiring conflicting rules and calculating their weights, the evidential reasoning approach proposed by Yang and colleagues [40]is employed to combine the classification results of distinct rules. CSMC is useful since it applies multiple rules—especially conflicting ones—to multi-class classification. However, it has deficiencies as well. That is, even though CSMC retains multi-class information in the process of rule acquisition, much of it is lost in rule pruning. Consequently, the evidence bodies used in the rating classification are often inaccurate and classification results suffer. Moreover, CSMC employs a rough set method to derive evidence weights. Although using evidence weights may significantly improve classification accuracy, its computation is very time-consuming and negatively affects the efficiency of classification. In this research, we propose a new algorithm to address the rating classification problem. Compared with CSMC, the proposed algorithm can preserve most of the useful multi-class information after pruning and it can derive attribute weights much more efficiently. Details of our algorithm are discussed next. 3. The proposed methodology 3.1. The solution framework To construct an efficient and powerful rating classifier for a specific product, we follow three phases as outlined in Fig. 2. The first phase is to mine need-rating rules, where retaining multi-class information is the main task. Since conflicts and ambiguities are often present in the need-rating database, the rules discovered must be able to deal with contradictory facts and uncertainties, and to arrive at multi-class information. The second phase calculates the weight for each rule. A weight measures the importance of a need-rating rule in the rating classification problem. In this phase, computational efficiency is crucial, due to the presence of various rules useful for classifier construction. The last phase is to develop the rating classifier, whose goal is to recommend products that yield high customer satisfaction. In this phase, all factors having influences on the potential customers' satisfaction levels are considered. Since customers of the same need may voice very different opinions for the same product, it is important to make multiclass ratings available. Predicting after-use ratings along with corresponding likelihoods provides potential customers a valuable purchase guideline, which significantly enhances the odds of customer satisfaction. Although traditional associative classification methods could generate reasonable classification results, they suffer from three weak points when classifying customers' ratings. First, they do not accommodate conflicting ratings. Customers of the same needs and preferences may have very different opinions (ratings) for the same product. As discussedin Section 3, traditional associative classification methods resolve the situation by retaining only the rule that has the highest confidencelevel. As a result, not enough multi-class information is preserved to deal with the conflict nature. The second weakness is the prediction tactic. To predict accurately, a classifier needs to consider a customer's needs, preferences, and demographics. However, traditional prediction relies only on one optimal rule, which is not enough to attain accurate classification. Thirdly, traditional methods classify each case without explanations, that is, they simply list the applicable rule. Such classification is somewhat arbitrary, ambiguous, and irrational. Since customers of the same need may give different ratings to the same product, a classification method must be capable of predicting the probabilities of attaining different customer ratings. The proposed classification algorithm aims to overcome the above deficiency. Details are described next. 3.2. Mine need-rating rules The need-rating data is first organized into a data table I=(O, A∪C), whereOis the set of objects (customers), |O| is the number of customers in I. A is the set of attributes, A={A1,…, Ah,…, A|A|}, |A| is the number of attributes in A, each Ah, h=1,2,…,|A|, is a factor/criterion or a customer characteristic. C corresponds to a set of class labels, C={c1,…, cg,…, c|C|}, |C| is the number of classes in I, each cg, g=1,2,…,|C|, denotes a rating grade. 472 Y. Jiang et al. / Decision Support Systems 48 (2010) 470–479
Y Jiang et al. / Decision Support Systems 48(2010)470-479 Need- rating data composed of customers'needs, preferences, personal profile and ratings about the Mine need-rating rules and retain useful 2Organize all rules into a set of classification Imulti-class information xperts, with conflicting rules being in one expert 3 Prune redund 4 Calculate the weights of evidence bodies using Measure the importance of need-rating rules K nthe neural network approach 5 Find the necessary classification experts for the potential customer's ratings 6 Apply the integrated strategy to predict ratings of the potential customers General phases The proposed algorithm Fig. 2. The proposed rating classification framework. In data table L customers are described by customer characteristics, Ri consists of all the need-rating rules which have the same pre- preferences, and ratings for the product. For example, a middle-aged condition but different ratings. That is, it includes all the multi-class customer who prefers a laptop with average central processing unit(CPU) information associated with Pim. For convenience, we rewrite Ri as: peed, good battery life, and rates the laptop Inspiron a good R:P→E, where product can be described as follows: (Age= Midle A' cPU= Average'A'Battery =Good,)A 'Rating Good. P=Pum: E-G, 1. on/ 1 )V-vI(Gum, con/um)V-VI(GLIRAI, COn/ 1R )v(B, conte) The rule mining algorithm first scans the data table once to mine confum is the confidence degree of Pim=CLm, which is larger than the need-rating rules involving one attribute in rule preconditions. It then minimal confidence threshold, and e is the frame of discernment recursively combines the need-rating rules generated to extract rules defined in the evidence theory. The confi e is the belief degree assigned involving more attributes. The support and confidence degrees forrul to catchall(default) terms since they cannot be classified to any spe- are calculated simultaneously. Any rules with support and confidence cific ratings by rules in R. degrees larger than the threshold are saved as need-rating rules For example,P1→c;P1→c2andP1→c3 are three conflicting rules with the same precondition. Their confidence degrees are 49%, 43%, and 8% confi e=1-2 confi.m respectively. If the minimal confidence threshold is 40%, then rules P1→C1andP1→c2 would be retained, whereas p1→c3 is pruned. After a comprehensive set of rules are mined, the redundant ones In evidence theory, evidence bodies are given by experts and are deleted. The proposed prune process eliminates redundancy to consist of hypothesis-probability pairs. For example, the evidence ensure that succinct need-rating rules are derived, and all necessary body may look like: (Good, 0.6), (Average, 0.3).(poor, 0.1). In the multi-class information is preserved experts opinion, the probability of receiving a good evaluation is 60% an average evaluation is 30%, and a poor evaluation is 10%. Recall that 3.2. 1. Derivation of classification experts confim can be treated as a belief degree given by expert R to a hypo- Let r be the set of need-rating rules discovered from data table l: thesized rating Cum, based on observed attribute values in P. Therefore, Ei is regarded as the evidence body provided by classification expert R R={P1→C1,-,Pa→ca-,PR→c1 The set of need-rating rules is transformed into a set of classi fication experts, R=[RI.Rm. R.. Rr), which satisfy the following where R is the number of rules in R Precondition Pa consists of the constraints attribute-values in data table l, and result ca is the corresponding rating. For any subset R of the rule setR, Ri= r R1={B1→C1,-,Pm→Cm,P1→c1} (2) RnR =p, for any i and j, i*j. where Ril is the number of rules in R Pim-Cum is the mth rule in R. R is a classification expert in the rating classification problem if the fol- The set of evidence bodies corresponding to R is denoted as E lowing constraints are satisfied E…,E…,E1……ErJ. To this point, we have transformed all the need rating rules into T independent classification experts. However, not all (1)P1=-=Pm=-PR classification experts are necessary for the recommendation system; For any other rules(Pa→cd)∈R,P≠Pm some are redundant. In the next subsection, we develop pruning i≠-≠Cm≠-≠CR methods to remove the redundant classification experts
In data table I, customers are described by customer characteristics, preferences, and ratings for the product. For example, a middle-aged customer who prefers a laptop with average central processing unit (CPU) speed, good battery life, and rates the laptop Inspiron 1525 as a good product can be described as follows: ð 0 Age = Middle0 ∧ 0 CPU = Average0 ∧ 0 Battery = Good0 Þ ∧ 0 Rating = Good0 : The rule mining algorithm first scans the data table once to mine need-rating rules involving one attribute in rule preconditions. It then recursively combines the need-rating rules generated to extract rules involving more attributes. The support and confidence degrees for rules are calculated simultaneously. Any rules with support and confidence degrees larger than the threshold are saved as need-rating rules. For example, P1→c1, P1→c2, and P1→c3 are three conflicting rules with the same precondition. Their confidence degrees are 49%, 43%, and 8%, respectively. If the minimal confidence threshold is 40%, then rules P1→c1 and P1→c2 would be retained, whereas P1→c3 is pruned. After a comprehensive set of rules are mined, the redundant ones are deleted. The proposed prune process eliminates redundancy to ensure that succinct need-rating rules are derived, and all necessary multi-class information is preserved. 3.2.1. Derivation of classification experts Let R be the set of need-rating rules discovered from data table I: R = fP1→c1; ⋯; Pd→cd; ⋯; Pj Rj→cj Rj g where |R| is the number of rules in R. Precondition Pd consists of the attribute-values in data table I, and result cd is the corresponding rating. For any subset Ri of the rule set R, Ri = fPi;1→ci;1; ⋯; Pi;m→ci;m; ⋯; Pi; j Ri j→ci; j Ri j g where |Ri| is the number of rules in Ri, Pi,m→ci,m is the mth rule in Ri. Ri is a classification expert in the rating classification problem if the following constraints are satisfied: (1) Pi,1=⋯=Pi,m=⋯Pi,|Ri | (2) For any other rules (Pd→cd)∉Ri, Pd≠Pi,m (3) ci,1≠⋯≠ci,m≠⋯≠ci,|Ri |. Ri consists of all the need-rating rules which have the same precondition but different ratings. That is, it includes all the multi-class information associated with Pi,m. For convenience, we rewrite Ri as: Ri: Pi→Ei, where Pi = Pi;m; Ei = ðci;1; confi;1Þ∨⋯∨ðci;m; confi;mÞ∨⋯∨ðci; jRi j ; confi; j Ri j Þ∨ðΘ; confi;Θ Þ confi,m is the confidence degree of Pi,m→ci,m, which is larger than the minimal confidence threshold, and Θ is the frame of discernment defined in the evidence theory. The confi,Θ is the belief degree assigned to catchall (default) terms since they cannot be classified to any specific ratings by rules in Ri: confi;Θ = 1− ∑ jRi j m= 1 confi;m In evidence theory, evidence bodies are given by experts and consist of hypothesis–probability pairs. For example, the evidence body may look like: {(Good, 0.6), (Average, 0.3), (poor, 0.1)}. In the expert's opinion, the probability of receiving a good evaluation is 60%, an average evaluation is 30%, and a poor evaluation is 10%. Recall that confi,m can be treated as a belief degree given by expert Ri to a hypothesized rating ci,m, based on observed attribute values in Pi. Therefore, Ei is regarded as the evidence body provided by classification expert Ri. The set of need-rating rules is transformed into a set of classi- fication experts, R={R1,…,Ri,…, Rj,…, RT}, which satisfy the following constraints: (1) ∪ T i= 1 Ri = R (2) Ri∩Rj = φ, for any i and j, i≠j. The set of evidence bodies corresponding to R is denoted as E= {E1,…,Ei,…, Ej,…, ET}. To this point, we have transformed all the needrating rules into T independent classification experts. However, not all classification experts are necessary for the recommendation system; some are redundant. In the next subsection, we develop pruning methods to remove the redundant classification experts. Fig. 2. The proposed rating classification framework. Y. Jiang et al. / Decision Support Systems 48 (2010) 470–479 473
Y Jiang et al. Decision Support Systems 48(2010)470-479 3.2.2. Classification expert pruning through an improved algorithm Many methods such as support vector machine(SVM)[33] and The goal of pruning classification experts is to generate a minimal information theory 9 have been used to measure attribute weights set of classification experts which can cover all customers in data table Neural network is one of the most efficient methods for deriving the L. Before presenting the pruning algorithm, we define the order of a factor weights [19, 22]. For a rating classification problem, the number relationship on classification experts. of possible attribute sets in rule preconditions totals 2A-1, which is For two classification experts R and rr is said to be inferior to R, equivalent to the number of times traditional data mining techniques that is, Ri>R, if need to train the datasets in order to obtain the weight for each evi- dence body. This is very time-consuming and computationally pro- 1 PiCP hibitive for real-time recommendation systems. In this paper, we For V(Cin. conf,n)EE. =(Cim, confim)EE, CLm is the same employ a NN-based method, which requires training only once with need-rating data to derive the evidence weights. For need-rating data L, we first find a trained neural network N The relationship R>R implies that classification experts R and R with the entire set of attributes A, A=(A1. AhAva), as its input provide the same classification outcome, but R gives the classification Then the output ratings for lol customers in data table I, denoted as information based on a simpler precondition. The classification expert NC=(ncI.,nck. nq o), nck EC, k=1,2,.Ol can be calculated. Sup- Ry is more restrictive but not more powerful relative to R. Hence, R, is pose the real ratings of the customers are RC=[rcr.,rck,, qo) redundant and should be removed to improve the quality of the clas- where rck eC is the rating of the kth customer in O. The network's sifier Fig 3 shows the pruning algorithm used to improve the quality accuracy, denoted do can be calculated as follow of classification fication expert has more complex preconditions but does not provide k=1 added information, then delete it. A simpler precondition is favorable since it provides a more powerful classification capability. The next The weights of evidence bodies can then be computed according to pruning step uses the data covering method [26]. Classification expert the classification experts in CES. Suppose the precondition of classi- Rs is necessary for customer c if Rs is a matching classification expert fication expert R consists of B, B=(A A. 2..AijB the accuracy d forc and there is not another matching classification expert whose pre- without the attributes in Bi are computed by simply setting the con- condition includes that of Rs. Furthermore, we remove the classifica- nection weights from the input attributes ( A, A, 2.A B of the trained tion experts which do not meet the support or confidence threshold. network to zero. Finally, the difference between di and do is found to measure the influence of attribute set (A, A, ,. ALIB) to the classi 3.3. Measure the importance of evidence bodies fication. The greater the influence of the attribute set is to the classi cation, the bigger is its weight. After classification experts are generated, the next step is to deter- Key steps of the nn algorithm are outlined below. mine the weights of the evidence bodies given by classification experts In the traditional evidence theory, evidence weights are consistent (1)Let A=(Al., Ah,..., Av be the set of all input attributes and be the re with human experts'knowledge and experience. If a human expert is(2)Train the neural network N to maximize the network accuracy authoritative and familiar with the decision problem, the evidence body given by her/ him will be reliable. In this research, the evidence do with A as input such that it achieves a network as accurately bodies are derived from different attribute sets. Thus we use the weights of attributes to infer the importance of evidence bodies (3)For i= 1, 2,.,ICESI. let N be a network whose weights are as (a) For all the inputs except (A, A,2,.A, JB,), assign the weights of Ni equal to the weights of N put: The classification expert set(R)and data table(D) )Set the weights from input (AL -A,,Ai B to zero Output: Final classification expert set(CES) Compute the output of n N, denoted as NC=(ncLIs, nuke, ncuol, and the accuracy of Ni, denot The pruning algorithm (4)Compute the influence of B, to the network accuracy w =do-dA. I. For each classification expert R in CES (5)If i>CESL, go to(6), otherwise, set i=i+ 1 and go to (3) if彐R∈CES,R (6) The derived W.,Win., WcEs are the weights of the evidence CES= CES\R, bodies, where wi is the weight of evidence body E. end if The proposed NN method it possible to derive the End for quickly and efficiently without falling into the trap of local mi It therefore attains more accurate evidence weights and impro 2. For each classification expert R, in CES efficiency of the rating classification model For each customer c in data table I 3. 4. Construct rating classifier for potential customers If R, is a necessary for cO,← count+ End if Given the classification experts, evidence bodies, and evidence End fo g system is ready to predict the ratings of End for customers. For a potential customer c, the system identifies the neces- Delete the classification experts in CES which sary classification experts first. For example, R and R are two matching dissatisfy the following constraint: classification experts whose preconditions are 'Age= Middle'A count.> threshold CPU=AverageA 'Battery=Good for R, and Age=Middle'ACPU= Average'for R. R could be extracted from customers whose precon Fig 3. The classification expert pruning algorithm. ditions satisfy Age= Middle’∧tPU= Average'∧ Battery= Good'or
3.2.2. Classification expert pruning through an improved algorithm The goal of pruning classification experts is to generate a minimal set of classification experts which can cover all customers in data table I. Before presenting the pruning algorithm, we define the order of a relationship on classification experts. For two classification experts Ri and Rj, Rj is said to be inferior to Ri, that is, Ri≻Rj, if 1. Pi⊂Pj 2. For ∀ (cj,n, confj,n)∈Ej, ∃ (ci,m, confi,m)∈Ei, ci,m is the same as cj,n and confj,n= confi,m. The relationship Ri≻Rj implies that classification experts Ri and Rj provide the same classification outcome, but Ri gives the classification information based on a simpler precondition. The classification expert Rj is more restrictive but not more powerful relative to Ri. Hence, Rj is redundant and should be removed to improve the quality of the classifier. Fig. 3 shows the pruning algorithm used to improve the quality of classification experts. The first step prunes inferior classification experts. A classification expert Rj should be pruned if ∃Ri∈CES, Ri≻Rj. That is, if the classi- fication expert has more complex preconditions but does not provide added information, then delete it. A simpler precondition is favorable since it provides a more powerful classification capability. The next pruning step uses the data covering method [26]. Classification expert Rs is necessary for customer c if Rs is a matching classification expert for c and there is not another matching classification expert whose precondition includes that of Rs. Furthermore, we remove the classification experts which do not meet the support or confidence threshold. 3.3. Measure the importance of evidence bodies After classification experts are generated, the next step is to determine the weights of the evidence bodies given by classification experts. In the traditional evidence theory, evidence weights are consistent with human experts' knowledge and experience. If a human expert is authoritative and familiar with the decision problem, the evidence body given by her/him will be reliable. In this research, the evidence bodies are derived from different attribute sets. Thus we use the weights of attributes to infer the importance of evidence bodies. Many methods such as support vector machine (SVM) [33] and information theory [9] have been used to measure attribute weights. Neural network is one of the most efficient methods for deriving the factor weights [19,22]. For a rating classification problem, the number of possible attribute sets in rule preconditions totals 2|A| −1, which is equivalent to the number of times traditional data mining techniques need to train the datasets in order to obtain the weight for each evidence body. This is very time-consuming and computationally prohibitive for real-time recommendation systems. In this paper, we employ a NN-based method, which requires training only once with need-rating data to derive the evidence weights. For need-rating data I, we first find a trained neural network N with the entire set of attributes A, A= {A1,…, Ah,…, A|A|}, as its input. Then the output ratings for |O| customers in data table I, denoted as NC= {nc1,…, nck,…, nc|O|}, nck∈C, k= 1,2,…,|O|, can be calculated. Suppose the real ratings of the customers are RC= {rc1,…, rck,…, rc|O|}, where rck∈C is the rating of the kth customer in O. The network's accuracy, denoted d0 can be calculated as follows: d0 = ∑ jOj k= 1 vk; vk = 1 If nck is the same as rck 0 Otherwise : ( The weights of evidence bodies can then be computed according to the classification experts in CES. Suppose the precondition of classi- fication expert Ri consists of Bi, Bi= {Ai,1,Ai,2,…Ai,|Bi |}, the accuracy di without the attributes in Bi are computed by simply setting the connection weights from the input attributes {Ai,1,Ai,2…Ai,|Bi |}of the trained network to zero. Finally, the difference between di and d0 is found to measure the influence of attribute set {Ai,1,Ai,2,…Ai,|Bi |} to the classi- fication. The greater the influence of the attribute set is to the classi- fication, the bigger is its weight. Key steps of the NN algorithm are outlined below. (1) Let A= {A1,…, Ah,…, A|A|} be the set of all input attributes and RC be the real ratings. (2) Train the neural network N to maximize the network accuracy d0 with A as input such that it achieves a network as accurately as possible. (3) For i= 1, 2,…, |CES|, let Ni be a network whose weights are as follows: (a) For all the inputs except {Ai,1,Ai,2,…Ai,|Bi |}, assign the weights of Ni equal to the weights of N. (b) Set the weights from input {Ai,1,Ai,2,…Ai,|Bi |}to zero. Compute the output of network Ni, denoted as NCi= {nci,1,…, nci,k,…, nci,|O|}, and the accuracy of Ni, denoted as di. (4) Compute the influence of Bi to the network accuracy: wi=d0 − di. (5) If i≥|CES|, go to (6), otherwise, set i=i+ 1 and go to (3). (6) The derived {w1,…, wi,…, w|CES|} are the weights of the evidence bodies, where wi is the weight of evidence body Ei. The proposed NN method makes it possible to derive the weights quickly and efficiently without falling into the trap of local minimum. It therefore attains more accurate evidence weights and improves the efficiency of the rating classification model. 3.4. Construct rating classifier for potential customers Given the classification experts, evidence bodies, and evidence weights, the recommendation system is ready to predict the ratings of customers. For a potential customer c, the system identifies the necessary classification experts first. For example, Ri and Rj are two matching classification experts whose preconditions are ‘Age=Middle’ ∧ ‘CPU=Average’ ∧ ‘Battery=Good’ for Ri, and ‘Age=Middle’ ∧ ‘CPU= Average’ for Rj. Rj could be extracted from customers whose preconFig. 3. The classification expert pruning algorithm. ditions satisfy ‘Age=Middle’ ∧ ‘CPU=Average’ ∧ ‘Battery=Good’ or 474 Y. Jiang et al. / Decision Support Systems 48 (2010) 470–479