Expert Systems with Applications 38(2011)15344-15355 Contents lists available at Science Direct Expert Systems with Applications ELSEVIER journalhomepagewww.elsevier.com/locate/eswa An implementation and evaluation of recommender systems for traveling abroad Dong-Her Shih, David C. Yenb. *. Ho-Cheng Lin, Ming-Hung Shih of Information Management, National Yunlin University of Science and Technology, 123, Section 3, Universiry Road, Douliu, Yunlin, Taiwan, ROC of DSC E MiS, Farmer School of Business, Miami University, Oxford, OH 45056, USA Dep of electrical and Computer Engineering. NC State University, raleigh, NC 27695, USA ARTICLE INFO ABSTRACT The improvement of information technology makes storage no longer a problem. In addition, the birth of the Internet makes information transfer faster than ever. It brings us convenient life. However, more and ve filtering more information result in a new problem, which is information overload. Today, many more people ar traveling abroad since they no longer have to work on weekends. Traveling abroad has become a kind of Recommender system undred countries in the world worth to travel. and there is so much infor- mation available that it makes a travelers decision extremely difficult to make In our research, we try to implement the most common three kinds of recommender system techniques in order to recommend to customers which countries are the best traveling locations for them. Thus, we can save travelers a lot of time when deciding where to go From our experiment and evaluation, we find that a hybrid recom- lender system is a better technique in recommendation according to our abroad database, and it con- quers the shortcomings of content-based filtering and collaborative filtering approaches e 2011 Elsevier Ltd. All rights reserved. 1 Introduction to provide personalized information services (Schafer, Konstan Riedl, 2001): retrieving the information, consumer desires. of the In plosion of e-commerce in recent years, nd helps them determine which one to buy. A recommender For firms this makes it easy to develop a one-to-one niques to the problem of helping customers find the products business style. One of the important issues is that they would like to purchase by producing a predicted likeness hould establish the relationship between customers and itself, score or a list of recommended products for a given customer and provide appropriate information and products that match (Sarwar et al., 1998). It has been used in many Websites to rec- the interests of customers. The need for new marketing strategies ommend various items including movies, music, news, articles, in e-commerce, such as one-to-one marketing, Web personaliza- books, software, computers, etc(see Fig. 1). There are three ap- tion, and customer relationship management has been stressed proaches for building recommender systems which are content both in research as well as in practice( Mobasher, Cooley, Srivast based recommending(CBF), collaborative filtering(CF)and hybrid ava, 2000: Sarwar, Karypis, Konstan, Reidl, 2000) filtering. It is important to interact with customers and provide them One advantage to the personalized recommender system is with personalized service and communication. Such customer that consumers can immediately access the information they nteractions can transform customer information into quality ser- are interested in, and save their time to prevent reading the ces or products(Weng liu, 2004). For customer relationship overload information On the other hand, enterprises can collect management, one-to-one marketing is one of the most effective customers' buying behaviors and then develop appropriate approaches to enhance customer satisfaction, loyalty, and marketing strategies to attract different customers and efficiently reputation. deliver the information they are interested in. The customers Because of the rapid spread of the Internet, information satisfaction and loyalty will thus be increased, and the load has become a serious problem. One way to overco increase in the visiting frequency of the customers can further above problem is to develop an intelligent recommender create more transaction opportunities and benefit the Internet enterprises. Many more people are traveling abroad since they no longer ding author.Tel:+15135294827;fax:+15135299689. have to work during the weekends, which have lead to a rapid in- (D.C. Yen. 89423719eyuntechedu tw(H-C. Lin), dannysmhegmail com crease in the growth of the traveling population. The importance of leisure time is increasing, and there is a tendency toward traveling 0957-4174 front matter o 2011 Elsevier Ltd. All rights reserved o:10.1016/eswa2011.060
An implementation and evaluation of recommender systems for traveling abroad Dong-Her Shih a , David C. Yen b,⇑ , Ho-Cheng Lin a , Ming-Hung Shih c aDepartment of Information Management, National Yunlin University of Science and Technology, 123, Section 3, University Road, Douliu, Yunlin, Taiwan, ROC bDepartment of DSC & MIS, Farmer School of Business, Miami University, Oxford, OH 45056, USA cDepartment of Electrical and Computer Engineering, NC State University, Raleigh, NC 27695, USA article info Keywords: Content-based filtering Collaborative filtering Hybrid filtering Taxonomy Recommender system Bayes’ theorem abstract The improvement of information technology makes storage no longer a problem. In addition, the birth of the Internet makes information transfer faster than ever. It brings us convenient life. However, more and more information result in a new problem, which is information overload. Today, many more people are traveling abroad since they no longer have to work on weekends. Traveling abroad has become a kind of trend. There are more than a hundred countries in the world worth to travel, and there is so much information available that it makes a traveler’s decision extremely difficult to make. In our research, we try to implement the most common three kinds of recommender system techniques in order to recommend to customers which countries are the best traveling locations for them. Thus, we can save travelers a lot of time when deciding where to go. From our experiment and evaluation, we find that a hybrid recommender system is a better technique in recommendation according to our abroad database, and it conquers the shortcomings of content-based filtering and collaborative filtering approaches. 2011 Elsevier Ltd. All rights reserved. 1. Introduction Due to an explosion of e-commerce in recent years, the rapid spread of the Internet has made our world move faster than ever. For firms this makes it easy to develop a one-to-one marketing business style. One of the important issues is that companies should establish the relationship between customers and itself, and provide appropriate information and products that match the interests of customers. The need for new marketing strategies in e-commerce, such as one-to-one marketing, Web personalization, and customer relationship management has been stressed both in research as well as in practice (Mobasher, Cooley, & Srivastava, 2000; Sarwar, Karypis, Konstan, & Reidl, 2000). It is important to interact with customers and provide them with personalized service and communication. Such customer interactions can transform customer information into quality services or products (Weng & Liu, 2004). For customer relationship management, one-to-one marketing is one of the most effective approaches to enhance customer satisfaction, loyalty, and reputation. Because of the rapid spread of the Internet, information overload has become a serious problem. One way to overcome the above problem is to develop an intelligent recommender system to provide personalized information services (Schafer, Konstan, & Riedl, 2001): retrieving the information, consumer desires, and helps them determine which one to buy. A recommender system is the information filtering that applies data analysis techniques to the problem of helping customers find the products they would like to purchase by producing a predicted likeness score or a list of recommended products for a given customer (Sarwar et al., 1998). It has been used in many Websites to recommend various items including movies, music, news, articles, books, software, computers, etc (see Fig. 1). There are three approaches for building recommender systems which are contentbased recommending (CBF), collaborative filtering (CF) and hybrid filtering. One advantage to the personalized recommender system is that consumers can immediately access the information they are interested in, and save their time to prevent reading the overload information. On the other hand, enterprises can collect customers’ buying behaviors and then develop appropriate marketing strategies to attract different customers and efficiently deliver the information they are interested in. The customer’s satisfaction and loyalty will thus be increased, and the increase in the visiting frequency of the customers can further create more transaction opportunities and benefit the Internet enterprises. Many more people are traveling abroad since they no longer have to work during the weekends, which have lead to a rapid increase in the growth of the traveling population. The importance of leisure time is increasing, and there is a tendency toward traveling 0957-4174/$ - see front matter 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.06.030 ⇑ Corresponding author. Tel.: +1 513 529 4827; fax: +1 513 529 9689. E-mail addresses: shihdh@yuntech.edu.tw (D.-H. Shih), yendc@muohio.edu (D.C. Yen), g9423719@yuntech.edu.tw (H.-C. Lin), dannysmh@gmail.com (M.-H. Shih). Expert Systems with Applications 38 (2011) 15344–15355 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
D-H. Shih et al. Expert Systems with Applications 38(2011)15344-15355 15345 Examples of Recommendation I Your Lits oI Hap.I of Music (40th Anniversary D(+ Price: 514.9/&elgible for FREE Super sever More Duying choices You 2used我 now from s11,0 In Stock. ships from and so by Amaan co Cel yanhe The wizard of oz (Three- Disc Colectcr's Editon DvD w Burke C Ad te Wish List an-Doppins(ath Annysnan Ed hon) DVD Julie Andrews The King and L(soth Anniversary Edition) DVD w Deborah Kerr Expore simllar items: pyD Ise MSC Fig.1.www.amazon.com. Peoples of traveling abroad score or a list of top-N recommended items for a given user. a rec 14,000,000 ommendation system can provide personalized information ser 000 10,000,000 been recording and analyzing a customer's previous preferences. 8000,000 Hence, there are three general types of recommender systems 000 which are content-based approach, collaborative filtering approach and hybrid filtering approach. Among them, collaborative filtering 4.000.000 is the most popular personalized recommendation method widely 2,000,000 2000 2005 2. 1. Personalization Fig. 2. Statistic data of a national who travel abroad between 1992 and 2009 The term'personalization is often used in the context of recom- lender systems that selectively promote products to end-users g to the Tourism Bureau of Taiwan, there are 8 based on the analysis of earlier interactions(Schafer, Konstan, veling abroad. Much more people are traveling Riedl, 1999). Personalization means a Website can provide a cus in Fig. 2(Source: taiwan. net. tw) tomer unique and particular needs. Mobasher et al. (2000). Mob ' e have implemented three approaches for build- asher, Dai, and Luo(2002), Mobasher, Dai, Luo, and Nakagawa ing recommender systems-content-based recommending, collab-(2001)defined Web personalization as an act of response accord- orative filtering and hybrid filtering to recommend the traveling ing to the individual user's interest and hobby on Internet usage. untries. In our experiment we use the real data to evaluate the Through personalization, businesses can predict a customers ses three approaches to determine which one is better. behaviors through their past purchasing records, and demographic This paper has three primary research contributions: data. Accordingly, companies can develop more appropriate mar keting strategies to fit each customer by providing suitable infor- 1. Develop a recommender technique for on-line traveling mation and products/services to customers. Customers satisfaction and loyalty can thus be enhanced, and the increase 2. Presentation of a hybrid method, collaborative filtering method, in each customer's visiting frequency can further create more and content-based method to discuss the advantages and transaction opportunities and benefit the Internet businesses (Lee, Liu, Lu, 2002). 3. Evaluate the effect of different variables in these three method The remainder of the paper is organized as follows In Section 2 2. 2. Recommender systems related work is expatiated, including personalization and recom- mender system is defined as the system which tems.The elementary theoretical background is mends an appropriate product or service to certain customers provided in Section 3, followed by Section 4 explaining the experi- ment and results. Finally, the conclusion is given in Section 5 ccording to customers need ay, more and more researchers are studying recommender systems. The most important factor in a recommender system is how to analyze customers behavior. 2 Related work therefore the system will recommend products based on an accu- rate estimation approach( Sarwar et al, 2000, 2001). The key sim- ilarity measures, which are used in the recommender system, such as cosine similarity, Pearson correlation, NB classifier, Euclidean sites by producing a predicted like distance
abroad. According to the Tourism Bureau of Taiwan, there are 8.2 million people traveling abroad. Much more people are traveling abroad as shown in Fig. 2 (Source: taiwan.net.tw). In this paper, we have implemented three approaches for building recommender systems – content-based recommending, collaborative filtering and hybrid filtering to recommend the traveling countries. In our experiment we use the real data to evaluate theses three approaches to determine which one is better. This paper has three primary research contributions: 1. Develop a recommender technique for on-line traveling ebusiness. 2. Presentation of a hybrid method, collaborative filtering method, and content-based method to discuss the advantages and disadvantages. 3. Evaluate the effect of different variables in these three methods. The remainder of the paper is organized as follows. In Section 2, related work is expatiated, including personalization and recommender systems. The elementary theoretical background is provided in Section 3, followed by Section 4 explaining the experiment and results. Finally, the conclusion is given in Section 5. 2. Related work Recommender systems apply data analysis techniques to the problem of helping users find the items they would like to purchase at e-commerce sites by producing a predicted likeliness score or a list of top-N recommended items for a given user. A recommendation system can provide personalized information services in different ways; it depends on whether the system has been recording and analyzing a customer’s previous preferences. Hence, there are three general types of recommender systems which are content-based approach, collaborative filtering approach and hybrid filtering approach. Among them, collaborative filtering is the most popular personalized recommendation method widely in recommender systems. 2.1. Personalization The term ‘personalization’ is often used in the context of recommender systems that selectively promote products to end-users based on the analysis of earlier interactions (Schafer, Konstan, & Riedl, 1999). Personalization means a Website can provide a customer unique and particular needs. Mobasher et al. (2000), Mobasher, Dai, and Luo (2002), Mobasher, Dai, Luo, and Nakagawa (2001) defined Web personalization as an act of response according to the individual user’s interest and hobby on Internet usage. Through personalization, businesses can predict a customer’s behaviors through their past purchasing records, and demographic data. Accordingly, companies can develop more appropriate marketing strategies to fit each customer by providing suitable information and products/services to customers. Customer’s satisfaction and loyalty can thus be enhanced, and the increase in each customer’s visiting frequency can further create more transaction opportunities and benefit the Internet businesses (Lee, Liu, & Lu, 2002). 2.2. Recommender systems A recommender system is defined as the system which recommends an appropriate product or service to certain customers according to customer’s need. Today, more and more researchers are studying recommender systems. The most important factor in a recommender system is how to analyze customer’s behavior, therefore the system will recommend products based on an accurate estimation approach (Sarwar et al., 2000, 2001). The key similarity measures, which are used in the recommender system, such as cosine similarity, Pearson correlation, NB classifier, Euclidean distance. Fig. 1. www.amazon.com. Fig. 2. Statistic data of a national who travel abroad between 1992 and 2009. D.-H. Shih et al. / Expert Systems with Applications 38 (2011) 15344–15355 15345
15346 Recommender systems are often used in e-commerce Websites Collaborative filtering based on the user( Resnick, lacovo, suggest products or services to their customers and provide con- Suchak, Bergstrom, riedl, 1994: Sarwar et al. 2000: Shardanand umers appropriate information to fit the users. The number of Maes, 1995)is the most successful recommending technique to e-commerce businesses is increasing in the adoption of recom- date, and is extensively used in many commercial recommender mender system technologies in their Websites. The most famous systems(Liu, Lai, Lee 2009: Shih, Chiang, Lin, 2008). Recom threeWebsitesareAmazon.com,ebayandgoogle.com. mender systems based on CF-U compute the top-N recommended Most recommendation techniques fall into two categories, items for that user as follows. First, they identify the k most similar namely content-based filtering and collaborative filtering(special users in the database. This is often done by modeling users and issue on information filtering). Recently, hybrid measure becomes items with the vector-space model, which is widely used for infor- significant recommendation technique. Therefore, three major mation retrieval (Sarwar et al, 2000). In this model each of the n approaches are used for processing input data and formulating users as well as the active user is treated as a vector in the m- the prediction: collaborative filtering(CF), content-based filtering dimensional item space, and the similarity of active user to existing (CBF) and hybrid filtering approach users is measured by computing the cosine between these vectors or correlation 2.2.1. Content-based filtering To address the scalability concerns of CF-U algorithms an Content-based filtering makes predictions by analyzing a user's vide better explaining for recommendation to users, collaborative previous preferences or interests which would be the obvious indi- filtering based on item(CF-1)techniques have been developed cators for user's future behavior. CBF requires that items are de-(Billsus Pazzani, 1998: Sarwar et al., 2000). These approaches scribed by features, and is typically applied upon text-based analyze the user-item matrix to identify relations between the dif- documents, or in domains with structured data(Khribi, Jenni, ferent items, and then use these relations to compute the list of Nasraoui, 2009: Pazzani, 1999). Next, the relevance of a given con- top-N recommendations tent item and the users interest profile is measured against the similarity of this recommendable item to the users interest profile 2.2.3. Hybrid Finally, items that have a high degree of similarity to the users Hybrid recommender systems combine two or more recom- interest protle are recommended to the user. For example, con- mendation techniques to gain better performance with fewer of tent-based filtering has been utilized in book recommendation the drawbacks of any individual one(Liu et al. 2009). Most com- tasks(Mooney Roy, 2000), using features sue as title. aut monly, collaborative filtering is combined with some other tech or theme. In such cases, the user's previous preferences on the nique in an attempt to avoid the ramp-up problem.Balabanov'c respective features are used to filter the available books and rec- and Shoham(1997)apply"Selection agent", which decides the rec- ommend the most relevant to the user Content-based filtering is ommendation algorithm between content-based filtering and CF typically applied to recommend products that have analyzable Pazzani(1999) shows the hybrid approach for recommendation content or descriptions, such as books (Mooney Roy, 2000). that uses more of the available information and consequently has A customer,'s personal information is first collected, and then more precise recommendations. The strengths of the different he system reasons out the customer's preferences by analyzing proaches can be complementary e con- All these approaches that have been applied in different do- sumer's personal information is obtained, the recommender sys- mains are shown in table 1 tem can then construct a computational model to predict a users preference for other items of the same application domain. In fact, the work of recommendation can be regarded as classification: 3. Methodology using the known information already to set up a model to predict the unknown events(Lee et al., 2002). In this paper, we have implemented three generalized recom mending techniques for recommending to customers which coun- 2.2.2. Collaborative filtering tries are the best traveling location for them. Thus, travelers can collaborative filtering is a method for calculating expected user save a lot of time by removing hesitations and having the ability preference for a product, using evaluation by, or the preferences of, to make a quicker more efficient decision. In every recommending other users who have experienced the product( Billsus Pazzani 1998: Goldberg, Nichols, Oki, Terry 1992: Konstan et al Table 1 1997). CF is designed for the less frequently-purchased products Classification of recommender system. It is currently widely applied and used for various products such Method as music or movies(Billsus Pazzani, 1998: Goldberg et al Content based Products Lawrence, Almasi, Kotlyar, viveros, and 1992: Konstan et al, 1997). The basic input data consist of the pref Duri(2001) erence matrix between users and products: to collect explicit user preferences for this input data, a purchasing intention or implicit e-Learning Khribi et al. (2009) preference, such as an inquiry or visit, may be used. Similarity collaborative Movies Resnick et al. (1994)and among users is calculated by the Pearson correlation coefficient filteing or the cosine measure(Konstan et al, 1997: Mild Natter, 2002 Linden, Smith, and York(2003). Sarwar et al., 2000) based on the similarity calculation and simila Cho and kin atures, we can find neighbors to a particular user We can calcu Jeon(2006)and Liu et al. (2009 late a user's preference for a product based on his or her average Music Hayes and Cunningham(200 Kim, Lee, Cho, and Kim(2004) preference for other products and his or her neighbors ' preference for the product( Khribi et al, 2009: Konstan et al, 1997: Mild Natter, 2002: Sarwar et al, 2000). In collaborative filtering. the neighbor algorithm requires computation that grows wi hih et al. (2008) both the number of customers and the number of products, an e-Learning Khribi et al. (2009) as a sparsity problem; if there are few user preferences, its recom- Movies Schein ul, Ungar, and Pennock(2002) mendation performance is low(Sarwar et al., 2000). Products Liu and shih(2005)and Liu et al. (2009)
Recommender systems are often used in e-commerce. Websites suggest products or services to their customers and provide consumers appropriate information to fit the users. The number of e-commerce businesses is increasing in the adoption of recommender system technologies in their Websites. The most famous three Websites are: Amazon.com, eBay, and google.com. Most recommendation techniques fall into two categories, namely content-based filtering and collaborative filtering (special issue on information filtering). Recently, hybrid measure becomes a significant recommendation technique. Therefore, three major approaches are used for processing input data and formulating the prediction: collaborative filtering (CF), content-based filtering (CBF) and hybrid filtering approach. 2.2.1. Content-based filtering Content-based filtering makes predictions by analyzing a user’s previous preferences or interests which would be the obvious indicators for user’s future behavior. CBF requires that items are described by features, and is typically applied upon text-based documents, or in domains with structured data (Khribi, Jemni, & Nasraoui, 2009; Pazzani, 1999). Next, the relevance of a given content item and the user’s interest profile is measured against the similarity of this recommendable item to the user’s interest profile. Finally, items that have a high degree of similarity to the user’s interest profile are recommended to the user. For example, content-based filtering has been utilized in book recommendation tasks (Mooney & Roy, 2000), using features such as title, author, or theme. In such cases, the user’s previous preferences on the respective features are used to filter the available books and recommend the most relevant to the user. Content-based filtering is typically applied to recommend products that have analyzable content or descriptions, such as books (Mooney & Roy, 2000). A customer’s personal information is first collected, and then the system reasons out the customer’s preferences by analyzing and modeling the available personal information. Once the consumer’s personal information is obtained, the recommender system can then construct a computational model to predict a user’s preference for other items of the same application domain. In fact, the work of recommendation can be regarded as classification: using the known information already to set up a model to predict the unknown events (Lee et al., 2002). 2.2.2. Collaborative filtering Collaborative filtering is a method for calculating expected user preference for a product, using evaluation by, or the preferences of, other users who have experienced the product (Billsus & Pazzani, 1998; Goldberg, Nichols, Oki, & Terry, 1992; Konstan et al., 1997). CF is designed for the less frequently-purchased products. It is currently widely applied and used for various products such as music or movies (Billsus & Pazzani, 1998; Goldberg et al., 1992; Konstan et al., 1997). The basic input data consist of the preference matrix between users and products; to collect explicit user preferences for this input data, a purchasing intention or implicit preference, such as an inquiry or visit, may be used. Similarity among users is calculated by the Pearson correlation coefficient or the cosine measure (Konstan et al., 1997; Mild & Natter, 2002; Sarwar et al., 2000). Based on the similarity calculation and similar features, we can find neighbors to a particular user. We can calculate a user’s preference for a product based on his or her average preference for other products and his or her neighbors’ preference for the product (Khribi et al., 2009; Konstan et al., 1997; Mild & Natter, 2002; Sarwar et al., 2000). In collaborative filtering, the nearest neighbor algorithm requires computation that grows with both the number of customers and the number of products, and has a sparsity problem; if there are few user preferences, its recommendation performance is low (Sarwar et al., 2000). Collaborative filtering based on the user (Resnick, Iacovou, Suchak, Bergstrom, & Riedl, 1994; Sarwar et al., 2000; Shardanand & Maes, 1995) is the most successful recommending technique to date, and is extensively used in many commercial recommender systems (Liu, Lai, & Lee 2009; Shih, Chiang, & Lin, 2008). Recommender systems based on CF-U compute the top-N recommended items for that user as follows. First, they identify the k most similar users in the database. This is often done by modeling users and items with the vector-space model, which is widely used for information retrieval (Sarwar et al., 2000). In this model each of the n users as well as the active user is treated as a vector in the mdimensional item space, and the similarity of active user to existing users is measured by computing the cosine between these vectors or correlation. To address the scalability concerns of CF-U algorithms and provide better explaining for recommendation to users, collaborative filtering based on item (CF-I) techniques have been developed (Billsus & Pazzani, 1998; Sarwar et al., 2000). These approaches analyze the user-item matrix to identify relations between the different items, and then use these relations to compute the list of top-N recommendations. 2.2.3. Hybrid Hybrid recommender systems combine two or more recommendation techniques to gain better performance with fewer of the drawbacks of any individual one (Liu et al., 2009). Most commonly, collaborative filtering is combined with some other technique in an attempt to avoid the ramp-up problem. Balabanov’c and Shoham (1997) apply ‘‘Selection agent’’, which decides the recommendation algorithm between content-based filtering and CF. Pazzani (1999) shows the hybrid approach for recommendation that uses more of the available information and consequently has more precise recommendations. The strengths of the different approaches can be complementary. All these approaches that have been applied in different domains are shown in Table 1. 3. Methodology In this paper, we have implemented three generalized recommending techniques for recommending to customers which countries are the best traveling location for them. Thus, travelers can save a lot of time by removing hesitations and having the ability to make a quicker more efficient decision. In every recommending Table 1 Classification of recommender system. Method Domain Authors Content based filtering Products Lawrence, Almasi, Kotlyar, Viveros, and Duri (2001) e-Commerce Lee et al. (2002) e-Learning Khribi et al. (2009) Collaborative filtering Movies Resnick et al. (1994) and Kim and Yum (2005) Products Shardanand and Maes (1995), Linden, Smith, and York (2003), Cho and Kim (2005), Choi, Kang, and Jeon (2006) and Liu et al. (2009) Music Hayes and Cunningham (2004) Wallpaper Kim, Lee, Cho, and Kim (2004) Software Akinaga et al. (2005) News Lee and Park (2007) Music Li et al. (2007) Spam Shih et al. (2008) e-Learning Khribi et al. (2009) Hybrid Movies Schein, Popescul, Ungar, and Pennock (2002) Products Liu and Shih (2005) and Liu et al. (2009) 15346 D.-H. Shih et al. / Expert Systems with Applications 38 (2011) 15344–15355
technique, we have designed a process of recommendation follow- ng each filtering technique and divide it into two phases. Phase All one is the learning phase and phase two is the recommending test hase. Detailed recommending methodology is described as follows 3.1. Content-based filtering According to the attributes of items and user performance, an lyzing the log data to provide users with recommendation results are usually called content-based filtering(Li, Smith, Bergman, 8:855 Castelli, 1998). As we know content-based filtering is the earliest 88: ::::::::::: recommendation method. Unfortunately, this method can only recommend items that are related to history data. hence we de- Fig 4. An example of product taxonomy gned a process of recommendation following content-based fil- tering and also divide it into two phases. Phase one is the tail structure is shown in Fig. Che recommending test phase. De- several nodes at a lower level into one parent node. The root node learning phase and phase two is abeled by"All"denotes the most general product class. Fig. 4 shows an example of product taxonomy for a fashion Web 3.1.1. Phase i retailer First, we pre-process the raw data. Data will be divided into Applications of product taxonomy in data mining have been two parts-learning data and recommending test data. According emphasized by many researchers. Therefore we proposed a to the difference between the traveling locations, we adopt taxon- ontent-based filtering based traveling location hierarchy(see my to classify different traveling locations into five continents a decision tree is a tree in which each non-leaf node denotes a (America, Oceania, Europe, Asia, and Africa). The next step is and test on an attribute of cases, each branch corresponds to an out- yzing the relation between the ing locations. In addition, come of the test, and each leaf node denotes a class prediction we adopted decision tree algorithm C5.0 to classify learning data based on customer performances. The basic input data consists of he quality of a decision tree depends on both the classification gender, age, constellations, selling place, and output data accuracy and the size of the tree. There are well-known decision consists of the traveling locations. Finally, this generates a deci- tree induction algorithms such as CHAID(Kass, 1980). CART(Bei- sion model man et al., 1984). C4.5(Quinlan, 1993)and QUEST(Loh Shih, In most Web retailers, product taxonomy is available. Product 1997), etc. Applications of decision tree based classification include arget marketing, churn prediction, medical diagnosis and so on. A taxonomy is practically represented as a tree that classifies a set commercial version of C5.0 in data mining package, Clementine of products at a low level into a more general product at a higher vel. The leaves of the tree denote the product instances, and 7.0, is used in our study. non-leaf nodes denote product classes obtained by combining 3. 1.2. Phase ll /e use recommending test data as an input, and the perfor- Learning Recommend mance of customers as an output. Then the data will be processed by a decision model. Through the algorithm of a decision tree, the system will generate an output for each record. It is the fitting con- Beg tinents for the user. When combining the result with some market- policies the result then becomes the recommendation result. Here we apply the policy of recommending the top 2 traveling locations in every continent. Cu 3. 2. Collaborative filtering ep Cho and Kim(2005) said that collaborative filtering is one of the most successful recommending methods in their paper. The meth- Location New customer od also fit various data sources such as movies, Website, products classification profile software, etc In collaborative filtering, we designed a process of recommendations following collaborative filtering and also di- vided it into two phases. Phase one is the learning phase and phase two is the recommending test phase. Detail structure is shown Decision Tree P Decision Rule 3. 2.1. Phase I At first, data will be divided into two parts-learning data and recommending test data. We adopt k-means algorithm to cluster preprocessed data according to the attributes-gender, age, con- End Recommendation stellations, selling place and locus of going abroad(as shown in Fig. 10). The basic input data consists of gender, age, constellations. selling place, locus of going abroad, and the output data is traveling Fig. 3. Our proposed structure of content-based filtering
technique, we have designed a process of recommendation following each filtering technique and divide it into two phases. Phase one is the learning phase, and phase two is the recommending test phase. Detailed recommending methodology is described as follows. 3.1. Content-based filtering According to the attributes of items and user performance, analyzing the log data to provide users with recommendation results are usually called content-based filtering (Li, Smith, Bergman, & Castelli, 1998). As we know content-based filtering is the earliest recommendation method. Unfortunately, this method can only recommend items that are related to history data. Hence, we designed a process of recommendation following content-based filtering and also divide it into two phases. Phase one is the learning phase and phase two is the recommending test phase. Detail structure is shown in Fig. 3. 3.1.1. Phase I First, we pre-process the raw data. Data will be divided into two parts – learning data and recommending test data. According to the difference between the traveling locations, we adopt taxonomy to classify different traveling locations into five continents (America, Oceania, Europe, Asia, and Africa). The next step is analyzing the relation between the traveling locations. In addition, we adopted decision tree algorithm C5.0 to classify learning data based on customer performances. The basic input data consists of gender, age, constellations, selling place, and output data consists of the traveling locations. Finally, this generates a decision model. In most Web retailers, product taxonomy is available. Product taxonomy is practically represented as a tree that classifies a set of products at a low level into a more general product at a higher level. The leaves of the tree denote the product instances, and non-leaf nodes denote product classes obtained by combining several nodes at a lower level into one parent node. The root node labeled by ‘‘All’’ denotes the most general product class. Fig. 4 shows an example of product taxonomy for a fashion Web retailer. Applications of product taxonomy in data mining have been emphasized by many researchers. Therefore we proposed a content-based filtering based traveling location hierarchy (see Fig. 5). A decision tree is a tree in which each non-leaf node denotes a test on an attribute of cases, each branch corresponds to an outcome of the test, and each leaf node denotes a class prediction. The quality of a decision tree depends on both the classification accuracy and the size of the tree. There are well-known decision tree induction algorithms such as CHAID (Kass, 1980), CART (Beiman et al., 1984), C4.5 (Quinlan, 1993) and QUEST (Loh & Shih, 1997), etc. Applications of decision tree based classification include target marketing, churn prediction, medical diagnosis and so on. A commercial version of C5.0 in data mining package, Clementine 7.0, is used in our study. 3.1.2. Phase II We use recommending test data as an input, and the performance of customers as an output. Then the data will be processed by a decision model. Through the algorithm of a decision tree, the system will generate an output for each record. It is the fitting continents for the user. When combining the result with some marketing policies the result then becomes the recommendation result. Here we apply the policy of recommending the top 2 traveling locations in every continent. 3.2. Collaborative filtering Cho and Kim (2005) said that collaborative filtering is one of the most successful recommending methods in their paper. The method also fit various data sources such as movies, Website, products, software, etc. In collaborative filtering, we designed a process of recommendations following collaborative filtering and also divided it into two phases. Phase one is the learning phase and phase two is the recommending test phase. Detail structure is shown as Fig. 6. 3.2.1. Phase I At first, data will be divided into two parts – learning data and recommending test data. We adopt k-means algorithm to cluster preprocessed data according to the attributes – gender, age, constellations, selling place and locus of going abroad (as shown in Fig. 10). The basic input data consists of gender, age, constellations, selling place, locus of going abroad, and the output data is traveling Fig. 3. Our proposed structure of content-based filtering. locations (Fig. 7). Fig. 4. An example of product taxonomy. D.-H. Shih et al. / Expert Systems with Applications 38 (2011) 15344–15355 15347
15348 D-H. Shih et aL/ Expert Systems with Applications 38(2011)15344-15355 All 二乙安二 Fig. 5. Traveling location taxonomy. According to the difference of every locus, we replace each trav eling location with a number as shown in Table 2. For example: Hong Kong and macao (1)and etc. Therefore, if there is a locus from China- Vietnam Japan, the result would be(13 19, 8) Learning Recommending Next we calculate the support value of the attributes as shown in table 3 and sort all the values. Then by looking into the roc(Ed- wards Barron, 1994)weight table, we can get every attribute with its own weight. The order is gender, age, constellations, sell- Begin New customer ing place locus of going abroad px。f1e Barron and Barrett's development of a formally justifiable solu tion to the task of turning rankings of weights into weights, and even more their demonstration of the quality of the result, is the eason for defining SMARTER and writing this paper. They call their Customer weights Rank Order Centroid, or ROC, weights. The notation of this rofile paper is identical to theirs except that they call the number of attri- Preprocessing butes n while we call it K. The key ideas of the Barron-Barrett derivation are quite simple If nothing were known about the weights except their sum, set at I ight ntion. then the set of Clustering Similarity ors would be any that have that sum. If you had no prior reasor measurement prefer one weight vector to another, it would be natural (and er Determination ror-minimizing)to use equal weights. The point describing equal weights in the hyper-surface(simplex) of all possible weights is ts centroid By knowing the rank order of weights, the argument of the End Recommendation = paragraph is to change the geometric description of acceptable weights-the simplex. It is straightforward the corner points of the smaller simplex consistent Fig. 6. Our proposed structure of collaborative filtering
According to the difference of every locus, we replace each traveling location with a number as shown in Table 2. For example: Hong Kong and Macao (1) and etc. Therefore, if there is a locus from China ? Vietnam ? Japan, the result would be (13, 19, 8). Next we calculate the support value of the attributes as shown in Table 3 and sort all the values. Then by looking into the ROC (Edwards & Barron, 1994) weight table, we can get every attribute with its own weight. The order is gender, age, constellations, selling place, locus of going abroad. Barron and Barrett’s development of a formally justifiable solution to the task of turning rankings of weights into weights, and even more their demonstration of the quality of the result, is the reason for defining SMARTER and writing this paper. They call their weights Rank Order Centroid, or ROC, weights. The notation of this paper is identical to theirs except that they call the number of attributes n, while we call it K. The key ideas of the Barron–Barrett derivation are quite simple. If nothing were known about the weights except their sum, set at I by convention, then the set of possible non-negative weight vectors would be any that have that sum. If you had no prior reason to prefer one weight vector to another, it would be natural (and error-minimizing) to use equal weights. The point describing equal weights in the hyper-surface (simplex) of all possible weights is its centroid. By knowing the rank order of weights, the argument of the preceding paragraph is to change the geometric description of the set of acceptable weights – the simplex. It is straightforward to specify the corner points of the smaller simplex consistent All America Oceania Europe America 9 Canada 17 the Pacific Ocean island 3 New Zealand & Australia 5 East Europe 2 Spain, Portugal& Morocco 15 North Europe 16 Mid-west Europe 18 Asia Africa Africa 10 Hongkong & Macao 1 Malaysia & Singapore 6 Phili Indonesia 7 ppines, Cambodia & Vietnam 19 Thailand 11 South Asia 12 China 13 the Middle East 14 Japan 8 Fig. 5. Traveling location taxonomy. Fig. 6. Our proposed structure of collaborative filtering. 15348 D.-H. Shih et al. / Expert Systems with Applications 38 (2011) 15344–15355