Using Genetic Algorithms for Personalized Recommendation Chein-Shung hwang, Yi-Ching Su, and Kuo-Cheng tseng Dept of Information Management Chinese Culture University, Taipei, Taiwan Dept of Information Management Chinmin Institute of Technology, Miaoli, Taiwan cshwangafaculty. pccu. edu. tw, poohhh@ms. chinmin ed tw. tikicams. chinmin. edu. tw Abstract. with the high-speed development of customer service orientation, it is essential that the enterprises must find and understand customers'interests and preferences and then provide for suitable products or services. Recom- mender systems provide one way of circumventing this problem. This paper de- scribes a new recommender system, which employs a genetic algorithm to learn personal preferences of customers and provide tailored suggestions. Keywords: Recommender systems: generic algorithm; collaborative filterin 1 Introduction The explosive growth of the world-wide-web has led to an influx of users and conse- quently, a huge increase in the volume of available on-line data. The volume of things is considerably more than any person can possibly filter through to find the ones that he/she will like. Recommender systems have emerged in response to the information overloaded problem. Most Personalized Recommender systems adopt two types of techniques: collaborative filtering approach and content-based approach. Collabora tive filtering approach finds other users that have shown similar tastes to the given user and recommends what they have liked to that user. But it is not well-suited to locating information for a specific content information need. On the other hand, con- tent-based approach recommends items based on the item contents that the user has liked in the past. Combining with content-based approach can eliminate the shortcom- ings of collaborative filtering approach and provide better recommendations [1] Many hybrid recommender systems have been developed for e-commerce applica tions. The typical steps of recommender systems can be described as follows. First customer preference profiles in terms of product features are analyzed and extracted from transaction file and product file. Second, a data mining technique is used to find similar customers who have shown similar interests as on-line customers. Finally, a list of recommendations is provided and can be further adjusted by the subsequent customers' feedbacks. However, for different application strategies, each preference J -S. Pan, S.M. Chen, and N.T. Nguyen(Eds ) ICCCI 2010, Part IL, LNAI 6422, Pp. 104-112, 2010 o Springer-Verlag Berlin Heidelberg 2010
J.-S. Pan, S.-M. Chen, and N.T. Nguyen (Eds.): ICCCI 2010, Part II, LNAI 6422, pp. 104–112, 2010. © Springer-Verlag Berlin Heidelberg 2010 Using Genetic Algorithms for Personalized Recommendation Chein-Shung Hwang1 , Yi-Ching Su2 , and Kuo-Cheng Tseng2 1 Dept. of Information Management, Chinese Culture University, Taipei, Taiwan 2 Dept. of Information Management, Chinmin Institute of Technology, Miaoli, Taiwan cshwang@faculty.pccu.edu.tw, poohhh@ms.chinmin.ed.tw, tikic@ms.chinmin.edu.tw Abstract. With the high-speed development of customer service orientation, it is essential that the enterprises must find and understand customers' interests and preferences and then provide for suitable products or services. Recommender systems provide one way of circumventing this problem. This paper describes a new recommender system, which employs a genetic algorithm to learn personal preferences of customers and provide tailored suggestions. Keywords: Recommender systems; generic algorithm; collaborative filtering. 1 Introduction The explosive growth of the world-wide-web has led to an influx of users and consequently, a huge increase in the volume of available on-line data. The volume of things is considerably more than any person can possibly filter through to find the ones that he/she will like. Recommender systems have emerged in response to the information overloaded problem. Most Personalized Recommender systems adopt two types of techniques: collaborative filtering approach and content-based approach. Collaborative filtering approach finds other users that have shown similar tastes to the given user and recommends what they have liked to that user. But it is not well-suited to locating information for a specific content information need. On the other hand, content-based approach recommends items based on the item contents that the user has liked in the past. Combining with content-based approach can eliminate the shortcomings of collaborative filtering approach and provide better recommendations [1]. Many hybrid recommender systems have been developed for e-commerce applications. The typical steps of recommender systems can be described as follows. First, customer preference profiles in terms of product features are analyzed and extracted from transaction file and product file. Second, a data mining technique is used to find similar customers who have shown similar interests as on-line customers. Finally, a list of recommendations is provided and can be further adjusted by the subsequent customers' feedbacks. However, for different application strategies, each preference
Using Genetic Algorithms for Personalized Recommendation 105 feature may be associated with different importance. Most of the systems either ig- nored or used a fixed weight for each feature, which often caused a poor recommen dation performance Genetic algorithms are adaptive algorithms based on the Darwinian principle of natural selection and are often used to solve optimization problems. In this paper, we propose a hybrid recommender system which uses genetic algorithms for feature weighting. The proposed system consists of three modules In PGM(Profile Genera- tion Module), the customers transaction data are analyzed to establish the customers preference profile candidate table. In NSM(Neighborhood Selection Module), a clus tering method is first adopted to segment customers into groups using the profile candidate table. The genetic algorithm is then used to fine-tune profile matching for each active customer. Finally, in the RC Model(Recommendation Module), a list of recommendation is derived and presented. This will enable the recommender system to make more accurate predictions of users' likes and dislikes, and hence will provide better recommendations to users 2 Research Background Recommender systems have been successfully applied in a number of difference applications such as recommending movies, books, music and products. There are two major techniques used in recommender systems [2-4], content-based approach and collaborative filtering approach 2.1 Collaborative Filtering The term collaborative filtering was coined by the Goldberg et al. [5], the developers of the first recommender system-Tapestry. Tapestry allows users to annotate docu ments that they read. Users can then retrieve a document based on the content of the document or other users opinions in terms of annotation on that document. However, the recommendations are not automated and require users to explicitly define their collaborative relationships. GroupLens[6][7] provides an automated recommendations using a neighborhood-based algorithm. The system uses the ratings of items to find people who are most similar to you and use their opinions for recommendations GroundLens provides personalized predictions for Usenet news articles, while other systems use this approach for recommending movies, music, jokes and web pages. The original collaborative filtering algorithm contains two main steps: neighbor hood formation and generation of recommendation. Neighborhood formation finds a set of users known as neighbors that have similar preference ratings. Common simi- ity metrics used include Pe correlation, mean squared difference, and vector similarity. In the generation of recommendation step, the system then computes the predicted ratings on items the active user has not yet seen based on his neighbors ratings for those items. Finally, the system derives and sorts a set of recommendations by the predicted ratin
Using Genetic Algorithms for Personalized Recommendation 105 feature may be associated with different importance. Most of the systems either ignored or used a fixed weight for each feature, which often caused a poor recommendation performance. Genetic algorithms are adaptive algorithms based on the Darwinian principle of natural selection and are often used to solve optimization problems. In this paper, we propose a hybrid recommender system which uses genetic algorithms for feature weighting. The proposed system consists of three modules. In PGM (Profile Generation Module), the customer’s transaction data are analyzed to establish the customers’ preference profile candidate table. In NSM (Neighborhood Selection Module), a clustering method is first adopted to segment customers into groups using the profile candidate table. The genetic algorithm is then used to fine-tune profile matching for each active customer. Finally, in the RC Model (Recommendation Module), a list of recommendation is derived and presented. This will enable the recommender system to make more accurate predictions of users' likes and dislikes, and hence will provide better recommendations to users. 2 Research Background Recommender systems have been successfully applied in a number of difference applications such as recommending movies, books, music and products. There are two major techniques used in recommender systems [2-4], content-based approach and collaborative filtering approach. 2.1 Collaborative Filtering The term collaborative filtering was coined by the Goldberg et al. [5] , the developers of the first recommender system – Tapestry. Tapestry allows users to annotate documents that they read. Users can then retrieve a document based on the content of the document or other users’ opinions in terms of annotation on that document. However, the recommendations are not automated and require users to explicitly define their collaborative relationships. GroupLens[6][7] provides an automated recommendations using a neighborhood-based algorithm. The system uses the ratings of items to find people who are most similar to you and use their opinions for recommendations. GroundLens provides personalized predictions for Usenet news articles, while other systems use this approach for recommending movies, music, jokes and web pages. The original collaborative filtering algorithm contains two main steps: neighborhood formation and generation of recommendation. Neighborhood formation finds a set of users known as neighbors that have similar preference ratings. Common similarity metrics used include Pearson correlation, mean squared difference, and vector similarity. In the generation of recommendation step, the system then computes the predicted ratings on items the active user has not yet seen based on his neighbors’ ratings for those items. Finally, the system derives and sorts a set of recommendations by the predicted ratings
106 C.-S. Hwang, Y -C. Su, and K -C. tseng 2. 2 Content-Based Recommendation Content-based recommendations are based on information on the content of items rather than on other users'opinions Content-based approach to recommendations has been adopted in information retrieval community and employs similar techniques. In content-based recommendations, every item is represented by a feature vector or an attribute profile. Each feature can be based on a numerical or nominal scale represent ing different information of items such as color, type, or price. a dissimilar- ty/similarity measure may be used to measure the similarity of features between items. However, people may place different importance on different attributes when judging the similarity between items. Each feature may be assigned a weight repre- senting the importance of users toward that feature. Accordingly, the overall similar- ity between items is measured as the weighted sum of individual attributes. One drawbacks of content-based approach is that users need to explicitly specify the weights of features 2.3 Genetic algorithms Genetic algorithms(GAs)[8(9] are stochastic search techniques that guide tion of solutions using th of evolution and natural genetics. E research has been per ing the robust properties of genetic and demonstrating their cross a broad spectrum of optimization prob- lems, including feature selection and weighting tasks. GAs are modeled loosely on the principles of the evolution via natural selection, employing a population of individu- als that undergo selection in the presence of variation-inducing operators such as mutation and crossover. A fitness function is used to evaluate individuals, and repro- ductive success varies with fitness A general algorithm is started with a set of solutions (represented by chromo somes)called population. An initial population is created from a random selection of solutions. At every evolutionary step(generation), the solutions in the current popula tion are evaluated according to some predefined quality criterion, referred to as the fitness, or fitness function. Solutions from one population are taken and used to form a new population(next generation). Solutions(parents) are selected according to thei fitness-the more suitable they are the more chances they have to reproduce. These solutions then "reproduce" to create one or more new solutions(offspring), after which the offspring are produced by crossover or mutation randomly. The process of fitness-dependent selection and application of genetic operators to generate successive generations of solutions is repeated many times until a satisfactory solution is found. The basic steps of GAs are outlined as follows: 1. [Initialization) Randomly generate an initial population of solutions and evaluate the fitness function 2. [New population] Create a new population by repeating following steps until the new population is complete 2.1[Selection] Select two parent solutions from a population according to their fit ess (the better fitness the bigger chance to be selected)
106 C.-S. Hwang, Y.-C. Su, and K.-C. Tseng 2.2 Content-Based Recommendation Content-based recommendations are based on information on the content of items rather than on other users’ opinions. Content-based approach to recommendations has been adopted in information retrieval community and employs similar techniques. In content-based recommendations, every item is represented by a feature vector or an attribute profile. Each feature can be based on a numerical or nominal scale representing different information of items such as color, type, or price. A dissimilarity/similarity measure may be used to measure the similarity of features between items. However, people may place different importance on different attributes when judging the similarity between items. Each feature may be assigned a weight representing the importance of users toward that feature. Accordingly, the overall similarity between items is measured as the weighted sum of individual attributes. One drawbacks of content-based approach is that users need to explicitly specify the weights of features. 2.3 Genetic Algorithms Genetic algorithms (GAs) [8][9] are stochastic search techniques that guide a population of solutions using the principles of evolution and natural genetics. Extensive research has been performed exploiting the robust properties of genetic algorithms and demonstrating their capabilities across a broad spectrum of optimization problems, including feature selection and weighting tasks. GAs are modeled loosely on the principles of the evolution via natural selection, employing a population of individuals that undergo selection in the presence of variation-inducing operators such as mutation and crossover. A fitness function is used to evaluate individuals, and reproductive success varies with fitness. A general algorithm is started with a set of solutions (represented by chromosomes) called population. An initial population is created from a random selection of solutions. At every evolutionary step (generation), the solutions in the current population are evaluated according to some predefined quality criterion, referred to as the fitness, or fitness function. Solutions from one population are taken and used to form a new population (next generation). Solutions (parents) are selected according to their fitness - the more suitable they are the more chances they have to reproduce. These solutions then "reproduce" to create one or more new solutions (offspring), after which the offspring are produced by crossover or mutation randomly. The process of fitness-dependent selection and application of genetic operators to generate successive generations of solutions is repeated many times until a satisfactory solution is found. The basic steps of GAs are outlined as follows: 1. [Initialization] Randomly generate an initial population of solutions and evaluate the fitness function 2. [New population] Create a new population by repeating following steps until the new population is complete 2.1[Selection] Select two parent solutions from a population according to their fitness (the better fitness, the bigger chance to be selected)
Using Genetic Algorithms for Personalized Recommendation 107 2.2( Crossover] With a crossover probability cross over the parents to form a new offspring. If no crossover was performed, offspring is an exact copy of parents 2.3[Mutation] With a mutation probability mutate new offspring at each locus(po sition in chromosome) 2.4Accepting] Place new offspring in a new population 3. Evaluation] Compute the fitness values for the new population of N solutions 4. [Test] If the stopping criterion is met, stop, and return the best solution in current opulation 5. Loop] Go to step 2 3 System Architecture In this paper, we propose a hybrid recommender system that uses genetic algorithms for feature weighting to find similar who may share the same interests as the active customers, and capture the potential needs of customers. The proposed system consists of three modules, as shown in Fig. I Product Customer Customer files Transacton Profle generatior Module candidate table Nei ghbomood Top N neigh Recommendation Recommend it Fig. 1. System architecture 3.1 Profile Generation Module (PGm) The goal of PGM is to create the preference profile for each customer. The first step in PGM is to build the product profile from the product database. Each product profile is characterized by its feature values and defined as a binary vector as
Using Genetic Algorithms for Personalized Recommendation 107 2.2[Crossover] With a crossover probability cross over the parents to form a new offspring. If no crossover was performed, offspring is an exact copy of parents. 2.3[Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome). 2.4[Accepting] Place new offspring in a new population 3. [Evaluation] Compute the fitness values for the new population of N solutions 4. [Test] If the stopping criterion is met, stop, and return the best solution in current population 5. [Loop] Go to step 2. 3 System Architecture In this paper, we propose a hybrid recommender system that uses genetic algorithms for feature weighting to find similar customers who may share the same interests as the active customers, and capture the potential needs of customers. The proposed system consists of three modules, as shown in Fig. 1. Fig. 1. System architecture 3.1 Profile Generation Module ( PGM ) The goal of PGM is to create the preference profile for each customer. The first step in PGM is to build the product profile from the product database. Each product profile is characterized by its feature values and defined as a binary vector as
108 C.-S. Hwang, Y.-C. Su, and K.-C. tseng P=(,f,,f where n is the number of product features(n=12, in our experiment. ) Each feature is assigned a value of l if a product possesses that feature, and 0 otherwise. To fully understand which product features a customer is of interest, in the second step, we compute the customer product preference profile from the customer trans tion data and the product profile. The product preference profile of customer k is described as CPP=(CPP,CPP,…,CPP) where CPP represents the preference profile of feature i for customer k and is ob- tained by summing up the feature information from the products purchased by cus tomer k. CPp can be defined as CPP= where T represents the set of products purchased by customer k. Finally, the customer preference profile is generated by integrating three portions f information: the customer product preference profile, the customer information and customer purchasing behavior. The customer preference profile is defined as CP=(CPP, CIk, CT) =CPk,CP2,…,CP) where CI,=(CIi, CIF) contains customer ks information including member level and gender and CT: =(R,, Fr, M, contains the information about customer k's purchasing behavior measured by recent Table I shows an example of the customer preference profile Table 1. The Customer Preference Profile 是 [0O2|10.33103301007800019
108 C.-S. Hwang, Y.-C. Su, and K.-C. Tseng ( , , , ) n j j j j P f f " f 1 2 = (1) where n is the number of product features (n=12, in our experiment.). Each feature is assigned a value of 1 if a product possesses that feature, and 0 otherwise. To fully understand which product features a customer is of interest, in the second step, we compute the customer product preference profile from the customer transaction data and the product profile. The product preference profile of customer k is described as ( , , , ) n CPPk CPPk CPPk " CPPk 1 2 = (2) where i CPPk represents the preference profile of feature i for customer k and is obtained by summing up the feature information from the products purchased by customer k. i CPPk can be defined as k j T i j i k T f CPP k ∑∈ = (3) where Tk represents the set of products purchased by customer k. Finally, the customer preference profile is generated by integrating three portions of information: the customer product preference profile, the customer information and customer purchasing behavior. The customer preference profile is defined as ( , , , ) ( , , ) 1 2 17 k k k k k k k CP CP CP CP CPP CI CT = " = (4) where ( , ) 1 2 CI k = CI k CI k contains customer k’s information including member level and gender and ( , , ) CTk = Rk Fk Mk contains the information about customer k’s purchasing behavior measured by the RFM (recency, frequency, and monetary) information. Table 1 shows an example of the customer preference profile. Table 1. The Customer Preference Profile C_no 1 f 2 f … 12 f level sex Recency Frequency Monetary 002 1 0.33 … 0.33 0 1 0.078 0 0.019