ARTICLE N PRESS xpert Systems with Applications xxx(2011)xXX-XXX Contents lists available at ScienceDirect Expert Systems with Applications ELSEVIER journalhomepagewww.elsevier.com/locate/eswa Making use of associative classifiers in order to alleviate typical drawbacks in recommender systems Joel Pinho Lucas", Saddys Segrera, Maria N Moreno Departamento de informatica y Automatica, Universidad de salamanca, Plaza de la Merced s/n 37008,. Salamanca, spain ARTICLE FO ABSTRACT Nowadays, there is a constant need for perso in e-commerce systems. Recommender systems make suggestions and provide information Is available, however, many recommender tech- Associative classification niques are still vulnerable to some shorto his work, we analyze how methods employed in these systems are affected by some typical drawbacks. Hence, we conduct a case study using data gath- ed from real recommender systems in order to investigate what machine learning methods can all ate such drawbacks. Due to some especial features inherited by associative classifiers, particular attention to this category of methods to test their capability of dealing with typical G 2011 Elsevier Ltd. All right 1 Introduction other domains with similar drawbacks(Moreno, Ramos, Garcia, Toro, 2008), we try classification based on association, which is a Nowadays, e-commerce systems present loads of products machine learning technique that combines concepts from classifi- available for sale. Thus, users would probably have difficulty to cation and association, in recommender systems. As it will be de- choose the products they prefer and, consequently, to purchase scribed along the next sections, such technique may present them. Due to such facts and to a more and more competitive indus- several advantages if applied for building a recommender model try. these systems need to personalize their products'presentation. Moreover, in a preliminary study made in Lucas, Segrera, and A way to reach such personalization is by means of therecom- Moreno(2008), we proved that classification based on association mender systems", which according to Bae and Kim(2010) have have a promising potential in the recommender systems domain. emerged in e-commerce applications to provide advice to cus- n order to perform such analysis, we accomplished a case study tomer about items they might wish to purchase or examine. In this using two recommender systems databases. The first consists of nse, recommender systems aim at enabling the creation of a new ratings of movies made by movielens users and the second con- ore personally designed for each consumer (Schafer, Konstan, sists of book ratings made the BookCrossing community. We con- sidered essential to use MovieLens data on the case study we Taking into account that machine learning techniques are ap- propose, because almost every case study on recommender sys- plied for identifying patterns with different purposes, such as clas- tems we found employed mainly movie rating datasets(mostly sification or prediction and knowledge discovery, according MovieLens). Cheung, Kwok, Law, and Tsui(2003), these techniques can lowever, employing just one type of data from a single domain cessfully applied to predict users' preferences in recommender sys- may limit the scope of many case studies. This shortcoming on this tems. Machine learning methods can provide several research field is also confirmed by herlocker, Konstan, Terveen, provements on these systems and then provide more and Riedl(2004), who argues that the lack of variety in publicly ization. However, even employing machine learning methods, rec- available collaborative filtering datasets (particularly with signi ommender systems still suffer from innumerous limitations and cant numbers of ratings) remains one of the most significant chal may be very susceptible to produce errors. lenges in the field. In this way, we decided to employ, in addition to In this work we investigate the main shortcoming presented by a classic movie rating data, a different data base within a different recommender systems and how they affect recommendations' domain in order to enhance the quality of the case study presented quality. Afterwards, we analyze how they may be alleviated. There- in this work. fore, after obtaining successful results with association rules Herlocker et al.(2004)also affirm that even early I recognized that, in a recommender scenario it can be Corresponding author. Tel. +34 923294653: fax: +34 923294514 able to measure how often the system leads its users to wrong choices and that accuracy differences are usually tiny even when ucas).saddyseusales(S. Segrera) they are measurable. Based on such observations, we consider other 0957-4174/s- see front matter o 2011 Elsevier Ltd. All rights reserved doi:10.1016/eswa2011.07.136 Please cite this article in press as: Pinho Lucas, J, et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender sys tems.Expert Systems with Applications(2011). doi:10.1016/jeswa2011.07.136
Making use of associative classifiers in order to alleviate typical drawbacks in recommender systems Joel Pinho Lucas ⇑ , Saddys Segrera, María N. Moreno Departamento de Informática y Automática, Universidad de Salamanca, Plaza de la Merced s/n 37008, Salamanca, Spain article info Keywords: Recommender systems Associative classification Sparsity abstract Nowadays, there is a constant need for personalization in e-commerce systems. Recommender systems make suggestions and provide information about items available, however, many recommender techniques are still vulnerable to some shortcomings. In this work, we analyze how methods employed in these systems are affected by some typical drawbacks. Hence, we conduct a case study using data gathered from real recommender systems in order to investigate what machine learning methods can alleviate such drawbacks. Due to some especial features inherited by associative classifiers, we give a particular attention to this category of methods to test their capability of dealing with typical drawbacks. 2011 Elsevier Ltd. All rights reserved. 1. Introduction Nowadays, e-commerce systems present loads of products available for sale. Thus, users would probably have difficulty to choose the products they prefer and, consequently, to purchase them. Due to such facts and to a more and more competitive industry, these systems need to personalize their products’ presentation. A way to reach such personalization is by means of the ‘‘recommender systems’’, which according to Bae and Kim (2010) have emerged in e-commerce applications to provide advice to customer about items they might wish to purchase or examine. In this sense, recommender systems aim at enabling the creation of a new store personally designed for each consumer (Schafer, Konstan, & Riedl, 2001). Taking into account that machine learning techniques are applied for identifying patterns with different purposes, such as classification or prediction and knowledge discovery, according to Cheung, Kwok, Law, and Tsui (2003), these techniques can be successfully applied to predict users’ preferences in recommender systems. Machine learning methods can provide several improvements on these systems and then provide more personalization. However, even employing machine learning methods, recommender systems still suffer from innumerous limitations and may be very susceptible to produce errors. In this work we investigate the main shortcoming presented by recommender systems and how they affect recommendations’ quality. Afterwards, we analyze how they may be alleviated. Therefore, after obtaining successful results with association rules in other domains with similar drawbacks (Moreno, Ramos, García, & Toro, 2008), we try classification based on association, which is a machine learning technique that combines concepts from classifi- cation and association, in recommender systems. As it will be described along the next sections, such technique may present several advantages if applied for building a recommender model. Moreover, in a preliminary study made in Lucas, Segrera, and Moreno (2008), we proved that classification based on association have a promising potential in the recommender systems domain. In order to perform such analysis, we accomplished a case study using two recommender systems databases. The first consists of ratings of movies made by MovieLens users and the second consists of book ratings made the BookCrossing community. We considered essential to use MovieLens data on the case study we propose, because almost every case study on recommender systems we found employed mainly movie rating datasets (mostly MovieLens). However, employing just one type of data from a single domain may limit the scope of many case studies. This shortcoming on this research field is also confirmed by Herlocker, Konstan, Terveen, and Riedl (2004), who argues that the lack of variety in publicly available collaborative filtering datasets (particularly with signifi- cant numbers of ratings) remains one of the most significant challenges in the field. In this way, we decided to employ, in addition to a classic movie rating data, a different data base within a different domain in order to enhance the quality of the case study presented in this work. Herlocker et al. (2004) also affirm that even early researchers recognized that, in a recommender scenario, it can be more valuable to measure how often the system leads its users to wrong choices and that accuracy differences are usually tiny even when they are measurable. Based on such observations, we consider other 0957-4174/$ - see front matter 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.07.136 ⇑ Corresponding author. Tel.: +34 923294653; fax: +34 923294514. E-mail addresses: joelpl@usal.es (J. Pinho Lucas), saddys@usal.es (S. Segrera), mmg@usal.es (M.N. Moreno). Expert Systems with Applications xxx (2011) xxx–xxx Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa Please cite this article in press as: Pinho Lucas, J., et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender systems. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.07.136
ARTICLE IN PRESS netrics, besides accuracy, to evaluate algorithms in a er systems usually do not employ only collaborative filtering meth- He d ods. Likewise, content-based methods are usually not employed ot effective due to the lack of mecha- res. Hence, each of these ap- Bae Kim, 2010). g methods are nender sys- ntages formance d items to users
metrics, besides accuracy, to evaluate algorithms in a recommender context. Hence, we also consider the false positive occurrence and number of rules generated (if applicable for the algorithm being analyzed). Moreover, we also took into account some implicit features related to shortcomings of recommender systems. In the next section we present the main classes of methods employed in recommender systems. Subsequently, we depict the most important ones. At this time, we also describe the use of association rules for classification problems. In Section 4, we quote and explain the main shortcomings and research challenges related to recommender systems. Finally, Section 5 contains the case study accomplished in order to experiment methods’ capability for alleviating effects of shortcomings produced in recommender systems. 2. Types of recommender methods Methods employed in recommender systems have their foundations in different areas of knowledge. Cheung et al. (2003) and Lee, Kim, and Rhee (2001) classify recommender systems depending on the type of method used for making recommendations. The two main categories of recommender methods are collaborative filtering algorithms and content-based approaches. Content-based methods compare text documents to a specific user profile, where web objects are recommended to a user based on those he has been interested in the past (Lee et al., 2001). On the other hand, collaborative filtering methods were proposed aiming to provide more consistency to recommendations by means of information related to the social context of each user. To do so, the recommendation will be given to the active user according to item’s opinions given by other users who have similar profile (similar preferences about other items). The collaborative filtering approach was originally based on the nearest neighbor algorithm (Sarwar, Karypis, Konstan, & Reidl, 2001), which recommends products to a target user according to the opinions of users who have similar purchase patterns. Thus, recommended products will be the ones liked by users with similar interests. This approach appeals to the notion that when we are looking for information, we often seek the advice of friends with similar tastes or other people whose judgment we trust (Condliff, Lewis, Madigan, & Posse, 1999). In this way, information about items that other people have already found and evaluated is taken into account. Breese, Heckerman, and Kadie (1998) classified collaborative filtering methods into two groups: memory-based methods and model-based methods. In memory-based methods the nearest neighbors of a target user is found by matching the opinions of such user to the opinions of all system’s users. On the other hand, model-based methods build a predictive model by means of a training set comprising opinions acquired from a small portion of the system’s users. Such methods have been developed more recently in order to avoid the sparsity problem, which usually arises when memory-based methods are employed, because e-commerce systems generally offer millions of products for sale, so that it is not feasible to obtain opinions about all of them (Sarwar, Karypis, Konstan, & Riedl, 2000). Current recommender systems usually do not hold a substantial number of evaluations comparing with the number of items available in the system. As a consequence, the system seldom encompasses a complete database with evaluations of all items available. That is why model-based methods generally employ a training set with evaluations gathered from just a part of users of the system. Even though, generally the number of evaluations available is proportionally short and, as result, it is still necessary to develop more techniques to solve such shortcoming. Due to difficulties on obtaining considerable information of evaluation from users about system’s items, current recommender systems usually do not employ only collaborative filtering methods. Likewise, content-based methods are usually not employed solely, because they are not effective due to the lack of mechanisms to extract Web objects features. Hence, each of these approaches has its strengths and weaknesses (Bae & Kim, 2010). Therefore, content-based and collaborative filtering methods are commonly combined or employed together in recommender systems. Combining different methods to overcome disadvantages and limitations of a single method may improve the performance of recommenders (Göksedef & Gündüz-Ögüdücü, 2010). 3. Methods employed in recommender systems Seeing that recommender systems recommend items to users based on ratings or past customer behavior and also that there are usually several items and users, they need to be grouped in order to make recommendations feasible. In this way, classification methods are mostly employed to classify every user and/or item in one of those groups. Techniques used for classification are considered predictive methods because they aim at predicting the value of an attribute, called label-attribute, in a certain dataset. Each value of the label-attribute must be discrete since it is responsible for representing a class. The prediction of the label-attribute value is achieved by means of other attributes’ values (the descriptive attributes). The values of these attributes must be known in all samples, so that they may build a training set defining clearly the characteristics of all classes. Thus, classification is considered as a supervised learning approach. Furthermore, a test set is used to verify the consistence of the learning model. In this section we will describe the main techniques employed in recommender systems, which may be either supervised or unsupervised learning techniques. In the next subsection we depict the most common machine learning methods employed in recommender systems. Subsequently, we describe some foundations of association rules and how they can be employed in these systems. Finally, in Section 3.3, we describe classification based on association methods. 3.1. Machine learning methods According to Bae and Kim (2010), most researchers have been using data mining techniques in recommender systems aiming at predicting the customer’s future behavior and to increase the chance of repurchasing. In this way, and since data mining approaches are basically made of machine learning methods, we may say that such methods are quite popular in the recommender systems area. Billsus and Pazzani (1998) were the first to apply machine learning techniques in recommender systems, the authors proposed to transform the traditional collaborative filtering recommender problem (nearest neighbor) into a machine learning problem. To do so, authors firstly employed neural networks and considered the recommender problem as a classification problem. In this sense, a neural network constructs a model, defining classes of items, for recommendation using the ratings given by users and then classifying the non-rated items. More recent neural networks approaches (Chou, Li, Chen, & Wu, 2010) uses consumer knowledge upon items in order to provide personalization in ecommerce systems. They assume that costumers have some amount of experience or information about items they are using or plan to purchase. In this context, collaborative filtering may be seen as a prediction task, because the basic idea is to predict how a user will rate a new item (not rated before). That is the reason why this new approach for collaborative filtering was called item-based methods, because recommendations are given based 2 J. Pinho Lucas et al. / Expert Systems with Applications xxx (2011) xxx–xxx Please cite this article in press as: Pinho Lucas, J., et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender systems. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.07.136
ARTICLE IN PRESS even us sil g fu uzzy ns than other methods racy than classic orative fil- sociation rule sociation rules which is very e they can cluster. R m 1, also liked item nder sys-
on items’ features. However, nowadays item-based collaborative methods are generally known as model-based methods. Since most machine learning techniques employed in recommender systems are supervised learning methods, they need to be ran off-line in order to build a learning model used for classifi- cation. Thus, great part of the processing is performed off-line and hence, these methods do not require much on-line processing time, letting recommendations to be provided faster. Moreover, they may reduce shortcomings caused by sparsity, because they do not need ratings of all items available in order to build the classi- fication model. However, as highlighted in Section 2, due to difficulties on obtaining considerable information of evaluation from users about system’s items, model-based methods are commonly mixed with content-based methods in order to enhance recommendation’s quality. Despite being the first machine learning method applied for collaborative filtering, nowadays neural networks are not vastly employed in recommender systems. The ‘‘black box’’ learning approach of such method, where no information known except input and output, is a crucial issue for building recommender models. In addition, the training time of a neural network is costly, which might be a critical issue in current recommender systems, whose data, according to Koren (2010), is changing over time and models should be continuously updated to reflect its present nature. As classifiers may be implemented employing many different machine-learning techniques, naturally recommender systems expanded the model-based techniques they apply. Probably the most popular machine learning method for recommender systems are the Bayesian networks. They are an effective method employed in data mining that are widely applied for building recommender models (Condliff et al., 1999). The use of Bayesian networks for collaborative filtering was first suggested in Breese et al. (1998). It makes use of a training set to build a probabilistic structure for predicting user ratings. A distinct Bayesian network is used for representing every user who recommendations will be provided to and a learning algorithm is responsible for processing every network. Each node of the network represents, by means of a decision tree, a domain item and the edges represent information about the user. The states of a node correspond to rating values for the item being represented by the node. Since the classification model used for recommendations need to be build and, depending on the amount of data of the system, it may be very costly. On the other hand, the output model is small and its efficiency is similar to the nearest neighbor algorithm. However, it may still be not effi- cient enough depending on the number of items, because in order to predict the rating of a given item, the algorithm needs to calculate the conditional probabilities of all possible ratings for such item given all possible ratings for other items. Another popular data mining technique widely employed in recommender systems, as a model-based method, is clustering. It consists on performing non-supervised learning in order to identify groups of users who appear to have similar preferences. Therefore, recommendations provided to the active user will be related to the opinions (or ratings) given by users of the group he owns to. Most clustering algorithms require a distance metric or similarity metric, obtained by computing similarity between items, to guide the clustering process (Connor & Herlocker, 2001). Depending on the system, sometimes, after the clusters being created, opinions of other users may be averaged in the cluster in order to provide recommendations for the active user. Nevertheless, clustering techniques may use fuzzy logics and then every user may own to more than one cluster. In this way, each user will have partial participation in several clusters. At this point, a membership measure is assigned to each user in every cluster. Recommendation will be an average across the clusters, weighted by the membership value. However, even using fuzzy logics, according to Breese et al. (1998), clustering techniques usually produce less-personal recommendations than other methods, and in some cases, the clusters have worse accuracy than classic collaborative filtering (i.e., nearest neighbor) algorithms. Therefore, nowadays clustering techniques are usually applied in collaborative filtering together with other methods. In this context, they are applied as a first step in order to distribute the data for different recommender methods or to reduce the initial dataset. While dividing the population into clusters may hurt the accuracy of recommendations to users near the fringes of their assigned cluster, pre-clustering may be a worthwhile trade-off between accuracy and throughput (Schafer, 2005). Taking into account machine learning techniques mainly applied in collaborative filtering methods, the Support Vector Machines (SVMs) are quite frequent. Such technique consists in a supervised learning method for building a lineal classifier. In order to build a SVM, every user is seen as a vector composed by ratings about items. Such vectors are associated to a geometric space in which a hyperplane of separation between the possible classes is built. In this context, such classes are related to groups of users with similar preferences. Unlike other learning methods, SVM’s performance is related not to the number of features in the system, but to the margin with which it separates the data (Cheung et al., 2003). Association rule-based methods for classification are also employed in recommender systems. They are being more and more popular due to some benefits they present for the collaborative filtering context. In the next subsection we detail association rule discovery applied for recommender systems. 3.2. Association rules As well as most data mining techniques, association rules induction algorithms can be employed to enhance personalization in recommendation systems, such as the ones in Lee et al. (2001), Lazcorreta, Botella, and Fernández-Caballero (2008). Association rules were first introduced by Agrawal, Imielinski, and Swami (1993) aiming at discovering consuming patterns in retail databases. Thus, the task of discovering association rules in retail data was termed as ‘‘market basket analysis’’. Data stored in market basket are items bought on a per-transaction basis for a retailing organization. The representation of an association rule may be declared as A ? B, where A and B are item sets. Such representation states that, in a transaction, the occurrence of all items from ‘‘A’’ (antecedent side of the rule) results in the occurrence of items belonging to ‘‘B’’ (consequent side of the rule), such as A # I and B # I, where ‘‘I’’ is an item set. An association rule describes an association relation between item sets that occurs together on transactions of a data set. Thus, association, unlike classification, is not considered as a prediction task, because it aims at describing data. Therefore, association rule mining is not a machine learning method. There are several association rule mining algorithms available in the literature, such as ECLAT (Zaki, Parthasarathy, Ogihara, & Li, 1997), DIC (Brin, Motwani, Ullman, & Tsur, 1997) and FP-Growth (Han, Pei, & Yin, 2000). However, the Apriori (Agrawal et al., 1993) is certainly the most popular, and widely employed, association rules discovery algorithm nowadays (Neves, 2003). Its concepts and techniques are used by almost every algorithm proposed currently, which, are mostly mere extension of the Apriori (Neves, 2003). Our motivation to apply association rules in recommender systems is based on the structure of these rules, which is very appropriated for recommendation purposes, because they can learn patterns like ‘‘70% of users who liked item 1, also liked item J. Pinho Lucas et al. / Expert Systems with Applications xxx (2011) xxx–xxx 3 Please cite this article in press as: Pinho Lucas, J., et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender systems. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.07.136
2". In this context, the rules can denot er on different beings can easier comprehend le algorithms than an output As highli ree data n recommender recommendations. An error in a
2’’. In this context, the rules can denote two types of associations: between users and items (in the case of content-based methods) and associations between items or users (in the case of collaborative filtering methods). The first type of association rules in recommender systems consists on associate active user’s profile data with items’ features. The second one, employed in collaborative filtering, consists on associating data of the active user with data of other users or on associating data of items the active user is interested in with data of other items available on the system. In Sun, Kong, and Chen (2005), for example, association rules are applied to mine relationships between items and then make the prediction about an item for an active user by computing the weighted sum of the ratings given by the user about the items similar to the target item. Géry and Haddad (2003) propose the use of implicit opinions (instead of ratings) of users. The problem of finding Web pages visited together is similar to find associations rules among itemsets in transaction databases. In most cases, rules may be discovered offline by processing data related to users’ opinions. Recommender models based on association rules are easy to be interpreted and are usually faster to be built than most machine learning models. A recommender model using Bayesian networks, for example, considers the conditional probabilities of all the possible opinions for an item given all possible opinions for other items. Moreover, association rules may alleviate the sparsity drawback, which will be described into the next section, because rules do not need to consider all opinions from users in order to build a recommender model. Furthermore, the system may take into account just the best rules for the recommender model. Rules may be evaluated and ranked by means of statistical measures like support and confidence. In the case study depicted in the next section, we will analyze in a more detailed way how association rules can reduce sparsity. However, Zhang and Chang (2005) affirm that recommender models built employing only association generally present poorquality recommendations. In Zhang and Chang (2005), association rules were combined with sequential rules to enhance recommendations efficiency. While in Lee et al. (2001), a method combining the nearest neighbor algorithm with association rules was developed. Such method employs information acquired from transactions of groups of users with similar preferences (neighbors) in order to discover association rules about Web objects. Alternatively, other machine learning methods could be applied for recommender systems, as the one proposed by Liu, Hsu, and Ma (1998). In such work, authors adapted association rules in order to play the role of a classifier. Such method will be described in the next subsection and it will be tried for recommender systems in Section 5. 3.3. Classification based on association methods As highlighted in the previous subsection, association rules aim at describing data and, consequently, they are seen a non-supervised learning method. On the other hand, a classification method is seen as a prediction technique, because it aims at predicting the value of an attribute (label) in a data set. The joining of concepts from classification and association (Liu et al., 1998) is an alternative approach for performing classification tasks, where association rules are employed as the basis of a classification method. Seeing that association models are commonly more effective than classifi- cation models, a crucial matter that encourages the use of association rules in classification is the high computational cost that current classification methods present. Several works (Li, Han, & Pei, 2001; Liu et al., 1998; Thabtah, Cowling, & Peng, 2005; Yin & Han, 2003) verify that classification based on association methods presents higher accuracy than traditional classification methods. Differing from association rules, decision trees, for example, do not consider simultaneous correspondences occurring on different attribute values. Moreover, human beings can easier comprehend an output provided by association rule algorithms than an output provided by usual classification techniques, such as artificial neural networks (Sarwar et al., 2000). Thabtah et al. (2005) sustain that a few accurate and effective classifiers based on associative classification have been presented recently, such as CBA (Classification Based in Association) (Liu et al., 1998), CPAR (Classification based on Predictive Association Rules) (Yin & Han, 2003), and CMAR (Classification based on Multiple class-Association Rules) (Li et al., 2001). Taking into account that for classification rule mining there is one and only one predetermined target, while for association rule mining the target of discovery is not pre-determined (Liu et al., 1998), it is necessary to constrain the rules’ consequent terms to encompass only one attribute. Thus, the consequent term of an association rule will represent the target, or class, attribute. Therefore, such rule can play a prediction role in a given system: in order to classify an item, the rule’s properties are matched to every rule’s antecedents and the attribute value of the consequent term (from one or more selected rules) will be the predicted class. Generally, the classification model is a set of ordered rules. In the CBA algorithm, for example, the rules are ordered by means of the confidence measure and it uses only one rule for performing classification. However, in this case some scenario in which could exist multiples rules with similar confidence measures may occur and, at the same time, with greatly different support measures. Hence, a rule A with much higher confidence than a rule B could be the one chosen for classification even if B had a much higher support (Li et al., 2001). The MCAR algorithm solves such drawback by means of an approach that considers, in addition to the confidence, the rules’ support. The CMAR algorithm has a fine approach for selecting association rules for classification, instead of using just one rule, it makes use of all rules that match the case to be classified. If the consequent term of all selected rules is the same, the predicted class will obviously be the value of the rules’ consequent term. Though, in a different scenario, rules are divided in groups according to the consequent terms’ values. The value chosen for classification is acquired through the group in which its elements hold the highest correlation value depending on the weighted v2 measure. Similarly to CMAR, the CPAR algorithm also divides rules in groups, though, instead of using all rules that match to the object to be predicted, it uses the ‘‘k’’ best rules that represent each class. Afterwards, the algorithm chooses a group, by means of the Laplace Accuracy measure, that will be the one used for classification. The drawbacks presented by association rules induction algorithms are, in general, the same ones of classification based on association algorithms. A critical drawback of these algorithms is due to those rules that have few attributes. Seeing that such rules expresses narrow information, an object which has few attributes would be ineffectively classified. Another critical drawback is due to the large number of rules that algorithms commonly produce (Sarwar et al., 2001), as a consequence, much of them do not supply relevant information or are contradictory. Such drawback is a critical issue related to associative classifiers, because the performance of the algorithm may be affected when retrieving, storing, pruning and sorting a large number of rules (Li et al., 2001). The CMAR algorithm tries to solve such drawback by implementing a FP-Tree data structure to store the association rules’ frequent itemsets. 4. Recommender systems shortcomings Shortcomings inherited by methods employed in recommender systems are reflected in erroneous recommendations. An error in a 4 J. Pinho Lucas et al. / Expert Systems with Applications xxx (2011) xxx–xxx Please cite this article in press as: Pinho Lucas, J., et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender systems. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.07.136
ARTICLE N PRESS J Pinho Lucas et aL/ Expert Systems with Applications xxx(2011)xxx-xXx recommendation may be presented as a false negative or a false are 1000 stores, 52 weeks in a year, 500,000 customers and 10,000 positive. The first one consists of products that were not recon products, the dataset has1000×52×10000×500000= mended, though the consumer would have liked them. The second 260,000,000, 000,000 potential cells. However, it might only have one consists of recommended products, though the consumer does 1, 872,000,000 populated cells, because there are 450,000 custom- not like them. According to Sarwar et al. (2000), the false positives ers shopping on average 26 times a year, buying 40 products at just are more critical because they will lead to angry consumers More- 2 stores. So, the dataset is 0.00036% sparse (936,000,000 over, even early researchers recognized that when recommender 260,000, 000,000,000)x 100) approaches are used to support decisions, it can be more valuable to measure how often the system leads its users to wrong choices 4.2. Scalability (Herlocker et al. 2004). Consequently, recommender method hould concern mostly on avoiding false positives Scalability is another drawback of recommender systems re- However, Claypool et al.(1999) have suggested that recom- sulted by the huge number of items available in it. Scalability in mender methods, unlike humans, have difficulty in distinguishing recommender systems includes both very large problem sizes between high-quality and low-quality information relating to the and real-time latency requirements ( Schafer et al, 2001). One same topic. Therefore, it might not be effective to provide recom- example of such requirements may be a scenario in which there mendations when evaluations acquired from users are not taken is a recommender system connected to a Web site needing to pre to account on the other hand the use of information from user vide recommendations within some milliseconds and. at the same evaluations probably provide more false positives. In order to re- time, serve thousands of users simultaneously. Thus, a major chal duce false positives, methods employed in recommender systems lenge in recommender systems nowadays, according to Schafer must avoid the occurrence of some typical drawbacks. Next sub- et al. (2001), is to adapt data mining techniques to meet simulta- sections will describe the four most critical drawbacks that may neously low latency and high throughput(amount of data flowing occur in these systems. in a system) requirements in order to attract and serve a huge number of users. In the recommender system context, the through- 4.1. Data sparsity put may be measured by the number of users and items that the system is able to support without affecting efficiency. Probably the biggest challenge recommender systems have Efficiency is a key feature for recommender systems, because nowadays is related to data sparsity problem due to the huge they need to supply fast feedback to their users. Generally, short- amount of data available in current recommender systems. Funda- comings resulted from scalability do not occur in model based mentally, sparsity occurs because the number of ratings needed to methods, because in these methods, differently from other classes build a prediction model is greater than the number of the ratings of methods, the computer data processing is usually not done dur- obtained from users. Moreover, most recommender techniques re- ing run time. quire user explicit expression of personal preferences for item Scalability may turn into a major concern for the efficiency of a Nevertheless, methods for obtaining ratings implicitly have been system, because some techniques, like searching for the nearest developed in order to add more ratings and reduce sparsity. How- neighbor, for example, may be unfeasible to be employed in sys- ever, even with the use of up to date methods(including data min- tems having a huge data base. a typical web-based recommender ing methods), sparsity still remain a critical drawback for system running only the nearest neighbor algorithm will probably recommender systems due to the extensive number of items avail- suffer serious scalability problems(Sarwar et al., 2001). ble. This is a significant problem because, in practice, it is usually costly and difficult to collect sufficient data for all users(Ahn, Kan 4.3. Early rater problem lee, 2010). According to Sarwar et al. (2001), active users may have purchased less than 1% of the items available in a system. This Despite that the drawbacks described before may be minimized neans that in a recommender system of movies owning 500,000 by means of usage of model-based methods, there are other shor items, for example, an active user would be able to rate 5000 mov- comings that may occur along with these methods. The Early Rater es, nevertheless, we cannot expect that all users of the system (or First-Rater) Problem(Claypool et al., 1999: Condliff et al, 1999 watch 5000 movies and provide ratings to all of them. In addition, is an example of drawback associated with every class of collabo- rating schemes can only be applied to homogeneous domains and rative filtering methods Such drawback arises when it is impossi- the number of ratings eligible to be used by the system is even ble to offer recommendations about an item that was just more restricted incorporated in the system and therefore, has few (or even nor Model based methods reduce shortcomings derived from spar- evaluations from users. In fact, the early rater problem is directly sity, however it is still necessary to have a certain minimum num- linked with sparsity, because when a system has a high numbe ber of ratings in order to build an estimation model of ratings. In of items, probably most of these items have never received any any case, owning ratings of just 1% of systems items may be, evaluation. Conceptually, the early-rater problem can be viewed according to the technique used, scarce to build a reliable model. as a special instance of the sparsity problem( huang, Chen, Zeng In this work we also analyze how the performance of classifiers 2004). may be affected by sparsity. Therefore, we address some questions anwar et al. (2001) affirm that current recommender syst related to sparsity: how we can nominate whether a dataset is depend on the altruism of a set of users who are willing to rate parse or not and if it is possible to measure the degree of sparsity many items without receiving many recommendations. Econo- of a dataset. Due to practical reasons sometimes industry and acad- mists have speculated that even if rating required no effort at all emy, evaluate sparsity considering the number of NULL NA values many users would choose to delay considering items to wait for presented by a certain dataset. Hence sparsity may be seen as den their neighbors to provide them with recommendations( Avery sity, which reflects both the overall size of recommenders item Zeckhauser, 1997). Thus, it is needed to find a way to encourage pace and the degree in which users have explored it(Herlocker users to made evaluations about items available in the system. etal.2004) Analogously, such drawback also occurs with a l In this context, in Rittman(2005) an example about a dataset the system, because since there is no information about him, it with four attributes in a retail market scenario is described. The would be impossible to determine his behavior in order to pre ttributes are: store, week in a year, costumer and product. If there him recommendations. Actually this scenario of the early rater Please cite this article in press as: Pinho Lucas, J, et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender sy tems.Expert Systems with Applications(2011). doi:10.1016/jeswa2011.07.136
recommendation may be presented as a false negative or a false positive. The first one consists of products that were not recommended, though the consumer would have liked them. The second one consists of recommended products, though the consumer does not like them. According to Sarwar et al. (2000), the false positives are more critical because they will lead to angry consumers. Moreover, even early researchers recognized that when recommender approaches are used to support decisions, it can be more valuable to measure how often the system leads its users to wrong choices (Herlocker et al., 2004). Consequently, recommender methods should concern mostly on avoiding false positives. However, Claypool et al. (1999) have suggested that recommender methods, unlike humans, have difficulty in distinguishing between high-quality and low-quality information relating to the same topic. Therefore, it might not be effective to provide recommendations when evaluations acquired from users are not taken into account. On the other hand, the use of information from user evaluations probably provide more false positives. In order to reduce false positives, methods employed in recommender systems must avoid the occurrence of some typical drawbacks. Next subsections will describe the four most critical drawbacks that may occur in these systems. 4.1. Data sparsity Probably the biggest challenge recommender systems have nowadays is related to data sparsity problem due to the huge amount of data available in current recommender systems. Fundamentally, sparsity occurs because the number of ratings needed to build a prediction model is greater than the number of the ratings obtained from users. Moreover, most recommender techniques require user explicit expression of personal preferences for items. Nevertheless, methods for obtaining ratings implicitly have been developed in order to add more ratings and reduce sparsity. However, even with the use of up to date methods (including data mining methods), sparsity still remain a critical drawback for recommender systems due to the extensive number of items available. This is a significant problem because, in practice, it is usually costly and difficult to collect sufficient data for all users (Ahn, Kang, & Lee, 2010). According to Sarwar et al. (2001), active users may have purchased less than 1% of the items available in a system. This means that in a recommender system of movies owning 500,000 items, for example, an active user would be able to rate 5000 movies, nevertheless, we cannot expect that all users of the system watch 5000 movies and provide ratings to all of them. In addition, rating schemes can only be applied to homogeneous domains and the number of ratings eligible to be used by the system is even more restricted. Model based methods reduce shortcomings derived from sparsity, however it is still necessary to have a certain minimum number of ratings in order to build an estimation model of ratings. In any case, owning ratings of just 1% of system’s items may be, according to the technique used, scarce to build a reliable model. In this work we also analyze how the performance of classifiers may be affected by sparsity. Therefore, we address some questions related to sparsity: how we can nominate whether a dataset is sparse or not and if it is possible to measure the degree of sparsity of a dataset. Due to practical reasons sometimes industry and academy, evaluate sparsity considering the number of NULL/NA values presented by a certain dataset. Hence, sparsity may be seen as density, which reflects both the overall size of recommender’s item space and the degree in which users have explored it (Herlocker et al., 2004). In this context, in Rittman (2005) an example about a dataset with four attributes in a retail market scenario is described. The attributes are: store, week in a year, costumer and product. If there are 1000 stores, 52 weeks in a year, 500,000 customers and 10,000 products, the dataset has 1000 52 10,000 500,000 = 260,000,000,000,000 potential cells. However, it might only have 1,872,000,000 populated cells, because there are 450,000 customers shopping on average 26 times a year, buying 40 products at just 2 stores. So, the dataset is 0.00036% sparse ((936,000,000/ 260,000,000,000,000) 100). 4.2. Scalability Scalability is another drawback of recommender systems resulted by the huge number of items available in it. Scalability in recommender systems includes both very large problem sizes and real-time latency requirements (Schafer et al., 2001). One example of such requirements may be a scenario in which there is a recommender system connected to a Web site needing to provide recommendations within some milliseconds and, at the same time, serve thousands of users simultaneously. Thus, a major challenge in recommender systems nowadays, according to Schafer et al. (2001), is to adapt data mining techniques to meet simultaneously low latency and high throughput (amount of data flowing in a system) requirements in order to attract and serve a huge number of users. In the recommender system context, the throughput may be measured by the number of users and items that the system is able to support without affecting efficiency. Efficiency is a key feature for recommender systems, because they need to supply fast feedback to their users. Generally, shortcomings resulted from scalability do not occur in model based methods, because in these methods, differently from other classes of methods, the computer data processing is usually not done during run time. Scalability may turn into a major concern for the efficiency of a system, because some techniques, like searching for the nearest neighbor, for example, may be unfeasible to be employed in systems having a huge data base. A typical web-based recommender system running only the nearest neighbor algorithm will probably suffer serious scalability problems (Sarwar et al., 2001). 4.3. Early rater problem Despite that the drawbacks described before may be minimized by means of usage of model-based methods, there are other shortcomings that may occur along with these methods. The Early Rater (or First-Rater) Problem (Claypool et al., 1999; Condliff et al., 1999) is an example of drawback associated with every class of collaborative filtering methods. Such drawback arises when it is impossible to offer recommendations about an item that was just incorporated in the system and, therefore, has few (or even none) evaluations from users. In fact, the early rater problem is directly linked with sparsity, because when a system has a high number of items, probably most of these items have never received any evaluation. Conceptually, the early-rater problem can be viewed as a special instance of the sparsity problem (Huang, Chen, & Zeng, 2004). Sarwar et al. (2001) affirm that current recommender systems depend on the altruism of a set of users who are willing to rate many items without receiving many recommendations. Economists have speculated that even if rating required no effort at all, many users would choose to delay considering items to wait for their neighbors to provide them with recommendations (Avery & Zeckhauser, 1997). Thus, it is needed to find a way to encourage users to made evaluations about items available in the system. Analogously, such drawback also occurs with a new user joining the system, because since there is no information about him, it would be impossible to determine his behavior in order to provide him recommendations. Actually, this scenario of the early rater J. Pinho Lucas et al. / Expert Systems with Applications xxx (2011) xxx–xxx 5 Please cite this article in press as: Pinho Lucas, J., et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender systems. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.07.136