A Multilayer Ontology-based Hybrid Recommendation Model Ivan Cantador, Alejandro Bellogin, Pablo Castells Escuela Politecnica Superior Universidad Autonoma de madrid Campus de Cantoblanco, 28049, Madrid, Spain Hivan.cantador, alejandro. bellogin, pablo castells; @uames Abstract may enjoy similar items. However, in typical approaches, We propose a novel hybrid recommendation model in which the comparison between users is done globally, in such a user preferences and item features are described in terms of way that partial, but strong and useful similarities might be semantic concepts defined in domain ontologies. The missed. For instance, two people may have a highly exploitation of meta-information describing the coincident taste in cinema, but a very divergent one in recommended items and user profiles in a general, portable sports. The opinions of these people on movies could be way, along with the capability of inferring knowledge from highly valuable for each other, but risk to be ignored by he relations defined in the ontologies, are the key aspects of many collaborative recommender systems, because the he presented proposal. Taking advantage of the enhanced semantics representation, user profiles are compared at a global similarity between the users might be lot finer grain size than they are in usual recommender systems In our proposal we argue for the distinction of different More specifically, the concept, item, and user spaces are layers within the interests and preferences of users, as a clustered in a coordinated way, and the resulting clusters are useful refinement to produce better recommendations d to find similarities among individuals at multiple Depending on the current context, only a specific subset of semantic layers. Such layers correspond to implicit the segments (layers)of a user profile is considered Communities of Interest(Col), and enable collaborative order to establish her similarities with other people when a recommendations of enhanced precision. Our approach is recommendation has to be performed. Such models of tested in two sets of experiments: one including profiles nduced user networks or communities, partitioned at manually defi real users and another with different common semantic layers can be exploited in the automatically profiles based on data from the IMDb and movielens datasets recommendation processes in order to produce more accurate and context-sensitive results Keywords: hybrid recommender systems, communities of interest, ontology, user profiling o Our approach is based on an ontological representation the domain of discourse where user interests are defined The ontological space takes the shape of a semantic network of interrelated domain concepts and the user 1 Introduction profiles are initially described as weighted lists measuring Recommender systems emerged in the early nineties as a the user interests for those concepts. We propose here to thriving research area on its own, distinct from other exploit the links between users and concepts to extract related fields in Artificial Intelligence and Information relations among users according to common interests Retrieval. The area has undergone a considerable leap in Analysing the structure of the domain ontology and taking significance and potential value since then, with the boost of digital content and online businesses involving stocks of profiles, we cluster the domain concept space, and generate goods of different sorts. The volume, growth rate, ubiquity groups of interests shared by certain users. Thus, those of access, and to a large extent unstructured nature of users who share interests of a specific concept cluster are worldwide content challenge the limits of human connected in the corresponding community, where their processing capabilities and information access preference weights measure the degree of membership to technologies, putting at stake the effective utility of that cluster content, despite its actual value. It is in such settings where The rest of the paper has the following structure. Section recommender systems can make a great valuable 2 describes the different types of recommender systems contribution, by proactively scanning the space of choices. and their current limitations, and depicts which of them are and predicting the potential usefulness of items for each addressed by our proposal. Section 3 is dedicated to the particular user, without needing users to explicitly specify underlying ontology-based knowledge representation and their needs or query tor items of whose existence they item features and user preferences are expressed in terms based on the principle that users with common traits(in of domain ontologies, how they are extended using the their demographic data, behaviour, taste, opinions, etc. semantic relations of those structures, and how they are
A Multilayer Ontology-based Hybrid Recommendation Model Iván Cantador, Alejandro Bellogín, Pablo Castells Escuela Politécnica Superior Universidad Autónoma de Madrid Campus de Cantoblanco, 28049, Madrid, Spain {ivan.cantador, alejandro.bellogin, pablo.castells}@uam.es Abstract We propose a novel hybrid recommendation model in which user preferences and item features are described in terms of semantic concepts defined in domain ontologies. The exploitation of meta-information describing the recommended items and user profiles in a general, portable way, along with the capability of inferring knowledge from the relations defined in the ontologies, are the key aspects of the presented proposal. Taking advantage of the enhanced semantics representation, user profiles are compared at a finer grain size than they are in usual recommender systems. More specifically, the concept, item, and user spaces are clustered in a coordinated way, and the resulting clusters are used to find similarities among individuals at multiple semantic layers. Such layers correspond to implicit Communities of Interest (CoI), and enable collaborative recommendations of enhanced precision. Our approach is tested in two sets of experiments: one including profiles manually defined by real users and another with automatically generated profiles based on data from the IMDb and MovieLens datasets. Keywords: hybrid recommender systems, communities of interest, ontology, user profiling 1. Introduction Recommender systems emerged in the early nineties as a thriving research area on its own, distinct from other related fields in Artificial Intelligence and Information Retrieval. The area has undergone a considerable leap in significance and potential value since then, with the boost of digital content and online businesses involving stocks of goods of different sorts. The volume, growth rate, ubiquity of access, and to a large extent unstructured nature of worldwide content challenge the limits of human processing capabilities and information access technologies, putting at stake the effective utility of content, despite its actual value. It is in such settings where recommender systems can make a great valuable contribution, by proactively scanning the space of choices, and predicting the potential usefulness of items for each particular user, without needing users to explicitly specify their needs or query for items of whose existence they cannot be aware beforehand. Recommender systems are based on the principle that users with common traits (in their demographic data, behaviour, taste, opinions, etc.) may enjoy similar items. However, in typical approaches, the comparison between users is done globally, in such a way that partial, but strong and useful similarities might be missed. For instance, two people may have a highly coincident taste in cinema, but a very divergent one in sports. The opinions of these people on movies could be highly valuable for each other, but risk to be ignored by many collaborative recommender systems, because the global similarity between the users might be low. In our proposal we argue for the distinction of different layers within the interests and preferences of users, as a useful refinement to produce better recommendations. Depending on the current context, only a specific subset of the segments (layers) of a user profile is considered in order to establish her similarities with other people when a recommendation has to be performed. Such models of induced user networks or communities, partitioned at different common semantic layers can be exploited in the recommendation processes in order to produce more accurate and context-sensitive results. Our approach is based on an ontological representation of the domain of discourse where user interests are defined. The ontological space takes the shape of a semantic network of interrelated domain concepts and the user profiles are initially described as weighted lists measuring the user interests for those concepts. We propose here to exploit the links between users and concepts to extract relations among users according to common interests. Analysing the structure of the domain ontology and taking into account the semantic preference weights of the user profiles, we cluster the domain concept space, and generate groups of interests shared by certain users. Thus, those users who share interests of a specific concept cluster are connected in the corresponding community, where their preference weights measure the degree of membership to that cluster. The rest of the paper has the following structure. Section 2 describes the different types of recommender systems and their current limitations, and depicts which of them are addressed by our proposal. Section 3 is dedicated to the underlying ontology-based knowledge representation and basic content retrieval of our proposal. We describe how item features and user preferences are expressed in terms of domain ontologies, how they are extended using the semantic relations of those structures, and how they are
exploited for basic content-based recommendations. The mechanism to cluster the concept space in several layers of 'reference Profile shared semantic interests for building multi-level relations chant the uwer'spm between users is presented in Section 4. The exploitation of the derived communities to enhance collaborative filtering is described in Section 5. The empirical evaluation of that model is presented in Section 6. As already mentioned, two different experiments are described: one using user profiles manually defined by real users and other conducted with artificial user profiles built from data of the well-known IMDb and MovieLens repositories. Section 7summarises related work, and finally, we conclude with some discussions and future research lines in Section 8 Figure I General process followed by a recommendersystem 2. Background In this scenario, the main difficulty lies in that the utility function g is usually not defined in the entire uxI space, recommendation problem can be formulated as but only on some subset of it. In recommender systems, the ows [1]. Let l=(u, uz. uu) be the set of all users utility function is defined only on the items that have been registered in the system, and let I=(i i2 in) be the set previously rated by the users, and it has to be extrapolated to the whole uxt sPace of all possible items that can be recommended. Let g(ua is ) be a utility function that measures the gain or Thus based on the mechanism in which item ratings are usefulness of item i to user l4x工→R estimated for different users, the following two main types where R is a totally ordered set (e.g. non negative of recommender systems can be distinguished: 1)content integers or real numbers within a certain range). Then, for based recommender systems, in which the user is each user u∈ll, we aim to choose the item i"∈工 recommended items similar to those he preferred in the that maximises the users utility. More formally past, and, 2)collaborative filtering systems, in which the user is recommended items that people with similar tastes Vu∈l,= arg max g(un,) and preferences liked in the past. Due to the limitations of each of the above strategies, combinations of them have The utility of an item is usually represented by a rating, een investigated in the so-called hybrid recommender measuring how much a specific user is(or is predicted to systems, empirically demonstrating their better be)interested in a specific item. Depending on the effectiveness application, the ratings can either be specified by the users, Nowadays, the interest in recommender systems is on or computed by the application. Each element of the user rise, constituting an integral part of a number of spaceycanbedescribedwithprofilethatmightincluderecommendedAmazon.com[341.whererecommendations several demographic characteristics, such as gender, age, nationality, marital status, etc, or some information about of books, CDs, and other products are done, or Google set of characteristics. For example, in a movie recommender system, movies can be described not only by improvements to make the recommendation algorithms directors. etc more effective and able to a broader range of real Figure I shows a general schema of a recommendation world applications [1[10]. As we explain later, these process. Firstly, the system manually or automaticall improvements include, among others, the application of captures the target users preferences, building her personal strategies that address situations in which few ratings are features of the preferred items, evaluations or ratings of with major flexibility and interpretability for the users, and profile. These preferences are defined as explicit content available over certain items. the use of recommendations those items, or as implicit tastes/interests information the study of more scalable algorithms that allow to make acquired from the user's behaviour or utilisation of the recommendations not only for a single user, but also for a system. Once the user profile is created, it is somehow group of people with similar tastes and interests compared against the items stored in the system, and those The strategies to confront the previous and other aspects items which are most appropriate are recommended are currently open research issues in the field. Here, we Depending on the algorithm implemented to choose the propose the use of Semantic Web technologies to address most appropriate items, we shall distinguish several some of them. Specifically, we present a hybrid recommendation model based on ontologies which offers a ategorisations for the recommender systems, and identify novel contribution to the scientific community that works on recommender systems. The opportunity to add meta
exploited for basic content-based recommendations. The mechanism to cluster the concept space in several layers of shared semantic interests for building multi-level relations between users is presented in Section 4. The exploitation of the derived communities to enhance collaborative filtering is described in Section 5. The empirical evaluation of that model is presented in Section 6. As already mentioned, two different experiments are described: one using user profiles manually defined by real users and other conducted with artificial user profiles built from data of the well-known IMDb and MovieLens repositories. Section 7 summarises related work, and finally, we conclude with some discussions and future research lines in Section 8. 2. Background The recommendation problem can be formulated as follows [1]. Let U= ( ) 1 2 , ,..., M uu u be the set of all users registered in the system, and let I = ( ) 1 2 , ,..., N ii i be the set of all possible items that can be recommended. Let ( , ) m n gu i be a utility function that measures the gain or usefulness of item ni to user m u , i.e., g : U × →I R , where R is a totally ordered set (e.g. non negative integers or real numbers within a certain range). Then, for each user um ∈ U , we aim to choose the item ∈ I max, mu i that maximises the user’s utility. More formally: ∈ ∀∈ = I U max, , arg max ( , ) m n u m mn i u i gu i The utility of an item is usually represented by a rating, measuring how much a specific user is (or is predicted to be) interested in a specific item. Depending on the application, the ratings can either be specified by the users, or computed by the application. Each element of the user space U can be described with a profile that might include several demographic characteristics, such as gender, age, nationality, marital status, etc., or some information about the user’s tastes, interests and preferences. Analogously, each element of the item space I can be described with a set of characteristics. For example, in a movie recommender system, movies can be described not only by their titles, but also by their genres, principal actors, directors, etc. Figure 1 shows a general schema of a recommendation process. Firstly, the system manually or automatically captures the target user’s preferences, building her personal profile. These preferences are defined as explicit content features of the preferred items, evaluations or ratings of those items, or as implicit tastes/interests information acquired from the user’s behaviour or utilisation of the system. Once the user profile is created, it is somehow compared against the items stored in the system, and those items which are most appropriate are recommended. Depending on the algorithm implemented to choose the most appropriate items, we shall distinguish several categorisations for the recommender systems, and identify different subsets of items that can be retrieved. Figure 1 General process followed by a recommender system In this scenario, the main difficulty lies in that the utility function g is usually not defined in the entire U×I space, but only on some subset of it. In recommender systems, the utility function is defined only on the items that have been previously rated by the users, and it has to be extrapolated to the whole U×I space. Thus, based on the mechanism in which item ratings are estimated for different users, the following two main types of recommender systems can be distinguished: 1) contentbased recommender systems, in which the user is recommended items similar to those he preferred in the past, and, 2) collaborative filtering systems, in which the user is recommended items that people with similar tastes and preferences liked in the past. Due to the limitations of each of the above strategies, combinations of them have been investigated in the so-called hybrid recommender systems, empirically demonstrating their better effectiveness. Nowadays, the interest in recommender systems is on the rise, constituting an integral part of a number of important websites like MovieLens [27], where movies are recommended, Amazon.com [34], where recommendations of books, CDs, and other products are done, or Google News Personalization [20], a system for recommending news. In all of them, the use of recommendation methods has been very successful. However, the current generation of recommender systems still requires further improvements to make the recommendation algorithms more effective and applicable to a broader range of realworld applications [1][10]. As we explain later, these improvements include, among others, the application of strategies that address situations in which few ratings are available over certain items, the use of recommendations with major flexibility and interpretability for the users, and the study of more scalable algorithms that allow to make recommendations not only for a single user, but also for a group of people with similar tastes and interests. The strategies to confront the previous and other aspects are currently open research issues in the field. Here, we propose the use of Semantic Web technologies to address some of them. Specifically, we present a hybrid recommendation model based on ontologies which offers a novel contribution to the scientific community that works on recommender systems. The opportunity to add meta-
information to the descriptions of the recommended items More formally, and following the notation used in [11, and the preferences of the users, together with the let Content(i ) be the content description of item i,EI capability of inferring knowledge from the relations i.e., the set of content features characterising i, that are existent in the used domain ontologies are the key aspects used to determine the appropriateness of the item for the of the presented proposal different users. This description is usually represented as a Before introducing our recommendation model and its vector of real numbers(weights), in which each compone enefits, we briefly describe the characteristics of content measures the importance"(or"informativeness")of the based, collaborative filtering and hybrid recommender corresponding feature in the item content description systems, and explain their main current limitations Content(i 2. 1. Content-based recommender systems Since content-based recommender systems were Content-based approaches to recommendation making designed mostly to recommend textual items, the contents [7[8]][46] build on the conjecture that a person of the items are usually described with keywords. Hence, likes items with features similar to those of other items he for example, the content-based component of the Fab or she liked in the past [54]. Thus, the utility gain function system [5] represents web page contents in terms of the g(un) of item,∈ a for user u∈ u is estimated based on the utilities a(,, i,)assigned by user u, to items i 28 most informative words Analogously, let Content Based User Profile (u)be the that ggest movies to user u, a content-based recommender weighted item content features that describe the tastes, ystem would try to understand the commonalities among interests and needs of the user movies user u. has previously evaluated positively specific genres, preferred actors and directors, etc Content Based UserPr()=un=(,2…,1)∈R In content-based recommender systems, items are suggested according to a comparison between their The utility gain of item i, for user u is then calculated descriptions and the user profiles, which contain with a score function that combines the different item information about the users' tastes. interests. and needs description and user profile components Data structures for both of these components are created using features extracted from the content of the items. a g(u,i)=score( Content Based User Profile(u), Content(L ))ER weighting scheme is often used for providing high weights to the most discriminating features and preferences, and Different content-based recommendation approaches low weights to the less informative ones have been proposed in the literature to formulate the Figure 2 shows the general process followed by a previous expression. Basically, these techniques are content-based recommender system. Firstly, the users classified in heuristic-based and model-based approaches eferences are established according to the content The first ones calculate utility predictions based on features of those items preferred/selected by her. The heuristic formulas that are inspired mostly on information preferences existing in this profile are compared against retrieval methods, such as the cosine similarity measure the features of the items stored in the system choice set, The second ones, on the other hand, obtain utility and the items whose features are most similar to the user's predictions based on a model learned from the underlying content-based preferences are finally retrieved. Note that in statistical learning and machine learning this scenario only the items that share content-based models, such as Bayesian classifiers, clustering algorithms features with the user profiles can be suggested, reducin decision trees and artificial neural networks drastically the set of items that might be recommended to For both types of techniques, several limitations have each individual user been identified in the literature [15[10]. We describe some of them nex Relevence Nhb or Restricted content analysis. Content-based recommendations are restricted by the features that are explicitly associated with the items to be recommended. For example, content-based movie recommendations can only be based on written materials about a movie: actors' names, plot summaries, cinematographic genres, etc effectiveness of these techniques thus depends on the descriptive data available. Therefore, in order to have a sufficient set of features. the content should either Targtww be in a form that can be automatically parsed by a Figure 2 Content-based recommendations computer or in a form in which the features can be manually extracted in an easy way. In many cases these situations are very difficult to achieve. There
information to the descriptions of the recommended items and the preferences of the users, together with the capability of inferring knowledge from the relations existent in the used domain ontologies are the key aspects of the presented proposal. Before introducing our recommendation model and its benefits, we briefly describe the characteristics of contentbased, collaborative filtering and hybrid recommender systems, and explain their main current limitations. 2.1. Content-based recommender systems Content-based approaches to recommendation making [7][8][31][33][43][46] build on the conjecture that a person likes items with features similar to those of other items he or she liked in the past [54]. Thus, the utility gain function ( , ) m n gu i of item i n ∈ I for user um ∈ U is estimated based on the utilities ( , ) m l gu i assigned by user m u to items l i that are “similar” to item n i . For instance, in order to suggest movies to user m u a content-based recommender system would try to understand the commonalities among movies user m u has previously evaluated positively: specific genres, preferred actors and directors, etc. In content-based recommender systems, items are suggested according to a comparison between their descriptions and the user profiles, which contain information about the users’ tastes, interests, and needs. Data structures for both of these components are created using features extracted from the content of the items. A weighting scheme is often used for providing high weights to the most discriminating features and preferences, and low weights to the less informative ones. Figure 2 shows the general process followed by a content-based recommender system. Firstly, the user’s preferences are established according to the content features of those items preferred/selected by her. The preferences existing in this profile are compared against the features of the items stored in the system choice set, and the items whose features are most similar to the user’s content-based preferences are finally retrieved. Note that in this scenario only the items that share content-based features with the user profiles can be suggested, reducing drastically the set of items that might be recommended to each individual user. Figure 2 Content-based recommendations More formally, and following the notation used in [1], let ( ) n Content i be the content description of item i n ∈ I , i.e., the set of content features characterising n i that are used to determine the appropriateness of the item for the different users. This description is usually represented as a vector of real numbers (weights), in which each component measures the “importance” (or “informativeness”) of the corresponding feature in the item content description: ( ) == ∈ ( ,1 ,2 , , ,..., ) RK n n n n nK Content i i i i i Since content-based recommender systems were designed mostly to recommend textual items, the contents of the items are usually described with keywords. Hence, for example, the content-based component of the Fab system [5] represents web page contents in terms of the 128 most informative words. Analogously, let ( ) m ContentBasedUserProfile u be the content-based preferences of user um ∈ U , i.e., the weighted item content features that describe the tastes, interests and needs of the user. ( ) == ∈ ( ,1 ,2 , , ,..., ) RK m m m m mK ContentBasedUserProfile u u u u u The utility gain of item n i for user m u is then calculated with a score function that combines the different item description and user profile components: g u i score ContentBasedUserProfile u Content i ( mn m n , , ) = ( ( ) ( ))∈R Different content-based recommendation approaches have been proposed in the literature to formulate the previous expression. Basically, these techniques are classified in heuristic-based and model-based approaches. The first ones calculate utility predictions based on heuristic formulas that are inspired mostly on information retrieval methods, such as the cosine similarity measure. The second ones, on the other hand, obtain utility predictions based on a model learned from the underlying data using statistical learning and machine learning models, such as Bayesian classifiers, clustering algorithms, decision trees, and artificial neural networks. For both types of techniques, several limitations have been identified in the literature [1][5][10]. We describe some of them next. • Restricted content analysis. Content-based recommendations are restricted by the features that are explicitly associated with the items to be recommended. For example, content-based movie recommendations can only be based on written materials about a movie: actors’ names, plot summaries, cinematographic genres, etc. The effectiveness of these techniques thus depends on the descriptive data available. Therefore, in order to have a sufficient set of features, the content should either be in a form that can be automatically parsed by a computer or in a form in which the features can be manually extracted in an easy way. In many cases, these situations are very difficult to achieve. There
are some domains that have an inherent problem with as an approximate representation of her interests and automatic feature extraction, and it is often not needs in the domain of application practical to assign features by hand due to limitations matched against ratings submitted by =一。 ratings are of resources. For instance, it is much harder to apply obtaining the users set of "nearest neighbours". The items automatic feature extraction methods to multimedia hat were rated highly by the user's nearest neighbours and ta,e.g, graphical images, video streams, and audio were not rated by the user will finally be recommended records The way in which the user's"neighbours"are determined, and the strategy followed to combine the ratings of such Content overspecialisation. Content-based users will differentiate the existent CF approache recommender systems only retrieve items that score highly against a specific user profile. They cannot With the above ideas, the definitions of the user profile recommend items that are different from anything the and the item description given in this section for content ased recommender systems differ from those associated user has seen before. Thus, for example, a person to CF recommender systems. Specifically, let with no experience in Spanish cuisine would never Collaborative User Profile(u )=r=(a, ' aMer. be the receive recommendations for even the best Spanish restaurant n town collaborative profile of user u constituted by the set of ratings provided by the user to the N items stored in the Cold-start: new user problem. A user has to rate a system, and let Ratings()==(5…,4)∈R"behe sufficient number of items before a content-based of ratings r∈ R assigned to item i, by the M users recommender system can really understand her registered in the system. In both of the above definitions, if preferences and present him with reliable user u. has not rated item i, then r.=0. The utility recommendations. A new user having none or very ain of item i, for user u. is then computed by a score few ratings may not be suggested any accurate function that combines the different user profile and item description components Portfolio effect: non diversity problem. In certain ases, items should not be recommended if they are q(u,.)=score( Collaborative User Profile(u), Ratings()ER too similar to something the user has already seen. The different formulations given for the previous For example, it is not necessarily a good idea to expression [91[481511[] have lead to two main recommend all movies by Antonio Banderas to a user categories of CF techniques: user-based and item-based who liked one of them in the past, or it could not be approaches appropriate to recommend news articles describing active user s the same event ratings with those of other users to identify g a group of 2. 2. Collaborative filtering recommender systems be recommended to that user Collaborative filtering (CF) techniques [22][301 [47[48511[53] match people with similar preferences in order to make recommendations. Unlike content-based methods, collaborative recommender systems try to predict ⊙0◎ the utility of items for a particular user according to the items previously evaluated by other users. In other words, the utility gain function g(um, i, ) of item i, EI for user Eu is estimated based on the utilities g(un i )assigned to item i, by those users u, that are" similar to user u The great power of the CF approaches relative to Inside the b recommendation ability [101, i.e., the chance of recommending items that do not share content features A-y"for twg pressed in the user profiles. For example, it may be that listeners who enjoy free jazz also enjoy avant-garde classical music. but a content-based recommender trained Figure 3 User-based collaborative filtering recommendations on the preferences of a free jazz aficionado would not be The items preferred by the most similar users are recommended able to suggest items in the classical music realm to the active user none of the features(performers, instruments, repertories) associated with items in the different categories would be Item-based CF approaches, on the other hand, take each shared. Only by looking outside the preferences of the item of the active users list of rated items and recommend ndividual can such suggestions be made other items that seem to be similar to that item according to In CF systems, the users express other users ratings rating items. The ratings submitted by a user are thus used
are some domains that have an inherent problem with automatic feature extraction, and it is often not practical to assign features by hand due to limitations of resources. For instance, it is much harder to apply automatic feature extraction methods to multimedia data, e.g., graphical images, video streams, and audio records. • Content overspecialisation. Content-based recommender systems only retrieve items that score highly against a specific user profile. They cannot recommend items that are different from anything the user has seen before. Thus, for example, a person with no experience in Spanish cuisine would never receive recommendations for even the best Spanish restaurant in town. • Cold-start: new user problem. A user has to rate a sufficient number of items before a content-based recommender system can really understand her preferences and present him with reliable recommendations. A new user having none or very few ratings may not be suggested any accurate recommendations. • Portfolio effect: non diversity problem. In certain cases, items should not be recommended if they are too similar to something the user has already seen. For example, it is not necessarily a good idea to recommend all movies by Antonio Banderas to a user who liked one of them in the past, or it could not be appropriate to recommend news articles describing the same event. 2.2. Collaborative filtering recommender systems Collaborative filtering (CF) techniques [22][30] [47][48][51][53] match people with similar preferences in order to make recommendations. Unlike content-based methods, collaborative recommender systems try to predict the utility of items for a particular user according to the items previously evaluated by other users. In other words, the utility gain function ( , ) m n gu i of item i n ∈ I for user um ∈ U is estimated based on the utilities ( , ) l n gui assigned to item n i by those users l u that are “similar” to user m u . The great power of the CF approaches relative to content-based ones is its “outside the box” recommendation ability [10], i.e., the chance of recommending items that do not share content features expressed in the user profiles. For example, it may be that listeners who enjoy free jazz also enjoy avant-garde classical music, but a content-based recommender trained on the preferences of a free jazz aficionado would not be able to suggest items in the classical music realm since none of the features (performers, instruments, repertories) associated with items in the different categories would be shared. Only by looking outside the preferences of the individual can such suggestions be made. In CF systems, the users express their preferences by rating items. The ratings submitted by a user are thus used as an approximate representation of her tastes, interests and needs in the domain of application. These ratings are matched against ratings submitted by all other users, obtaining the user’s set of “nearest neighbours”. The items that were rated highly by the user’s nearest neighbours and were not rated by the user will finally be recommended. The way in which the user’s “neighbours” are determined, and the strategy followed to combine the ratings of such users will differentiate the existent CF approaches. With the above ideas, the definitions of the user profile and the item description given in this section for contentbased recommender systems differ from those associated to CF recommender systems. Specifically, let ( ) == ∈ ( ,1 ,2 , , ,..., ) N m m m m mN CollaborativeUserProfile u r r r r R be the collaborative profile of user mu constituted by the set of ratings provided by the user to the N items stored in the system, and let ( ) == ∈ ( 1, 2, , , ,..., ) M Ratings i r r r n n n n Mn r R be the set of ratings r m n, ∈R assigned to item n i by the M users registered in the system. In both of the above definitions, if user m u has not rated item n i , then r m n, = ∅ . The utility gain of item n i for user m u is then computed by a score function that combines the different user profile and item description components: g u i score CollaborativeUserProfile u Ratings i ( mn m n , , ) = ( ( ) ( ))∈R The different formulations given for the previous expression [9][48][51][53] have lead to two main categories of CF techniques: user-based and item-based approaches. User-based CF approaches compare the active user’s ratings with those of other users to identify a group of similar people. The highest rated items of that group will be recommended to that user. Figure 3 User-based collaborative filtering recommendations. The items preferred by the most similar users are recommended to the active user Item-based CF approaches, on the other hand, take each item of the active user’s list of rated items, and recommend other items that seem to be similar to that item according to other users’ ratings
available it might become very difficult to categorise 888 the user's interests Cold-start: new item problem. Collaborative filtering recommender systems only rely on users eferences to make recommendations and do not ke use of content information of the existing items Until a new item is rated by a substantial number of users, the recommender system would not be able to recommend it. a recent item that has not obtained many ratings cannot be easily recommended. This problem shows up in domains such as news articles where there is a constant stream of new items and each user only rates a few [52] Gray sheep problem. For the user whose tastes are Figure 4 Item-based collaborative filtering unusual compared to the rest of the population, there The items which have been most similarly evaluated are will not be any other users who are particularly recommended to the active user recommendations Collaborative recommenders work best for a user Pure collaborative filtering recommender systems who fits into a cluster with many neighbours of confront some of the weaknesses existing in content-base similar tastes. However, the techniques do not work approaches. Since collaborative strategies make use of well for the so-called "Gray sheep", those people who other users recommendations(ratings), they can deal with fall on a border between two cliques of users. This is any kind of content and recommend any item, even the also a problem for demographic systems that attempt ones that are dissimilar to those seen in the past. However to categorise users according to personal collaborative techniques suffer from their own limitations Portfolio effect: non diversity problem. Since Sparse rating problem. In collaborative filtering collaborative filtering systems' knowledge about systems, the number of ratings already obtained is content is purely derived from usually very small compared to the number of ratings recommendations are strongly based toward what has needed to be predicted. In practice, many commercial been chosen(or recommended) in the past, resulting systems, such as Amazon. com which recommends in frequent recommendations of just the most popular CDNow. com which recommends music This could make collaborative filtering a po albums, are used to evaluate very large datasets where cry tool for the end user, often failing to even active users may have rated well under 1% of produce an interesting diversity of recommended the existent items [ 50]. The success of collaborative content filtering recommendations depends on the availability of a critical mass of users. They are based on the 2.3. Hybrid recommender systems verla in ratings across users and have difficulties when the space of ratings is sparse, i.e., few users Hybrid recommender systems 3151[17[251301 38][4555] combi have rated the same items. There might be many filtering techniques under a single framework, mitigating ems that have been rated by only a few people and inherent limitations of either paradigm. Thus, hybrid these items would be recommended very rarely, even recommendations are generated taking into account both if those few users gave high ratings to them Moreover, if the set of items changes too rapidly old descriptive features and ratings ratings will be of little value to new users who will Numerous ways for combining content-based and ot be able to have their ratings compared to those of collaborative filtering information are conceivable [1[10] the existing users Among them, the most widely adopted is the so-called collaborative via content" paradigm [451, where content- Cold-start: new user problem. Collaborative based profiles are built to detect similarities among users ltering strategies learn the users' preferences only Based on the taxonomy of hybridization methods given from the ratings they have given. When a new user [10], hybrid recommender systems can be classified utilises the system no personal ratings are available follows for her, and no proper recommendations can be made Because recommendations follow from a comparison Weighted hybrid recommenders. These systems between the target user and other users, based solel suggest items with scores that are obtained from the on the accumulation of ratings, if few ratings are results of all their individual recommendation techniques. Those results are usually merged by linear combination or vote consensus schemes. The benefit
Figure 4 Item-based collaborative filtering recommendations. The items which have been most similarly evaluated are recommended to the active user Pure collaborative filtering recommender systems confront some of the weaknesses existing in content-based approaches. Since collaborative strategies make use of other users’ recommendations (ratings), they can deal with any kind of content and recommend any item, even the ones that are dissimilar to those seen in the past. However, collaborative techniques suffer from their own limitations [1][5][10], as described next. • Sparse rating problem. In collaborative filtering systems, the number of ratings already obtained is usually very small compared to the number of ratings needed to be predicted. In practice, many commercial systems, such as Amazon.com which recommends books or CDNow.com which recommends music albums, are used to evaluate very large datasets where even active users may have rated well under 1% of the existent items [50]. The success of collaborative filtering recommendations depends on the availability of a critical mass of users. They are based on the overlap in ratings across users and have difficulties when the space of ratings is sparse, i.e., few users have rated the same items. There might be many items that have been rated by only a few people and these items would be recommended very rarely, even if those few users gave high ratings to them. Moreover, if the set of items changes too rapidly, old ratings will be of little value to new users who will not be able to have their ratings compared to those of the existing users. • Cold-start: new user problem. Collaborative filtering strategies learn the users’ preferences only from the ratings they have given. When a new user utilises the system no personal ratings are available for her, and no proper recommendations can be made. Because recommendations follow from a comparison between the target user and other users, based solely on the accumulation of ratings, if few ratings are available it might become very difficult to categorise the user’s interests. • Cold-start: new item problem. Collaborative filtering recommender systems only rely on users’ preferences to make recommendations, and do not make use of content information of the existing items. Until a new item is rated by a substantial number of users, the recommender system would not be able to recommend it. A recent item that has not obtained many ratings cannot be easily recommended. This problem shows up in domains such as news articles where there is a constant stream of new items and each user only rates a few [52]. • Gray sheep problem. For the user whose tastes are unusual compared to the rest of the population, there will not be any other users who are particularly similar, leading to poor recommendations. Collaborative recommenders work best for a user who fits into a cluster with many neighbours of similar tastes. However, the techniques do not work well for the so-called “Gray sheep”, those people who fall on a border between two cliques of users. This is also a problem for demographic systems that attempt to categorise users according to personal characteristics. • Portfolio effect: non diversity problem. Since collaborative filtering systems’ knowledge about content is purely derived from user choices, recommendations are strongly based toward what has been chosen (or recommended) in the past, resulting in frequent recommendations of just the most popular items. This could make collaborative filtering a poor discovery tool for the end user, often failing to produce an interesting diversity of recommended content. 2.3. Hybrid recommender systems Hybrid recommender systems [3][5][17][25][30] [38][45][55] combine content-based and collaborative filtering techniques under a single framework, mitigating inherent limitations of either paradigm. Thus, hybrid recommendations are generated taking into account both descriptive features and ratings. Numerous ways for combining content-based and collaborative filtering information are conceivable [1][10]. Among them, the most widely adopted is the so-called “collaborative via content” paradigm [45], where contentbased profiles are built to detect similarities among users. Based on the taxonomy of hybridization methods given in [10], hybrid recommender systems can be classified as follows: • Weighted hybrid recommenders. These systems suggest items with scores that are obtained from the results of all their individual recommendation techniques. Those results are usually merged by linear combination or vote consensus schemes. The benefit