Semantic Reasoning: A Path to New Possibilities of Personalization* Yolanda Blanco-Fernandez, Jose J. Pazos-Arias, Alberto Gil-Solla, Manuel Ramos-Cabrer, and Martin Lopez-Nores Department of Telematics Engineering, University of vigo, 36310 yolanda, jose, agil, ramos, minores )@det. uvigo Abstract. Recommender systems face up to current information overload by se- lecting automatically items that match the personal preferences of each user. The So-called content-based recommenders suggest items similar to those the user liked in the past, by resorting to syntactic matching mechanisms. The rigid na- ure of such mechanisms leads to recommend only items that bear a strong re- emblance to those the user already knows. In this paper, we propose a novel ontent-based strategy that diversifies the offered recommendations by employ ing reasoning mechanisms borrowed from the Semantic Web. These mechanisms discover extra knowledge about the user's preferences, thus favoring more accu be used in a wide variety of personalization applications and services, in diverse domains and recommender systems. The proposed reasoning-based strategy has been empirically evaluated with a set of real users. The obtained results evidence computational feasibility and significant increases in recommendation accuracy w.r.t. existing approaches where our reasoning capabilities are disregarded 1 Introduction Recommender systems provide personalized advice to users about items or services they might be interested in. Currently, these tools are gaining momentum in the Digital Revolution, helping people efficiently manage content overload and reducing complex ity when searching for relevant information To fulfill these personalization needs, three main components are required in a rec ommender system: (i)a database where the available items are stored, (ii) personal profiles where the users' preferences are modeled, and(iii)recommendation strategies aimed at selecting personalized suggestions for each individual. The first such strat- egy was the so-called content-based filtering, which suggests to a user items similar to those he/she liked in the past. In spite of its accuracy, this technique is limited due to the employed similarity metrics. These metrics are based on rigid syntactic approaches Work funded by the Ministerio de educacion y Ciencia( Gobierno de Espana)research project TSI2007-61599, by the Conselleria de Educacion e Ordenacion Universitaria(Xunta e Galicia) incentives file 2007/000016-0, and by the programa de promocion Xeral da Investigacion de la Conselleria de Innovacion, Industria e Comercio (Xunta de galicia) PGIDITOSPXIC32204PN S. Bechhofer et al. (Eds ) ESWC 2008, LNCS 5021. Pp. 720-735, 2008. C Springer-Verlag Berlin Heidelberg 2008
Semantic Reasoning: A Path to New Possibilities of Personalization Yolanda Blanco-Fernández, José J. Pazos-Arias, Alberto Gil-Solla, Manuel Ramos-Cabrer, and Martín López-Nores Department of Telematics Engineering, University of Vigo, 36310, Spain {yolanda,jose,agil,mramos,mlnores}@det.uvigo.es Abstract. Recommender systems face up to current information overload by selecting automatically items that match the personal preferences of each user. The so-called content-based recommenders suggest items similar to those the user liked in the past, by resorting to syntactic matching mechanisms. The rigid nature of such mechanisms leads to recommend only items that bear a strong resemblance to those the user already knows. In this paper, we propose a novel content-based strategy that diversifies the offered recommendations by employing reasoning mechanisms borrowed from the Semantic Web. These mechanisms discover extra knowledge about the user’s preferences, thus favoring more accurate and flexible personalization processes. Our approach is generic enough to be used in a wide variety of personalization applications and services, in diverse domains and recommender systems. The proposed reasoning-based strategy has been empirically evaluated with a set of real users. The obtained results evidence computational feasibility and significant increases in recommendation accuracy w.r.t. existing approaches where our reasoning capabilities are disregarded. 1 Introduction Recommender systems provide personalized advice to users about items or services they might be interested in. Currently, these tools are gaining momentum in the Digital Revolution, helping people efficiently manage content overload and reducing complexity when searching for relevant information. To fulfill these personalization needs, three main components are required in a recommender system: (i) a database where the available items are stored, (ii) personal profiles where the users’ preferences are modeled, and (iii) recommendation strategies aimed at selecting personalized suggestions for each individual. The first such strategy was the so-called content-based filtering, which suggests to a user items similar to those he/she liked in the past. In spite of its accuracy, this technique is limited due to the employed similarity metrics. These metrics are based on rigid syntactic approaches Work funded by the Ministerio de Educación y Ciencia (Gobierno de España) research project TSI2007-61599, by the Consellería de Educación e Ordenación Universitaria (Xunta de Galicia) incentives file 2007/000016-0, and by the Programa de Promoción Xeral da Investigación de la Consellería de Innovación, Industria e Comercio (Xunta de Galicia) PGIDIT05PXIC32204PN. S. Bechhofer et al.(Eds.): ESWC 2008, LNCS 5021, pp. 720–735, 2008. c Springer-Verlag Berlin Heidelberg 2008
Semantic Reasoning: A Path to New Possibilities of personalization that only detect similarity between items that share all or some of their attributes [1] Consequently, traditional content-based approaches lead to overspecialized suggestions ncluding only items that bear a strong resemblance to those the user already knows (i.e items with attributes defined in his/her profile). To fight overspecialization, researchers devised a new strategy named collaborative filtering, based on offering to each user items that were appealing to others with similar preferences(named neighbors). Collaborative filtering reduces the effects of overs cialization by considering other users' interests, but it also causes new limitations, such as scalability problems, difficulties to select each user's neighborhood when the avail- able preferences are sparse(commonly named sparsity problem), and privacy concerns related to the confidentiality of the users' personal data(see [1] for details). Bearing in mind the severe drawbacks of the collaborative solutions, we propo novel content-based strategy that exploits the main strengths of this personaliza- tion paradigm and overcomes the overspecialized nature of its recommendations. For that purpose, our strategy diversifies the offered suggestions without resorting to other users' preferences, thus protecting their privacy. Specifically, we fight syntactic limita tions of the existing content-based approaches by employing two reasoning techniques borrowed from the Semantic Web field: the so-called semantic associations [3] and Spreading Activation techniques(henceforth, SA techniques)[7]. Instead of using the traditional syntactic similarity metrics, these associations trace semantic bonds between the user's preferences and the items available in the recommender system, which are previously formalized in a domain ontology along with their semantic annotations. Next, SA techniques efficiently explore these semantic relationships and discover new knowledge related to the users'interests. This knowledge permits our strategy to com- pare in a more flexible way the user's preferences with the available items, thus of fering more accurate recommendations. Although the adopted reasoning mechanisms have been widely used in the Semantic Web [3, 14, 15], their internals must be adapted to fulfill personalization requirements of a recommender system. So, these mechanisms must allow to: (i) learn automatically new knowledge about the users' preferences from their feedback, and (ii) adapt dynamically the strategy as these preferences evolve In spite of the generality of our reasoning-based approach, in this paper we have adopted a specific context with the goal of describing in detail its use in a domain where the information overload is noticeable. Specifically, we have exploited the rea- soning capabilities of our content-based strategy in order to enhance the recommenda tions offered to viewers of the Interactive Digital TV(IDTV). Today, TV viewers are exposed to overwhelming amounts of information, and challenged by the plethora of nteractive functionality provided by the current digital receivers. As there are hundreds of channels with an abundance of programs available, it is likely that appealing TV programs go unnoticed. To assist these viewers, it is possible to take advantage of the personalization capabilities provided by a TV recommender system, which sifts through the myriad of programs available in the digital stream and selects those that match the viewers' preferences by using our reasoning-based strategy This paper is organized as follows: Sect. 2 describes the two key elements in our reasoning framework: (i)the ontology where the domain knowledge is formalized, in cluding the available TV programs and their semantic descriptions, and (ii) the user
Semantic Reasoning: A Path to New Possibilities of Personalization 721 that only detect similarity between items that share all or some of their attributes [1]. Consequently, traditional content-based approaches lead to overspecialized suggestions including only items that bear a strong resemblance to those the user already knows (i.e. items with attributes defined in his/her profile). To fight overspecialization, researchers devised a new strategy named collaborative filtering, based on offering to each user items that were appealing to others with similar preferences (named neighbors). Collaborative filtering reduces the effects of overspecialization by considering other users’ interests, but it also causes new limitations, such as scalability problems, difficulties to select each user’s neighborhood when the available preferences are sparse (commonly named sparsity problem), and privacy concerns related to the confidentiality of the users’ personal data (see [1] for details). Bearing in mind the severe drawbacks of the collaborative solutions, we propose a novel content-based strategy that exploits the main strengths of this personalization paradigm and overcomes the overspecialized nature of its recommendations. For that purpose, our strategy diversifies the offered suggestions without resorting to other users’ preferences, thus protecting their privacy. Specifically, we fight syntactic limitations of the existing content-based approaches by employing two reasoning techniques borrowed from the Semantic Web field: the so-called semantic associations [3] and Spreading Activation techniques (henceforth, SA techniques) [7]. Instead of using the traditional syntactic similarity metrics, these associations trace semantic bonds between the user’s preferences and the items available in the recommender system, which are previously formalized in a domain ontology along with their semantic annotations. Next, SA techniques efficiently explore these semantic relationships and discover new knowledge related to the users’ interests. This knowledge permits our strategy to compare in a more flexible way the user’s preferences with the available items, thus offering more accurate recommendations. Although the adopted reasoning mechanisms have been widely used in the Semantic Web [3,14,15], their internals must be adapted to fulfill personalization requirements of a recommender system. So, these mechanisms must allow to: (i) learn automatically new knowledge about the users’ preferences from their feedback, and (ii) adapt dynamically the strategy as these preferences evolve. In spite of the generality of our reasoning-based approach, in this paper we have adopted a specific context with the goal of describing in detail its use in a domain where the information overload is noticeable. Specifically, we have exploited the reasoning capabilities of our content-based strategy in order to enhance the recommendations offered to viewers of the Interactive Digital TV (IDTV). Today, TV viewers are exposed to overwhelming amounts of information, and challenged by the plethora of interactive functionality provided by the current digital receivers. As there are hundreds of channels with an abundance of programs available, it is likely that appealing TV programs go unnoticed. To assist these viewers, it is possible to take advantage of the personalization capabilities provided by a TV recommender system, which sifts through the myriad of programs available in the digital stream and selects those that match the viewers’ preferences by using our reasoning-based strategy. This paper is organized as follows: Sect. 2 describes the two key elements in our reasoning framework: (i) the ontology where the domain knowledge is formalized, including the available TV programs and their semantic descriptions, and (ii) the user
722 Y. Blanco-Fernandez et al modeling approach employed to create the users' profiles. Next, Sect. 3 describes how the semantic associations and Sa techniques are exploited in our content-based strategy. Then, a sample example where a set of TV programs are suggested to a given viewer is presented in Sect. 4. The tests carried out to validate our reasoning-based approach are explained in detail in Sect. 5. Finally, Sect. 6 draws some conclusions and points out possible lines of further work 2 Domain Ontology and User Modeling 2.1 The Domain Ontology Two elements are needed to formalize the idTV domain by an ontology: (1)the seman tic descriptions of the TV programs that can be suggested, and (ii)a language expres- sive enough to represent the concepts (i.e. classes and their instances)and relationships (i.e. hierarchical links and properties)identified in the domain. In our approach, the se- mantic descriptions have been extracted from TV-Anytime metadata specifications [6] whereas the OWL ( DL) language has been selected due to its expressive capability, which allows to formalize concepts and expressions not supported in RDF and RDFS Starting from TV-Anytime metadata, we have defined and included in our OWL on- tology several hierarchies of classes and properties, as well as specific instances of them, as shown in the TV ontology depicted in Fig. 1. The considered TV programs (identified by unique IDs) have been automatically extracted from the Internet Movie Data Base (IMDB)and the BBC web server, and are represented as specific instances belonging to a hierarchy of genres organized in several levels (e.g. fiction, leisure, romance, etc. ) as shown at top of Fig. 1. The main attributes of these programs(e.g. involved credits topics and places, intended audience, intention, etc ) are also instances related to them by labeled properties. These attributes also belong to hierarchically organized classes As some of these classes are already defined in existing conceptualizations, we have imported ontologies about different domains such as sports, geographical information credits involved in TV programs(e.g. actors), among others. 2.2 Our User Modeling Approach Our approach models the user's profiles by reusing the knowledge available in the do- main ontology, that is why we named them ontology-profiles. Specifically, we propose a semantic model for each user that gives information about: (i) the tV programs that were appealing or uninteresting for him/her(named positive and negative preferences, respectively), (ii) their main attributes, and (iii) the genres under which these programs are classified in the TV ontology(see at the top of Fig. 1). This user modeling approach provides a formal representation of the users'preferences, permitting to reason about hem and discover additional knowledge about their interests. Such knowledge permit Seehttp://www.imdb.comandhttp://backstage.bbcco.uk/data/7daylistingDatafordetail 2TheseontologieswereextractedfromtheDamlrepositorylocatedinwww.damlorg/ ontologies and converted to the OWL language by means of a tool developed by the MindswApResearchGroup(seehttp://www.mindswap.org/2002/owl.shtmlfordetails
722 Y. Blanco-Fernández et al. modeling approach employed to create the users’ profiles. Next, Sect. 3 describes how the semantic associations and SA techniques are exploited in our content-based strategy. Then, a sample example where a set of TV programs are suggested to a given viewer is presented in Sect. 4. The tests carried out to validate our reasoning-based approach are explained in detail in Sect. 5. Finally, Sect. 6 draws some conclusions and points out possible lines of further work. 2 Domain Ontology and User Modeling 2.1 The Domain Ontology Two elements are needed to formalize the IDTV domain by an ontology: (i) the semantic descriptions of the TV programs that can be suggested, and (ii) a language expressive enough to represent the concepts (i.e. classes and their instances) and relationships (i.e. hierarchical links and properties) identified in the domain. In our approach, the semantic descriptions have been extracted from TV-Anytime metadata specifications [6], whereas the OWL (DL) language has been selected due to its expressive capability, which allows to formalize concepts and expressions not supported in RDF and RDFS. Starting from TV-Anytime metadata, we have defined and included in our OWL ontology several hierarchies of classes and properties, as well as specific instances of them, as shown in the TV ontology depicted in Fig. 1. The considered TV programs (identified by unique IDs) have been automatically extracted from the Internet Movie DataBase (IMDB) and the BBC web server1, and are represented as specific instances belonging to a hierarchy of genres organized in several levels (e.g. fiction, leisure, romance, etc.), as shown at top of Fig. 1. The main attributes of these programs (e.g. involved credits, topics and places, intended audience, intention, etc.) are also instances related to them by labeled properties. These attributes also belong to hierarchically organized classes. As some of these classes are already defined in existing conceptualizations, we have imported ontologies about different domains such as sports, geographical information, credits involved in TV programs (e.g. actors), among others2. 2.2 Our User Modeling Approach Our approach models the user’s profiles by reusing the knowledge available in the domain ontology, that is why we named them ontology-profiles. Specifically, we propose a semantic model for each user that gives information about: (i) the TV programs that were appealing or uninteresting for him/her (named positive and negative preferences, respectively), (ii) their main attributes, and (iii) the genres under which these programs are classified in the TV ontology (see at the top of Fig. 1). This user modeling approach provides a formal representation of the users’ preferences, permitting to reason about them and discover additional knowledge about their interests. Such knowledge permits 1 See http://www.imdb.com and http://backstage.bbc.co.uk/data/7DayListingData for details. 2 These ontologies were extracted from the DAML repository located in www.daml.org/ ontologies and converted to the OWL language by means of a tool developed by the MINDSWAP Research Group (see http://www.mindswap.org/2002/owl.shtml for details)
Semantic Reasoning: A Path to New Possibilities of Personalization 72 Fig 1. Excerpt from classes, properties and instances in our TV ontology to compare, in a more effective way, the users' preferences with the available items, thus leading to personalization processes more accurate than the traditional syntactic approaches [1]. In this regard, note that our ontology-profiles greatly improve other at lists-based approaches which are not well structured to favor the discovery of new owledge(see[4] for details) Fulfilling the goals of our personalization strategy requires identifying the interest of the user in both TV programs defined in his/her profile and their attributes and genres. Specifically, these Degrees Of Interest(named DOl indexes and belonging to [-1, 1))can be explicitly stated by the user or automatically inferred from his/her viewing behav ior(e.g. programs accepted or rejected after recommendations, viewing time for each suggested program, etc. ) Once the DOI indexes of each program in the user's profile have been established, we compute the indexes corresponding to their attributes and to
Semantic Reasoning: A Path to New Possibilities of Personalization 723 Tom Cruise Born on 4th of July Nicole Kidman Cameron Crowe ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID10 ID11 ID9 ID8 Clint Eastwood Morgan Freeman Kyoto Tokyo Tokyo Kyoto World War I World War I Vietnam War Vietnam War Stanley Kubrick Eyes Wide Shut Eyes Wide Shut Learn about WW I Learn about WW I Welcome to Tokyo Welcome to Tokyo Japanese cities War Topics Vanilla Sky Vanilla Sky Million Dollar Baby Danny the Dog Danny the Dog TV Contents Fiction Contents History Cookery Pets Tourism Drama Romance Mistery Action Non Fiction Contents Leisure Contents Martial Arts Game of Death Bruce Lee Karate Kung Fu Kung Fu Karate Jerry Maguire Jerry Maguire The Last Samurai HasActor HasActor HasActress HasActor HasDirector rdf:id rdf:id rdf:id rdf:id rdf:id rdf:id rdf:id rdf:id rdf:id rdf:id rdf:id HasDirector HasDirector HasDirector ActorIn DirectorIn HasActor ActorIn ActorIn ActorIn HasTopic HasTopic TopicIn HasTopic HasTopic HasPlace HasPlace rdf:typeOf rdf:SubClassOf owl:ObjectProperty The Last Samurai Paths of Glory Born on 4th of July Game of Death Million Dollar Baby Paths of Glory Fig. 1. Excerpt from classes, properties and instances in our TV ontology to compare, in a more effective way, the users’ preferences with the available items, thus leading to personalization processes more accurate than the traditional syntactic approaches [1]. In this regard, note that our ontology-profiles greatly improve other flat lists-based approaches which are not well structured to favor the discovery of new knowledge (see [4] for details). Fulfilling the goals of our personalization strategy requires identifying the interest of the user in both TV programs defined in his/her profile and their attributes and genres. Specifically, these Degrees Of Interest (named DOI indexes and belonging to [-1,1]) can be explicitly stated by the user or automatically inferred from his/her viewing behavior (e.g. programs accepted or rejected after recommendations, viewing time for each suggested program, etc.). Once the DOI indexes of each program in the user’s profile have been established, we compute the indexes corresponding to their attributes and to
724 Y. Blanco-Fernandez et al the genres under which these programs are classified in the ontology. This computation mechanism -omitted here due to space limitations-is explained in detail in [5] Although other ontology-based proposals have been devised in literature, our user modeling approach differs to a great extent from these existing works. As an exam- ple, note the Quickstep system proposed by Middleton in [10], which suggests research papers according to the users'interests. The main difference between our work and Quickstep is related to the knowledge used for modeling purposes. In fact, Quickstep uses a simple taxonomy of research categories for representing the papers each user appreciates, whereas our proposal exploits the whole knowledge formalized in the on- tology, permitting to carry out reasoning processes that discover extra information about users'preferences. The same limitation can be identified in the system proposed in [16] which recommends books according to the user preferences. There, the knowledge dis covery is based on analyzing just only hierarchical relationships, thus hampering more 3 Our Reasoning-Based Strategy Our personalization strategy suggests programs that are semantically associated with the contents the viewer has liked in the past, improving the syntactic similarity metrics adopted in the traditional content-based methods. Specifically, our strategy consists of two stages-named filtering phase and recommendation phase, respectively-, which are sketched next and fully described in Sect. 3.1 and Sect. 3.2. Filtering phase. This stage selects in the OWL ontology instances of classes and properties that are relevant for the user, by considering his/her personal preferences Next, our reasoning-based approach infers semantic associations among the lected entities identifying specific TV programs. These hidden associations -which we borrow from [3]-are discovered from the hierarchical links and properties de- fined in the domain ontology Recommendation phase. The inferred knowledge is processed in the second phase by employing SA techniques. This intelligent mechanism works as concept ex plorer, as it detects concepts that are closely related to the user's preferences by exploring the entities and semantic associations inferred during the filtering phase 3.1 Filtering Phase Firstly, our strategy locates in the domain ontology the programs that were(un )appealing to the user( defined in his/her profile). Next, it traverses successively the properties bound to these programs until reaching new class instances(nodes referred to programs, actors, topics.)in the ontology. To guarantee the computational feasibility, we have developed a controlled inference mechanism that works as follows. as new nodes are reached from a given instance, our approach firstly quantifies their relevance for the user. Then, the nodes whose relevance indexes are lower than a specific threshold are ignored, in a such The value of this threshold depends on both the domain ontology and the recommender system that adopts our content-based strategy. In our tests in DTV field, we have used values around
724 Y. Blanco-Fernández et al. the genres under which these programs are classified in the ontology. This computation mechanism -omitted here due to space limitations- is explained in detail in [5]. Although other ontology-based proposals have been devised in literature, our user modeling approach differs to a great extent from these existing works. As an example, note the Quickstep system proposed by Middleton in [10], which suggests research papers according to the users’ interests. The main difference between our work and Quickstep is related to the knowledge used for modeling purposes. In fact, Quickstep uses a simple taxonomy of research categories for representing the papers each user appreciates, whereas our proposal exploits the whole knowledge formalized in the ontology, permitting to carry out reasoning processes that discover extra information about users’ preferences. The same limitation can be identified in the system proposed in [16], which recommends books according to the user preferences. There, the knowledge discovery is based on analyzing just only hierarchical relationships, thus hampering more complex inference processes as those pursued in our work. 3 Our Reasoning-Based Strategy Our personalization strategy suggests programs that are semantically associated with the contents the viewer has liked in the past, improving the syntactic similarity metrics adopted in the traditional content-based methods. Specifically, our strategy consists of two stages –named filtering phase and recommendation phase, respectively–, which are sketched next and fully described in Sect. 3.1 and Sect. 3.2. – Filtering phase. This stage selects in the OWL ontology instances of classes and properties that are relevant for the user, by considering his/her personal preferences. Next, our reasoning-based approach infers semantic associations among the selected entities identifying specific TV programs. These hidden associations –which we borrow from [3]– are discovered from the hierarchical links and properties de- fined in the domain ontology. – Recommendation phase. The inferred knowledge is processed in the second phase by employing SA techniques. This intelligent mechanism works as concept explorer, as it detects concepts that are closely related to the user’s preferences by exploring the entities and semantic associations inferred during the filtering phase. 3.1 Filtering Phase Firstly, our strategy locates in the domain ontology the programs that were (un)appealing to the user (defined in his/her profile). Next, it traverses successively the properties bound to these programs until reaching new class instances (nodes referred to programs, actors, topics...) in the ontology. To guarantee the computational feasibility, we have developed a controlled inference mechanism that works as follows. As new nodes are reached from a given instance, our approach firstly quantifies their relevance for the user. Then, the nodes whose relevance indexes are lower than a specific threshold3 are ignored, in a such 3 The value of this threshold depends on both the domain ontology and the recommender system that adopts our content-based strategy. In our tests in DTV field, we have used values around 0.65