vailableonlineatwww.sciencedirect.com ScienceDirect Knowledge-Based SYSTEMS ELSEVIER Knowledge-Based Systems 21(2008)305-320 www.elsevier.com/locate/knosys a flexible semantic inference methodology to reason about user preferences in knowledge-based recommender systems Yolanda blanco- fernandez " Jose j. pazos-Arias Alberto Gil-Solla Manuel Ramos-Cabrer Martin LOpez-Nores, Jorge Garcia-Duque a Ana Fernandez-Vilas a, Rebeca P. Diaz-Redondo a, Jesus Bermejo-Munoz b ETSE Telecommicacion, Campus Universitario, Vigo 36310, Spain b Telent, Ronda de Tamarillo, 29, Sevilla 41006, spain Received 22 December 2006: received in revised form 18 June 2007: accepted 28 July 2007 Available online 9 August 2007 Abstract Recommender systems arose with the goal of helping users search in overloaded information domains(like e-commerce, e-learning or Digital TV). These tools automatically select items(commercial products, educational courses, TV programs, etc. )that may be appealing to each user taking into account his/her personal preferences. The personalization strategies used to compare these preferences with the available items suffer from well-known deficiencies that reduce the quality of the recommendations. Most of the limitations arise from using syntactic matching techniques because they miss a lot of useful knowledge during the recommendation process. In this paper, we propose a personalization strategy that overcomes these drawbacks by applying inference techniques borrowed from the Semantic Web Our approach reasons about the semantics of items and user preferences to discover complex associations between them. These semantic associations provide additional knowledge about the user preferences, and permit the recommender system to compare them with the available items in a more effective way. The proposed strategy is flexible enough to be applied in many recommender systems, regardless of their application domain. Here, we illustrate its use in AVATAR, a tool that selects appealing audiovisual programs from among the myriad available in Digital TV. C 2007 Elsevier B.v. All rights reserved Keywords: Recommender systems; Semantic Web: Ontologies; Semantic reasoning: Content-based filtering 1. Introduction overload is hard to digest for users. who have difficulties to find really relevant items(e.g. commercial products, educa n recent years, we have witnessed an exponential tional courses, TV programs). Recommender systems arose growth in the amount of information available in diverse in the middle 1990s to provide assistance in these searching domains(like e-commerce, e-learning or Digital TV). This tasks; to this aim, they automatically select those items that may be appealing to each user considering the prefe erences defined in his/her personal profile Work supported by the Spanish Ministry of Education and Science Projects TSl2004-03677 and TSl2007-61599, and by the Xunta de galicia The first recommendation strategies were the so-called project PGIDITO5PXI32204PN content-based methods[2, 44]. This technique suggests items g author.Tel:+34986813967;fax:+34986812100 similar to the ones that each user liked in the past. The sim- E-mail addresses: yolanda@det vigo. es (Y. Blanco-Fernandez), ilarity metrics employed limit the quality of the offered rec- se@det vigo.es (J.J. Pazos-Arias), agil@det. uvigo. HiI-Soll ommendations, because they are based on more or less os(adet. vigo. es (M. Ramos sophisticated matching techniques that miss a lot of knowl (A. Fernandez. vilas), rebecaadet uvigo es (R. P. Diaz-Redondo ), jesus. edge during the personalization process. In fact, due to bermejo@telventabengoa.com(J.Bermejo-Munoz). their syntactic nature, the existing metrics only detect 0950-7051/S-see front matter 2007 Elsevier B v. All rights reserved 6 knosys.2007.07.004
A flexible semantic inference methodology to reason about user preferences in knowledge-based recommender systems q Yolanda Blanco-Ferna´ndez a,*, Jose´ J. Pazos-Arias a , Alberto Gil-Solla a , Manuel Ramos-Cabrer a , Martı´n Lo´pez-Nores a , Jorge Garcı´a-Duque a , Ana Ferna´ndez-Vilas a , Rebeca P. Dı´az-Redondo a , Jesu´s Bermejo-Mun˜oz b a ETSE Telecomunicacio´n, Campus Universitario,Vigo 36310, Spain b Telvent, Ronda de Tamargillo, 29, Sevilla 41006, Spain Received 22 December 2006; received in revised form 18 June 2007; accepted 28 July 2007 Available online 9 August 2007 Abstract Recommender systems arose with the goal of helping users search in overloaded information domains (like e-commerce, e-learning or Digital TV). These tools automatically select items (commercial products, educational courses, TV programs, etc.) that may be appealing to each user taking into account his/her personal preferences. The personalization strategies used to compare these preferences with the available items suffer from well-known deficiencies that reduce the quality of the recommendations. Most of the limitations arise from using syntactic matching techniques because they miss a lot of useful knowledge during the recommendation process. In this paper, we propose a personalization strategy that overcomes these drawbacks by applying inference techniques borrowed from the Semantic Web. Our approach reasons about the semantics of items and user preferences to discover complex associations between them. These semantic associations provide additional knowledge about the user preferences, and permit the recommender system to compare them with the available items in a more effective way. The proposed strategy is flexible enough to be applied in many recommender systems, regardless of their application domain. Here, we illustrate its use in AVATAR, a tool that selects appealing audiovisual programs from among the myriad available in Digital TV. 2007 Elsevier B.V. All rights reserved. Keywords: Recommender systems; Semantic Web; Ontologies; Semantic reasoning; Content-based filtering 1. Introduction In recent years, we have witnessed an exponential growth in the amount of information available in diverse domains (like e-commerce, e-learning or Digital TV). This overload is hard to digest for users, who have difficulties to find really relevant items (e.g. commercial products, educational courses, TV programs). Recommender systems arose in the middle 1990s to provide assistance in these searching tasks; to this aim, they automatically select those items that may be appealing to each user considering the preferences defined in his/her personal profile. The first recommendation strategies were the so-called content-based methods [2,44]. This technique suggests items similar to the ones that each user liked in the past. The similarity metrics employed limit the quality of the offered recommendations, because they are based on more or less sophisticated matching techniques that miss a lot of knowledge during the personalization process. In fact, due to their syntactic nature, the existing metrics only detect 0950-7051/$ - see front matter 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.knosys.2007.07.004 q Work supported by the Spanish Ministry of Education and Science Projects TSI2004-03677 and TSI2007-61599, and by the Xunta de Galicia project PGIDIT05PXI32204PN. * Corresponding author. Tel.: +34 986813967; fax: +34 986812100. E-mail addresses: yolanda@det.uvigo.es (Y. Blanco-Ferna´ndez), jose@det.uvigo.es (J.J. Pazos-Arias), agil@det.uvigo.es (A. Gil-Solla), mramos@det.uvigo.es (M. Ramos-Cabrer), mlnores@det.uvigo.es (M. Lo´pez-Nores), jgd@det.uvigo.es (J. Garcı´a-Duque), avilas@det.uvigo.es (A. Ferna´ndez-Vilas), rebeca@det.uvigo.es (R.P. Dı´az-Redondo), jesus. bermejo@telvent.abengoa.com (J. Bermejo-Mun˜oz). www.elsevier.com/locate/knosys Available online at www.sciencedirect.com Knowledge-Based Systems 21 (2008) 305–320
Y. Blanco-Fernandez et al. Knowledge- Based Systems 21(2008)305-320 similarity between items that share the same attributes or In order to discover as much knowledge about the user features. This causes overspecialized recommendations that preferences as possible, our approach looks for relation only include items very similar to those the user already ships hidden in the system ontology. Even though such knows relationships are already defined in the Semantic Web com- Instead of fighting the overspecialization of content- munity, they have never been included in a pe pased methods, researchers proposed new personalization environment like a recommender system. For that reason, strategies, such as collaborative filtering [21, 19, 37] and we develop a new inference methodology to take advantage hybrid approaches mixing both techniques [4, 9, 40]. To of the benefits of semantic reasoning in our recommenda diversify the offered recommendations, collaborative rec- tion strategy. The main features of this methodology are ommender systems suggest to each user items that were the following ones appealing to other users with similar tastes. Regardless of its success in many application domains, collaborative fil-. Firstly, it explores extensively the knowledge base and tering has two serious drawbacks. On the one hand, it discovers the hidden relationships from the entities(clas requires knowing many user profiles in order to elaborate ses and their instances), properties and hierarchical links ccurate recommendations for a given user. On the other explicitly formalized in it. Such relationships provide the s applicability and quality are limited by the so-called knowledge missed by the current approaches and permit sparsity problem, which occurs when the available data to compare the user preferences with the available items are insufficient for identifying similar users[9]. in a more effective way, beyond the traditional syntactic In order to develop an approach that provides high similarity metrics quality recommendations(even when data are sparse), this Secondly, our reasoning methodology ensures that the paper rescues the content-based methods by developing inferred relationships are relevant for the user, and that effective mechanisms against overspecialization. Specifi- they adapt in a flexible way as his/her preferences evolve cally, our approach overcomes this problem due to the over time. Thereby, our content-based strategy guaran- syntactic nature of the existing similarity metrics- by bor tees the diversity of the recommendations without risk rowing inference techniques from the Semantic Web. This ing their personalized nature initiative is based on describing the Web resources by meta-. Finally, our methodology is not exclusively joined to a data, so that computers can understand their semantics and specific domain, as it is flexible enough to be used in infer relationships between them. Such a reasoning pro diverse personalization applications. In this paper, we cess requires formalizing the semantic descriptions of the illustrate its use in AVATAR, a tool that selects appeal resources in a commonly agreed and reusable way; for that ing audiovisual programs from among the myriad avail purpose, the Semantic Web community resorts to ontolo able in Digital TV. gies, i.e. conceptualizations that identify typical concepts and relationships in a specific application domain. Con This paper is organized as follows: Section 2 reviews the cepts are identified by means of classes and relationships most well-known matching techniques used in traditional are represented as properties(both hierarchically orga- content-based approaches, as well as the proposals defined ties are included in this formal knowledge base. 3 proper- nized), and besides, specific instances of classes and in the Semantic Web for querying ontologies and inferring knowledge from them. Section 3 describes the two key ele Bearing in mind the results achieved in the Semantic ments of our semantic reasoning methodology, applied in Web, our proposal includes reasoning about the semantic the AVATAR system: the ontology about the tv domain descriptions of the items available in the recommender sys- and the user profiles that store their personal preferences tem(formalized in a domain ontology), and inferring The semantic relationships adopted in our approach and implicit semantic relationships between them. Such rela- the inference mechanism from the system knowledge base tionships diversify the recommendations because they are detailed in Sections 4 and 5, respectively. Section 6 allow establishing correspondences between the user pref- shows an example of the reasoning-based recommenda erences and other items appealing to him/her that do not tions elaborated by AVATAR. Section 7 compares our necessarily share the same features. In other words, our proposal with other similar approaches defined in the liter content-based strategy suggests items semantically related ature. Finally, Section 8 draws some conclusions and to those the user liked in the past, not just items similar describes our future work. to his/her preferences 2. Background I As the amount of products available in the collaborative recommender 2.1. Conventional matching techniques system increases, it is less likely that two users rate the same products in their profiles. Since there is no overlapping among their profiles, it is hard to find users with similar tastes As we mentioned in the previous section, content-based 2 We take reasoning as a synonym for inference recommender systems resort to matching techniques to The terms ontology and knowledge base are used without distinction. compare the user preferences to the available items. The
similarity between items that share the same attributes or features. This causes overspecialized recommendations that only include items very similar to those the user already knows. Instead of fighting the overspecialization of contentbased methods, researchers proposed new personalization strategies, such as collaborative filtering [21,19,37] and hybrid approaches mixing both techniques [4,9,40]. To diversify the offered recommendations, collaborative recommender systems suggest to each user items that were appealing to other users with similar tastes. Regardless of its success in many application domains, collaborative filtering has two serious drawbacks. On the one hand, it requires knowing many user profiles in order to elaborate accurate recommendations for a given user. On the other, its applicability and quality are limited by the so-called sparsity problem, which occurs when the available data are insufficient for identifying similar users1 [9]. In order to develop an approach that provides highquality recommendations (even when data are sparse), this paper rescues the content-based methods by developing effective mechanisms against overspecialization. Specifi- cally, our approach overcomes this problem – due to the syntactic nature of the existing similarity metrics – by borrowing inference techniques from the Semantic Web. This initiative is based on describing the Web resources by metadata, so that computers can understand their semantics and infer relationships between them. Such a reasoning2 process requires formalizing the semantic descriptions of the resources in a commonly agreed and reusable way; for that purpose, the Semantic Web community resorts to ontologies, i.e. conceptualizations that identify typical concepts and relationships in a specific application domain. Concepts are identified by means of classes and relationships are represented as properties (both hierarchically organized), and besides, specific instances of classes and properties are included in this formal knowledge base.3 Bearing in mind the results achieved in the Semantic Web, our proposal includes reasoning about the semantic descriptions of the items available in the recommender system (formalized in a domain ontology), and inferring implicit semantic relationships between them. Such relationships diversify the recommendations because they allow establishing correspondences between the user preferences and other items appealing to him/her that do not necessarily share the same features. In other words, our content-based strategy suggests items semantically related to those the user liked in the past, not just items similar to his/her preferences. In order to discover as much knowledge about the user preferences as possible, our approach looks for relationships hidden in the system ontology. Even though such relationships are already defined in the Semantic Web community, they have never been included in a personalization environment like a recommender system. For that reason, we develop a new inference methodology to take advantage of the benefits of semantic reasoning in our recommendation strategy. The main features of this methodology are the following ones: • Firstly, it explores extensively the knowledge base and discovers the hidden relationships from the entities (classes and their instances), properties and hierarchical links explicitly formalized in it. Such relationships provide the knowledge missed by the current approaches and permit to compare the user preferences with the available items in a more effective way, beyond the traditional syntactic similarity metrics. • Secondly, our reasoning methodology ensures that the inferred relationships are relevant for the user, and that they adapt in a flexible way as his/her preferences evolve over time. Thereby, our content-based strategy guarantees the diversity of the recommendations without risking their personalized nature. • Finally, our methodology is not exclusively joined to a specific domain, as it is flexible enough to be used in diverse personalization applications. In this paper, we illustrate its use in AVATAR, a tool that selects appealing audiovisual programs from among the myriad available in Digital TV. This paper is organized as follows: Section 2 reviews the most well-known matching techniques used in traditional content-based approaches, as well as the proposals defined in the Semantic Web for querying ontologies and inferring knowledge from them. Section 3 describes the two key elements of our semantic reasoning methodology, applied in the AVATAR system: the ontology about the TV domain and the user profiles that store their personal preferences. The semantic relationships adopted in our approach and the inference mechanism from the system knowledge base are detailed in Sections 4 and 5, respectively. Section 6 shows an example of the reasoning-based recommendations elaborated by AVATAR. Section 7 compares our proposal with other similar approaches defined in the literature. Finally, Section 8 draws some conclusions and describes our future work. 2. Background 2.1. Conventional matching techniques As we mentioned in the previous section, content-based recommender systems resort to matching techniques to compare the user preferences to the available items. The 1 As the amount of products available in the collaborative recommender system increases, it is less likely that two users rate the same products in their profiles. Since there is no overlapping among their profiles, it is hard to find users with similar tastes. 2 We take reasoning as a synonym for inference. 3 The terms ontology and knowledge base are used without distinction. 306 Y. Blanco-Ferna´ndez et al. / Knowledge-Based Systems 21 (2008) 305–320
Y. Blanco-Ferndndez et al. Knowledge-Based Systems 21(2008)305-320 most widely used techniques nowadays fall into two guages have been proposed in the Semantic Web community,such as RDF(S) [8], DAML [II] OIL [14] DAML t oil [12] and Owl [22]. OwL is currently the (i) Techniques based on cosine similarity: This approach most widely used solution because supports certain fea- represents each available item and the user profile tures missing in other formats, like disjointness and Bool- as two vectors of features/attributes(e. g. in the Tv ean combinations of classes, cardinality restrictions and domain, genre, topic, actors, directors, . ) which specific characteristics of properties (transitive, unique, have a set of possible values(<drama, action,. > inverse,.) war, traveling. > <Morgan Freeman, Tom Many authors have proposed different approaches for Cruise, .. > <Clint Eastwood, Alejandro Amend- querying ontologies conforming to these normalized for bar, .. >) The similarity between both vectors is mats. The focus has been put on RDF and RDFS, since computed as the cosine of the angle between them these were the most extended formats before OWL [24, 4]. This approach only detects similarity when appeared. Despite the vast amount of existing approaches the considered item has exactly the same features (e.g. RQL [20], Squish QL [25], TRIPLE [38] RDQL [36) defined in the user profile, thus preventing any none of these languages support the inference of complex semantic reasoning process semantic relationships like those pursued in our approach (ii) Techniques based on automatic classifiers: Automatic In fact, the query paradigm offered in these formats methods, such as neural networks [71 decisio ning "Get all the instances related to A by means of the relation- [30] Bayesian networks [44], and association rules ings for two main reasons: (i) it forces the user to know in [39, 5, 24]. These are computational models that clas- detail the underlying ontology to specify correct queries, sify a given input in a specific category considering and (ii) it only considers relationships explicitly formalized a predefined training set. In the TV domain, this set in the knowledge base, thus missing others hidden in the past and their semantic attributes), the inputs we pursue have been only considered in the context of the are the features of a given TV program, and the ol research project SemDis. This project proposes an put is a category that determines if this program is approach for querying an RDF ontology and discovering appealing or unappealing for the user. For that pur- complex semantic associations between two instances A pose, the classifiers consider the occurrence patterns and B specified by the user [l]. In order to infer such asso- of the input features in the training set; thus, they ciations, the authors of [1] identify both instances in the only take into account the syntax of these attributes knowledge base and explores(successively) the chains of and disregard their meaning (i.e. their semantics) properties that start from them. Traversing these property sequences, their approach reaches new instances implicitly The syntactic nature of the aforementioned techniques related to those specified by the user. Several semantic cause the overspecialization present in the current content- associations are defined in [l] from these property based recommender systems. Our strategy overcomes that sequences and from the relationships between them limitation by considering the descriptions of the Tv pro- Our recommendation strategy adopts the associations grams and inferring semantic relationships between them. defined in Sem Dis because they favor the discovery of hid- These relationships provide the recommender system with den knowledge about the user preferences, and contribute additional knowledge about the user preferences and thus to more effective matching processes. These associations support more effective personalization processes. This and the relationships between the property sequences that knowledge allows to find TV programs appealing to the user originate them will be detailed in Section 4 Bat do not have the same attributes defined in his/her pro- Although our approach borrows the semantic associa- In order to carry out this semantic reasoning process, for discovering such associations in the RDF(S) ontology our recommender system requires capabilities for querying This is because the Sem Dis query paradigm does not con the classes, properties and instances defined in a knowledge sider the personalization requirements present in a recom ontology, and also for inferring from them the aforemen- mender system. Instead of a paradigm "Get all the semantic oned relationships. In the next section, we review some associations between A and B, our methodology requires a the existing approaches that tackle these issues, and also paradigm " Get the instances related to a and the semantic lentify the limitations that hinder their use in our reason- associations established between them. In AVATAR, A ing methodology refers to the Tv programs the user liked in the past, and the retrieved instances identify the programs finally sug 2.2. Oueries and inferences from onto 4 The query languages defined for OWL are valid for RDF(S)as well. In order to share the knowledge formalized in an onto 5 Additional information about SemDis ogy, a normalized format is required. Several ontology lan- lsdis.cs. uga. edu/Projects/SemDis
most widely used techniques nowadays fall into two categories: (i) Techniques based on cosine similarity: This approach represents each available item and the user profile as two vectors of features/attributes (e.g. in the TV domain, genre, topic, actors, directors,...), which have a set of possible values (<drama, action,...>, <war, traveling,...>, <Morgan Freeman, Tom Cruise,...>, <Clint Eastwood, Alejandro Amenabar,...>). The similarity between both vectors is computed as the cosine of the angle between them [24,4]. This approach only detects similarity when the considered item has exactly the same features defined in the user profile, thus preventing any semantic reasoning process. (ii) Techniques based on automatic classifiers: Automatic classifiers are based on diverse machine learning methods, such as neural networks [7], decision trees [30], Bayesian networks [44], and association rules [39,5,24]. These are computational models that classify a given input in a specific category considering a predefined training set. In the TV domain, this set stores the user preferences (programs he/she liked in the past and their semantic attributes), the inputs are the features of a given TV program, and the output is a category that determines if this program is appealing or unappealing for the user. For that purpose, the classifiers consider the occurrence patterns of the input features in the training set; thus, they only take into account the syntax of these attributes and disregard their meaning (i.e. their semantics). The syntactic nature of the aforementioned techniques cause the overspecialization present in the current contentbased recommender systems. Our strategy overcomes that limitation by considering the descriptions of the TV programs and inferring semantic relationships between them. These relationships provide the recommender system with additional knowledge about the user preferences and thus support more effective personalization processes. This knowledge allows to find TV programs appealing to the user that do not have the same attributes defined in his/her pro- file, but rather other features semantically related to them. In order to carry out this semantic reasoning process, our recommender system requires capabilities for querying the classes, properties and instances defined in a knowledge ontology, and also for inferring from them the aforementioned relationships. In the next section, we review some of the existing approaches that tackle these issues, and also identify the limitations that hinder their use in our reasoning methodology. 2.2. Queries and inferences from ontologies In order to share the knowledge formalized in an ontology, a normalized format is required. Several ontology languages have been proposed in the Semantic Web community, such as RDF(S) [8], DAML [11], OIL [14], DAML + OIL [12] and OWL [22]. OWL is currently the most widely used solution because supports certain features missing in other formats, like disjointness and Boolean combinations of classes, cardinality restrictions and specific characteristics of properties (transitive, unique, inverse,...). Many authors have proposed different approaches for querying ontologies conforming to these normalized formats. The focus has been put on RDF and RDFS, since these were the most extended formats before OWL appeared.4 Despite the vast amount of existing approaches (e.g. RQL [20], SquishQL [25], TRIPLE [38], RDQL [36]), none of these languages support the inference of complex semantic relationships like those pursued in our approach. In fact, the query paradigm offered in these formats is: ‘‘Get all the instances related to A by means of the relationship R’’. This paradigm is not appropriate for our reasonings for two main reasons: (i) it forces the user to know in detail the underlying ontology to specify correct queries, and (ii) it only considers relationships explicitly formalized in the knowledge base, thus missing others hidden in it. To the best of our knowledge, the kind of relationships we pursue have been only considered in the context of the research project SemDis.5 This project proposes an approach for querying an RDF ontology and discovering complex semantic associations between two instances A and B specified by the user [1]. In order to infer such associations, the authors of [1] identify both instances in the knowledge base and explores (successively) the chains of properties that start from them. Traversing these property sequences, their approach reaches new instances implicitly related to those specified by the user. Several semantic associations are defined in [1] from these property sequences and from the relationships between them. Our recommendation strategy adopts the associations defined in SemDis because they favor the discovery of hidden knowledge about the user preferences, and contribute to more effective matching processes. These associations and the relationships between the property sequences that originate them will be detailed in Section 4. Although our approach borrows the semantic associations, we cannot reuse the mechanism defined in SemDis for discovering such associations in the RDF(S) ontology. This is because the SemDis query paradigm does not consider the personalization requirements present in a recommender system. Instead of a paradigm ‘‘Get all the semantic associations between A and B’’, our methodology requires a paradigm ‘‘Get the instances related to A and the semantic associations established between them’’. In AVATAR, A refers to the TV programs the user liked in the past, and the retrieved instances identify the programs finally sug- 4 The query languages defined for OWL are valid for RDF(S) as well. 5 Additional information about SemDis can be found in http:// lsdis.cs.uga.edu/Projects/SemDis. Y. Blanco-Ferna´ndez et al. / Knowledge-Based Systems 21 (2008) 305–320 307
308 Y. Blanco-Fernandez et al. Knowledge- Based Systems 21(2008)305-320 gested. In order to guarantee the computational viability, The properties are the key elements of our semantic rea we use a controlled inference mechanism that explores soning methodology. Specifically, our approach explores the knowledge base by selecting instances significant to both these properties and the instances(and classes) joined the user and omitting those which are totally irrelevant. by them, with the goal of uncovering meaningful semantic Thanks to this mechanism, our methodology guarantees associations hidden in the knowledge base. To this aim, we that: (i)the discovered associations find programs appeal- adopt the notion of property sequence defined in the Sem- ing to the user, and (ii) the reasoning process adapts as Dis project his/her preferences evolve. This way, our content-based strategy achieves a balance between the diversification 3. 2. Property sequences and the personalization of the offered recommendations Before detailing the reasoning methodology applied in Let C and p be the respective sets of all classes and all AVATAR, the next section describes: (i) the TV ontology properties defined in our ontology. Given a property used in this system and (ii) the profiles that store the view- PE P, its domain( denoted by domain(P)) limits the enti ers'preferences, modeled from the knowledge represented ties of C to which P can be applied, and its range( denoted in the ontology by range(P)) indicates the entities of C that P may take its value 3. The reasoning framework In [1] a property sequence PS is defined as a finite set of 3.1. The Tv ontolog properties [PI,., PN] that join several classes defined in the ontology. This can be formally expressed as follows: Since AVATAR is a TV recommender system, methodology requires an ontology that formalizes the BS={P1,…,PMP∈PW1≤j≤N, cepts and relationships typical in the Tv domain. This range()=domain(Pa+1)VI<i<NI information has been extracted from TV-Anytime [41]. specification that provides detailed semantic descriptions Example 1. For instance, in Fig. I, it is possible to identify about generic audiovisual programs the property sequence PS=HASACTOR, ACTORIN, Our TV ontology has been implemented in OWL. Spe- HASTOPICIjoining the classes Adventure Movies, Starring cifically, it includes a set of classes(representing program Actors, Drama Movies, and War Topics enres, topics, credits, geographical and temporal informa tion,etc. )and properties that establish relationships among An instance of Ps(denoted by ps)is defined as the set of them. Besides, our ontology defines hierarchical relation properties that join specific instances of the classes chips among classes, and among properties. In fact, it con- contained in PS. We use Y to represent that tains several hierarchies defined from the Tv-Anytime y is an instance of Y, being Y a entity(class or property metadata: hierarchies of genres(action movies, nature doc in the ontology. According to this notation, given umentaries,sports,etc.),hierarchies of topics(war, travel, PS=[PI,., PNl, we define its instance ps as follows (countries, cities, etc. ) hierarchies of credits(actors, direc- Ps=(P1,.., Pwl/p-P,VI<J<NS (2) tors, hosts, etc. ), etc L In order to reason about specific TV programs and to Example 2. In Fig. I, it also is possible to identufy an er semantic associations among them, it is necessary to add specific instances of the classes and properties defined In this case, the instance ps =[HasActor, ActorIn, Has Top- in the OwL ontology. Specifically, the Tv programs are ic] links the nodes Cast Away, Tom Hanks, Saving Private represented as instances of the classes defined in the hierar- Rvan, and World War II. chy of genres, and each one is given a unique reference (henceforth, ID). The semantic attributes of these pro- O approac. [l] the following grams(topics, geographical and temporal information, definitions credits, etc )are also defined as instances belonging to the remaining hierarchies of classes mentioned before. These (1) The origin and the terminus of a sequence are the first characteristics are linked to each program by means of and the last nodes contained in it, respectively properties, as shown in the excerpt of the ontology repre (1. 1)As X E C can be the origin of several property sented in Fig. I sequences, we use Ps to identify a sequence originated in X Note that owl Object Property identifies a property between two nodes labeled with a name, whereas rdf type of represents the relationshi between a class and one of its instances. For simplicity, we omitted some 7 For simplicity, we use the term property sequence to refer both to the classes and rdf typeof links in Fig. I(e.g. links between some specific sequences and to their instances. The sequences actors and the class Starring Actors) uppercase letters(i.e. PS), and their instances by lowercase ones (ps)
gested. In order to guarantee the computational viability, we use a controlled inference mechanism that explores the knowledge base by selecting instances significant to the user and omitting those which are totally irrelevant. Thanks to this mechanism, our methodology guarantees that: (i) the discovered associations find programs appealing to the user, and (ii) the reasoning process adapts as his/her preferences evolve. This way, our content-based strategy achieves a balance between the diversification and the personalization of the offered recommendations. Before detailing the reasoning methodology applied in AVATAR, the next section describes: (i) the TV ontology used in this system and (ii) the profiles that store the viewers’ preferences, modeled from the knowledge represented in the ontology. 3. The reasoning framework 3.1. The TV ontology Since AVATAR is a TV recommender system, our methodology requires an ontology that formalizes the concepts and relationships typical in the TV domain. This information has been extracted from TV-Anytime [41], a specification that provides detailed semantic descriptions about generic audiovisual programs. Our TV ontology has been implemented in OWL. Specifically, it includes a set of classes (representing program genres, topics, credits, geographical and temporal information, etc.) and properties that establish relationships among them. Besides, our ontology defines hierarchical relationships among classes, and among properties. In fact, it contains several hierarchies defined from the TV-Anytime metadata: hierarchies of genres (action movies, nature documentaries, sports, etc.), hierarchies of topics (war, travel, disasters, etc.), hierarchies of geographical information (countries, cities, etc.), hierarchies of credits (actors, directors, hosts, etc.), etc. In order to reason about specific TV programs and to infer semantic associations among them, it is necessary to add specific instances of the classes and properties defined in the OWL ontology. Specifically, the TV programs are represented as instances of the classes defined in the hierarchy of genres, and each one is given a unique reference (henceforth, ID). The semantic attributes of these programs (topics, geographical and temporal information, credits, etc.) are also defined as instances belonging to the remaining hierarchies of classes mentioned before. These characteristics are linked to each program by means of properties, as shown in the excerpt of the ontology represented in Fig. 1. 6 The properties are the key elements of our semantic reasoning methodology. Specifically, our approach explores both these properties and the instances (and classes) joined by them, with the goal of uncovering meaningful semantic associations hidden in the knowledge base. To this aim, we adopt the notion of property sequence defined in the SemDis project. 3.2. Property sequences Let C and P be the respective sets of all classes and all properties defined in our ontology. Given a property P 2 P, its domain (denoted by domain (P)) limits the entities of C to which P can be applied, and its range (denoted by range (P)) indicates the entities of C that P may take as its value. • In [1], a property sequence PS is defined as a finite set of properties [P1,...,PN] that join several classes defined in the ontology. This can be formally expressed as follows: PS ¼ f½P1; ... ; P N =Pj 2 P 8 1 6 j 6 N; range ðPiÞ ¼ domainðPiþ1Þ 8 1 6 i < Ng ð1Þ Example 1. For instance, in Fig. 1, it is possible to identify the property sequence PS = [HASACTOR, ACTORIN, HASTOPIC] joining the classes Adventure Movies, Starring Actors, Drama Movies, and War Topics. • An instance of PS (denoted by ps) is defined as the set of properties that join specific instances of the classes contained in PS. 7 We use y ! rdf :typeOf Y to represent that y is an instance of Y, being Y a entity (class or property) in the ontology. According to this notation, given PS = [P1,...,PN], we define its instance ps as follows: ps ¼ f½p1; ... ; pN =pj ! rdf :typeOf Pj 8 1 6 j 6 Ng ð2Þ Example 2. In Fig. 1, it also is possible to identify an instance of the property sequence PS used in Example 1. In this case, the instance ps = [HasActor, ActorIn, HasTopic] links the nodes Cast Away, Tom Hanks, Saving Private Ryan, and World War II. Our approach also borrows from [1] the following definitions: (1) The origin and the terminus of a sequence are the first and the last nodes contained in it, respectively. (1.1) As X 2 C can be the origin of several property sequences, we use PSX to identify a sequence originated in X: 6 Note that owl:ObjectProperty identifies a property between two nodes labeled with a name, whereas rdf:typeOf represents the relationship between a class and one of its instances. For simplicity, we omitted some classes and rdf:typeof links in Fig. 1 (e.g. links between some specific actors and the class Starring Actors). 7 For simplicity, we use the term property sequence to refer both to the sequences and to their instances. The sequences are represented by uppercase letters (i.e. PS), and their instances by lowercase ones (ps). 308 Y. Blanco-Ferna´ndez et al. / Knowledge-Based Systems 21 (2008) 305–320
Y. Blanco-Fernandez et al. Knowledge-Based Systems 21(2008)305-320 rdf: type Of ○ Instances owl: Object Property TopicIn Carmen Has Place PlaceIn G Topicin HasPlace Placein Has Director Oscar Actorin ACTORIN Trueba Movies X ( HASTOPIC Fig 1. Subset of instances, classes and properties in our OWL ontology P={P1,,PM/P∈PV1≤t≤M,X∈C, PS. NodesOfPso={C1,CM+l/C1∈Cv1≤t≤M+1 domain(P1)=X, range(Pi)=domain(Pa+1)vI<i<MI C1=X,C= domain(P),V1≤j≤M, CN+l=range(PM)J 1. 2)Analogously, being x an instance of class X E C, we use ps to identify an instance of On the other hand, given ps=[Pi,.,PMl(an instance of the sequence PS originated in x. This way, the sequence PS=[P1,., PMD and given that given PS=[PI,., PMl pst is defined as PS Nodes OfPSO=[Cl,., CM+1l we define the function follows. ps". PSNodes Sequence( as follows ps". PSNodes Sequence=icl P,V1≤i≤M c--C,V1≤j≤M+1} (2)The length of ps is the number of properties contained in the sequence. In our approach, the sequence with Example 3. Let us consider in Fig. I two property the minimum length between two given instances is sequences originated in x=CastAway ps lasdirecto named geodesic sequence (length 1), and psf=[HasActor, ActorIn, Has Topic(length (3)Like in SemDis, we define two functions to access the 3). The nodes included in both sequences are ps. PSNodes- Sequence=[ Cast Away, Rober Zemeckis) and ps, PSNodes PsY Nodes of ps o), and the nodes ci of one of its Sequence a CastAway, TomHanks, SavingPriateRyan instances ps(function ps. PSNodes Sequence) On one hand, given PS=[PI,., PMI we define the s In the following, we will use a subindex to identify each sequence/ first function as originated in a given node
PSX ¼ f½P1;...;P M =Pt 2P 8 1 6 t 6 M; X 2C; domainðP1Þ ¼ X; rangeðPiÞ ¼ domainðPiþ1Þ 8 1 6 i < Mg (1.2) Analogously, being x an instance of class X 2 C, we use psx to identify an instance of the sequence PSX originated in x. This way, given PSX = [P1,...,PM], psx is defined as follows: psx ¼ f½p1; ... ; pM =x ! rdf :typeOf X; pi ! rdf :typeOf Pi; 8 1 6 i 6 Mg (2) The length of ps is the number of properties contained in the sequence. In our approach, the sequence with the minimum length between two given instances is named geodesic sequence. (3) Like in SemDis, we define two functions to access the classes Cj of the property sequence PSX (function PSX.Nodes Of PS ()), and the nodes cj of one of its instances psx (function psx .PSNodesSequence()). On one hand, given PSX = [P1,...,PM], we define the first function as: PSX :NodesOfPSðÞ ¼ f½C1;...;CMþ1=Ct 2 C 8 1 6 t 6 M þ1; C1 ¼ X; Cj ¼ domainðPjÞ; 8 1 6 j 6 M; CNþ1 ¼ range Pð Þg ð M 3Þ On the other hand, given psx = [p1,...,pM] (an instance of the sequence PSX = [P1,...,PM]) and given that PSX.NodesOfPS() = [C1,...,CM+1], we define the function psx .PSNodesSequence() as follows: psx :PSNodesSequenceðÞ ¼ f½c1; ... ; cMþ1=c1 ¼ x; cj ! rdf :typeOf Cj; 8 1 6 j 6 M þ 1g ð4Þ Example 3. Let us consider in Fig. 1 two8 property sequences originated in x ¼ CastAway : psx 1 ¼ ½HasDirector (length 1), and psx 2 ¼ ½HasActor; ActorIn; HasTopic (length 3). The nodes included in both sequences are psx 1. PSNodesSequence = {Cast Away, Rober Zemeckis} and psx 2:PSNodes Sequence ¼ fCastAway; TomHanks; SavingPrivateRyan; WorldWarIIg: Finnish Tours Travelling Adv. Discover Spain! HasAgency HasTopic HasTopic HasTopic rdf:type Of owl:Object Property Camaron Flamenco dancing Carmen Flamenco Women Oscar Jaenada Spain Belle Epoque Fernando Trueba Penelope Cruz Jorge Sanz Spanish Civil War World War II Saving Private Ryan Tom Hanks Robert Zemeckis Cast Away TopicIn TopicIn HasPlace PlaceIn HasPlace PlaceIn HasActor HASACTOR ACTORIN HASTOPIC HasActor HasActress ActorIn ActorIn HasDirector HasDirector Tourism Contents Romance Movies Adventure Movies War Topics Directors Classes Instances Starring Actors Drama Movies Fig. 1. Subset of instances, classes and properties in our OWL ontology. 8 In the following, we will use a subindex to identify each sequence ps originated in a given node. Y. Blanco-Ferna´ndez et al. / Knowledge-Based Systems 21 (2008) 305–320 309