ARTICLE N PRESS Y. Blanco-Ferndndez et aL Information Sciences xcx(2011)xxx-xXx Reasoning about a user's preferences requires a formal representation including semantic descriptions of the items that are appealing or unappealing to him/ her (named positive and negative preferences, respectively). These descriptions permit a recommender system to learn new knowledge about the user's interests, which is not possible with many of the existing user Some existing works define too simple user models, containing only flat lists of key words(e.g attributes) or ratings referred to each item defined in the users profile [11, 24, 39. These proposals provide little knowledge about the users references, and therefore hamper the application of advanced reasoning processes. Other more sophisticated proposals take advantage of the hierarchical structures defined in an ontology to model the ers preferences 27, 42, 19 In these works, profiles do not contain the specific items the user(dis)liked in but the classes under which these items are categorized in a hierarchy. The main drawback of this approach is ti explores the hierarchical structure of the domain and misses the semantic descriptions of the items, which are es useful for user modeling tasks and for subsequent reasoning processes, as we will describe through the paper. Bearing in mind that the descriptions required in our reasoning mechanisms are already defined in the domain ontology, we propose to model the users preferences by reusing the knowledge formalized in it. The resulting models are named ontology profiles and store the interest of the user in: (i) the attributes of the items which are(un ) interesting for him/ her and(ii)the hier archy of classes under which these items are categorized. This approach has two main advantages for a recommender system On the one hand, the formal representation of the user's profile allows the system to reason and compare effectively his/ her preferences against the available items, thus favoring more accurate personalization processe On the other hand, we provide the system with a very detailed model of the users interests, while not requiring that the classes, properties and instances that identify these preferences be stored in each profile. Thus, we significantly reduce the storage capabilities needed in the reasoning-driven recommender system. To this aim, we use the domain ontology as a common knowledge repository, keeping only two elements in the users profile: unique references(denoted by IDs) Profile P Pre DOI indexes index Instances in the domain ontolog in the FICTIO Fig. 3. Our ontology-based approach for modeling user in a TV recommender system. Please cite this article in press as: Y.Blanco-Fernandezet al, Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems, Inform. Sci.(2011). doi: 10.1016/jins2011.06.016
3.2. User modeling technique Reasoning about a user’s preferences requires a formal representation including semantic descriptions of the items that are appealing or unappealing to him/her (named positive and negative preferences, respectively). These descriptions permit a recommender system to learn new knowledge about the user’s interests, which is not possible with many of the existing user modeling techniques: Some existing works define too simple user models, containing only flat lists of key words (e.g. attributes) or ratings referred to each item defined in the user’s profile [11,24,39]. These proposals provide little knowledge about the user’s preferences, and therefore hamper the application of advanced reasoning processes. Other more sophisticated proposals take advantage of the hierarchical structures defined in an ontology to model the user’s preferences [27,42,19]. In these works, profiles do not contain the specific items the user (dis)liked in the past, but the classes under which these items are categorized in a hierarchy. The main drawback of this approach is that it only explores the hierarchical structure of the domain and misses the semantic descriptions of the items, which are especially useful for user modeling tasks and for subsequent reasoning processes, as we will describe through the paper. Bearing in mind that the descriptions required in our reasoning mechanisms are already defined in the domain ontology, we propose to model the user’s preferences by reusing the knowledge formalized in it. The resulting models are named ontologyprofiles and store the interest of the user in: (i) the attributes of the items which are (un)interesting for him/her and (ii) the hierarchy of classes under which these items are categorized. This approach has two main advantages for a recommender system: On the one hand, the formal representation of the user’s profile allows the system to reason and compare effectively his/ her preferences against the available items, thus favoring more accurate personalization processes. On the other hand, we provide the system with a very detailed model of the user’s interests, while not requiring that the classes, properties and instances that identify these preferences be stored in each profile. Thus, we significantly reduce the storage capabilities needed in the reasoning-driven recommender system. To this aim, we use the domain ontology as a common knowledge repository, keeping only two elements in the user’s profile: unique references (denoted by IDs) ID1 ID3 DOI indexes DOI indexes DOI indexes DOI indexes ID1 ID2 X Y ID3 ID2 ID1 Actor Director Alert Topic Intended Audience FICTION Drama Cookery Adventure Tourism Gardening LEISURE TV CONTENTS InstanceOf SubClassOf Properties Fig. 3. Our ontology-based approach for modeling user in a TV recommender system. 6 Y. Blanco-Fernández et al. / Information Sciences xxx (2011) xxx–xxx Please cite this article in press as: Y. Blanco-Fernández et al., Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems, Inform. Sci. (2011), doi:10.1016/j.ins.2011.06.016
ARTICLE N PRESS Y. Blanco-Fermandez et aL / Information Sciences xxx(2011)xXx-XXXx that identify the items the user(dis)liked and his/her specific level of interest in each one of them. These references pe mit to locate in the ontology the items defined in the user's profile and to query their semantic descriptions (ie attributes and hierarchical classes )over the conceptualization, as shown in Fig 3 for a recommender system in Tv domain. Note that our modeling technique does not consider a flat list of attributes referred to the users preferences, but rather it xploits the structure of the domain ontology and the relationships existing among these attributes in order to learn knowl- edge about his/her interests and exploit it during the personalization process. bviously, recommender systems require the users to define some initial preferences to start working. Considering the users'involvement, the goal is to provide a user-friendly interface to alleviate their initialization burden. Our user modeling technique exploits the hierarchical structure of the underlying ontology for that purpose. Specifically, a list of classes/sub- classes and specific instances referred to the items to be recommended(eg. programs in the tv domain)is shown to the user, who can identify his/ her positive and negative preferences by assigning ratings to each specific item. The hierarchy of classes displayed is self-explanatory (see bottom of Fig. 3), so that the users can easily browse it and feel free to rate as items as they wan After the profile initialization, it is necessary to measure the users level of interest in each item included in his/her profile. To this aim, we have defined the so-called Dol indexes( Degree of Interest)in the range [-1. 1 ], with -1 representing the greatest disliking and 1 the greatest liking. These indexes can be either explicitly entered by the user or inferred automat- ically by the recommender system from the relevance feedback provided after recommendations. The dol index computed for each item is also used to set the ratings corresponding to its attributes and to the classes under which the item is cate- gorized in the ontology. Specifically, the dol of an attribute is taken as the average of the dols of the items it is linked to Similarly, the dols of the most specific classes are computed as the average of the dols of the items classified under them. Then, we propagate these values upwards in the hierarchy until reaching its root class. For that purpose, we adopt the ap- proposed in[42], which leads to higher Dol indexes for the superclasses closer to the leaf class whose value is being gated, and lower ones for the classes which are closer to the root of the hierarchy. Besides, the higher the dol of a given nd the lower its number of siblings, the higher the value propagated to its superclass 4. Using content-based filtering in tandem with Sa techniques As we mentioned in the introduction, our content-based strategy is divided into two phases named pre-filtering and rec- emendation phase. Even though the pre-filtering phase has been detailed in 9]. in this section we summarize the main as- ects of this process with the goal of clarifying how the user's Ontology of Interest is selected( Section 4. 1)and how it is processed by Sa techniques in the recommendation phase of the strategy( Sections 4.2, 4.3, 4.4). Regarding SA techniques. we extend traditional approaches by overcoming the limitations pointed out in Section 2. 2. 2, which hamper their adoption in a recommender system where the focus must be put on the user's preferences: On the one hand, our approach extends the simple relationships considered by traditional Sa techniques by considering both the properties defined in the ontology and the semantic associations inferred from them. this rich variety of rela- ionships permit to establish links that propagate the relevance of the items selected by the pre-filtering phase, leading to diverse enhanced recommendations On the other hand, to fulfill the personalization requirements of a recommender system, our link weighting process does not depend only on the two nodes joined by the considered link, but also on the strength of) their relationship to the items defined in the user's profile This way, the links of the network created for the user are updated as our strategy learns new knowledge about his her preferences, thus leading to tailor-made recommendations after the spreading process. Once the principles of our SA approach have been sketched, we focus on the processes required for its use in our content based strategy: (i) selection of the user's Ontology of Interest. (ii)creation of the users SA network, (ii )weighting of its links, (iii) processing of the network by Sa techniques, and(iv) selection of our reasoning-based recomm 4.1. Pre-filtering phase: creating the user's Ontology of Interest Our pre-filtering phase decides which instances of classes and properties from the domain ontology must be included in the users Ontology of Interest because they are relevant for him/her. For that purpose, we firstly locate in the domain ontol- ntil rePealing to the user(defined in his her profile). Next, we traverse successively the properties bound to these items until reaching new class instances in the ontology, referred to other items and their attributes. In order to guarantee computational feasibility, we have developed a controlled inference mechanism that progressively filters the instances of classes and properties that do not provide useful knowledge for the personalization process As new nodes are reached from a given e, we firstly quantify their relevance for the user by an index named seman- tic intensity(denoted by isem(n) for node n) whose computation process will be described in this section cite this article in press as: Y. Blanco-Fernandez et al, Exploring synergies between content-based filtering and Spreading Activation iques in knowledge-based recommender systems, Inform. Sci. (2011). doi: 10. 1016/j ins. 2011.06.016
that identify the items the user (dis)liked, and his/her specific level of interest in each one of them. These references permit to locate in the ontology the items defined in the user’s profile and to query their semantic descriptions (i.e. attributes and hierarchical classes) over the conceptualization, as shown in Fig. 3 for a recommender system in TV domain. Note that our modeling technique does not consider a flat list of attributes referred to the user’s preferences, but rather it exploits the structure of the domain ontology and the relationships existing among these attributes in order to learn knowledge about his/her interests and exploit it during the personalization process. Obviously, recommender systems require the users to define some initial preferences to start working. Considering the users’ involvement, the goal is to provide a user-friendly interface to alleviate their initialization burden. Our user modeling technique exploits the hierarchical structure of the underlying ontology for that purpose. Specifically, a list of classes/subclasses and specific instances referred to the items to be recommended (e.g. programs in the TV domain) is shown to the user, who can identify his/her positive and negative preferences by assigning ratings to each specific item. The hierarchy of classes displayed is self-explanatory (see bottom of Fig. 3), so that the users can easily browse it and feel free to rate as items as they want. After the profile initialization, it is necessary to measure the user’s level of interest in each item included in his/her profile. To this aim, we have defined the so-called DOI indexes (Degree Of Interest) in the range [1,1], with 1 representing the greatest disliking and 1 the greatest liking. These indexes can be either explicitly entered by the user or inferred automatically by the recommender system from the relevance feedback provided after recommendations. The DOI index computed for each item is also used to set the ratings corresponding to its attributes and to the classes under which the item is categorized in the ontology. Specifically, the DOI of an attribute is taken as the average of the DOIs of the items it is linked to. Similarly, the DOIs of the most specific classes are computed as the average of the DOIs of the items classified under them. Then, we propagate these values upwards in the hierarchy until reaching its root class. For that purpose, we adopt the approach proposed in [42], which leads to higher DOI indexes for the superclasses closer to the leaf class whose value is being propagated, and lower ones for the classes which are closer to the root of the hierarchy. Besides, the higher the DOI of a given class and the lower its number of siblings, the higher the value propagated to its superclass. 4. Using content-based filtering in tandem with SA techniques As we mentioned in the introduction, our content-based strategy is divided into two phases named pre-filtering and recommendation phase. Even though the pre-filtering phase has been detailed in [9], in this section we summarize the main aspects of this process with the goal of clarifying how the user’s Ontology of Interest is selected (Section 4.1) and how it is processed by SA techniques in the recommendation phase of the strategy (Sections 4.2, 4.3, 4.4). Regarding SA techniques, we extend traditional approaches by overcoming the limitations pointed out in Section 2.2.2, which hamper their adoption in a recommender system where the focus must be put on the user’s preferences: On the one hand, our approach extends the simple relationships considered by traditional SA techniques by considering both the properties defined in the ontology and the semantic associations inferred from them. This rich variety of relationships permit to establish links that propagate the relevance of the items selected by the pre-filtering phase, leading to diverse enhanced recommendations. On the other hand, to fulfill the personalization requirements of a recommender system, our link weighting process does not depend only on the two nodes joined by the considered link, but also on (the strength of) their relationship to the items defined in the user’s profile. This way, the links of the network created for the user are updated as our strategy learns new knowledge about his/her preferences, thus leading to tailor-made recommendations after the spreading process. Once the principles of our SA approach have been sketched, we focus on the processes required for its use in our contentbased strategy: (i) selection of the user’s Ontology of Interest, (ii) creation of the user’s SA network, (ii) weighting of its links, (iii) processing of the network by SA techniques, and (iv) selection of our reasoning-based recommendations. 4.1. Pre-filtering phase: creating the user’s Ontology of Interest Our pre-filtering phase decides which instances of classes and properties from the domain ontology must be included in the user’s Ontology of Interest because they are relevant for him/her. For that purpose, we firstly locate in the domain ontology the items that are (un)appealing to the user (defined in his/her profile). Next, we traverse successively the properties bound to these items until reaching new class instances in the ontology, referred to other items and their attributes. In order to guarantee computational feasibility, we have developed a controlled inference mechanism that progressively filters the instances of classes and properties that do not provide useful knowledge for the personalization process: As new nodes are reached from a given instance, we firstly quantify their relevance for the user by an index named semantic intensity (denoted by kSem(n) for node n), whose computation process will be described in this section. Y. Blanco-Fernández et al. / Information Sciences xxx (2011) xxx–xxx 7 Please cite this article in press as: Y. Blanco-Fernández et al., Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems, Inform. Sci. (2011), doi:10.1016/j.ins.2011.06.016
ARTICLE N PRESS Y. Blanco-Ferndndez et aL Information Sciences xcx(2011)xxx-xXx Next, the nodes whose intensity indexes are not greater than a specific threshold are disregarded so that our inference mechanism continues traversing only the properties that permit to reach new nodes from those that are relevant for the In order to measure the semantic intensity of a node n, we take into account various ontology-dependent pre-filtering cri- the more significant the relationship between a given node and the user's preferences, the higher the resulting value. Some of these criteria(described in detail in 9]) are summarized next (1)Length of the property sequence that enables to reach the node starting from the user's preferences. The longer this sequence, the lower the semantic intensity of the node because its relationship to the users preferences is less signif- icant due to the presence of many intermediate nodes. (2) Existence of hierarchical relationships between the node and the users preferences. The intensity of a node increases when it is possible to find a common ancestor between it and the user's preferences in the hierarchies defined in the (3)Existence of implicit relationships between the node and the user's preferences detected by graph theory betweenness. In graph theory [16], the betweenness among three nodes is high when in the most of paths from the first node to the second one, the third node is also included. therefore, from a high value of betweenness, it follows that the involved nodes are strongly related. In our approach, these nodes are the user's preferences and the class instance whose relevance is being measured Once the nodes related to the user's preferences (and also the properties linking them to each other) have been selected our strategy infers semantic associations between the instances referred to items that can be recommended. As per the cat egorization of semantic associations described in Section 2. 1, we detect the following relationships between the items de- fined in the users Ontology of Interest First, p-path associations between the items that are joined by a property sequence in the Ontology of Interest, as it hap- ns with the programs Hell,s kitchen and Indian culinary specialties in Fig. 2, which are linked by the instance cooking in the ontology. Second, p-join associations between, for instance, the items whose attributes belong to a union class in the ontology. As an example, the programs Renaissance sculpture and The Art of ceramics in Fig. 2 are associated because both are about plastic arts strongly related to each other(as shown in the class hierarchy of the figure, sculpture and ceramics belong to the union class Plastic arts ). Starting from the user's Ontology of Interest and the semantic associations inferred among its nodes, we create the user's SA network, whose knowledge is explored during the second phase of the strategy by exploiting the inference capabilities provided by Sa techniques. 4.2. Creation of the user's SA network The user's Sa network can be easily built starting from his /her Ontology of Interest. Specifically, the nodes of this network are the class instances selected by the pre-filtering phase of our strategy. The knowledge learned in this first phase also helps to identify the links that relate the nodes to each other, which permit to carry out the inference processes toward recomm dations. In this regard, our SA approach defines two kind of links Real links. These links model the knowledge that is explicitly represented in the user's Ontology of Interest. Specifically, we consider a real link in the user's SA network for each one of the property instances included in his/her Ontology Virtual links. These links refer to relationships inferred from the Ontology of Interest. In this group, we include both simple hierarchical relationships and the complex semantic associations discovered from the properties and hierarchical links of he users Ontology of Interest. According to the nature of both relationships we identify two kind of virtual links: Associative virtual links. We consider an associative virtual link between each pair of items related by p-path or p-join associations. For instance, from the associations depicted in Fig. 1, we define three associative virtual links: between items i1 and is, due to the p-path association: between items in and i6. due to p-join; and between items is and ig, again due to p-join. Hierarchical virtual links We consider a hierarchical virtual link en the two instances belonging to the union class that causes p-join associations. For instance, in Fig. 1 it is possible to establish a virtual link between items i3 and i7, which are classified under the union class c We define a new type of structure(named virt arting from p-join associations existing between two Items This structure permits to go from one item to by crossing a minimum number of real links and the hierarchical link that originates the p-join association between items. The length of the virtual path is defined as the links contained in it. As an example, in Fig. 1 it is possible to find a virtual path(of length 3)between items in and i6, which Please cite this article in press as: Y. Blanco-Fernandezet al, Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems, Inform. Sci.(2011). doi: 10.1016/jins2011.06.016
Next, the nodes whose intensity indexes are not greater than a specific threshold are disregarded, so that our inference mechanism continues traversing only the properties that permit to reach new nodes from those that are relevant for the user. In order to measure the semantic intensity of a node n, we take into account various ontology-dependent pre-filtering criteria, so that the more significant the relationship between a given node and the user’s preferences, the higher the resulting value. Some of these criteria (described in detail in [9]) are summarized next: (1) Length of the property sequence that enables to reach the node starting from the user’s preferences. The longer this sequence, the lower the semantic intensity of the node because its relationship to the user’s preferences is less significant due to the presence of many intermediate nodes. (2) Existence of hierarchical relationships between the node and the user’s preferences. The intensity of a node increases when it is possible to find a common ancestor between it and the user’s preferences in the hierarchies defined in the ontology. (3) Existence of implicit relationships between the node and the user’s preferences detected by graph theory betweenness. In graph theory [16], the betweenness among three nodes is high when in the most of paths from the first node to the second one, the third node is also included. Therefore, from a high value of betweenness, it follows that the involved nodes are strongly related. In our approach, these nodes are the user’s preferences and the class instance whose relevance is being measured. Once the nodes related to the user’s preferences (and also the properties linking them to each other) have been selected, our strategy infers semantic associations between the instances referred to items that can be recommended. As per the categorization of semantic associations described in Section 2.1, we detect the following relationships between the items de- fined in the user’s Ontology of Interest: First, q-path associations between the items that are joined by a property sequence in the Ontology of Interest, as it happens with the programs Hell’s kitchen and Indian culinary specialties in Fig. 2, which are linked by the instance cooking in the ontology. Second, q-join associations between, for instance, the items whose attributes belong to a union class in the ontology. As an example, the programs Renaissance sculpture and The Art of ceramics in Fig. 2 are associated because both are about plastic arts strongly related to each other (as shown in the class hierarchy of the figure, sculpture and ceramics belong to the union class Plastic arts). Starting from the user’s Ontology of Interest and the semantic associations inferred among its nodes, we create the user’s SA network, whose knowledge is explored during the second phase of the strategy by exploiting the inference capabilities provided by SA techniques. 4.2. Creation of the user’s SA network The user’s SA network can be easily built starting from his/her Ontology of Interest. Specifically, the nodes of this network are the class instances selected by the pre-filtering phase of our strategy. The knowledge learned in this first phase also helps to identify the links that relate the nodes to each other, which permit to carry out the inference processes toward recommendations. In this regard, our SA approach defines two kind of links: Real links. These links model the knowledge that is explicitly represented in the user’s Ontology of Interest. Specifically, we consider a real link in the user’s SA network for each one of the property instances included in his/her Ontology. Virtual links. These links refer to relationships inferred from the Ontology of Interest. In this group, we include both simple hierarchical relationships and the complex semantic associations discovered from the properties and hierarchical links of the user’s Ontology of Interest. According to the nature of both relationships, we identify two kind of virtual links: – Associative virtual links. We consider an associative virtual link between each pair of items related by q-path or q-join associations. For instance, from the associations depicted in Fig. 1, we define three associative virtual links: between items i1 and i5, due to the q-path association; between items i1 and i6, due to q-join; and between items i5 and i8, again due to q-join. – Hierarchical virtual links. We consider a hierarchical virtual link between the two instances belonging to the union class that causes q-join associations. For instance, in Fig. 1 it is possible to establish a virtual link between items i3 and i7, which are classified under the union class C. We define a new type of structure (named virtual path) starting from q-join associations existing between two specific items. This structure permits to go from one item to the other by crossing a minimum number of real links and the hierarchical link that originates the q-join association between the two items. The length of the virtual path is defined as the number of real links contained in it. As an example, in Fig. 1 it is possible to find a virtual path (of length 3) between items i1 and i6, which 8 Y. Blanco-Fernández et al. / Information Sciences xxx (2011) xxx–xxx Please cite this article in press as: Y. Blanco-Fernández et al., Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems, Inform. Sci. (2011), doi:10.1016/j.ins.2011.06.016