ARTICLE N PRESS Information Sciences xxx(2011)xXx Contents lists available at Science Direct Information sciences ELSEVIER journalhomepage:www.elsevier.com/locate/ins Exploring synergies between content-based filtering and spreading Activation techniques in knowledge-based recommender systems Yolanda blanco-Fernandez, Martin Lopez-Nores, Alberto Gil-Solla, Manuel Ramos-Cabrer, Jose j. Pazos-aria ETSE Telecomunicacion, Campus Universitario, Vigo 36310, Spain ARTICLE INFO A BSTRACT mation overload by selecting automatically items that Received in revised form 9 June 2011 suggest item Accepted 10 June 2011 to those the user liked in the past, using syntactic matching mecha- nisms. The ri Available online xxxx of such mechanisms leads to re nding only items that bear rong resemblance to those the user already knows. Traditional collaborative approaches face up to overspecialization by considering the preferences of other users, which causes ther severe limitations. In this paper, we avoid the intrinsic pitfalls of collaborative solu- fltering tions and diversify the recommendations by reasoning about the semantics of the users preferences. Specifically, we present a novel content-based recommendation strategy that Spreading Activation techniques resorts to semantic reasoning mechanisms adopted in the Semantic Web, such as Spread- ng Activation techniques and semantic associations. We have adopted these mechanisms extra knowledge about the users preferences and leading to more accurate and diverse suggestions. Our approach is generic enough to be used in a wide variety of domains and recommender systems. The proposal has been preliminary evaluated by statistics- driven tests involving real users in the recommendation of Digital TV contents. The results reveal the users' satisfaction regarding the accuracy and diversity of the reasoning-driven ontent-based recommendations e 2011 Elsevier Inc. All rights reserved. 1 Introduction Recommender systems provide personalized advice to users about items they might be interested in. These tools are already helping people efficiently manage content overload and reduce complexity To fulfill these personalization needs, three main components are required: (i)a database that stores characterizations of the available items, (ii) profiles that model the users' preferences, and (iii)recommendation strategies that make personalized uggestions to each individual. The first recommendation strategy was content-based filtering[ 41, 30, which consists of suggesting items similar to those the user liked in the past. In spite of its accuracy, this technique is limited due to the similarity metrics employed, which are based on rigid syntactic approaches that can only detect similarity between items that share all or some of their attributes [1. Consequently, traditional content-based approaches lead to overspecialized suggestions including only items that bear strong resemblance to those the user already knows (i.e. items bound to the attributes defined in his her profile ). Work funded by the ministerio de educacion y Ciencia(Gobierno de Espana) Research Project tiN 0020-0255S-see front matter a 2011 Elsevier Inc. All rights reserved. doi:10.1016ins201106.016 Please cite this article in press as:Y. Blanco-Fernandez et al, Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems, Inform. Sci.(2011). doi: 10.1016)jins2011.06.016
Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems q Yolanda Blanco-Fernández ⇑ , Martín López-Nores, Alberto Gil-Solla, Manuel Ramos-Cabrer, José J. Pazos-Arias ETSE Telecomunicación, Campus Universitario, Vigo 36310, Spain article info Article history: Received 21 September 2009 Received in revised form 9 June 2011 Accepted 10 June 2011 Available online xxxx Keywords: Personalization Content-based filtering Semantic reasoning Spreading Activation techniques abstract Recommender systems fight information overload by selecting automatically items that match the personal preferences of each user. The so-called content-based recommenders suggest items similar to those the user liked in the past, using syntactic matching mechanisms. The rigid nature of such mechanisms leads to recommending only items that bear strong resemblance to those the user already knows. Traditional collaborative approaches face up to overspecialization by considering the preferences of other users, which causes other severe limitations. In this paper, we avoid the intrinsic pitfalls of collaborative solutions and diversify the recommendations by reasoning about the semantics of the user’s preferences. Specifically, we present a novel content-based recommendation strategy that resorts to semantic reasoning mechanisms adopted in the Semantic Web, such as Spreading Activation techniques and semantic associations. We have adopted these mechanisms to fulfill the personalization requirements of recommender systems, enabling to discover extra knowledge about the user’s preferences and leading to more accurate and diverse suggestions. Our approach is generic enough to be used in a wide variety of domains and recommender systems. The proposal has been preliminary evaluated by statisticsdriven tests involving real users in the recommendation of Digital TV contents. The results reveal the users’ satisfaction regarding the accuracy and diversity of the reasoning-driven content-based recommendations. 2011 Elsevier Inc. All rights reserved. 1. Introduction Recommender systems provide personalized advice to users about items they might be interested in. These tools are already helping people efficiently manage content overload and reduce complexity when searching for relevant information. To fulfill these personalization needs, three main components are required: (i) a database that stores characterizations of the available items, (ii) profiles that model the users’ preferences, and (iii) recommendation strategies that make personalized suggestions to each individual. The first recommendation strategy was content-based filtering [41,30], which consists of suggesting items similar to those the user liked in the past. In spite of its accuracy, this technique is limited due to the similarity metrics employed, which are based on rigid syntactic approaches that can only detect similarity between items that share all or some of their attributes [1]. Consequently, traditional content-based approaches lead to overspecialized suggestions including only items that bear strong resemblance to those the user already knows (i.e. items bound to the attributes defined in his/her profile). 0020-0255/$ - see front matter 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2011.06.016 q Work funded by the Ministerio de Educación y Ciencia (Gobierno de España) Research Project TIN2010-20797. ⇑ Corresponding author. E-mail address: yolanda@det.uvigo.es (Y. Blanco-Fernández). Information Sciences xxx (2011) xxx–xxx Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins Please cite this article in press as: Y. Blanco-Fernández et al., Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems, Inform. Sci. (2011), doi:10.1016/j.ins.2011.06.016
ARTICLE N PRESS Y. Blanco-Ferndndez et aL Information Sciences xcx(2011)xxx-xXx In order to fight overspecialization, researchers devised collaborative filtering 36, 25, 29 -whose basic idea is to move be- d the experience of an individual user's profile and instead draw on the experiences of a community of like-minded users (his/her neighbors), and even they combined content-based and collaborative filtering in hybrid approaches [6, 22, 33, 13,.40 Even though collaborative(and hybrid) approaches mitigate the effects of overspecialization by considering the interests of other users, they bring in new limitations, such as the sparsity problem(related to difficulties to select each individuals neighborhood when the knowledge about the users' preferences is sparse), privacy concerns bound to the confidentiality of the users'personal data, and scalability problems stemmed from the management of many user profiles(instead of just one profile like in content-based approaches) The contribution of our paper is a content-based strategy that diversifies the recommendations by exploiting semantic rea- soning about the users interests, instead of considering other individuals preferences. This way, we overcome the overspe- cialization effects without suffering from the intrinsic limitations of collaborative and hybrid solutions. Specifically, our reasoning mechanisms have been borrowed from the area of the Semantic Web an initiative that is based on ()annotating Web resources by semantic annotations(metadata). (ii) formalizing this knowledge in a domain ontology that represents concepts and relationships by classes and properties, respectively, and (iii) carrying out reasoning processes about the onto- ogy in order to infer semantic relationships among the annotated resources. Broadly speaking, our content-based strategy suggests items which are semantically related to the users preferences, in- stead of offering items with the same attributes that appear in his/ her profile. For example, in the Tv field, a viewer who has enjoyed documentaries about traveling and archeology might receive as recommendations programs about potholing(a hob- by strongly related to the study of ancient graves)or about Greece (a country of deep-rooted archeological tradition).Our domain-independent strategy consists of two stages that adopt semantic associations 4 and Spreading Activation techniques henceforth, SA techniques)[14 as reasoning mechanisms: (1) Firstly, the pre-filtering phase selects an excerpt from the domain ont comprises only instances of classes and properties that are significant for the user(because they are closel his/her preferences). For the on, this excerpt is named the user's Ontology of Interest. Then, en semantic associations among the items included in the user's Ontology of Interest, starting from the hierarchical relationships and properties formal ed in it (2)Next, the recommendation phase processes the discovered knowledge and provides the personalized recommenda tions. To this aim, we emphasize the use of Sa techniques as computational mechanisms able to explore efficiently a generic network with nodes interconnected by links, and to detect concepts that are strongly related to each other. In our approach, the considered network corresponds to the users Ontology of Interest, while the strongly related nodes are his/ her preferences and the items to be suggested. The filtering criteria employed to delimit the users Ontology of Interest have been described in detail in [9]. For that rea son, here we focus on the second phase of our strategy. Specifically, our main research contribution consists of extending aim,our improved SA techniques must fulfill the following requirement. ecommender system can be considered. To this traditional Sa techniques so that the personalization requirements of a Firstly, our SA mechanisms must enable our strategy to discover useful knowledge for the recommendation proces reasoning about the semantics of the user's Ontology of Interest. Secondly, the knowledge inferred by the sa mechanisms must serve to increase the diversity of the offered content-based recommendations Lastly, our SA approach must learn automatically the user's preferences from the feedback provided after recommenda- tions, and thereafter update conveniently his her personal profile. This way, our reasoning-based suggestions evolve as the user's preferences change over time, thus reinforcing his/ her confidence in our personalization strategy. This paper is organized as follows. The next two sections provide necessary ba ound to understand our approach: Sec- tion 2 explains internals of semantic associations and highlights the limitations of traditional Sa techniques for personali zation purposes, while Section 3 presents the two essential components of our reasoning framework: the domain ontology and the user profiles. Next, Section 4 details the internals of our two-phase recommendation strategy exploring synergies between our improved Sa techniques and content-based filtering in the selection of diverse recommendations. Afterwards, Section 5 provides an example of our strategy in the scope of Digital TV, where we highlight how to exploit our reasoning capabilities to select Tv programs among the myriad available in the digital stream. Next, Section 6 presents the experimental evaluation of our approach and discusses scalability and computational feasibility concerns. Finally, Sec. tion 7 summarizes the conclusions from our work and motivates possible lines of further research. 2. Background on semantic reas In this section, we describe the internals of the semantic reasoning mechanisms exploited in our recommendation strat emantic associations and Sa techniques. Very briefly, the associations allow to interrelate the items available in the cite this article in press as: Y. Blanco-Fernandez et al., Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems, Inform. Sci.(2011). doi: 10.1016/jins2011.06.016
In order to fight overspecialization, researchers devised collaborative filtering [36,25,29] – whose basic idea is to move beyond the experience of an individual user’s profile and instead draw on the experiences of a community of like-minded users (his/her neighbors), and even they combined content-based and collaborative filtering in hybrid approaches [6,22,33,13,40]. Even though collaborative (and hybrid) approaches mitigate the effects of overspecialization by considering the interests of other users, they bring in new limitations, such as the sparsity problem (related to difficulties to select each individual’s neighborhood when the knowledge about the users’ preferences is sparse), privacy concerns bound to the confidentiality of the users’ personal data, and scalability problems stemmed from the management of many user profiles (instead of just one profile like in content-based approaches). The contribution of our paper is a content-based strategy that diversifies the recommendations by exploiting semantic reasoning about the user’s interests, instead of considering other individuals’ preferences. This way, we overcome the overspecialization effects without suffering from the intrinsic limitations of collaborative and hybrid solutions. Specifically, our reasoning mechanisms have been borrowed from the area of the Semantic Web, an initiative that is based on (i) annotating Web resources by semantic annotations (metadata), (ii) formalizing this knowledge in a domain ontology that represents concepts and relationships by classes and properties, respectively, and (iii) carrying out reasoning processes about the ontology in order to infer semantic relationships among the annotated resources. Broadly speaking, our content-based strategy suggests items which are semantically related to the user’s preferences, instead of offering items with the same attributes that appear in his/her profile. For example, in the TV field, a viewer who has enjoyed documentaries about traveling and archeology might receive as recommendations programs about potholing (a hobby strongly related to the study of ancient graves) or about Greece (a country of deep-rooted archeological tradition). Our domain-independent strategy consists of two stages that adopt semantic associations [4] and Spreading Activation techniques (henceforth, SA techniques) [14] as reasoning mechanisms: (1) Firstly, the pre-filtering phase selects an excerpt from the domain ontology that comprises only instances of classes and properties that are significant for the user (because they are closely related to his/her preferences). For that reason, this excerpt is named the user’s Ontology of Interest. Then, we infer hidden semantic associations among the items included in the user’s Ontology of Interest, starting from the hierarchical relationships and properties formalized in it. (2) Next, the recommendation phase processes the discovered knowledge and provides the personalized recommendations. To this aim, we emphasize the use of SA techniques as computational mechanisms able to explore efficiently a generic network with nodes interconnected by links, and to detect concepts that are strongly related to each other. In our approach, the considered network corresponds to the user’s Ontology of Interest, while the strongly related nodes are his/her preferences and the items to be suggested. The filtering criteria employed to delimit the user’s Ontology of Interest have been described in detail in [9]. For that reason, here we focus on the second phase of our strategy. Specifically, our main research contribution consists of extending traditional SA techniques so that the personalization requirements of a recommender system can be considered. To this aim, our improved SA techniques must fulfill the following requirements: Firstly, our SA mechanisms must enable our strategy to discover useful knowledge for the recommendation process by reasoning about the semantics of the user’s Ontology of Interest. Secondly, the knowledge inferred by the SA mechanisms must serve to increase the diversity of the offered content-based recommendations. Lastly, our SA approach must learn automatically the user’s preferences from the feedback provided after recommendations, and thereafter update conveniently his/her personal profile. This way, our reasoning-based suggestions evolve as the user’s preferences change over time, thus reinforcing his/her confidence in our personalization strategy. This paper is organized as follows. The next two sections provide necessary background to understand our approach: Section 2 explains internals of semantic associations and highlights the limitations of traditional SA techniques for personalization purposes, while Section 3 presents the two essential components of our reasoning framework: the domain ontology and the user profiles. Next, Section 4 details the internals of our two-phase recommendation strategy, exploring synergies between our improved SA techniques and content-based filtering in the selection of diverse recommendations. Afterwards, Section 5 provides an example of our strategy in the scope of Digital TV, where we highlight how to exploit our reasoning capabilities to select TV programs among the myriad available in the digital stream. Next, Section 6 presents the experimental evaluation of our approach and discusses scalability and computational feasibility concerns. Finally, Section 7 summarizes the conclusions from our work and motivates possible lines of further research. 2. Background on semantic reasoning In this section, we describe the internals of the semantic reasoning mechanisms exploited in our recommendation strategy: semantic associations and SA techniques. Very briefly, the associations allow to interrelate the items available in the 2 Y. Blanco-Fernández et al. / Information Sciences xxx (2011) xxx–xxx Please cite this article in press as: Y. Blanco-Fernández et al., Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems, Inform. Sci. (2011), doi:10.1016/j.ins.2011.06.016
ARTICLE N PRESS Y. Blanco-Fermandez et aL / Information Sciences xxx(2011)xXx-XXXx ecommender system, whereas the sa techniques serve to discover new knowledge about the users' preferences from the inferred associations and the concepts formalized in the domain ontology. 2.1. Semantic associations The semantic associations employed in our reasoning approach have been borrowed from (4, where Anyanwu and Sheth defined the relationships that can be established between two specific class instances in an ontology. In order to categorize these associations, they resorted to a structure named property sequence, which consists of a set of class instances linked to each other by means of properties. The first class instance defined in the sequence is the origin, the last one is the terminus. and the length of the sequence is the number of properties included in it. The semantic associations defined in [4]are defined next with the aid of Fig. 1 P-path association. Two class instances in and is are p-pathAssociated in an ontology if it is possible to find a property both classiisose origin is in and whose terminus is is(or vice versa ). Obviously, the longer the property sequence linking both class instances, the less significant the relationship between them, due to the presence of many intermediate nodes. p-join association. Two class instances are p-joinAssociated if both are origins(eg i, and is in Fig. 1)or terminus(is and i of two property sequences containing instances belonging to a common class c (named the union class). 2. 2. Spreading Activation techniques Sa techniques are computational mechanisms able to efficiently explore huge generic networks of nodes interconnected by links. According to the guidelines established in [14 these techniques work as follows: Each node is associated to a weight(called the activation level) that grows with its relevance in the network: the more levant the node, the higher its activation level. Besides, each link joining two nodes has a weight whose value is propor tional to the strength of the relationship existing between both nodes. Initially, a set of nodes are selected and the nodes connected with them by links(named neighbor nodes)are activated. In this process, the activation levels of the initially selected nodes are spread until reaching their neighbors in the network. The activation level of a reached node is typically computed by considering the levels of its neighbors and the weights assigned to the links that join them to each other. Consequently, the more relevant the neighbors of a given node (i.e. the higher their activation levels) and the stronger the relationship between the node and its neighbors (i.e. the higher the weights of the links between them). the more relevant the node will be in the network. The spreading process is repeated until reaching all the nodes of the network. In the end the highest activation levels correspond to the nodes that are most closely related to the initially selected ones. Since the spreading process permits to reach nodes that are not directly joined to the initially selected ones, Sa techniques carry out inference processes where new knowledge is learned. To harness these inferential capabilities, several algorithms have been proposed for exploration and extraction of the most significant concepts formalized in a knowledge network. In p-path (u, D Property Sequence: ps=[po, p, p2, p, ① ① p-JoIn q, D p-Join (s D i, i,: instances belonging to class C Fig. 1. Semantic associations adopted in our reasoning-driven approach. cite this article in press as: Y. Blanco-Fernandez et al, Exploring synergies between content-based filtering and Spreading Activation iques in knowledge-based recommender systems, Inform. Sci. (2011). doi: 10. 1016/j ins. 2011.06.016
recommender system, whereas the SA techniques serve to discover new knowledge about the users’ preferences from the inferred associations and the concepts formalized in the domain ontology. 2.1. Semantic associations The semantic associations employed in our reasoning approach have been borrowed from [4], where Anyanwu and Sheth defined the relationships that can be established between two specific class instances in an ontology. In order to categorize these associations, they resorted to a structure named property sequence, which consists of a set of class instances linked to each other by means of properties. The first class instance defined in the sequence is the origin, the last one is the terminus, and the length of the sequence is the number of properties included in it. The semantic associations defined in [4] are defined next with the aid of Fig. 1: q-path association. Two class instances i1 and i5 are q-pathAssociated in an ontology if it is possible to find a property sequence whose origin is i1 and whose terminus is i5 (or vice versa). Obviously, the longer the property sequence linking both class instances, the less significant the relationship between them, due to the presence of many intermediate nodes. q-join association. Two class instances are q-joinAssociated if both are origins (e.g. i1 and i6 in Fig. 1) or terminus (i5 and i8) of two property sequences containing instances belonging to a common class C (named the union class). 2.2. Spreading Activation techniques SA techniques are computational mechanisms able to efficiently explore huge generic networks of nodes interconnected by links. According to the guidelines established in [14], these techniques work as follows: Each node is associated to a weight (called the activation level) that grows with its relevance in the network: the more relevant the node, the higher its activation level. Besides, each link joining two nodes has a weight whose value is proportional to the strength of the relationship existing between both nodes. Initially, a set of nodes are selected and the nodes connected with them by links (named neighbor nodes) are activated. In this process, the activation levels of the initially selected nodes are spread until reaching their neighbors in the network. The activation level of a reached node is typically computed by considering the levels of its neighbors and the weights assigned to the links that join them to each other. Consequently, the more relevant the neighbors of a given node (i.e. the higher their activation levels) and the stronger the relationship between the node and its neighbors (i.e. the higher the weights of the links between them), the more relevant the node will be in the network. The spreading process is repeated until reaching all the nodes of the network. In the end, the highest activation levels correspond to the nodes that are most closely related to the initially selected ones. Since the spreading process permits to reach nodes that are not directly joined to the initially selected ones, SA techniques carry out inference processes where new knowledge is learned. To harness these inferential capabilities, several algorithms have been proposed for exploration and extraction of the most significant concepts formalized in a knowledge network. In i6 i8 p4 p5 i1 i i2 4 i5 i7 i3 p0 p1 p2 p3 i , i : instances belonging to class C 3 7 - join i , i 1 6 - join i,i 1 6 - join i , i 5 8 - join i,i 5 8 i1 i i2 4 i i3 5 p0 p1 p2 p3 Origin Terminus - path i , i 1 5 - path i,i 1 5 Property Sequence: ps [p , p , p , p ] 0123 Fig. 1. Semantic associations adopted in our reasoning-driven approach. Y. Blanco-Fernández et al. / Information Sciences xxx (2011) xxx–xxx 3 Please cite this article in press as: Y. Blanco-Fernández et al., Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems, Inform. Sci. (2011), doi:10.1016/j.ins.2011.06.016
ARTICLE N PRESS Y. Blanco-Ferndndez et aL Information Sciences xcx(2011)xxx-xXx literature, many applications resort to the so-called Hopfield Net algorithm due to its beneficial properties of search and knowledge discovery, as explained in [ 23- 2. 2.1. The Hopfield Net algorithm iBe the Hopfield Net algorithm is based on a neural network that provides two capabilities especially relevant for the spread- ation by iteration)until e in e l1 the nodes of 12) for details ) On the one hand, the search capabilities allow the algo- levels of the remaining nodes in the network. On the other, the algorithm Hopfield Net traverses successively the nodes(iter heir activation levels converge to a stable value. The internals are as follows Firstly, a value 1 is assigned as the activation level for the initially activated nodes, and a value 0 is established for the remaining nodes of the network. ext, the initial activation levels are spread through the network, and the levels corresponding to all the nodes are com- puted by using the sigmoid function Us included in Eq (1)): A(+1)=5∑A(0)w),0≤j≤n-1 A(t+1)is the activation level of the node j in iteration t+ 1 A(t) is the activation level of the node i in iteration t. n is the number of nodes in the network, Wi is the weight of the link between the nodes i and j, being Wi=0 if there does not exist a link between them in the netwo where 0, is a configurable threshold, and 02 is a parameter used to modify the shape of the sigmoid function fs(x). The spreading process is repeated until the activation level of all the nodes reach a stable value, as indicated by Eq. (2), where s is a configurable parameter taking very low values (t+1)-A(D≤ 2. 2. Limitations of traditional Sa techniques in personalization field We have identified two severe drawbacks that prevent us from exploiting the inferential capabilities of traditional SA techniques in our reasoning-driven recommendation strategy. These drawbacks lie within()the kind of links modeled in the considered network and (ii) the weighting processes of those links. On the one hand, the kind of the modeled links is closely related to the richness of the reasoning processes carried out luring the spreading process. These links establish paths to propagate the relevance of the initially activated nodes to other nodes closely related to them. For that reason, it is possible that some significant nodes never be detected, due to the absence of links reaching them in the network. Existing SA techniques(see examples in 32, 35, 23, 37))model very simple relationships, which lead to poor inferences and prevent from discovering the knowledge hidden behind more complex associations. The second limitation of traditional Sa approaches is related to the weighting processes of the links modeled in the net work. According to the guidelines described in Section 2. 2, these weights remain invariable time. because their val- ues depend either on the existence of a relationship between the two linked nodes or on the strength of this relationship. This static weighting process is not appropriate for our personalization process, where it is necessary that the weights assigned to the links of the users network enable to: (1) learn automatically his her preferences from the feedback pro- vided after recommendations and (ii) adapt dynamically the spread-based inference process as these preferences evolve. In Section 4, we will explain how our reasoning-driven approach fights above limitations by extending traditional SA techniques so that they can be adopted in a content-based recommender system. Prior to that, the next section describes the procedures we have followed to formalize the domain ontology and to model the user profiles. 3. Background on our reasoning-driven personalization framewor 3.1. The domain ontology In the field of the Semantic Web, an ontology characterizes the concepts typical in a domain and their relationships by means of classes and properties, respectively, which are organized hierarchically [8]. Besides, the ontology is populated Please cite this article in press as: Y. Blanco-Fernandezet al, Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems, Inform. Sci.(2011). doi: 10.1016/jins2011.06.016
literature, many applications resort to the so-called Hopfield Net algorithm due to its beneficial properties of search and knowledge discovery, as explained in [23]. 2.2.1. The Hopfield Net algorithm The Hopfield Net algorithm is based on a neural network that provides two capabilities especially relevant for the spreading process: parallel search and convergence (see [12] for details). On the one hand, the search capabilities allow the algorithm to activate in each iteration all the nodes of the network in parallel, computing their activation levels according to the levels of the remaining nodes in the network. On the other, the algorithm Hopfield Net traverses successively the nodes (iteration by iteration) until their activation levels converge to a stable value. The internals are as follows: Firstly, a value 1 is assigned as the activation level for the initially activated nodes, and a value 0 is established for the remaining nodes of the network. Next, the initial activation levels are spread through the network, and the levels corresponding to all the nodes are computed by using the sigmoid function (fS included in Eq. (1)): Ajðt þ 1Þ ¼ fS Xn1 i¼0 AiðtÞ wij !; 0 6 j 6 n 1 ð1Þ In this expression: – Aj(t + 1) is the activation level of the node j in iteration t + 1, – Ai(t) is the activation level of the node i in iteration t, – n is the number of nodes in the network, – wij is the weight of the link between the nodes i and j, being wij = 0 if there does not exist a link between them in the network, – fSðxÞ ¼ 1 1þexp h1x h2 h i, where h1 is a configurable threshold, and h2 is a parameter used to modify the shape of the sigmoid function fS(x). The spreading process is repeated until the activation level of all the nodes reach a stable value, as indicated by Eq. (2), where n is a configurable parameter taking very low values Xn1 j¼0 jAjðt þ 1Þ AjðtÞj 6 n ð2Þ 2.2.2. Limitations of traditional SA techniques in personalization field We have identified two severe drawbacks that prevent us from exploiting the inferential capabilities of traditional SA techniques in our reasoning-driven recommendation strategy. These drawbacks lie within (i) the kind of links modeled in the considered network and (ii) the weighting processes of those links. On the one hand, the kind of the modeled links is closely related to the richness of the reasoning processes carried out during the spreading process. These links establish paths to propagate the relevance of the initially activated nodes to other nodes closely related to them. For that reason, it is possible that some significant nodes never be detected, due to the absence of links reaching them in the network. Existing SA techniques (see examples in [32,35,23,37]) model very simple relationships, which lead to poor inferences and prevent from discovering the knowledge hidden behind more complex associations. The second limitation of traditional SA approaches is related to the weighting processes of the links modeled in the network. According to the guidelines described in Section 2.2, these weights remain invariable over time, because their values depend either on the existence of a relationship between the two linked nodes or on the strength of this relationship. This static weighting process is not appropriate for our personalization process, where it is necessary that the weights assigned to the links of the user’s network enable to: (i) learn automatically his/her preferences from the feedback provided after recommendations and (ii) adapt dynamically the spread-based inference process as these preferences evolve. In Section 4, we will explain how our reasoning-driven approach fights above limitations by extending traditional SA techniques so that they can be adopted in a content-based recommender system. Prior to that, the next section describes the procedures we have followed to formalize the domain ontology and to model the user profiles. 3. Background on our reasoning-driven personalization framework 3.1. The domain ontology In the field of the Semantic Web, an ontology characterizes the concepts typical in a domain and their relationships by means of classes and properties, respectively, which are organized hierarchically [8]. Besides, the ontology is populated 4 Y. Blanco-Fernández et al. / Information Sciences xxx (2011) xxx–xxx Please cite this article in press as: Y. Blanco-Fernández et al., Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems, Inform. Sci. (2011), doi:10.1016/j.ins.2011.06.016
ARTICLE N PRESS Y. Blanco-Fermandez et aL / Information Sciences xxx(2011)xXx-XXXx Sstancssff (hierarchical inks) Contents Cast away [1]IsAbout [2] HasActor [31 HasPlace [4]Has Period [5] Has Presenter [5] HasCRID East away a nurse Fig. 2. Subset of classes (top), properties and specific instances(bottom defined in an ontology about the tv domain. by including specific instances of classes and properties. In the context of a recommender system, class instances represent the available items and their attributes, whereas property instances link items and attributes to each other. We depict in Fig. 2 a brief excerpt from an ontology for the tv domain, defined from the Tv-Anytime specification(a collection of meta- data providing detailed descriptions about generic audiovisual contents 38 ). In this figure, it is possible to identify several class instances referred to specific TV programs, which belong to a hierarchy of genres(e. g. Fiction, Sports, Music, Leisure). The attributes of these Tv contents(e.g. cast, intented audience, topics) are also identified by hierarchically-organized classes and related to each program by means of labeled properties(eg. hasActor, hasIntendedAudience, isAbout). Ontologies have become the cornerstone of the Semantic Web due to two reasons. On the one hand, formal conceptual- izations enable inference processes to discover new knowledge from the represented information On the other, ontologies facilitate automated knowledge sharing, by allowing easy reuse between users and software agents. This feature facilitates the development of ontologies, which would be a tedious task otherwise. Nowadays, there exist repositories containing mu tiple and very diverse ontologies (e.g. SchemaWeb). as well as numerous management tools providing useful functionalities for development tasks(e.g. merging of multiple ontologies, consistency checking, discovery of equivalent classes, reuse of con- cept descriptions, automatic categorization of instances in the appropriate classes via logics-based reasoners [ 3, 17]. etc ) In sum, by reusing the concepts and relationships formalized in publicly available ontologies and resorting to the existing man- gement tools, it is possible to create a domain ontology for reasoning-purposes with acceptable effort. There exist several standard implementation languages for ontology development. The first proposals were RDF [7 and RDFS [10, which added a formal semantics to the purely syntactic specifications provided in XML Next, DAML [15] and OIL 18 arose, which have been finally fused and standardized by w3C as OWL [26]. the most expressive language nowadays including three sub-levels(Lite, DL and Full). The language to use in pends on the knowledge and expressiveness necessities of the domain he application of our reasoning-driven approach de- der system. IAvailableinhttp://www.schemaweb.info/schema/browseschemaaspx. Please cite this article in press as:Y. Blanco-Fernandez et al, Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems, Inform. Sci.(2011). doi: 10.1016)jins2011.06.016
by including specific instances of classes and properties. In the context of a recommender system, class instances represent the available items and their attributes, whereas property instances link items and attributes to each other. We depict in Fig. 2 a brief excerpt from an ontology for the TV domain, defined from the TV-Anytime specification (a collection of metadata providing detailed descriptions about generic audiovisual contents [38]). In this figure, it is possible to identify several class instances referred to specific TV programs, which belong to a hierarchy of genres (e.g. Fiction, Sports, Music, Leisure). The attributes of these TV contents (e.g. cast, intented audience, topics) are also identified by hierarchically-organized classes, and related to each program by means of labeled properties (e.g. hasActor, hasIntendedAudience, isAbout). Ontologies have become the cornerstone of the Semantic Web due to two reasons. On the one hand, formal conceptualizations enable inference processes to discover new knowledge from the represented information. On the other, ontologies facilitate automated knowledge sharing, by allowing easy reuse between users and software agents. This feature facilitates the development of ontologies, which would be a tedious task otherwise. Nowadays, there exist repositories containing multiple and very diverse ontologies (e.g. SchemaWeb1 ), as well as numerous management tools providing useful functionalities for development tasks (e.g. merging of multiple ontologies, consistency checking, discovery of equivalent classes, reuse of concept descriptions, automatic categorization of instances in the appropriate classes via logics-based reasoners [3,17], etc.). In sum, by reusing the concepts and relationships formalized in publicly available ontologies and resorting to the existing management tools, it is possible to create a domain ontology for reasoning-purposes with acceptable effort. There exist several standard implementation languages for ontology development. The first proposals were RDF [7] and RDFS [10], which added a formal semantics to the purely syntactic specifications provided in XML. Next, DAML [15] and OIL [18] arose, which have been finally fused and standardized by W3C as OWL [26], the most expressive language nowadays including three sub-levels (Lite, DL and Full). The language to use in the application of our reasoning-driven approach depends on the knowledge and expressiveness necessities of the domain considered and the recommender system. [6] Properties BBC breaking news CRID1 CRID2 CRID4 CRID6 CRID7 CRID8 CRID15 CRID14 CRID16 CRID11 CRID10 CRID13 CRID12 CRID5 CRID3 Darren Gordon [5] Hamlet Braveheart Renaissance T enice Michelangelo’s David Sculpture Renaissance sculpture want New York Inside Sydney T New York Sydney Delhi The ceramics Ceramics Indian culinary specialties Bombay Hamlet Cooking Hell’s kitchen stove [6] [6] [6] [6] [6] [6] [6] [6] [6] [6] [6] [6] [1] [1] [1] [1] [1] [1] [2] [2] [2] [6] [3] [3] [4] [3] [2] [2] [6] [1] [1] [1] [2] Varanasi CRID17 [6] [3] [3] Cast Away Indian culinary specialties BBC breaking news Inside New York Sydney T stove Tourism Cookery Leisure Contents Fiction Contents Contents TV Contents Sculpture Ceramics India cities InstanceOf Varanasi Delhi Bombay Romance Drama Action Ceramics Sculpture News Cultural Arts Reality Shows USA cities New York Australia cities Sydney want enice ceramics Renaissance sculpture Michelangelo’s David Hell’s kitchen Hamlet Braveheart Fig. 2. Subset of classes (top), properties and specific instances (bottom) defined in an ontology about the TV domain. 1 Available in http://www.schemaweb.info/schema/BrowseSchema.aspx. Y. Blanco-Fernández et al. / Information Sciences xxx (2011) xxx–xxx 5 Please cite this article in press as: Y. Blanco-Fernández et al., Exploring synergies between content-based filtering and Spreading Activation techniques in knowledge-based recommender systems, Inform. Sci. (2011), doi:10.1016/j.ins.2011.06.016