Exploiting Synergy between Ontologies and Recommender Systems Stuart E. Middleton Harith Alani. David C. De roure Intelligence, Agents and Multimedia Group Department of Electronics and Computer Science University of Southampton Southampton, So17 1BJ, UK (sem99r, ha, dder)@ecssoton ac uk ABSTRACT Recommender systems [23 learn about user preferences over time Recommender systems learn about user preferences over time and automatically find things of similar interest, thus reducing the automatically finding things of similar interest. This reduces the burden of creating explicit queries. They dynamically track users burden of creating explicit queries. Recommender systems do, as their interests change. However, such systems require an initial however, suffer from cold-start problems where no initial learning phase where behaviour information is built up to form an formation is available early on upon which to base user profile. During this initial learning phase performance is often poor due to the lack of user information; this is known as the cold-start problem [17] Semantic knowledge stru can provide valuable domain knowl Howeve There has been increasing interest in developing and using tools h kn not a trivial for creating annotated content and making it available over the task and user interests acquire and semantic web Ontologies are one such tool, used to maintain and provide access to specific knowledge repositories. Such source could complement the behavioral information held within This paper investigates the synergy between a web-based research recommender systems, by providing some initial knowledge about aper recommender system and an ontology containing users and their domains of interest. It should thus be possible to information automatically extracted from departmental databases available on the web. The ontology is used to address the bootstrap the initial learning phase of a recommender system with such knowledge, easing the cold-start problem. system addresses the ontology's interest-acquisition problem. An In return for any bootstrap information the recommender system performance of the integrated systems measured This would reduce the effort involved in acquiring and maintaining knowledge of people's research interests. To this end General terms we investigate the integration of Quickstep, a web-based Design, Experimentation recommender system, an ontology for the academic domain and OntoCoPl, a community of practice identifier that can pick out Keywords Cold-start problem, interest-acquisition problem, ontology, 2. RECOMMENDER SYSTEMS recommender system. People may find articulating what they want hard, but they are good at recognizing it when they see it. This insight has led to the 1. INTRODUCTION utilization of relevance feedback [24], where people rate web The mass of content available on the World-Wide Web raises pages as interesting or not interesting and the system tries to find important questions over its effective use. Search engines filter pages that match the interesting, positive examples and do not web pages that match explicit queries, but most people find match the not interesting, negative examples. With sufficient articulating exactly what they want difficult. The result is large positive and negative examples, modern machine learning lists of search results that contain a handful of useful pages, techniques can classify new pages with impressive accuracy. Such defeating the purpose of filtering in the first place systems are called content-based recommender systems Another way to recommend pages is based on the ratings of other people who have seen the page before. Collaborative Permission to make d ersonal or classroom use is granted c. ork for recommender systems do this by asking people to rate explicitly not made or distributed for profit and that pages and then recommending new pages that similar users have rated highly. The problem with collaborative filtering is that there otherwise, to republish, to post on servers or to redistribute to lists, is no direct reward for providing examples since they only help ecific permission by the authors other people. This leads to initial difficulties in obtaining a Semantic Web Workshop 2002 Hawaii, USA ufficient number of ratings for the system to be useful. Copyright by the autho
Exploiting Synergy between Ontologies and Recommender Systems Stuart E. Middleton, Harith Alani, David C. De Roure Intelligence, Agents and Multimedia Group Department of Electronics and Computer Science University of Southampton Southampton, SO17 1BJ, UK {sem99r,ha,dder}@ecs.soton.ac.uk ABSTRACT Recommender systems learn about user preferences over time, automatically finding things of similar interest. This reduces the burden of creating explicit queries. Recommender systems do, however, suffer from cold-start problems where no initial information is available early on upon which to base recommendations. Semantic knowledge structures, such as ontologies, can provide valuable domain knowledge and user information. However, acquiring such knowledge and keeping it up to date is not a trivial task and user interests are particularly difficult to acquire and maintain. This paper investigates the synergy between a web-based research paper recommender system and an ontology containing information automatically extracted from departmental databases available on the web. The ontology is used to address the recommender systems cold-start problem. The recommender system addresses the ontology’s interest-acquisition problem. An empirical evaluation of this approach is conducted and the performance of the integrated systems measured. General Terms Design, Experimentation. Keywords Cold-start problem, interest-acquisition problem, ontology, recommender system. 1. INTRODUCTION The mass of content available on the World-Wide Web raises important questions over its effective use. Search engines filter web pages that match explicit queries, but most people find articulating exactly what they want difficult. The result is large lists of search results that contain a handful of useful pages, defeating the purpose of filtering in the first place. Recommender systems [23] learn about user preferences over time and automatically find things of similar interest, thus reducing the burden of creating explicit queries. They dynamically track users as their interests change. However, such systems require an initial learning phase where behaviour information is built up to form an user profile. During this initial learning phase performance is often poor due to the lack of user information; this is known as the cold-start problem [17]. There has been increasing interest in developing and using tools for creating annotated content and making it available over the semantic web. Ontologies are one such tool, used to maintain and provide access to specific knowledge repositories. Such sources could complement the behavioral information held within recommender systems, by providing some initial knowledge about users and their domains of interest. It should thus be possible to bootstrap the initial learning phase of a recommender system with such knowledge, easing the cold-start problem. In return for any bootstrap information the recommender system could provide details of dynamic user interests to the ontology. This would reduce the effort involved in acquiring and maintaining knowledge of people’s research interests. To this end we investigate the integration of Quickstep, a web-based recommender system, an ontology for the academic domain and OntoCoPI, a community of practice identifier that can pick out similar users. 2. RECOMMENDER SYSTEMS People may find articulating what they want hard, but they are good at recognizing it when they see it. This insight has led to the utilization of relevance feedback [24], where people rate web pages as interesting or not interesting and the system tries to find pages that match the interesting, positive examples and do not match the not interesting, negative examples. With sufficient positive and negative examples, modern machine learning techniques can classify new pages with impressive accuracy. Such systems are called content-based recommender systems. Another way to recommend pages is based on the ratings of other people who have seen the page before. Collaborative recommender systems do this by asking people to rate explicitly pages and then recommending new pages that similar users have rated highly. The problem with collaborative filtering is that there is no direct reward for providing examples since they only help other people. This leads to initial difficulties in obtaining a sufficient number of ratings for the system to be useful. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission by the authors. Semantic Web Workshop 2002 Hawaii, USA Copyright by the authors
Hybrid systems, attempting to combine the advantages of content- team (Advanced Knowledge Technologies (200). It models ased and collaborative recommender systems, have proved people, projects, papers, events and research interests. The pular to-date. The feedback required for content-based ontology itself is implemented in Protege 2000 [101, a graphical commendation is shared, allowing collaborative tool for developing knowledge-based systems. It is populated with ecommendation as well. We use the Quickstep [18 hybrid information extracted automatically from a departmental papers. is focused on people, projects, and publications 2.1 The cold-start problem One difficult problem commonly faced by recommender systems 3.1 The Interest-acquisition Problem the cold-start problem [17) where recommendations are Peoples areas of expertise and interests are an important type of quired for new items or users for whom little or no information knowledge for many applications, for example expert finders [ 9] has yet been acquired. Poor performance resulting from a cold Semantic web technology can be a good source of such start can deter user uptake of a recommender system. This effect is thus self-destructive, since the recommender never achieves good the web pages up-to-date. The majority of web pages receive little performance since users never use it for long enough. We will maintenance, holding information that does not date quickly examine two types of cold-start problem Since interests and areas of expertise are dynamic in nature they The new-system cold-start problem is where there are are not often held within web pages. It is thus particularly difficult ratings by users, and hence no profiles of users. In this for an ontology to acquire such information; this is the interest- most recommender systems have no basis Is sItuation acquisition problem. recommend, and hence perform very poorly Many existing systems force users to perform self-assessment to The new-user cold-start problem is where the system s exist, but gather such information, but this has numerous disadvantages [ 5] running for a while and a set of user profiles and ratings Lotus have developed a system that monitors user interaction with no information is available about a new user. Most recon a document to capture interests and expertise [16]. Their system systems perform poorly in this situation too does not, however, consider the online documents that users Collaborative recommender systems fail to help in cold-star ituations, as they cannot discover similar user behaviour because This paper investigates linking an ontology with a recommender there is not enough previously logged behaviour data upon whic system to help overcoming the interest acquisition problem. The to base any correlations. Content-based and hybrid recommender recommender system will regularly provide the ontology with systems perform a little better since they need just a few examp nterest profiles for users, obtained by monitoring user web of user interest in order to find similar items ysing feedback on recommended papers No recommender system can cope alone with a totally cold-start however, since even content-based recommenders require a small 4. Related work number of examples on which to base recommendations. We ropose to link together a recommender system and an ontology recommend items liked by similar people. PHOAKS [26 is an to address this problem. The ontology can provide a variety of example of a collaborative filtering, recommending web links information on users and their publications. Publications provide mentioned in newsgroups articles. Only newsgroups with at least mportant information about what interests a user has had in the 20 posted web links are considered by PHOAKs, avoiding the ast, so provide a basis upon which to create initial profiles that can address the new-system cold start problem. Personnel records cold-start problems associated with newer newsgroups containing allow similar users to be identified. This will address the new-user less messages. Group Lens [14 is an alternative example, recommending newsgroup articles. Group Lens reports two cold old-start problem by providing a set of similar users on which to start problems in their experimental analysis. Users abandoned the base a new-user profile. system before they had provided enough ratings to receive commendations and early adopters of the system received poor 3. ONTOLOGIES recommendations until enough ratings were gathered. These An ontology is a conceptualisation of a domain into a human- systems are typical of collaborative recommenders, where a cold- nderstandable, but machine-readable format consisting of start makes early recommendation poor until sufficient people ships, and axioms [12]. Ontologies can provide a rich conceptualisation of the working domain of an Content-based recommender systems recommend items with organisation, representing the main concepts and relationships of similar content to things the user has liked before. An example of information such as an employees home phone number, or they could represent an activity such as authoring a document,or pages. Fab needs a few early ratings from each user in order to recommender, recommending funding information from a In this paper we use the term ontology to refer to the classification database. elfI observes users using a database and infers both tructure and instances within the knowledge base positive and negative examples of interest from this behaviour The ontology used in our work is designed to represent the oth these systems are typical of content-based recommender academic domain, and was developed by Sou on's akt systems, requiring users to use the system for an initial period of time before the cold-start problem is overcome
Hybrid systems, attempting to combine the advantages of contentbased and collaborative recommender systems, have proved popular to-date. The feedback required for content-based recommendation is shared, allowing collaborative recommendation as well. We use the Quickstep [18] hybrid recommender system in this paper to recommend on-line research papers. 2.1 The Cold-start Problem One difficult problem commonly faced by recommender systems is the cold-start problem [17], where recommendations are required for new items or users for whom little or no information has yet been acquired. Poor performance resulting from a coldstart can deter user uptake of a recommender system. This effect is thus self-destructive, since the recommender never achieves good performance since users never use it for long enough. We will examine two types of cold-start problem. The new-system cold-start problem is where there are no initial ratings by users, and hence no profiles of users. In this situation most recommender systems have no basis on which to recommend, and hence perform very poorly. The new-user cold-start problem is where the system has been running for a while and a set of user profiles and ratings exist, but no information is available about a new user. Most recommender systems perform poorly in this situation too. Collaborative recommender systems fail to help in cold-start situations, as they cannot discover similar user behaviour because there is not enough previously logged behaviour data upon which to base any correlations. Content-based and hybrid recommender systems perform a little better since they need just a few examples of user interest in order to find similar items. No recommender system can cope alone with a totally cold-start however, since even content-based recommenders require a small number of examples on which to base recommendations. We propose to link together a recommender system and an ontology to address this problem. The ontology can provide a variety of information on users and their publications. Publications provide important information about what interests a user has had in the past, so provide a basis upon which to create initial profiles that can address the new-system cold start problem. Personnel records allow similar users to be identified. This will address the new-user cold-start problem by providing a set of similar users on which to base a new-user profile. 3. ONTOLOGIES An ontology is a conceptualisation of a domain into a humanunderstandable, but machine-readable format consisting of entities, attributes, relationships, and axioms [12]. Ontologies can provide a rich conceptualisation of the working domain of an organisation, representing the main concepts and relationships of the work activities. These relationships could represent isolated information such as an employee’s home phone number, or they could represent an activity such as authoring a document, or attending a conference. In this paper we use the term ontology to refer to the classification structure and instances within the knowledge base. The ontology used in our work is designed to represent the academic domain, and was developed by Southampton’s AKT team (Advanced Knowledge Technologies [20]). It models people, projects, papers, events and research interests. The ontology itself is implemented in Protégé 2000 [10], a graphical tool for developing knowledge-based systems. It is populated with information extracted automatically from a departmental personnel database and publication database. The ontology consists of around 80 classes, 40 slots, over 13000 instances and is focused on people, projects, and publications. 3.1 The Interest-acquisition Problem People’s areas of expertise and interests are an important type of knowledge for many applications, for example expert finders [9]. Semantic web technology can be a good source of such information, but usually requires substantial maintenance to keep the web pages up-to-date. The majority of web pages receive little maintenance, holding information that does not date quickly. Since interests and areas of expertise are dynamic in nature they are not often held within web pages. It is thus particularly difficult for an ontology to acquire such information; this is the interestacquisition problem. Many existing systems force users to perform self-assessment to gather such information, but this has numerous disadvantages [5]. Lotus have developed a system that monitors user interaction with a document to capture interests and expertise [16]. Their system does not, however, consider the online documents that users browse. This paper investigates linking an ontology with a recommender system to help overcoming the interest acquisition problem. The recommender system will regularly provide the ontology with interest profiles for users, obtained by monitoring user web browsing and analysing feedback on recommended research papers. 4. Related Work Collaborative recommender systems utilize user ratings to recommend items liked by similar people. PHOAKS [26] is an example of a collaborative filtering, recommending web links mentioned in newsgroups articles. Only newsgroups with at least 20 posted web links are considered by PHOAKS, avoiding the cold-start problems associated with newer newsgroups containing less messages. Group Lens [14] is an alternative example, recommending newsgroup articles. Group Lens reports two coldstart problems in their experimental analysis. Users abandoned the system before they had provided enough ratings to receive recommendations and early adopters of the system received poor recommendations until enough ratings were gathered. These systems are typical of collaborative recommenders, where a coldstart makes early recommendation poor until sufficient people have provided ratings. Content-based recommender systems recommend items with similar content to things the user has liked before. An example of a content-based recommender is Fab [4], which recommends web pages. Fab needs a few early ratings from each user in order to create a training set. ELFI [25] is another content-based recommender, recommending funding information from a database. ELFI observes users using a database and infers both positive and negative examples of interest from this behaviour. Both these systems are typical of content-based recommender systems, requiring users to use the system for an initial period of time before the cold-start problem is overcome
Personal web-based agents such as Letizia [15]. Syskill Webert Quick rofiles on an [21] and Personal Webwatcher [ 19] track the users browsing and research paper topics. This allows inferences from the ontology to formulate user profiles. Profiles are constructed from positive and assist profile generation; in our case topic inheritance is used to egative examples of interest, obtained from explicit feedback or infer interest in super-classes of specific topics. Sharing interest euristics analysing browsing behaviour. They then suggest which profiles with the AkT ontology is not difficult since they are ks are worth following from the current web page by explicitly represented using ontological terms commending page links most similar to the users profile. Just Previous trials [18] of Quickstep used hand-crafted initial profiles ke a content-based recommender system, a few examples of based on interview data, to cope with the cold-start problem interest must be observed or elicited from the user before a useful profile can be constructed Linking Quickstep with the Akt ontology automates this process, allowing a more realistic cold-start solution that will scale to Ontologies can be used to improve content-base larger numbers of users in OntoSeek [13]. Users of Onto Seek navigate order to formulate queries. Ontologies can 5.1 Paper classification algorithm automatically construct knowledge bases from web pages, such as Every research paper within Quicksteps central database is in Web-KB [8]. Web-kB takes manually labelled examples of represented using a term frequency vector. Terms are single words domain concepts and applies machine-learning techniques to within the document, so term frequency vectors are computed by lassify new web pages. Both systems do not, however, capture counting the number of times words appear within the paper, Each dynamic information such as user interest dimension within a vector represents a term. Dimensionality Also of relevance are systems such as CiteSeer [6], which use reduction on vectors is achieved by removing common words found op-list and stemming words using the Porter[22] content-based similarity matching to help search for interesting stemming algorithm. Quickstep uses vectors with 10-15,000 research papers within a digital librar dimensions 5. THE QUICKSTEP RECOMMENDER Once added to the database, papers are classified using an IBk [1] SYSTEM classifier boosted by the AdaBoostMl [11 algorithm. The IBk Quickstep [18] is a hybrid recommender system, addressing the classifier is a k-Nearest Neighbour type classifier that uses eal-world problem of recommending on-line research papers to researchers. User browsing behaviour is unobtrusively monitored Figure 2 shows the basic k-Nearest Neighbour algorithm. The via a proxy server, logging each URL browsed during normal closeness of an unclassified vector to its neighbours within the work activity. A nearest-neighbour algorithm classifies browsed ector space determines its classification URLs based on a training set of labelled example papers, storing each new paper in a central database. The database of known pers grows over time, building a shared pool of knowledge. w(d4)=y∑(-b) Explicit feedback and browsed URL's form the basis of the interest profile for each user. Figure 1 shows an overview of the Quickstep system w(da, db )knn distance between document a and b orld wie Users Profiles number of terms in document set Figure 2. k-Nearest Neighbour algorithm Classifie Recommender Classifiers like k-Nearest Neighbour allow more training examples to be added to their vector space without the need to re- build the entire classifier. They also degrade well, so even when incorrect the class returned is normally in the right Classified neighbourhood"and so at least partially relevant. This makes k- papers Nearest Neighbour a robust choice of algorithm for this task Boosting works by repeatedly running a weak learning algorith Figure 1. The Quickstep recommender system on various distributions of the training set, and then combining the classifiers produced by the weak learner into a single Cach day a set of recommendations is computed, based or composite classifier. The "weak "learning algorithm here is the IBk classifier. Figure 3 shows the Ada BoostMI algorithm correlations between user interest profiles and classified paper opics. Any feedback offered by users on these recommendations is recorded when the user looks at them. Users can provide new examples of topics and correct paper classifications where wrong. In this way the training set, and hence classification accuracy, Improves over time
Personal web-based agents such as Letizia [15], Syskill & Webert [21] and Personal Webwatcher [19] track the users browsing and formulate user profiles. Profiles are constructed from positive and negative examples of interest, obtained from explicit feedback or heuristics analysing browsing behaviour. They then suggest which links are worth following from the current web page by recommending page links most similar to the users profile. Just like a content-based recommender system, a few examples of interest must be observed or elicited from the user before a useful profile can be constructed. Ontologies can be used to improve content-based search, as seen in OntoSeek [13]. Users of OntoSeek navigate the ontology in order to formulate queries. Ontologies can also be used to automatically construct knowledge bases from web pages, such as in Web-KB [8]. Web-KB takes manually labelled examples of domain concepts and applies machine-learning techniques to classify new web pages. Both systems do not, however, capture dynamic information such as user interests. Also of relevance are systems such as CiteSeer [6], which use content-based similarity matching to help search for interesting research papers within a digital library. 5. THE QUICKSTEP RECOMMENDER SYSTEM Quickstep [18] is a hybrid recommender system, addressing the real-world problem of recommending on-line research papers to researchers. User browsing behaviour is unobtrusively monitored via a proxy server, logging each URL browsed during normal work activity. A nearest-neighbour algorithm classifies browsed URL’s based on a training set of labelled example papers, storing each new paper in a central database. The database of known papers grows over time, building a shared pool of knowledge. Explicit feedback and browsed URL’s form the basis of the interest profile for each user. Figure 1 shows an overview of the Quickstep system. World Wide Web Users Profiles Classifier Recommender Classified papers World Wide Web Users Profiles Classifier Recommender Classified papers Figure 1. The Quickstep recommender system Each day a set of recommendations is computed, based on correlations between user interest profiles and classified paper topics. Any feedback offered by users on these recommendations is recorded when the user looks at them. Users can provide new examples of topics and correct paper classifications where wrong. In this way the training set, and hence classification accuracy, improves over time. Quickstep bases its user interest profiles on an ontology of research paper topics. This allows inferences from the ontology to assist profile generation; in our case topic inheritance is used to infer interest in super-classes of specific topics. Sharing interest profiles with the AKT ontology is not difficult since they are explicitly represented using ontological terms. Previous trials [18] of Quickstep used hand-crafted initial profiles, based on interview data, to cope with the cold-start problem. Linking Quickstep with the AKT ontology automates this process, allowing a more realistic cold-start solution that will scale to larger numbers of users. 5.1 Paper classification algorithm Every research paper within Quickstep’s central database is represented using a term frequency vector. Terms are single words within the document, so term frequency vectors are computed by counting the number of times words appear within the paper. Each dimension within a vector represents a term. Dimensionality reduction on vectors is achieved by removing common words found in a stop-list and stemming words using the Porter [22] stemming algorithm. Quickstep uses vectors with 10-15,000 dimensions. Once added to the database, papers are classified using an IBk [1] classifier boosted by the AdaBoostM1 [11] algorithm. The IBk classifier is a k-Nearest Neighbour type classifier that uses example documents, called a training set, added to a vector space. Figure 2 shows the basic k-Nearest Neighbour algorithm. The closeness of an unclassified vector to its neighbours within the vector space determines its classification. w(da,db) = √ ____________ Σ j = 1..T (tja – tjb) 2 w(da,db) kNN distance between document a and b da,db document vectors T number of terms in document set tja weight of term j document a w(da,db) = √ ____________ Σ j = 1..T (tja – tjb) 2 w(da,db) kNN distance between document a and b da,db document vectors T number of terms in document set tja weight of term j document a Figure 2. k-Nearest Neighbour algorithm Classifiers like k-Nearest Neighbour allow more training examples to be added to their vector space without the need to rebuild the entire classifier. They also degrade well, so even when incorrect the class returned is normally in the right “neighbourhood” and so at least partially relevant. This makes kNearest Neighbour a robust choice of algorithm for this task. Boosting works by repeatedly running a weak learning algorithm on various distributions of the training set, and then combining the classifiers produced by the weak learner into a single composite classifier. The “weak” learning algorithm here is the IBk classifier. Figure 3 shows the AdaBoostM1 algorithm
ise all values ofd to 1/N Artificial- Agents or t=l.T Game Theory calculate errore calculate B e/(l-ey calculate D计1 nowledge Representation classifier= argmax ∑ Machine Learning ith result class c hilosophy (All ech [Al] D class weight distribution on iteration Vision[All number of classes T number of iterations weak-learn(D, weak learner with distribution D, weak learn error on iteration t error adjustment value on iteration t Content-Based Navigation classifier final boosted classifier alization[hypertext] Figure 3. AdaboostMl boosting algorithm Figure 5. Section of the research paper topic ontology AdaBoostMI has been shown to improve [11] the performance of 5.3 Recommendation algorithm algorithms, particularly for stronger learning Recommendations are formulated from a correlation between the e k-Nearest Neighbour. It is thus a sensible choice users current topics of interest and papers classified as belonging to boo k classifier to those topics. a paper is only recommended if it does not appear in the users browsed URL log, that recommendations 5.2 User profiling algorithm have not been seen before. For each user, the top three interesting The profiling algorithm performs correlation between paper topic topics are selected with 10 recommendations made in total, Papers classifications and user browsing logs. Whenever a research pape are ranked in order of the recommendation confidence before is browsed that has been classified as belonging to a topic, it being presented to the user. Figure 6 shows the recommendation accumulates an interest score for that topic. Explicit feedback or commendations also accumulates interest value for topics. The current interest of a topic is computed using the inverse time Recommendation confidence= classification confidence t weighting algorithm shown in Figure 4 pic interest value Figure 6. Recommendation algorithm opIc Interest T ∑ Interest value(n)/ days old(n) 6. ONTOCOP Interest values Paper browsed=1 The Ontology-based Communities of Practice Identifier (Onto CoPD)[2] is an experimental system that uses the AKT Recommendation followed= 2 ontology to help identifying communities of practice(CoP). The Topic rated interesting= 10 community of practice of a person is taken here to be the closest Topic rated not interesting=-10 group of people, based on specific features they have in common with that given person. A community of practice is thus Figure 4. Profiling algorithm informal group of people who share some common interest in a particular practice [7[27]. Workplace communities of practice An is-a hierarchy of research paper topics is held so that super- mprove organisational performance by maintaining implicit knowledge, helping the spread of new ideas and sol class relationships can be used to infer broader topic interest. as a focus for innovation and driving organisational strategy When a specific topic is browsed, fractional interest is inferred for ach super-class of that topic, using a 1/ eel weighting where Identifying communities of practice is an essential first step to level'refers to how many classes up the is-a tree the super-class understand the knowledge resources of an organization [28] is from the original gure ows a section from the Organisations can bring the right people together to help the identified communities of practice to flourish and expand, for example by providing them with appropriate infrastructure and give them support and recognition. However, community of practice identification is currently a resource-heavy process
Initialise all values of D to 1/N Do for t=1..T call weak-learn(Dt ) calculate error et calculate βt = et /(1-et ) calculate Dt+1 Dt class weight distribution on iteration t N number of classes T number of iterations weak-learn(Dt ) weak learner with distribution Dt et weak_learn error on iteration t βt error adjustment value on iteration t classifier final boosted classifier C all classes classifier = argmax Σ log t = all iterations with result class c c ∈ C βt 1 __ Initialise all values of D to 1/N Do for t=1..T call weak-learn(Dt ) calculate error et calculate βt = et /(1-et ) calculate Dt+1 Dt class weight distribution on iteration t N number of classes T number of iterations weak-learn(Dt ) weak learner with distribution Dt et weak_learn error on iteration t βt error adjustment value on iteration t classifier final boosted classifier C all classes classifier = argmax Σ log t = all iterations with result class c c ∈ C βt 1 __ classifier = argmax Σ log t = all iterations with result class c c ∈ C βt 1 __ Figure 3. AdaBoostM1 boosting algorithm AdaBoostM1 has been shown to improve [11] the performance of weak learner algorithms, particularly for stronger learning algorithms like k-Nearest Neighbour. It is thus a sensible choice to boost our IBk classifier. 5.2 User profiling algorithm The profiling algorithm performs correlation between paper topic classifications and user browsing logs. Whenever a research paper is browsed that has been classified as belonging to a topic, it accumulates an interest score for that topic. Explicit feedback on recommendations also accumulates interest value for topics. The current interest of a topic is computed using the inverse time weighting algorithm shown in Figure 4. ˇ n 1..no of instances Topic interest = Interest value(n) / days old(n) Interest values Paper browsed = 1 Recommendation followed = 2 Topic rated interesting = 10 Topic rated not interesting = -10 ˇ n 1..no of instances Topic interest = Interest value(n) / days old(n) Interest values Paper browsed = 1 Recommendation followed = 2 Topic rated interesting = 10 Topic rated not interesting = -10 Figure 4. Profiling algorithm An is-a hierarchy of research paper topics is held so that superclass relationships can be used to infer broader topic interest. When a specific topic is browsed, fractional interest is inferred for each super-class of that topic, using a 1/2level weighting where ‘level’ refers to how many classes up the is-a tree the super-class is from the original topic. Figure 5 shows a section from the research paper topic ontology. Artificial Intelligence Hypermedia E-Commerce Interface Agents Mobile Agents Multi-Agent-Systems Recommender Systems Agents Belief Networks Fuzzy Game Theory Genetic Algorithms Genetic Programming Knowledge Representation Information Filtering Information Retrieval Machine Learning Natural Language Neural Networks Philosophy [AI] Robotics [AI] Speech [AI] Vision [AI] Text Classification Ontologies Adaptive Hypermedia Hypertext Design Industrial Hypermedia Literature [hypermedia] Open Hypermedia Spatial Hypertext Taxonomic Hypertext Visualization [hypertext] Web [hypermedia] Content-Based Navigation Architecture [open hypermedia] Figure 5. Section of the research paper topic ontology 5.3 Recommendation algorithm Recommendations are formulated from a correlation between the users current topics of interest and papers classified as belonging to those topics. A paper is only recommended if it does not appear in the users browsed URL log, ensuring that recommendations have not been seen before. For each user, the top three interesting topics are selected with 10 recommendations made in total. Papers are ranked in order of the recommendation confidence before being presented to the user. Figure 6 shows the recommendation algorithm. Recommendation confidence = classification confidence * topic interest value Figure 6. Recommendation algorithm 6. ONTOCOPI The Ontology-based Communities of Practice Identifier (OntoCoPI) [2] is an experimental system that uses the AKT ontology to help identifying communities of practice (CoP). The community of practice of a person is taken here to be the closest group of people, based on specific features they have in common with that given person. A community of practice is thus an informal group of people who share some common interest in a particular practice [7] [27]. Workplace communities of practice improve organisational performance by maintaining implicit knowledge, helping the spread of new ideas and solutions, acting as a focus for innovation and driving organisational strategy. Identifying communities of practice is an essential first step to understand the knowledge resources of an organization [28]. Organisations can bring the right people together to help the identified communities of practice to flourish and expand, for example by providing them with appropriate infrastructure and give them support and recognition. However, community of practice identification is currently a resource-heavy process
largely based of such community structures that are normally hidden within and AKT across organisations. Ontology OntoCoPI is a tool that uses ontology-based network analysis to User interest breadth-first spreading activation algorithm is applied by User OntoCoPI to crawl the ontology network of instances and publications relationships to extract patterns of certain relations between entities relating to a community of practice. The crawl can be Quickstep OntoCoPl limited to a given set of ontology relationships. These relationships can be traced to find specific information, such as who attended the same events, who co-authored papers and who are members of the same project or organisation. Communities of Figure 7. Ontology and recommender system integration practice are based on informal sets of relationships while ntologies are normally made up of formal relationships. The Upon start-up, the ontology provides the recommender hypothesis underlying OntoCoPI is that some informa with an initial set of publications for each of its registere ationships can be inferred from the presence of formal ones Each user's known publications are then correlated For instance, if A and B have no formal relationships, but they recommender systems classified paper database, and a set of have both authored papers with C, then that could indicate a historical interests compiled for that user. These historical interests form the basis of an initial profile, overcoming the new One of the advantages of using an ontology to identify system cold-start problem. Figure 8 details the initial profile communities of practice, rather than other traditional information algorithm. As per the Quickstep profiling algorithm, fractional networks [3 is that relationships can be selected according to interest in a topic super-classes is inferred when a specific topic is their semantics, and can have different weights to reflect relative importance. For example the relations of document authorship and project membership can be selected if it is required to identify communities of practice based on publications and project work. ∑ 1/ publication age(n) OntoCoPI allows manual selection of relationships or automatic selection based on the frequency of relationship use within the knowledge base. Selecting the right relationships and weights is an experimental process that is dependent on the ontolog structure,the type and amount of information in the ontology, and new-system initial profile =(t, topic interest(D))* the type of community of practice required t= <research paper topic> When working with a new community of practice some Figure 8. New-system initial profile algorithm experiments will be needed to see which relationships are relevant to the desired community of practice, and how to set relative weights. In the experiments described in this paper, certain When the recommender system is up and running and a new user relationships were selected manually and weighted based on our is added, the ontology provides the historical publication list of preferences. Further trials are needed to determine the most the new user and the OntoCoPI system provides a ranked list of effective selection similar users. The initial profile of the new user is formed from a correlation between historical publications and any similar user 7. INTEGRATION OF THE TWO profiles. This algorithm is detailed in figure 9, and addresses the TECHNOLOGIES new-user cold-start problem We have investigated the integration of the ontology, Onto CoPI and Quickstep recommender system to provide a solution to both the cold-start problem and interest acquisition problem. Figure 7 shows our experimental systems after integration
largely based on interviews, mainly because of the informal nature of such community structures that are normally hidden within and across organisations. OntoCoPI is a tool that uses ontology-based network analysis to support the task of community of practice identification. A breadth-first spreading activation algorithm is applied by OntoCoPI to crawl the ontology network of instances and relationships to extract patterns of certain relations between entities relating to a community of practice. The crawl can be limited to a given set of ontology relationships. These relationships can be traced to find specific information, such as who attended the same events, who co-authored papers and who are members of the same project or organisation. Communities of practice are based on informal sets of relationships while ontologies are normally made up of formal relationships. The hypothesis underlying OntoCoPI is that some informal relationships can be inferred from the presence of formal ones. For instance, if A and B have no formal relationships, but they have both authored papers with C, then that could indicate a shared interest. One of the advantages of using an ontology to identify communities of practice, rather than other traditional information networks [3] is that relationships can be selected according to their semantics, and can have different weights to reflect relative importance. For example the relations of document authorship and project membership can be selected if it is required to identify communities of practice based on publications and project work. OntoCoPI allows manual selection of relationships or automatic selection based on the frequency of relationship use within the knowledge base. Selecting the right relationships and weights is an experimental process that is dependent on the ontology structure, the type and amount of information in the ontology, and the type of community of practice required. When working with a new community of practice some experiments will be needed to see which relationships are relevant to the desired community of practice, and how to set relative weights. In the experiments described in this paper, certain relationships were selected manually and weighted based on our preferences. Further trials are needed to determine the most effective selection. 7. INTEGRATION OF THE TWO TECHNOLOGIES We have investigated the integration of the ontology, OntoCoPI and Quickstep recommender system to provide a solution to both the cold-start problem and interest acquisition problem. Figure 7 shows our experimental systems after integration. AKT Ontology User interest profiles User publications User and domain knowledge Communities of practice Quickstep OntoCoPI AKT Ontology User interest profiles User publications User and domain knowledge Communities of practice Quickstep OntoCoPI Figure 7. Ontology and recommender system integration Upon start-up, the ontology provides the recommender system with an initial set of publications for each of its registered users. Each user’s known publications are then correlated with the recommender systems classified paper database, and a set of historical interests compiled for that user. These historical interests form the basis of an initial profile, overcoming the newsystem cold-start problem. Figure 8 details the initial profile algorithm. As per the Quickstep profiling algorithm, fractional interest in a topic super-classes is inferred when a specific topic is added. ˇ n 1.. publications belonging to class t topic interest(t) = 1 / publication age(n) t = <research paper topic> new-system initial profile = (t, topic interest(t))* ˇ n 1.. publications belonging to class t topic interest(t) = 1 / publication age(n) t = <research paper topic> new-system initial profile = (t, topic interest(t))* Figure 8. New-system initial profile algorithm When the recommender system is up and running and a new user is added, the ontology provides the historical publication list of the new user and the OntoCoPI system provides a ranked list of similar users. The initial profile of the new user is formed from a correlation between historical publications and any similar user profiles. This algorithm is detailed in figure 9, and addresses the new-user cold-start problem