Ontological User Profiling in Recommender Systems STUART E MIDDLETON. NIGEL R SHAD BOLT AND DAVID C. DE ROURE Intelligence, Agents, Multimedia Group, University of Southampton We explore a novel ontological approach to user profiling within recommender systems, working on the problem of recommending on-line academic research papers. Our two experimental systems, Quickstep an profiles in terms of a research paper topic ontology. A novel profile visualization approach is taken to acquire rofile feedback Research papers are classified using ontological classes and collaborative recommendation xperiments, with 24 subjects over 3 months, and a large-scale experiment, with 260 subjects over an academic year, are conducted to evaluate different aspects of our approach. Ontological inference is shown to improve user profiling, extemal ontological knowledge used to successfully bootstrap a recommender system and profile isualization employed to improve profiling accuracy. The overall performance of our ontological recommender esented and favourably compared to other systems in the literature. or,L2.11 ARtificial Intelligence): Distributed Artificial Intelligence- Intelligent agents, H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval- Information filtering, Relevance feedback General Terms: Algorithms, Measurement, Design, Experimentation dditional Key Words and Phrases: Agent, Machine learning, Ontology, Personalization, Recommender stems, User profiling, User modelling INTRODUCTION The mass of content available on the World-Wide Web raises important questions over its effective use. The web is largely unstructured, with pages authored by many people pics, making simple br Web page filtering has thus become necessary for most web users Search engines are effective at filtering pages that match explicit queries Unfortunately, people find articulating what they want explicitly difficult, especially if forced to use a limited vocabulary such as keywords. As such search queries are often as supported by EPsrc award number 99308831 and the Interdisciplinary Research Technologies(AKT) project GR/N15764/01 of Electronics and ce University of Southampton, Southampton, Sol 1BJ, UK omitted 3/10/02. Revision 6/4/03. Final revision 29/9/03 rmission to make digitalhard copy of part of this work for personal or classroom use is granted without fee the title of the publication, and its date of appear, and notice is given that copying is by permission of the ACM Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific n and/or a fee 2001ACM1073-0516/01/30000345500
Ontological User Profiling in Recommender Systems STUART E. MIDDLETON, NIGEL R. SHADBOLT AND DAVID C. DE ROURE Intelligence, Agents, Multimedia Group, University of Southampton ________________________________________________________________________ We explore a novel ontological approach to user profiling within recommender systems, working on the problem of recommending on-line academic research papers. Our two experimental systems, Quickstep and Foxtrot, create user profiles from unobtrusively monitored behaviour and relevance feedback, representing the profiles in terms of a research paper topic ontology. A novel profile visualization approach is taken to acquire profile feedback. Research papers are classified using ontological classes and collaborative recommendation algorithms used to recommend papers seen by similar people on their current topics of interest. Two small-scale experiments, with 24 subjects over 3 months, and a large-scale experiment, with 260 subjects over an academic year, are conducted to evaluate different aspects of our approach. Ontological inference is shown to improve user profiling, external ontological knowledge used to successfully bootstrap a recommender system and profile visualization employed to improve profiling accuracy. The overall performance of our ontological recommender systems are also presented and favourably compared to other systems in the literature. Categories and Subject Descriptors: I.2.6 [Artificial Intelligence]: Learning - Knowledge acquisition; I.2.11 [Artificial Intelligence]: Distributed Artificial Intelligence - Intelligent agents; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval - Information filtering, Relevance feedback General Terms: Algorithms, Measurement, Design, Experimentation Additional Key Words and Phrases: Agent, Machine learning, Ontology, Personalization, Recommender systems, User profiling, User modelling ________________________________________________________________________ 1. INTRODUCTION The mass of content available on the World-Wide Web raises important questions over its effective use. The web is largely unstructured, with pages authored by many people on a diverse range of topics, making simple browsing too time consuming to be practical. Web page filtering has thus become necessary for most web users. Search engines are effective at filtering pages that match explicit queries. Unfortunately, people find articulating what they want explicitly difficult, especially if forced to use a limited vocabulary such as keywords. As such search queries are often ________________________________________________________________________ This research was supported by EPSRC studentship award number 99308831 and the Interdisciplinary Research Collaboration In Advanced Knowledge Technologies (AKT) project GR/N15764/01. Authors' addresses: Intelligence, Agents, Multimedia Group, Department of Electronics and Computer Science, University of Southampton, Southampton, SO17 1BJ, UK Authors’ email: {sem99r,nrs,dder}@ecs.soton.ac.uk. Submitted 3/10/02, Revision 6/4/03, Final revision 29/9/03 Permission to make digital/hard copy of part of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date of appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. © 2001 ACM 1073-0516/01/0300-0034 $5.00
poorly formulated, and result in large lists of search results that contain only a handful of useful pages The semantic web offers the potential for help, allowing more intelligent search queries based on web pages marked up with semantic metadata Semantic web technology is, however, very dependant on the degree to which authors annotate their web pages, and automatic web page annotation is still in its infancy. Annotation requires selflessness in authors because the annotations provided will only help other people searching their web pages. Because of this, the vast majority of web pages are not annotated, and in the foreseeable future will remain so. The semantic web can thus only be of limited benefit to the problem of effective searching Recommender systems go some way to addressing these issues. We present a novel ontological approach to user profiling within recommender systems. Two recommender systems are build, called Quickstep and Foxtrot, and three experiments conducted to evaluate different aspects of their performance. Quickstep uses ontological inference to improve profiling accuracy and integrates an external ontology for profile bootstrapping Foxtrot enhances the Quickstep system by employing the novel idea of visualizing user profiles to acquire direct profile feedback This section discusses our chosen problem domain and our general approach to ontological recommendation, along with related work. In section 2 we describe the Quickstep recommender system and an experiment to show how inference can improve user profiling and hence recommendation accuracy. Section 3 details an integration between the Quickstep recommender system and an external ontology, along with an experiment to demonstrate its effectiveness at bootstrapping profiles. In section 4 the Foxtrot recommender system is described, with an experiment to demonstrate how profile visualization can be used to acquire feedback and hence improve profile accuracy. Lastly in section 5 we bring this work together, collating the evidence found to support ontological to user profiling within recommender systems and discuss future work 1.1 Recommender systems People fin vant hard. but when they see it. This insight has led to the utilization of relevance feedback, where people rate web pages as'interestingor 'not interesting and the system tries to find pages that match the interesting,, positive examples and do not match the not nteresting,, negative examples. With sufficient positive and negative examples, modern machine learning techniques can classify new pages with impressive accuracy, in some
poorly formulated, and result in large lists of search results that contain only a handful of useful pages. The semantic web offers the potential for help, allowing more intelligent search queries based on web pages marked up with semantic metadata. Semantic web technology is, however, very dependant on the degree to which authors annotate their web pages, and automatic web page annotation is still in its infancy. Annotation requires selflessness in authors because the annotations provided will only help other people searching their web pages. Because of this, the vast majority of web pages are not annotated, and in the foreseeable future will remain so. The semantic web can thus only be of limited benefit to the problem of effective searching. Recommender systems go some way to addressing these issues. We present a novel ontological approach to user profiling within recommender systems. Two recommender systems are build, called Quickstep and Foxtrot, and three experiments conducted to evaluate different aspects of their performance. Quickstep uses ontological inference to improve profiling accuracy and integrates an external ontology for profile bootstrapping. Foxtrot enhances the Quickstep system by employing the novel idea of visualizing user profiles to acquire direct profile feedback. This section discusses our chosen problem domain and our general approach to ontological recommendation, along with related work. In section 2 we describe the Quickstep recommender system and an experiment to show how inference can improve user profiling and hence recommendation accuracy. Section 3 details an integration between the Quickstep recommender system and an external ontology, along with an experiment to demonstrate its effectiveness at bootstrapping profiles. In section 4 the Foxtrot recommender system is described, with an experiment to demonstrate how profile visualization can be used to acquire feedback and hence improve profile accuracy. Lastly, in section 5 we bring this work together, collating the evidence found to support ontological to user profiling within recommender systems, and discuss future work. 1.1 Recommender systems People find articulating what they want hard, but they are very good at recognizing it when they see it. This insight has led to the utilization of relevance feedback, where people rate web pages as ‘interesting’ or ‘not interesting’ and the system tries to find pages that match the ‘interesting’, positive examples and do not match the ‘not interesting’, negative examples. With sufficient positive and negative examples, modern machine learning techniques can classify new pages with impressive accuracy; in some
cases text classification accuracy exceeding human capability has been demonstrated arkey 1998] Obtaining sufficient examples is difficult however, especially when trying to obtain negative examples. The problem with asking people for examples is that the cost, in terms of time and effort, of providing the examples generally outweighs the reward people will eventually receive Negative examples are particularly unrewarding, since there could be many irrelevant items to any typical query Unobtrusive monitoring provides positive examples of what the user is looking for without interfering with the users normal work activity. Heuristics can also be applied to nfer negative examples from observed behaviour, although generally with less confidence. This idea has led to content-based recommender systems, which unobtrusively watch user behaviour and recommend new items that correlate with a user's profile Another way to recommend items is based on the ratings provided by other people who have liked the item before. Collaborative recommender systems do this by asking people to rate items explicitly and then recommend new items that similar users have rated highly. An issue with collaborative filtering is that there is no direct reward for providing examples since they only help other people. This leads to initial difficulties in obtaining a sufficient number of ratings for the system to be useful, a problem known as the cold-start problem [Maltz and Ehrlich 1995 Hybrid systems, attempting to combine the advantages of content-based and collaborative recommender systems, have proved popular to-date. The feedback required for content-based recommendation is shared, allowing collaborative recommendation as 1.2 User profiling User profiling is typically either knowledge-based or behaviour-based. Knowledge-based approaches engineer static models of users and dynamically match users to the closest model. Questionnaires and interviews are often employed to obtain this user knowledge Behaviour-based approaches use the user's behaviour as a model, commonly using machine-learning techniques to discover useful patterns in the behaviour. Behavioural logging is employed to obtain the data necessary from which to extract patterns [ Kobsa 1993] provides a good survey of user modelling techniques The user profiling approach used by most recommender systems is behaviour-based, commonly using a binary class model to represent what users find interesting and uninteresting. Machine-learning techniques are then used to find potential items of
cases text classification accuracy exceeding human capability has been demonstrated [Larkey 1998]. Obtaining sufficient examples is difficult however, especially when trying to obtain negative examples. The problem with asking people for examples is that the cost, in terms of time and effort, of providing the examples generally outweighs the reward people will eventually receive. Negative examples are particularly unrewarding, since there could be many irrelevant items to any typical query. Unobtrusive monitoring provides positive examples of what the user is looking for, without interfering with the users normal work activity. Heuristics can also be applied to infer negative examples from observed behaviour, although generally with less confidence. This idea has led to content-based recommender systems, which unobtrusively watch user behaviour and recommend new items that correlate with a user’s profile. Another way to recommend items is based on the ratings provided by other people who have liked the item before. Collaborative recommender systems do this by asking people to rate items explicitly and then recommend new items that similar users have rated highly. An issue with collaborative filtering is that there is no direct reward for providing examples since they only help other people. This leads to initial difficulties in obtaining a sufficient number of ratings for the system to be useful, a problem known as the cold-start problem [Maltz and Ehrlich 1995]. Hybrid systems, attempting to combine the advantages of content-based and collaborative recommender systems, have proved popular to-date. The feedback required for content-based recommendation is shared, allowing collaborative recommendation as well. 1.2 User profiling User profiling is typically either knowledge-based or behaviour-based. Knowledge-based approaches engineer static models of users and dynamically match users to the closest model. Questionnaires and interviews are often employed to obtain this user knowledge. Behaviour-based approaches use the user’s behaviour as a model, commonly using machine-learning techniques to discover useful patterns in the behaviour. Behavioural logging is employed to obtain the data necessary from which to extract patterns. [Kobsa 1993] provides a good survey of user modelling techniques. The user profiling approach used by most recommender systems is behaviour-based, commonly using a binary class model to represent what users find interesting and uninteresting. Machine-learning techniques are then used to find potential items of
nterest in respect to the binary model. There are a lot of effective machine learning algorithms based on two classes. a binary profile does not, however, lend itself to sharing examples of interest or integrating any domain knowledge that might be available Sebastiani 2002] provides a good survey of current machine learning tech An ontology is a conceptualisation of a domain into a human-understandable, but machine-readable format consisting of entities, attributes, relationships, and axioms Guarino and Giaretta 1995]. Ontologies can provide a rich conceptualisation of the working domain of an organisation, representing the main concepts and relationships of the work activities. These relationships could represent isolated information such as an home phone number, or they could represent an activity such as authoring a document, or attending a conference We use the term ontology to refer to the classification structure and instances within a 1.4 Problem domain The web is increasingly becoming the primary source of research papers to the modern researcher. With millions of research papers available over the web from thousands of web sites, finding the right papers and being informed of newly available papers is a problematic task. Browsing this many web sites is too time consuming and search queries are only fully effective if an explicit search query can be formulated for what you need All too often papers are missed We address the problem of recommending on-line research papers to the academic staff and students at the University of Southampton. Academics need to search for explicit research papers and be kept up-to-date on their own research areas when new papers are published. We examine an ontological recommender system approach to support these two activities. Unobtrusive monitoring methods are preferred because researchers have their normal work to perform and would not welcome interruptions from a new system. Very high accuracy on recommendations is not required since users will have the option to simply ignore poor recommendations Real world knowledge acquisition systems are both tricky and complex to evaluate [Shadbolt et al. 1999]. A lot of evaluations are performed with user log data, simulating real user activity, or with standard benchmark collections, such as newspaper articles over a period of one year, that provide a basis for comparison with other systems. Although these evaluations are useful, especially for technique comparison, it is important to back hem up with real world studies so we can see how the benchmark tests generalize to the
interest in respect to the binary model. There are a lot of effective machine learning algorithms based on two classes. A binary profile does not, however, lend itself to sharing examples of interest or integrating any domain knowledge that might be available. [Sebastiani 2002] provides a good survey of current machine learning techniques. 1.3 Ontologies An ontology is a conceptualisation of a domain into a human-understandable, but machine-readable format consisting of entities, attributes, relationships, and axioms [Guarino and Giaretta 1995]. Ontologies can provide a rich conceptualisation of the working domain of an organisation, representing the main concepts and relationships of the work activities. These relationships could represent isolated information such as an employee’s home phone number, or they could represent an activity such as authoring a document, or attending a conference. We use the term ontology to refer to the classification structure and instances within a knowledge base. 1.4 Problem domain The web is increasingly becoming the primary source of research papers to the modern researcher. With millions of research papers available over the web from thousands of web sites, finding the right papers and being informed of newly available papers is a problematic task. Browsing this many web sites is too time consuming and search queries are only fully effective if an explicit search query can be formulated for what you need. All too often papers are missed. We address the problem of recommending on-line research papers to the academic staff and students at the University of Southampton. Academics need to search for explicit research papers and be kept up-to-date on their own research areas when new papers are published. We examine an ontological recommender system approach to support these two activities. Unobtrusive monitoring methods are preferred because researchers have their normal work to perform and would not welcome interruptions from a new system. Very high accuracy on recommendations is not required since users will have the option to simply ignore poor recommendations. Real world knowledge acquisition systems are both tricky and complex to evaluate [Shadbolt et al. 1999]. A lot of evaluations are performed with user log data, simulating real user activity, or with standard benchmark collections, such as newspaper articles over a period of one year, that provide a basis for comparison with other systems. Although these evaluations are useful, especially for technique comparison, it is important to back them up with real world studies so we can see how the benchmark tests generalize to the
eal world setting. Similar problems are seen in the agent domain where, as Nwana INwana 1996] argues, it has yet to be conclusively demonstrated that people really benefit from agent-based information systems This is why a real problem has been chosen upon which to evaluate our work. 1.5 Related work Group Lens (Konstan et al. 1997 is an example of a collaborative filter, recommending newsgroup articles based on a Pearson-r correlation of other users' ratings. Fab [Balabanovic and shoham 1997 is a content-based recommender, recommending web pages based on a nearest-neighbour algorithm working with each individual user's set of positive examples. The Quickstep and Foxtrot systems are hybrid recommender systems combining both these types of approach. Personal web-based agents such as News Dude and Daily Learner [Billsus and Pazzani 2000], Personal Web Watcher [Mladenic 1996] and News Weeder [Lang 1995] build profiles from observed user behaviour. These systems filter news stories/web pages and recommend unseen ones based on content, using k-Nearest Neighbour, naive Bayes and TF-IDF machine learning techniques. Individual sets of positive and negative examples re maintained for each user's profile. In contrast, by using an ontology to represent user profiles we pool these limited training examples, sharing between users examples of each Ontologies are used to improve content-based search, as seen in OntoSeek [ Guarino et al. 1999]. Users of OntoSeek navigate the ontology in order to formulate queries Ontologies are also used to automatically construct knowledge bases from web pages such as in Web-KB [Craven et al. 1998. Web-KB takes manually labelled examples of domain concepts and applies machine-learning techniques to classify new web pages Both systems do not, however, capture dynamic information such as user interests Digital libraries classify and store research papers, such as CiteSeer Bollacker et al 1998]. Typically such libraries are manually created and manually categorized. while systems are digital libraries, the content is dynamically and autonomously updated fre the browsing behaviour of its users IMladenic and Stefan 1999 provides a good survey of text-learning and agent systems, including content-based and collaborative approaches. The systems most related to Quickstep and Foxtrot are Entree [Burke 2000], which uses a knowledge base and case-based reasoning to recommend restaurant data, and raaP [Delgado et al. 1998] that uses simple categories to represent user profiles with unshared individual training sets for each user. None of these systems use an ontology to explicitly represent user profiles
real world setting. Similar problems are seen in the agent domain where, as Nwana [Nwana 1996] argues, it has yet to be conclusively demonstrated that people really benefit from agent-based information systems. This is why a real problem has been chosen upon which to evaluate our work. 1.5 Related work Group Lens [Konstan et al. 1997] is an example of a collaborative filter, recommending newsgroup articles based on a Pearson-r correlation of other users’ ratings. Fab [Balabanović and Shoham 1997] is a content-based recommender, recommending web pages based on a nearest-neighbour algorithm working with each individual user’s set of positive examples. The Quickstep and Foxtrot systems are hybrid recommender systems, combining both these types of approach. Personal web-based agents such as NewsDude and Daily Learner [Billsus and Pazzani 2000], Personal WebWatcher [Mladenić 1996] and NewsWeeder [Lang 1995] build profiles from observed user behaviour. These systems filter news stories/web pages and recommend unseen ones based on content, using k-Nearest Neighbour, naïve Bayes and TF-IDF machine learning techniques. Individual sets of positive and negative examples are maintained for each user’s profile. In contrast, by using an ontology to represent user profiles we pool these limited training examples, sharing between users examples of each class. Ontologies are used to improve content-based search, as seen in OntoSeek [Guarino et al. 1999]. Users of OntoSeek navigate the ontology in order to formulate queries. Ontologies are also used to automatically construct knowledge bases from web pages, such as in Web-KB [Craven et al. 1998]. Web-KB takes manually labelled examples of domain concepts and applies machine-learning techniques to classify new web pages. Both systems do not, however, capture dynamic information such as user interests. Digital libraries classify and store research papers, such as CiteSeer [Bollacker et al. 1998]. Typically such libraries are manually created and manually categorized. While our systems are digital libraries, the content is dynamically and autonomously updated from the browsing behaviour of its users. [Mladenić and Stefan 1999] provides a good survey of text-learning and agent systems, including content-based and collaborative approaches. The systems most related to Quickstep and Foxtrot are Entrée [Burke 2000], which uses a knowledge base and case-based reasoning to recommend restaurant data, and RAAP [Delgado et al. 1998] that uses simple categories to represent user profiles with unshared individual training sets for each user. None of these systems use an ontology to explicitly represent user profiles