Capturing knowledge of user preferences: ontologies in recommender systems Stuart E Middleton, David C. De roure and nigel r shadbolt Department of Electronics and Computer Science University of Southampton Southampton, So17 1BJ, UK Email: (sem99r, dder, nrs)@ecs soton ac uk ABSTRACT where people rate web pages as interesting or not Tools for filtering the World Wide Web exist, but they are interesting and the system tries to find pages that match the hampered by the difficulty of capturing user preferences in interesting examples(positive examples)and do not match such a dynamic environment. We explore the acquisition of the not interesting examples(negative examples). With user profiles by unobtrusive monitoring of browsing sufficient positive and negative examples, modern machine behaviour and application of supervised machine-learning learning techniques can classify new pages with impressive techniques coupled with an ontological representation to accuracy extract user preferences. A multi-class approach to paper Obtaining sufficient examples is difficult however classification is used, allowing the paper topic taxonomy to especially when trying to obtain negative examples.The be utilised during profile construction. The Quickstep problem with asking people for examples is that the cost, in recommender system is presented and two empirical studies terms of time and effort, of providing the examples generally outweighs the reward they will eventually receive e ffectiveness of us Negative examples are particularly unrewarding, since there compared with an extendable flat list could be many irrelevant items to any typical query Keywords Unobtrusive monitoring provides positive examples of what Ontology, recommender system, user profiling, machine the user is looking for, without interfering with the users normal activity. Heuristics can also be applied to infer INTRODUCTION negative examples, although generally with less confidence The mass of content available on the World-Wide Web This idea has led to content-based recommender systems, which unobtrusively watch users browse the web, and largely unstructured pages authored by a massive range of recommend new pages that correlate with a user profile people on a diverse range of topics, simple browsing has Another way to recommend pages is based on the ratings of given way to filtering as the practical way to manage web- other people who have seen the page before. Collaborative based information- and for most of us that means search recommender systems do this by asking people to rate explicitly pages and then recommend new pages that similar Search engines are very effective at filtering pages that users have rated highly. The problem with collaborative articulating what they want extremely difficult, especially if examples since they only help other people. This leads to forced to use a limited vocabulary such as keywords. The initial difficulties in obtaining a sufficient number of ratings result is large lists of search results that contain a handful of for the system to be useful useful pages, defeating the purpose of filtering in the first Hybrid systems, attempting to combine the advantages of content-based and collaborative recommender systems, Recommender Systems can Help have proved popular to-date. The feedback required for Now people may find articulating what they want hard, but content-based recommendatio shared. allowing they are very good at recognizing it when they see it. This collaborative recommendation as well. A hybrid approach insight has led to the utilization of relevance feedback, is used by our Quickstep recommender system This work follows the tradition of over 30 years of LEAVE BLANK THE LAST 2.5 cm(1)OF THE LEFT knowledge acquisition. Knowledge acquisition above the COLUMN ON THE FIRST PAGE FOR THE normal workflow is intrusive and counterproductive. We COPYRIGHT NOTICE present a system with a low level of intrusiveness, driven by people making explicit choices that reflect the real world to capture profiles
Capturing knowledge of user preferences: ontologies in recommender systems Stuart E. Middleton, David C. De Roure and Nigel R. Shadbolt Department of Electronics and Computer Science University of Southampton Southampton, S017 1BJ, UK Email : {sem99r,dder,nrs}@ecs.soton.ac.uk ABSTRACT Tools for filtering the World Wide Web exist, but they are hampered by the difficulty of capturing user preferences in such a dynamic environment. We explore the acquisition of user profiles by unobtrusive monitoring of browsing behaviour and application of supervised machine-learning techniques coupled with an ontological representation to extract user preferences. A multi-class approach to paper classification is used, allowing the paper topic taxonomy to be utilised during profile construction. The Quickstep recommender system is presented and two empirical studies evaluate it in a real work setting, measuring the effectiveness of using a hierarchical topic ontology compared with an extendable flat list. Keywords Ontology, recommender system, user profiling, machine learning INTRODUCTION The mass of content available on the World-Wide Web raises important questions over its effective use. With largely unstructured pages authored by a massive range of people on a diverse range of topics, simple browsing has given way to filtering as the practical way to manage webbased information – and for most of us that means search engines. Search engines are very effective at filtering pages that match explicit queries. Unfortunately, most people find articulating what they want extremely difficult, especially if forced to use a limited vocabulary such as keywords. The result is large lists of search results that contain a handful of useful pages, defeating the purpose of filtering in the first place. Recommender Systems Can Help Now people may find articulating what they want hard, but they are very good at recognizing it when they see it. This insight has led to the utilization of relevance feedback, where people rate web pages as interesting or not interesting and the system tries to find pages that match the interesting examples (positive examples) and do not match the not interesting examples (negative examples). With sufficient positive and negative examples, modern machine learning techniques can classify new pages with impressive accuracy. Obtaining sufficient examples is difficult however, especially when trying to obtain negative examples. The problem with asking people for examples is that the cost, in terms of time and effort, of providing the examples generally outweighs the reward they will eventually receive. Negative examples are particularly unrewarding, since there could be many irrelevant items to any typical query. Unobtrusive monitoring provides positive examples of what the user is looking for, without interfering with the users normal activity. Heuristics can also be applied to infer negative examples, although generally with less confidence. This idea has led to content-based recommender systems, which unobtrusively watch users browse the web, and recommend new pages that correlate with a user profile. Another way to recommend pages is based on the ratings of other people who have seen the page before. Collaborative recommender systems do this by asking people to rate explicitly pages and then recommend new pages that similar users have rated highly. The problem with collaborative filtering is that there is no direct reward for providing examples since they only help other people. This leads to initial difficulties in obtaining a sufficient number of ratings for the system to be useful. Hybrid systems, attempting to combine the advantages of content-based and collaborative recommender systems, have proved popular to-date. The feedback required for content-based recommendation is shared, allowing collaborative recommendation as well. A hybrid approach is used by our Quickstep recommender system. This work follows the tradition of over 30 years of knowledge acquisition. Knowledge acquisition above the normal workflow is intrusive and counterproductive. We present a system with a low level of intrusiveness, driven by people making explicit choices that reflect the real world to capture profiles. LEAVE BLANK THE LAST 2.5 cm (1”) OF THE LEFT COLUMN ON THE FIRST PAGE FOR THE COPYRIGHT NOTICE
The Problem domain in the knowledge query modelling language(KQML). In- As the trend to publish research papers on-line increases, I representations are also seen in researchers are increasingly using the web as their primary knowledge-based systems, which use relationships between source of papers. Typical researchers need to know about web entities(bookmarks, web pages, page authors etc. )to new papers in their general field of interest, and older infer facts about given situat papers relating to their current work. In addition, researchers time is limited, as browsing competes with We use an ontology to investigate how domain knowledge place. It is this problem our can help in the acquisition of user preferences Overview of the Quickstep System Since researchers have their usual work to perform Quickstep une unobtrusive monitoring methods are preferred else they will vIa a proxy server, logging each URL browsed during be reluctant to use the system. Also, very high normal work activi A machine-learning algorithm ecommendation accuracy is not critical as long as the classifies browsed URLs overnight, and saves each system is deemed useful to them classified paper in a central paper store. Explicit feedback and browsed topics form the basis of the interest profile for Evaluation of real world knowledge acquisition systems, as each user Shadbolt [21] discusses, is both tricky and complex. A lot Each day a set of recommendations is computed, based on of evaluations are performed with user log data( simulating real user activity) or with standard benchmark collections correlations between user interest profiles and classified Although these evaluations are useful, especially for paper topics. Any feedback offered on these nique comparison, they must be backed up by real recommendations is recorded when the user looks at them world studies so we can see how the benchmark tests Users can provide new examples of topics and correct generalize to the real world setting. Similar problems are paper classifications where wrong. In this way the training seen in the agent domain where, as Nwana [16] argues, it set improves over time has yet to be conclusively demonstrated if people really benefit from such information systems World wide This is why we have chosen a real problem upon which to evaluate our Quickstep recommender system Web Users E Po User Profiling in Recommender Systems User modelling is typically either knowledge-based or behaviour-based. Knowledge-based approaches engineer static models of users and dynamically match users to the closest model. Behaviour-based approaches use the users Classifier Recommender behaviour itself as a model, often using machine-learning techniques to discover useful patterns of behaviour. Kobsa l0] provides a good survey of user modelling techniques The typical user profiling approach for recommender systems is behaviour-based, using a binary model Classified papers representing what users find interesting and uninteresting. Machine-learning techniques are then used to assess potential items of interest in respect to the binary model Figure I The Quickstep system Empirical Evaluation There are a lot of effective machine learning algorithms based on two classes. Sebastiani [20] provides a good The current literature lacks many clear results as to the survey of current machine learning techniques and extent knowledge-based approaches assist real-world Roure [5] a review of recommender systems systems, where noisy data and differing user opinions exist For this reason we decided to compare the use of an Although more difficult than the binary case, we choose to use a multi-class behavioural model. This allows the classes ontology against a simple flat list, to provide some to represent paper topics, and hence domain knowledge to empirical evidence as to the effectiveness of this approach be used when constructing the user profile. We thus bring Two experiments are detailed within this paper. The first together ideas from knowledge-based and behaviour-based has 14 subjects, all using the Quickstep system for a period modelling to address the problem domain of 1.5 months. The second has 24 subjects, again over a period of 1.5 months Ontology Use and the World Wide Web tologies are used both to structure the web. as in Both experiments divide the subjects into two grot Yahoo's search space categorization, and to provide a The first group uses a flat, extensible list of paper topics common basis for understanding between systems, such Any new examples, added via explicit feedback, use this
The Problem Domain As the trend to publish research papers on-line increases, researchers are increasingly using the web as their primary source of papers. Typical researchers need to know about new papers in their general field of interest, and older papers relating to their current work. In addition, researchers time is limited, as browsing competes with other tasks in the work place. It is this problem our Quickstep recommender system addresses. Since researchers have their usual work to perform, unobtrusive monitoring methods are preferred else they will be reluctant to use the system. Also, very high recommendation accuracy is not critical as long as the system is deemed useful to them. Evaluation of real world knowledge acquisition systems, as Shadbolt [21] discusses, is both tricky and complex. A lot of evaluations are performed with user log data (simulating real user activity) or with standard benchmark collections. Although these evaluations are useful, especially for technique comparison, they must be backed up by real world studies so we can see how the benchmark tests generalize to the real world setting. Similar problems are seen in the agent domain where, as Nwana [16] argues, it has yet to be conclusively demonstrated if people really benefit from such information systems. This is why we have chosen a real problem upon which to evaluate our Quickstep recommender system. User Profiling in Recommender Systems User modelling is typically either knowledge-based or behaviour-based. Knowledge-based approaches engineer static models of users and dynamically match users to the closest model. Behaviour-based approaches use the users behaviour itself as a model, often using machine-learning techniques to discover useful patterns of behaviour. Kobsa [10] provides a good survey of user modelling techniques. The typical user profiling approach for recommender systems is behaviour-based, using a binary model representing what users find interesting and uninteresting. Machine-learning techniques are then used to assess potential items of interest in respect to the binary model. There are a lot of effective machine learning algorithms based on two classes. Sebastiani [20] provides a good survey of current machine learning techniques and De Roure [5] a review of recommender systems. Although more difficult than the binary case, we choose to use a multi-class behavioural model. This allows the classes to represent paper topics, and hence domain knowledge to be used when constructing the user profile. We thus bring together ideas from knowledge-based and behaviour-based modelling to address the problem domain. Ontology Use and the World Wide Web Ontologies are used both to structure the web, as in Yahoo’s search space categorization, and to provide a common basis for understanding between systems, such as in the knowledge query modelling language (KQML). Indepth ontological representations are also seen in knowledge-based systems, which use relationships between web entities (bookmarks, web pages, page authors etc.) to infer facts about given situations. We use an ontology to investigate how domain knowledge can help in the acquisition of user preferences. Overview of the Quickstep System Quickstep unobtrusively monitors user browsing behaviour via a proxy server, logging each URL browsed during normal work activity. A machine-learning algorithm classifies browsed URLs overnight, and saves each classified paper in a central paper store. Explicit feedback and browsed topics form the basis of the interest profile for each user. Each day a set of recommendations is computed, based on correlations between user interest profiles and classified paper topics. Any feedback offered on these recommendations is recorded when the user looks at them. Users can provide new examples of topics and correct paper classifications where wrong. In this way the training set improves over time. World Wide Web Classified papers Classifier Users Profile Recommender Figure 1 The Quickstep system Empirical Evaluation The current literature lacks many clear results as to the extent knowledge-based approaches assist real-world systems, where noisy data and differing user opinions exist. For this reason we decided to compare the use of an ontology against a simple flat list, to provide some empirical evidence as to the effectiveness of this approach. Two experiments are detailed within this paper. The first has 14 subjects, all using the Quickstep system for a period of 1.5 months. The second has 24 subjects, again over a period of 1.5 months. Both experiments divide the subjects into two groups. The first group uses a flat, extensible list of paper topics. Any new examples, added via explicit feedback, use this
flat list to select from. The users are free to add to the list as which class vector it is most similar to. recommendations are selected from papers classified as belonging to a topic The second group uses a fixed size topic ontology(based of interest on the dmoz open directory project hierarchy [6]). Topics The profile itself is computed from the correlation between are selected from a hierarchical list based on the ontology. browsed papers and paper topics. This correlation leads to a Interest profiles of this group take into account the super topic interest history, and a simple time-decay function classes of any browsed topics allows current topics to be computed Performance metrics are measured over the duration of the Details of Specific Techniques Used trial, and thus the effectiveness of both groups compared Research Paper Representation APPROACH Research papers are represented as term vectors, with term The Quickstep System frequency /total number of terms used for a terms weight. Quickstep is a hybrid recommendation system, combining To reduce the dimensionality of the vectors, frequencies both content-based and collaborative filtering techniques less than 2 are removed, standard Porter stemming [18] Since both web pages and user interests are dynamic in applied to remove word suffixes and the SMArT [22] stop nature, catalogues, rule-bases and static user profiles would list used to remove common words such as "the" These quickly become out of date. A recommender system measures are commonly used in information systems, van approach thus appeared well suited to our problem Rijsbergen [24] and Harman [9] provide a good discussion of these issues Explicit feedback on browsed papers would be too intrusive, so unobtrusive monitoring is used providing Vectors with 10-15.000 terms were used in the trials along positive examples of pages the user typically browses with training set sizes of about 200 vectors. Had we needed Many users will be using the system at once, so it is more dimensionality reduction, the popular term frequency sensible to share user interest feed back and maintain a inverse document frequency (TF-IDF) weighting could common pool of labelled example papers(provided by the used(term weights below a threshold being removed )or users as examples of particular paper topics latent semantic indexing (lsD) Since there are positive examples of the kind of papers Only Postscript and PDF formats(and compressed formats) users are interested in, we have a labelled training set. This are supported, to avoid noisy HTML pages. This makes Is ideal for supervised learning techniques, which require classification easier, at the expense of HTML only papers each training example to have a label(the labels are then Research Paper Classification used as classification classes). The alternative, unsupervised The classification requirements are for a multi-class learning, is inherently less accurate since it must compute learning algorithm learning from a multi-labelled training likely labels before classification (e.g. clustering set. To learn from a training set, inductive learning is techniques). We shall use a term vector representation, required. There are quite a few inductive learning common in machine learning, to represent a research paper. techniques to choose from, including information theoretic A term vector is a list of word weights, derived from the ones (e.g. Rocchio classifier), neural networks (e.g frequency that the word appears within the paper backpropagation), instance-based methods (e.g. nearest We could have used a binary classification approach, with neighbour), rule learners(e.g. RIPPER), decision trees(e.g classes for interesting and"not interesting. This woul C4.5)and probabilistic classifiers have led to profiles consisting of two term vectors, one Multiple classifier techniques such as boosting [7] exist as representing the kind of thing the user is interested in well, and have been shown to enhance the performance of (computed from the positive examples) and the other what individual classifiers the user is not interested in(computed from the negative After reviewing and testing many of the above options, we examples). Recommendations would be those page vectors decided to use a nearest neighbour technique. The nearest that are most similar to the interesting class vector. The binary case is the simplest class representation, and neighbour approach is well suited to our problem, since the training set must grow over time and consists of multi-class consequently produces the best classification results when examples. Nearest neighbour algorithms also degrade well compared with multi-class methods with the next closest match being reported if the correct one One problem with such a representation is that the explicit is not found. The IBk algorithm [1] we chose outperformed knowledge of which topics the user is interested in is lost, naive Bayes and a J48 decision tree in our tests. We also making it hard to benefit from any prior knowledge we may use the boosting technique AdaBoostMl [7l, which works know about the domain(such as the paper topics). With well for multi-class problems if the boosted classifier is Quickstep, we have chosen a multi-class representation, strong enough. We found that boosting always improved with each class representing a research paper topic. Thi the base classifiers performance in our tests allows profiles that consist of a human understandable list of topics. The classifier assigns each paper a class based on
flat list to select from. The users are free to add to the list as needed. The second group uses a fixed size topic ontology (based on the dmoz open directory project hierarchy [6]). Topics are selected from a hierarchical list based on the ontology. Interest profiles of this group take into account the super classes of any browsed topics. Performance metrics are measured over the duration of the trial, and thus the effectiveness of both groups compared. APPROACH The Quickstep System Quickstep is a hybrid recommendation system, combining both content-based and collaborative filtering techniques. Since both web pages and user interests are dynamic in nature, catalogues, rule-bases and static user profiles would quickly become out of date. A recommender system approach thus appeared well suited to our problem. Explicit feedback on browsed papers would be too intrusive, so unobtrusive monitoring is used providing positive examples of pages the user typically browses. Many users will be using the system at once, so it is sensible to share user interest feedback and maintain a common pool of labelled example papers (provided by the users as examples of particular paper topics). Since there are positive examples of the kind of papers users are interested in, we have a labelled training set. This is ideal for supervised learning techniques, which require each training example to have a label (the labels are then used as classification classes). The alternative, unsupervised learning, is inherently less accurate since it must compute likely labels before classification (e.g. clustering techniques). We shall use a term vector representation, common in machine learning, to represent a research paper. A term vector is a list of word weights, derived from the frequency that the word appears within the paper. We could have used a binary classification approach, with classes for “interesting” and “not interesting”. This would have led to profiles consisting of two term vectors, one representing the kind of thing the user is interested in (computed from the positive examples) and the other what the user is not interested in (computed from the negative examples). Recommendations would be those page vectors that are most similar to the interesting class vector. The binary case is the simplest class representation, and consequently produces the best classification results when compared with multi-class methods. One problem with such a representation is that the explicit knowledge of which topics the user is interested in is lost, making it hard to benefit from any prior knowledge we may know about the domain (such as the paper topics). With Quickstep, we have chosen a multi-class representation, with each class representing a research paper topic. This allows profiles that consist of a human understandable list of topics. The classifier assigns each paper a class based on which class vector it is most similar to. Recommendations are selected from papers classified as belonging to a topic of interest. The profile itself is computed from the correlation between browsed papers and paper topics. This correlation leads to a topic interest history, and a simple time-decay function allows current topics to be computed. Details of Specific Techniques Used Research Paper Representation Research papers are represented as term vectors, with term frequency / total number of terms used for a terms weight. To reduce the dimensionality of the vectors, frequencies less than 2 are removed, standard Porter stemming [18] applied to remove word suffixes and the SMART [22] stop list used to remove common words such as “the”. These measures are commonly used in information systems; van Rijsbergen [24] and Harman [9] provide a good discussion of these issues. Vectors with 10-15,000 terms were used in the trials along with training set sizes of about 200 vectors. Had we needed more dimensionality reduction, the popular term frequencyinverse document frequency (TF-IDF) weighting could be used (term weights below a threshold being removed) or latent semantic indexing (LSI). Only Postscript and PDF formats (and compressed formats) are supported, to avoid noisy HTML pages. This makes classification easier, at the expense of HTML only papers. Research Paper Classification The classification requirements are for a multi-class learning algorithm learning from a multi-labelled training set. To learn from a training set, inductive learning is required. There are quite a few inductive learning techniques to choose from, including information theoretic ones (e.g. Rocchio classifier), neural networks (e.g. backpropagation), instance-based methods (e.g. nearest neighbour), rule learners (e.g. RIPPER), decision trees (e.g. C4.5) and probabilistic classifiers (e.g. naive Bayes). Multiple classifier techniques such as boosting [7] exist as well, and have been shown to enhance the performance of individual classifiers. After reviewing and testing many of the above options, we decided to use a nearest neighbour technique. The nearest neighbour approach is well suited to our problem, since the training set must grow over time and consists of multi-class examples. Nearest neighbour algorithms also degrade well, with the next closest match being reported if the correct one is not found. The IBk algorithm [1] we chose outperformed naive Bayes and a J48 decision tree in our tests. We also use the boosting technique AdaBoostM1 [7], which works well for multi-class problems if the boosted classifier is strong enough. We found that boosting always improved the base classifiers performance in our tests
Nearest neighbour algorithms represent instances of Recommendation Algorithm documents as term vectors within a term vector space Recommendations are formulated from a correlation Proximity of vectors within this term vector space indicates between the users current topics of interest and similarity. To classify a new paper, the vector distance from classified as belonging to those topics. A paper each example instance is calculated, and the closest recommended if it does not appear in th e browsed neighbours returned as the most likely classes. Inverse URL log, ensuring that recommendations have not been distance weighting is used to decrease the likelihood of seen before. For each user, the top three interesting topics choosing distant neighbours are selected with 10 recommendations made in total AdaboostMI extends AdaBoost to handle multi-class cases (making a 4/3/3 split of recommendations ). Papers are since AdaBoost itself is a binary classifier. AdaBoostMI ranked in order of the recommendation confidence before repeatedly runs a weak learning algorithm(in this case the being presented to the user IBk classifier) for a number of iterations over various parts Recommendation confidence =classification confidence s of the training set. The classifiers produced(specialized for particular classes)are combined to form a single composite classifier at the end The classification confidence is computed from the AdaBoostMI algorithm's class probability value for that Profiling Algorithm paper(somewhere between 0 and 1) The profiling algorithm performs correlation between the paper topic classifications and user browsing logs Research Paper Topic Ontology Whenever a research paper is browsed that has a classified The research paper topic ontology is based on the dmoz [6] taxonomy of computer science topics. It is an is-a hierarchy topic, It accumulates an interest score for that topic. Explicit of paper topics, up to 4 levels deep (e.g. an"interface feedback on recommendations also accumulates interest values for topics. The current interest of a topic is agents paper is-a"agents" paper). Pre-trial interviews computed using the inverse time weigh formed the basis of which additional topics would below, applied to the user feedback instances required. An expert review by two domain experts validated the ontology for correctness before use in the trials Feedback and the Quickstep Interface Topic interest Interest value(n)/ days old(n) Recommendations are presented to the user via a browser web page. The web page applet loads e curren recommendation set and records any feedback the user Interest values Paper browsed= 1 provides. Research papers can be jumped to, opening a new Recommendation followed= 2 browser window to display the paper URL. If the user Topic rated interesting= 10 likes/dislikes the paper topic, the interest feedback combo- box allows"interested"or"not interested"to replace the default"no comment". Finally, the topic of each paper can The profile for each user consists of a list of topics and the be changed by clicking on the topic and selecting a new one current interest values computed for them(see below). The from a popup menu. The ontology group has a hierarchical interest value weighting was chosen to provide sufficient popup menu, the flat list group has a single level popup weight for an explicit feedback instance to dominate for menu about a week. but after that browsed URL's would again ens儿 UierIaiauanU become dominant. In this way, the profile will adapt changing user interests as the trial progresses guickates Login/Recommendations/ Add Example/Logout Profile =(<user>, <topic>, <topic interest value>)* e.g.(someone, hypertext, -2 4 Doctimean topicis) (someone, agents, 6.5) Ne Convert- someone, machine learning, 1.33)) T Secifcson ofthe KOML Agent If the user is using the ontology based set of topics, all super classes gain a share when a topic receives some interest. The immediate super class receives 50% the main topics value. The next super class receives 25% and so on UnMsrsra sigart F until the most general topic in the is-a hierarchy is reached I Finga△pat In this way, general topics are included in the profile rather than just the most specific ones, producing a more rounde Figure 2 Quicksteps web-based interface ofle
Nearest neighbour algorithms represent instances of documents as term vectors within a term vector space. Proximity of vectors within this term vector space indicates similarity. To classify a new paper, the vector distance from each example instance is calculated, and the closest neighbours returned as the most likely classes. Inverse distance weighting is used to decrease the likelihood of choosing distant neighbours. AdaBoostM1 extends AdaBoost to handle multi-class cases since AdaBoost itself is a binary classifier. AdaBoostM1 repeatedly runs a weak learning algorithm (in this case the IBk classifier) for a number of iterations over various parts of the training set. The classifiers produced (specialized for particular classes) are combined to form a single composite classifier at the end. Profiling Algorithm The profiling algorithm performs correlation between the paper topic classifications and user browsing logs. Whenever a research paper is browsed that has a classified topic, it accumulates an interest score for that topic. Explicit feedback on recommendations also accumulates interest values for topics. The current interest of a topic is computed using the inverse time weighting algorithm below, applied to the user feedback instances. n 1..no of instances Topic interest = Interest value(n) / days old(n) Interest values Paper browsed = 1 Recommendation followed = 2 Topic rated interesting = 10 Topic rated not interesting = -10 n 1..no of instances Topic interest = Interest value(n) / days old(n) Interest values Paper browsed = 1 Recommendation followed = 2 Topic rated interesting = 10 Topic rated not interesting = -10 The profile for each user consists of a list of topics and the current interest values computed for them (see below). The interest value weighting was chosen to provide sufficient weight for an explicit feedback instance to dominate for about a week, but after that browsed URL’s would again become dominant. In this way, the profile will adapt to changing user interests as the trial progresses. Profile = (<user>,<topic>,<topic interest value>)* e.g. ((someone,hypertext,-2.4) (someone,agents,6.5) (someone,machine learning,1.33)) If the user is using the ontology based set of topics, all super classes gain a share when a topic receives some interest. The immediate super class receives 50% the main topics value. The next super class receives 25% and so on until the most general topic in the is-a hierarchy is reached. In this way, general topics are included in the profile rather than just the most specific ones, producing a more rounded profile. Recommendation Algorithm Recommendations are formulated from a correlation between the users current topics of interest and papers classified as belonging to those topics. A paper is only recommended if it does not appear in the users browsed URL log, ensuring that recommendations have not been seen before. For each user, the top three interesting topics are selected with 10 recommendations made in total (making a 4/3/3 split of recommendations). Papers are ranked in order of the recommendation confidence before being presented to the user. The classification confidence is computed from the AdaBoostM1 algorithm’s class probability value for that paper (somewhere between 0 and 1). Research Paper Topic Ontology The research paper topic ontology is based on the dmoz [6] taxonomy of computer science topics. It is an is-a hierarchy of paper topics, up to 4 levels deep (e.g. an “interface agents” paper is-a “agents” paper). Pre-trial interviews formed the basis of which additional topics would be required. An expert review by two domain experts validated the ontology for correctness before use in the trials. Feedback and the Quickstep Interface Recommendations are presented to the user via a browser web page. The web page applet loads the current recommendation set and records any feedback the user provides. Research papers can be jumped to, opening a new browser window to display the paper URL. If the user likes/dislikes the paper topic, the interest feedback combobox allows “interested” or “not interested” to replace the default “no comment”. Finally, the topic of each paper can be changed by clicking on the topic and selecting a new one from a popup menu. The ontology group has a hierarchical popup menu; the flat list group has a single level popup menu. Figure 2 Quickstep’s web-based interface Recommendation confidence =classification confidence * topic interest value
New examples can be added via the interface, with users The system recorded the times the user declared an interest providing a paper URL and a topic label. These are added in a topic(by selecting"interesting" or"not interesting), to the groups training set, allowing users to teach the system jumps to recommended papers and corrections to the topics new topics or improve classification of old ones of recommended papers. These feedback events were date builders run. The feedback logs are also muy for the profile with a log of all recommendations made.Feedback All feedback is stored stamped and recorded in a log file for later analysis, along metric for evaluation. Interest feedback, topic corrections recording was performed automatically by the system, and jumps to recommended papers are all recorded whenever the subjects looked at their recommendations EVALUATION Experimental Data Details of the two trials Since feedback only occurs when subjects check their Two trials were conducted to assess empirically both the commendations, the data collected occurs at irregular overall effectiveness of the Quickstep recommender system dates over the duration of the trial. Cumulative frequency of and to quantify the effect made by use of the ontology feedback events is computed over the period of the trial allowing trends to be seen as they develop during the trial The first trial used 14 subjects, consisting of researcher Since the total number of jumps and topics differ between from the lAM research laboratory. A mixture of 2 year the two groups, the figures presented are normalized by postgraduates up to professors was taken, all using the dividing by the number of topics(or recommendations)up Quickstep system for a duration of 1.5 months to that date. This avoids bias towards the group that The second trial used 24 subjects, 14 from the first trial and provided feedback most frequently 10 more I year postgraduates, and lasted for 1.5 months. Figure 3 shows the topic interest feedback results Some minor interface improvements were made to make the interest feedback is where the user comments feedback options less confusing ecommended topic, declaring it "interesting"or The pre-trial interview obtained details from subjects such interesting. If no feedback is offered, the result is"no s area of interest and expected frequency of browser The purpose of the two trials was to compare a group of Topic interest feedback is an indication of the accuracy of users using an ontology labelling strategy with a group of the current profile. When a recommended topic is correct users using a flat list labelling strategy. Subject selection for for a period of time, the user will tend to become content the two groups balanced the groups as much as possible, with it and stop rating it as"interesting".On the other hand, evening out topics of interest, browser use and research an uninteresting topic is likely to al ways attract a "not experience (in that order of importance). Both groups had interesting" rating. Good topics are defined as either"no he same number of subjects in them(7 each for the pilot comment or"interesting" topics. The cumulative trial, 12 each for the main trial) frequency figures are presented as a ratio of the total In the first trial, a bootstrap of 103 example papers covering (bad topics)can be computed from these figures by number of topics recommended. The not interesting ratio 17 topics was used. The bootstrap examples were obtained from bookmarks requested during the pre-trial interview subtracting the good topic values from I In the second trial, a bootstrap of 135 example papers The ontology groups have a 7 and 15% higher topic cceptance. In addition to this trend, the first trial ratios are overing 23 topics was used. The bootstrap training set was about 10% lower than the second trial ratios updated to include examples from the final training sets of the first trial. The first trials classified papers were also Figure 4 shows the jump feedback results. Jump feedback is kept, allowing a bigger initial collection of papers from where the user jumps to a recommended paper by opening which to recommend in the second trial it via the web browser. Jumps are correlated with topic Both groups had their own separate training set of interest feedback, so a good jump is a jump to a paper on a examples, which diverged in content as the trial progressed good topic. Jump feedback is an indication of the quality of The classifier was run twice for each research the recommendations being made as well as the classifying once with the flat list groups training set and the profile. The cumulative frequency figures are presented once with the ontology groups training set. The classifier as a ratio of the total number of recommendations made algorithm was identical for both groups; only the training There is a small 1% improvement in good jumps by the ontology group. Both trials show between 8-10%of The system interface used by both groups was identical recommendations leading to good jumps except for the popup menu for choosing paper topics. The Figure 5 shows the topic correction results. Topic ology group had a hierarchical menu (using the corrections are where the user corrects the topic of ontology ); the flat list group had a single layer menu recommended paper by providing a new one. A topic correction will add to or modify a groups training set so that the classification for that group will improve. The number
New examples can be added via the interface, with users providing a paper URL and a topic label. These are added to the groups training set, allowing users to teach the system new topics or improve classification of old ones. All feedback is stored in log files, ready for the profile builders run. The feedback logs are also used as the primary metric for evaluation. Interest feedback, topic corrections and jumps to recommended papers are all recorded. EVALUATION Details of the Two Trials Two trials were conducted to assess empirically both the overall effectiveness of the Quickstep recommender system and to quantify the effect made by use of the ontology. The first trial used 14 subjects, consisting of researchers from the IAM research laboratory. A mixture of 2nd year postgraduates up to professors was taken, all using the Quickstep system for a duration of 1.5 months. The second trial used 24 subjects, 14 from the first trial and 10 more 1st year postgraduates, and lasted for 1.5 months. Some minor interface improvements were made to make the feedback options less confusing. The pre-trial interview obtained details from subjects such as area of interest and expected frequency of browser use. The purpose of the two trials was to compare a group of users using an ontology labelling strategy with a group of users using a flat list labelling strategy. Subject selection for the two groups balanced the groups as much as possible, evening out topics of interest, browser use and research experience (in that order of importance). Both groups had the same number of subjects in them (7 each for the pilot trial, 12 each for the main trial). In the first trial, a bootstrap of 103 example papers covering 17 topics was used. The bootstrap examples were obtained from bookmarks requested during the pre-trial interview. In the second trial, a bootstrap of 135 example papers covering 23 topics was used. The bootstrap training set was updated to include examples from the final training sets of the first trial. The first trials classified papers were also kept, allowing a bigger initial collection of papers from which to recommend in the second trial. Both groups had their own separate training set of examples, which diverged in content as the trial progressed. The classifier was run twice for each research paper, classifying once with the flat list groups training set and once with the ontology groups training set. The classifier algorithm was identical for both groups; only the training set changed. The system interface used by both groups was identical, except for the popup menu for choosing paper topics. The ontology group had a hierarchical menu (using the ontology); the flat list group had a single layer menu. The system recorded the times the user declared an interest in a topic (by selecting “interesting” or “not interesting”), jumps to recommended papers and corrections to the topics of recommended papers. These feedback events were date stamped and recorded in a log file for later analysis, along with a log of all recommendations made. Feedback recording was performed automatically by the system, whenever the subjects looked at their recommendations. Experimental Data Since feedback only occurs when subjects check their recommendations, the data collected occurs at irregular dates over the duration of the trial. Cumulative frequency of feedback events is computed over the period of the trial, allowing trends to be seen as they develop during the trial. Since the total number of jumps and topics differ between the two groups, the figures presented are normalized by dividing by the number of topics (or recommendations) up to that date. This avoids bias towards the group that provided feedback most frequently. Figure 3 shows the topic interest feedback results. Topic interest feedback is where the user comments on a recommended topic, declaring it “interesting” or “not interesting”. If no feedback is offered, the result is “no comment”. Topic interest feedback is an indication of the accuracy of the current profile. When a recommended topic is correct for a period of time, the user will tend to become content with it and stop rating it as “interesting”. On the other hand, an uninteresting topic is likely to always attract a “not interesting” rating. Good topics are defined as either “no comment” or “interesting” topics. The cumulative frequency figures are presented as a ratio of the total number of topics recommended. The not interesting ratio (bad topics) can be computed from these figures by subtracting the good topic values from 1. The ontology groups have a 7 and 15% higher topic acceptance. In addition to this trend, the first trial ratios are about 10% lower than the second trial ratios. Figure 4 shows the jump feedback results. Jump feedback is where the user jumps to a recommended paper by opening it via the web browser. Jumps are correlated with topic interest feedback, so a good jump is a jump to a paper on a good topic. Jump feedback is an indication of the quality of the recommendations being made as well as the accuracy of the profile. The cumulative frequency figures are presented as a ratio of the total number of recommendations made. There is a small 1% improvement in good jumps by the ontology group. Both trials show between 8-10% of recommendations leading to good jumps. Figure 5 shows the topic correction results. Topic corrections are where the user corrects the topic of a recommended paper by providing a new one. A topic correction will add to or modify a groups training set so that the classification for that group will improve. The number