《电子商务 E-business》阅读文献：Finding Experts on the Semantic Desktop.pdf

Finding Experts on the Semantic Desktop Gianluca demartini and Claudia Niedere Leibniz universitat hannover Appelstrasse 9a, 30167 Hannover, Germany Abstract. Expert retrieval has attracted deep attention because of the huge economical impact it can have on enterprises. The classical dataset on which to perform this task is company intranet (i. e, personal pages, e-mails, documents). We propose a new system for finding experts in the users desktop content. Looking at private documents and e-mails of the user, the system builds expert profiles for all the people named in the desktop. This allows the search system to focus on the user's topics of interest thus generating satisfactory results on topics well represented on le desktop. We show, with an artificial test collection, how the desk- top content is appropriate for finding experts on the topic the user is interested in 1 Introduction Finding people who are expert on certain topics is a search task which has been mainly investigated in the enterprise context. Especially in big enterprises, topic areas can range very much also because of diverse and distributed data sources This peculiarity of enterprise datasets can highly affect the quality of the results of the expert finding task 15, 16 It is important to provide the enterprise managers with high recommendation. The managers need to build new project teams and to find people who can solve problems. Therefore, a high-precision tool for finding ex- perts is needed. Moreover, not only managers need to find experts. In a highly collaborative environment where the willingness of sharing and helping other team members is present, all the employees should be able to find out to which colleague to ask for help in solving issues If we want to achieve high-quality results while searching for experts, con- sidering the user's desktop content makes the search much more focused on the user's interests also because the desktop dataset will contain much more exper- tise evidence(on such topics) than the rest of the public enterprise intranet Classic expert search systems 9, 30, 21, 25, 26, 17 work on the entire enterprise knowledge available. This means that they use shared repository, e-mails his- tory, forums, wikis, databases, personal home pages, and all the data that an enterprise creates and stores. This makes the system to consider a huge variety of topics, for example, from accountability to IT specific issues. Our soli

Finding Experts on the Semantic Desktop Gianluca Demartini and Claudia Nieder´ee L3S Research Center Leibniz Universit¨at Hannover Appelstrasse 9a, 30167 Hannover, Germany {demartini,niederee}@L3S.de Abstract. Expert retrieval has attracted deep attention because of the huge economical impact it can have on enterprises. The classical dataset on which to perform this task is company intranet (i.e., personal pages, e-mails, documents). We propose a new system for finding experts in the user’s desktop content. Looking at private documents and e-mails of the user, the system builds expert profiles for all the people named in the desktop. This allows the search system to focus on the user’s topics of interest thus generating satisfactory results on topics well represented on the desktop. We show, with an artificial test collection, how the desktop content is appropriate for finding experts on the topic the user is interested in. 1 Introduction Finding people who are expert on certain topics is a search task which has been mainly investigated in the enterprise context. Especially in big enterprises, topic areas can range very much also because of diverse and distributed data sources. This peculiarity of enterprise datasets can highly affect the quality of the results of the expert finding task [15, 16]. It is important to provide the enterprise managers with high quality expert recommendation. The managers need to build new project teams and to find people who can solve problems. Therefore, a high-precision tool for finding experts is needed. Moreover, not only managers need to find experts. In a highly collaborative environment where the willingness of sharing and helping other team members is present, all the employees should be able to find out to which colleague to ask for help in solving issues. If we want to achieve high-quality results while searching for experts, considering the user’s desktop content makes the search much more focused on the user’s interests also because the desktop dataset will contain much more expertise evidence (on such topics) than the rest of the public enterprise intranet. Classic expert search systems [9, 30, 21, 25, 26, 17] work on the entire enterprise knowledge available. This means that they use shared repository, e-mails history, forums, wikis, databases, personal home pages, and all the data that an enterprise creates and stores. This makes the system to consider a huge variety of topics, for example, from accountability to IT specific issues. Our solution

focuses on using the user's desktop content as expertise evidence allowing the system to focus on the user's topics of interest thus providing high quality results The system we propose is first indexing the desktop content also using meta- data annotation that are produced by the Social Semantic Desktop system Nepo- muk [19. Our expert search system creates a vector space that includes the documents and the people that are present in the desktop content. After this step, when the desktop user issues a query of the type "Find erperts on the topic. +keywords the system shows a ranked list of people that the user can contact for getting help. Preliminary experiments show the high precision of the expert search results on topics which are covered by the desktop content. A lim itation of our system is that it can return only people that are present on the user's desktop. Therefore, the performances are poor when the desktop content (i.e, number of items and people)is limited, as for example for new employ ees, or when the queries are different from the main topics represented in the desktop. The main contributions of the paper are the description of how the beagle++ system creates metadata regarding documents and people(Section 2.1) a new system for finding experts on a semantic desktop(Section 2.2 he description of possible test datasets: one composed of fictitious data and one containing real desktop content( Section 3) preliminary experimental results showing how a focused dataset leads to high-quality expert search results( Section 4) a review of the previous systems and formal models presented in the field of expert search and Personal Information Management(PIM)(Section 5) 2 System Architecture 2.1 Generating Metadata about People In order to identify possible expert candidates and link them to desktop items we used extractors from the Beagle++ Dekstop Search Engine[13, 8. These extractors identify documents and e-mails authors by analysing the structure and the content of each file. For storing the produced metadata(see Figure 1) we employ the RDF repository developed in the Nepomuk project [19 based on Sesame for storing, querying, and reasoning about RDF and RDF Schema as well as on Lucene, which is integrated with the Sesame framework via the Lucene Sail [27, for full-text search An additional step is the entity linkage applied to the identified candidates For example, a person in e-mails is described by an e-mail address, whereas in a publication by the author' s name. Other causes for the appearance of different Ihttp://beagle2.kbs.uni-hannover.de http://www.youtubecom/watch?v=u14gdkcr7-1 http://www.openrdf.org

focuses on using the user’s desktop content as expertise evidence allowing the system to focus on the user’s topics of interest thus providing high quality results for queries about such topics. The system we propose is first indexing the desktop content also using metadata annotation that are produced by the Social Semantic Desktop system Nepomuk [19]. Our expert search system creates a vector space that includes the documents and the people that are present in the desktop content. After this step, when the desktop user issues a query of the type “Find experts on the topic...”+keywords the system shows a ranked list of people that the user can contact for getting help. Preliminary experiments show the high precision of the expert search results on topics which are covered by the desktop content. A limitation of our system is that it can return only people that are present on the user’s desktop. Therefore, the performances are poor when the desktop content (i.e., number of items and people) is limited, as for example for new employees, or when the queries are different from the main topics represented in the desktop. The main contributions of the paper are: – the description of how the Beagle++ system creates metadata regarding documents and people (Section 2.1). – a new system for finding experts on a semantic desktop (Section 2.2). – the description of possible test datasets: one composed of fictitious data and one containing real desktop content (Section 3). – preliminary experimental results showing how a focused dataset leads to high-quality expert search results (Section 4). – a review of the previous systems and formal models presented in the field of expert search and Personal Information Management (PIM) (Section 5). 2 System Architecture 2.1 Generating Metadata about People In order to identify possible expert candidates and link them to desktop items, we used extractors from the Beagle++ Dekstop Search Engine1 2 [13, 8]. These extractors identify documents and e-mails authors by analysing the structure and the content of each file. For storing the produced metadata (see Figure 1) we employ the RDF repository developed in the Nepomuk project [19] based on Sesame3 for storing, querying, and reasoning about RDF and RDF Schema, as well as on Lucene4 , which is integrated with the Sesame framework via the LuceneSail [27], for full-text search. An additional step is the entity linkage applied to the identified candidates. For example, a person in e-mails is described by an e-mail address, whereas in a publication by the author’s name. Other causes for the appearance of different 1 http://beagle2.kbs.uni-hannover.de 2 http://www.youtube.com/watch?v=Ui4GDkcR7-U 3 http://www.openrdf.org 4 http://lucene.apache.org

a client application can then use the Nepomuk Expert Recommendation service(which implements the system described in this paper) by providing a keyword query taken from the user. A screenshot of a possible client application is shown in Figure 2. In the top-left corner the user can provide a keyword query and the choice of looking for experts. In the central panel a ranked list of people is presented as result of the query. In the right pane, resources related to the selected expert are shown 3 Desktop Search Evaluation Datasets valuation of desktop search algorithms effectiveness is a difficult task because of the lack of standard test collections. The main problem of building such test collection is the privacy concerns that data providers might have while sharing personal data. The privacy issue is major as it impedes the diffusion of personal desktop data among researches. Some solutions for overcoming these problems have been presented in previous work [11, 12 ness of finding experts using desktop content as evidence of ng t In this section we describe two possible datasets for evaluating the effective- fictitious desktop dataset representing two hypothetical personas. This dataset has been manually created in the context of the Nepomuk project with the goal of providing a publicly available desktop dataset with no privacy concerns. As at present, the access to the actual data is still restricted. The second one is a set of real desktop data provided by 14 employees of a research center 3.1 Fictitious data In order to obtain reproducible and comparable experimental results there is a need for a common test collection. That is, a set of resources, queries, and relevance assessments that are publicly available. In the case of Pim the privacy issue of sharing personal data has to be faced. For solving this issue the team working on the Nepomuk project has created a collection of desktop items(i.e. documents, e-mails, contacts, calendar items, ...)for some imaginary personas representing hypothetical desktop users. In this paper we describe two desktop collections built in this context The first persona is called Claudia Stern. She is a project manager and her interests are mainly about ontologies, know ledge management, and information retrieval. Her desktop contains 56 publications about her interests, 36 e-mails 19 Word documents about project meetings and deliverables, 12 slides presenta- tions, 17 calendar items, 2 contacts, and an activity log collected while a travel was being arranged (i.e, flight booking, hotel reservation, search for shopping places)containing 122 actions. These resources have been indexed using the Bea- gle++ system obtaining a total of 22588 RDf triples which have been stored in the RDF repositor The second persona is called Dirk Hagemann. He works for the project that Claudia manages and his interests are similar to those of Claudia. His desktop http://dev.nepomuksemanticdesktop.org/wiki/claudia 7http://dev.nepomuksemanticdesktop.org/wiki/dirk

A client application can then use the Nepomuk Expert Recommendation service (which implements the system described in this paper) by providing a keyword query taken from the user. A screenshot of a possible client application is shown in Figure 2. In the top-left corner the user can provide a keyword query and the choice of looking for experts. In the central panel a ranked list of people is presented as result of the query. In the right pane, resources related to the selected expert are shown. 3 Desktop Search Evaluation Datasets Evaluation of desktop search algorithms effectiveness is a difficult task because of the lack of standard test collections. The main problem of building such test collection is the privacy concerns that data providers might have while sharing personal data. The privacy issue is major as it impedes the diffusion of personal desktop data among researches. Some solutions for overcoming these problems have been presented in previous work [11, 12]. In this section we describe two possible datasets for evaluating the effectiveness of finding experts using desktop content as evidence of expertise. One is a fictitious desktop dataset representing two hypothetical personas. This dataset has been manually created in the context of the Nepomuk project with the goal of providing a publicly available desktop dataset with no privacy concerns. As at present, the access to the actual data is still restricted. The second one is a set of real desktop data provided by 14 employees of a research center. 3.1 Fictitious Data In order to obtain reproducible and comparable experimental results there is a need for a common test collection. That is, a set of resources, queries, and relevance assessments that are publicly available. In the case of PIM the privacy issue of sharing personal data has to be faced. For solving this issue the team working on the Nepomuk project has created a collection of desktop items (i.e., documents, e-mails, contacts, calendar items, . . . ) for some imaginary personas representing hypothetical desktop users. In this paper we describe two desktop collections built in this context. The first persona is called Claudia Stern6 . She is a project manager and her interests are mainly about ontologies, knowledge management, and information retrieval. Her desktop contains 56 publications about her interests, 36 e-mails, 19 Word documents about project meetings and deliverables, 12 slides presentations, 17 calendar items, 2 contacts, and an activity log collected while a travel was being arranged (i.e., flight booking, hotel reservation, search for shopping places) containing 122 actions. These resources have been indexed using the Beagle++ system obtaining a total of 22588 RDF triples which have been stored in the RDF repository. The second persona is called Dirk Hagemann7 . He works for the project that Claudia manages and his interests are similar to those of Claudia . His desktop 6 http://dev.nepomuk.semanticdesktop.org/wiki/Claudia 7 http://dev.nepomuk.semanticdesktop.org/wiki/Dirk