Requirements for expertise location systems in biomedical science and the semantic Web Titus Schleyer, Heiko Spallek, Brian S Butler, Sushmita Subramanian, Daniel Weiss, M. Louisa Poythress, Phijarana Rattanathikum, Gregory Mueller' School of Dental Medicine and Joseph M Katz Graduate School of Business, University of Pittsburgh, Pittsburgh, PA I titus, hspallek, butler)@pitt. edu tel Corporation, Santa Clara, CA The MITRE Corporation, Bedford, MA Brulant, Inc.. Beachwood. OH Adobe Systems Incorporated, San Jose, CA DeepLocal, Inc, Pittsburgh, PA Abstract. Recent trends in science are increasing the need for researchers to form collaborations. To date, however, electronic systems have played only a minor role in helping scientists do so. This study used a literature review, and contextual inquiries and semistructured interviews with biomedical scientists to develop a preliminary set of requirements for electronic systems designed to help optimize how biomedical researchers choose collaborators. We then re viewed the requirements in light of emerging research on expertise location us- ing the Semantic Web. The requirements include aspects such as comprehen- sive, complete and up-to-date online profiles that are easy to create and maintain;the ability to exploit social networks when searching for collabora- tors; information to help gauge the compatibility of personalities and work styles: and recommendations for effective searching and making"non-intuitive connections between researchers. The Semantic Web offers significant oppor- profile data from disparate sources, annotating contributions to social media us ng methods such as Semantically Interlinked Online Communities, and con cept-based querying using ontologies. Future work should validate the preli nary requirements and explore in detail how the Semantic Web can help address Keywords: expertise location, requirements, Semantic Web, biomedical re- search 1 Introduction Increased collaboration across all fields of biomedical science has emerged as one ssible way to ac eve success and progress in combating disease and im roving health. " Team science, ""networked science"and inter/multi-disciplinary re- earch [1] are terms used to denote collaborative approaches expected to solve re- search problems of ever-growing complexity. Programmatic initiatives such as the
Requirements for expertise location systems in biomedical science and the Semantic Web Titus Schleyer1 , Heiko Spallek1 , Brian S. Butler2 , Sushmita Subramanian3 , Daniel Weiss4 , M. Louisa Poythress5 , Phijarana Rattanathikum6 , Gregory Mueller7 1 School of Dental Medicine and 2 Joseph M Katz Graduate School of Business, University of Pittsburgh, Pittsburgh, PA {titus, hspallek, bbutler}@pitt.edu 3 Intel Corporation, Santa Clara, CA 4 The MITRE Corporation, Bedford, MA 5 Brulant, Inc., Beachwood, OH 6 Adobe Systems Incorporated, San Jose, CA 7 DeepLocal, Inc., Pittsburgh, PA Abstract. Recent trends in science are increasing the need for researchers to form collaborations. To date, however, electronic systems have played only a minor role in helping scientists do so. This study used a literature review, and contextual inquiries and semistructured interviews with biomedical scientists to develop a preliminary set of requirements for electronic systems designed to help optimize how biomedical researchers choose collaborators. We then reviewed the requirements in light of emerging research on expertise location using the Semantic Web. The requirements include aspects such as comprehensive, complete and up-to-date online profiles that are easy to create and maintain; the ability to exploit social networks when searching for collaborators; information to help gauge the compatibility of personalities and work styles; and recommendations for effective searching and making “non-intuitive” connections between researchers. The Semantic Web offers significant opportunities for operationalizing the requirements, for instance through aggregating profile data from disparate sources, annotating contributions to social media using methods such as Semantically Interlinked Online Communities, and concept-based querying using ontologies. Future work should validate the preliminary requirements and explore in detail how the Semantic Web can help address them. Keywords: expertise location, requirements, Semantic Web, biomedical research 1 Introduction Increased collaboration across all fields of biomedical science has emerged as one possible way to achieve greater success and progress in combating disease and improving health. “Team science,” “networked science” and inter/multi-disciplinary research [1] are terms used to denote collaborative approaches expected to solve research problems of ever-growing complexity. Programmatic initiatives such as the
Roadmap and the Clinical and Translational Science Award(CTSA) programs of the National Institutes of Health(nih)demonstrate that funding agencies and re search organiz io [2] and the numbers of their making more individuals available for collaboration, either locally or remotely. As a wider range of collaborations is becoming recognized as valuable, many researchers are beginning to expand their collaborative horizons. At the same time, the Internet is making locating collaborators easier. In fact, modern communication and collabora ive technologies increase the number of potential collaborators by making many re- mote collaborations once considered impractical feasible At the same time, expertise location has been, and continues to be, a significant hallenge for many organizations [3, 4]. Scientists often turn to colleagues or the pub. lished literature to find collaborators [5]. However, these approaches do not scale well in the context of an increasing pool of potential collaborators. As the universe of po tential collaborators and information about them grows, the time and effort needed to evaluate each collaborative opportunity remains the same. A newer method for finding collaborators is to use databases of researchers par- d this type, which include"expertise locating systems, [6]"knowledge communities, [7] and"communities of practice, "[8] all embody, to varying degrees, the ability to find experts and, by extension, potential collaborators. The CSCw literature contains numerous examples of such systems [9-12]. Most of these systems are designed to help a person solve a specific problem at a particular point in time. However, scien- sts seeking collaborators face a bigger challenge. Not only are they looking for the most qualified expert, but they also plan to enter into a more or less long-term rela nship. Evaluating an individuals promise for such a relationship requires informa on,engagement and effort much beyond what is needed for finding an expert for singular (or even episodic)problem-solving. Only few reports of expertise location systems in academia have been published [11, 13]. While many commercial offerings, hecoMmunityofScience(cos;www.cos.com),Linked (www.linkedin.com),IndexCopernicusScientists(scientists.indexcopernicus.com), Biomedexperts(www.bioMedexperts.com) nd Research Crossroads (www.researchcrossroads.com),purporttomakeiteasiertohelpscientistsfindcol- laborator, no reports in the literature describe how well these systems actually do so The Semantic Web is a technology with significant promise to ameliorate the ex- pertise location problem [14]. As individuals create an increasing number of"digital trails"of their work processes and products, more information about their activities and relationships becomes computationally accessible. However, expertise location systems that leverage data from the Semantic Web must be constructed with the needs and requirements of the end user in mind. We therefore have organized this paper in two parts. We first present a set of preliminary requirements for expertise location NihRoadmapforMedicalResearchhttp://nihroadmap.nih.gov/ ClinicalandTranslationalScienceAwards,http://www.ctsaweb.org/
Roadmap1 and the Clinical and Translational Science Award (CTSA)2 programs of the National Institutes of Health (NIH) demonstrate that funding agencies and research organizations are not just passively observing this trend, but are actively encouraging it. In the process, many academic/research institutions are extending the scale and scope of their research portfolio [2] and the numbers of their research faculty, thus making more individuals available for collaboration, either locally or remotely. As a wider range of collaborations is becoming recognized as valuable, many researchers are beginning to expand their collaborative horizons. At the same time, the Internet is making locating collaborators easier. In fact, modern communication and collaborative technologies increase the number of potential collaborators by making many remote collaborations once considered impractical feasible. At the same time, expertise location has been, and continues to be, a significant challenge for many organizations [3,4]. Scientists often turn to colleagues or the published literature to find collaborators [5]. However, these approaches do not scale well in the context of an increasing pool of potential collaborators. As the universe of potential collaborators and information about them grows, the time and effort needed to evaluate each collaborative opportunity remains the same. A newer method for finding collaborators is to use databases of researchers partially or exclusively designed for the purpose. Knowledge management systems of this type, which include “expertise locating systems,” [6] “knowledge communities,” [7] and “communities of practice,” [8] all embody, to varying degrees, the ability to find experts and, by extension, potential collaborators. The CSCW literature contains numerous examples of such systems [9-12]. Most of these systems are designed to help a person solve a specific problem at a particular point in time. However, scientists seeking collaborators face a bigger challenge. Not only are they looking for the most qualified expert, but they also plan to enter into a more or less long-term relationship. Evaluating an individual’s promise for such a relationship requires information, engagement and effort much beyond what is needed for finding an expert for singular (or even episodic) problem-solving. Only few reports of expertise location systems in academia have been published [11,13]. While many commercial offerings, such as the Community of Science (COS; www.cos.com), LinkedIn (www.linkedin.com), Index Copernicus Scientists (scientists.indexcopernicus.com), BiomedExperts (www.biomedexperts.com) and Research Crossroads (www.researchcrossroads.com), purport to make it easier to help scientists find collaborators, no reports in the literature describe how well these systems actually do so. The Semantic Web is a technology with significant promise to ameliorate the expertise location problem [14]. As individuals create an increasing number of “digital trails” of their work processes and products, more information about their activities and relationships becomes computationally accessible. However, expertise location systems that leverage data from the Semantic Web must be constructed with the needs and requirements of the end user in mind. We therefore have organized this paper in two parts. We first present a set of preliminary requirements for expertise location 1 NIH Roadmap for Medical Research, http://nihroadmap.nih.gov/ 2 Clinical and Translational Science Awards, http://www.ctsaweb.org/
systems for biomedical scientists. Second, we discuss the requirements in light of technological capabilities and challenges of the Semantic Web 2 Methods This study drew on several methodological approaches in order to develop a derstanding of how scientific collaborations are established and what requir should inform the design of expertise location systems. The methods we cluded(1)affinity diagramming of issues in scientific collaboration; (2)a literature review of expertise location in computer-supported cooperative work and other disci plines; (3)contextual inquiries with 10 biomedical scientists; and (4) findings from 30 semistructured interviews with biomedical scientists from a variety of disciplines To develop the affinity diagram, the members of the project team(which consisted of all authors) recorded thoughts ideas and observations regarding the establishment of scientific collaborations and then took turns arranging them into naturally-forming ategories. The team then rearranged the groups to form a hierarchy that revealed the hajor issues of the domain. The most prominent groups were then adopted as the foci of exploratory investigations, specifically the literature search and contextual inquir We searched the literature using keywords including"expertise locating sy expertise location systems, ""expertise management systems, edge communities,”“ knowledge management" and“ knowledge managemer ms, "communities of practice " and"virtual communities"in the field of biomedi cal research, informatics, computer science and information science. The databases we searched were medline. the isi Web of Science. the acm portal and the ieee Digital Library(all available years) Contextual inquiry(CD)[15] sessions were performed with ten researchers from a range of disciplines and levels of seniority at Carnegie Mellon University and the University of Pittsburgh. Because we could not directly observe researchers in the process of forming collaborations, we mainly focused on retrospective accounts. The contextual inquiries were complemented by findings from 30 semistructured inter- views with scientists. The interviews focused on current and previous collaborations locating collaborators, solving problems in research, and information needs and in- formation resource use of participants. Four faculty researchers(including three au- thors: TKS, HS, BB)and one staff member conducted the interviews individually with a convenience sample of scientists from the six Health Science Schools at the University of Pittsburgh. While conducting our background studies, we formulated a running list of re- quirements for systems that help optimize how scientists choose collaborators. We enerated this list using an approach similar to grounded theory [16]. in which models nd hypotheses are progressively inferred from the data. We kept a record of the evi dence that supported each requirement, e.g. statements of our study participants findings from the literature, as well as of factors that would modify its validity or ap- plicability. The studies conducted as part of this project were approved by the Unive
systems for biomedical scientists. Second, we discuss the requirements in light of technological capabilities and challenges of the Semantic Web. 2 Methods This study drew on several methodological approaches in order to develop a rich understanding of how scientific collaborations are established and what requirements should inform the design of expertise location systems. The methods we used included (1) affinity diagramming of issues in scientific collaboration; (2) a literature review of expertise location in computer-supported cooperative work and other disciplines; (3) contextual inquiries with 10 biomedical scientists; and (4) findings from 30 semistructured interviews with biomedical scientists from a variety of disciplines. To develop the affinity diagram, the members of the project team (which consisted of all authors) recorded thoughts, ideas and observations regarding the establishment of scientific collaborations and then took turns arranging them into naturally-forming categories. The team then rearranged the groups to form a hierarchy that revealed the major issues of the domain. The most prominent groups were then adopted as the foci of exploratory investigations, specifically the literature search and contextual inquiries. We searched the literature using keywords including “expertise locating systems,” “expertise location systems,” “expertise management systems,” “knowledge communities,” “knowledge management” and “knowledge management systems,” “communities of practice,” and “virtual communities” in the field of biomedical research, informatics, computer science and information science. The databases we searched were MEDLINE, the ISI Web of Science, the ACM Portal and the IEEE Digital Library (all available years). Contextual inquiry (CI) [15] sessions were performed with ten researchers from a range of disciplines and levels of seniority at Carnegie Mellon University and the University of Pittsburgh. Because we could not directly observe researchers in the process of forming collaborations, we mainly focused on retrospective accounts. The contextual inquiries were complemented by findings from 30 semistructured interviews with scientists. The interviews focused on current and previous collaborations, locating collaborators, solving problems in research, and information needs and information resource use of participants. Four faculty researchers (including three authors: TKS, HS, BB) and one staff member conducted the interviews individually with a convenience sample of scientists from the six Health Science Schools at the University of Pittsburgh. While conducting our background studies, we formulated a running list of requirements for systems that help optimize how scientists choose collaborators. We generated this list using an approach similar to grounded theory [16], in which models and hypotheses are progressively inferred from the data. We kept a record of the evidence that supported each requirement, e.g. statements of our study participants or findings from the literature, as well as of factors that would modify its validity or applicability. The studies conducted as part of this project were approved by the Univer-
sity of Pittsburgh Institutional Review Board(IRB approval numbers: 0612065 and PRO07050299) Once the list of requirements was final, we reviewed the literature about the Se mantic Web with a particular focus on expertise location. We used this literature to in form the di on of the capabilitie of the Semantic Web in light of 3 Results 3.1 Preliminary r nts for expertise locatio The following 10 requirements for expertise location systems have been ordered loosely in an attempt to group related items (1)The effort required to create and update nline profile should be ith ed benefit of the system. Many current online networking systems for scientists, such as the COs, require a nificant amount of effort to create and maintain a comprehensive profile. Many sts considered this investment of time and effort difficult to justify as there is no clear gain to being part of the system. Only a few researchers we interviewed, spe cifically junior ones or those new to the organization, indicated that COS and/or the Faculty Research Interests Project(FRIP)at the University of Pittsburgh [11] helped them find collaborators. Several commented that they had tried to use COS and/or FRIP, but abandoned them when their attempt at finding a collaborator through them was not successful (2)Online profiles should present rich and comprehensive information about otential collaborators in an organized manner to reduce the effort involved in naking collaboration decisions. The Internet makes a significant amount of information available about individual scientists, but unfortunately in a very fragmented and inhomogeneous manner. Our background research showed that at present, researchers sometimes use multiple in- formation sources such as mEdlinE, Google Scholar, the ISI Web of Science and other databases to evaluate a potential collaborator. Retrieving, collating and review- ng information from these sources. however, often takes more time and effort than the individual is willing to expend. An expertise location system should collate and organize this information and present it to collaboration seekers in an easy-to-use for- mat in order to reduce the effort involved in choosing collaborators ()Online profiles should to be up-to-date, because some information they contain has a short lifespan. At its core, choosing a collaborator is an attempt to predict how someone else will be- have in the future. While knowledge about past behavior can be useful for doing so
sity of Pittsburgh Institutional Review Board (IRB approval numbers: 0612065 and PRO07050299). Once the list of requirements was final, we reviewed the literature about the Semantic Web with a particular focus on expertise location. We used this literature to inform the discussion of the capabilities and challenges of the Semantic Web in light of the requirements we formulated. 3 Results 3.1 Preliminary requirements for expertise location systems in biomedical science The following 10 requirements for expertise location systems have been ordered loosely in an attempt to group related items. (1) The effort required to create and update an online profile should be commensurate with the perceived benefit of the system. Many current online networking systems for scientists, such as the COS, require a significant amount of effort to create and maintain a comprehensive profile. Many scientists considered this investment of time and effort difficult to justify as there is no clear gain to being part of the system. Only a few researchers we interviewed, specifically junior ones or those new to the organization, indicated that COS and/or the Faculty Research Interests Project (FRIP) at the University of Pittsburgh [11] helped them find collaborators. Several commented that they had tried to use COS and/or FRIP, but abandoned them when their attempt at finding a collaborator through them was not successful. (2) Online profiles should present rich and comprehensive information about potential collaborators in an organized manner to reduce the effort involved in making collaboration decisions. The Internet makes a significant amount of information available about individual scientists, but unfortunately in a very fragmented and inhomogeneous manner. Our background research showed that at present, researchers sometimes use multiple information sources such as MEDLINE, Google Scholar, the ISI Web of Science and other databases to evaluate a potential collaborator. Retrieving, collating and reviewing information from these sources, however, often takes more time and effort than the individual is willing to expend. An expertise location system should collate and organize this information and present it to collaboration seekers in an easy-to-use format in order to reduce the effort involved in choosing collaborators. (3) Online profiles should to be up-to-date, because some information they contain has a short lifespan. At its core, choosing a collaborator is an attempt to predict how someone else will behave in the future. While knowledge about past behavior can be useful for doing so
the value of this information declines with time. Out-of-date profiles reduce the use- fulness of information that collaboration seekers require. On the other hand, not all in- formation in a profile is subject to the same rate of decay. Information about prof sional degrees of a collaborator tends to be relatively static, while publication topics and activity may not always reflect an individuals current research focus and produc (4)Researchers should be able to exploit their own and others'social networks when searching for collaborators. Social networks have been suggested as important structures for finding expertise and nformation [17]. Established researchers often use existing connections with col- agues as their primary resource for locating new collaborators. Junior researchers with few or no contacts within the desired field, may have significant difficulty initi- ating collaborations that Many scientists in our study indicated they are more likely to contact a colleague whom they think will know someone with the required expertise than cold-call a stranger. In addition, many emphasized the key role that deans, department heads and other well-connected individuals in the organization play n helping establish collaborations. The advantages of a mediated form of contact are that it may make it more likely that two parties will be compatible, increase the chances of a timely response, and provide a less intimidating method of contact. The system should model proximity, which influences the potential success of collaboration in several respect Physical proximity, social proximity, organizational proximity, and proximity in terms of shared research interests are all aspects of"proximity "that can affect the outcome of collaborations. Physical proximity provides access to potential collabora- tors, and allows the collaboration seeker to make informal and unobtrusive nents about compatibility. In the absence of physical proximity, shared research in terests andor common organizational or research communities can serve as similarity of work styles and other"soft" traits influencing collaborations. Our background research indicated that personal compatibility and similar work style are important factors determining the success of collaborations. The literature also in dicates that more than a simple overlap of interests is needed to create a successful collaboration. Expertise location systems should therefore facilitate an assessment of these factors, for instance, by identifying social connections (7) Social networks solely based on co-authorship may only partially describe a revious attempts to automatically describe a researchers collaborative network based on co-authorship of papers were only partially successful [18, 19]. Although co- authorship seems to be a good starting point for describing a collaboration network, it should be supplemented and validated by other data. Ideally, expertise location sys-
the value of this information declines with time. Out-of-date profiles reduce the usefulness of information that collaboration seekers require. On the other hand, not all information in a profile is subject to the same rate of decay. Information about professional degrees of a collaborator tends to be relatively static, while publication topics and activity may not always reflect an individual's current research focus and productivity. (4) Researchers should be able to exploit their own and others’ social networks when searching for collaborators. Social networks have been suggested as important structures for finding expertise and information [17]. Established researchers often use existing connections with colleagues as their primary resource for locating new collaborators. Junior researchers, with few or no contacts within the desired field, may have significant difficulty initiating collaborations that way. Many scientists in our study indicated they are more likely to contact a colleague whom they think will know someone with the required expertise than cold-call a stranger. In addition, many emphasized the key role that deans, department heads and other well-connected individuals in the organization play in helping establish collaborations. The advantages of a mediated form of contact are that it may make it more likely that two parties will be compatible, increase the chances of a timely response, and provide a less intimidating method of contact. (5) The system should model proximity, which influences the potential success of collaboration in several respects. Physical proximity, social proximity, organizational proximity, and proximity in terms of shared research interests are all aspects of “proximity” that can affect the outcome of collaborations. Physical proximity provides access to potential collaborators, and allows the collaboration seeker to make informal and unobtrusive assessments about compatibility. In the absence of physical proximity, shared research interests and/or common organizational or research communities can serve as surrogates. (6) The system should facilitate the assessment of personal compatibility, similarity of work styles and other “soft” traits influencing collaborations. Our background research indicated that personal compatibility and similar work style are important factors determining the success of collaborations. The literature also indicates that more than a simple overlap of interests is needed to create a successful collaboration. Expertise location systems should therefore facilitate an assessment of these factors, for instance, by identifying social connections. (7) Social networks solely based on co-authorship may only partially describe a researcher’s collaborative network. Previous attempts to automatically describe a researcher’s collaborative network based on co-authorship of papers were only partially successful [18,19]. Although coauthorship seems to be a good starting point for describing a collaboration network, it should be supplemented and validated by other data. Ideally, expertise location sys-