14 J-w. Ahn and P Brusilovsky other types of user interfaces and visualizations, including force-directed visual- ization and NE-based personalized browsing/searching. At the same time, more sophisticated concept-based user modeling ideas are being investigated References 1. Ahn, J, Brusilovsky, P. Adaptive visualization of search results: Bringing user nodels to visual analytics. Information Visualization 8(3), 167-179(2009) 2. Ahn, J, Brusilovsky, P. Grady, J., He, D, Florian, R: Semantic annotation based exploratory search for information analysts. In: Information Processing and Man- agement(in press, 2010) 3. Ahn, J . Brusilovsky, P, He, D, Grady, J, Li, Q. Personalized web explo- ation with task models. In: Huai, J, Chen, R, Hon, H.W., Liu, Y, Ma, W.Y Tomkins, A, Zhang, X(eds )Proceedings of the 17th International Conference on World Wide Web, WWw 2008, Beijing, China, April 21-25, pp. 1-10. ACM New York(2008) Bier, E.A., Ishak, E.W., Chi, E: Entity workspace: An evidence file that ds memory, inference, and reading. In: Mehrotra, S, Zeng, D.D., Chen, H Thuraisingham, B M, Wang, FY(eds )ISI 2006. LNCS, vol 3975, pp. 466-472 Springer, Heidelberg(2006) 5. Chen, C.C., Chen, M.C., Sun, Y: Pva: a self-adaptive personal view agent system In: KDD ' 01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 257-262. ACM Press, New York (2001) 6. Davies, D L, Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach Intell. PAMI 1(2), 224-227(2009) 7. Florian, R, Hassan, H, Ittycheriah, A, Jing, H, Kambhatla, N, Luo, X Nicolov, H, Roukos, S, Zhang, T. A statistical model for multilingual entity detec- ion and tracking. In: Proceedings of the Human Language Technologies Conference (HLT-NAACL'04), Boston, MA, USA, May 2004, pp. 1-8(2004) 8. Gauch, S, Speretta, M, Chandramouli, A, Micarelli, A: User profiles for person ized information access. In: Brusilovsky, P, Kobsa, A, Nejdl, w.(eds )Adaptive Web 2007. LNCS, vol. 4321, pp. 54-89. Springer, Heidelberg(2007) 9. Gentili, G, Micarelli, A Sciarrone, F. Infoweb: An adaptive information filtering system for the cultural heritage domain. Applied Artificial Intelligence 17(8-9) 15-744(2003 10. Hanani, U, Shapira, B, Shoval, P. Information filtering: Overview of issues, re- search and systems. User Modeling and User-Adapted Interaction 11(3), 203-259 (2001) 11. Korfhage, R. R. To see, or not to see -is that the query? In: SIGIR 91: Pro- eedings of the 14th annual international ACM SIGIR conference on Research and levelopment in information retrieval, pp. 134-141. ACM, New York(1991) Kumaran, G, Allan, J. Text classification and named entities for new event de- tection. In: SIGIR 04: Pro gs of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 297-304 ACM, New York(2004) 13. Leuski, A, Allan, J. Interactive information retrieval using clustering and spatial proximity. User Modeling and User-Adapted Interaction 14(2-3), 259-288(2004)
14 J.-w. Ahn and P. Brusilovsky other types of user interfaces and visualizations, including force-directed visualization and NE-based personalized browsing/searching. At the same time, more sophisticated concept-based user modeling ideas are being investigated. References 1. Ahn, J., Brusilovsky, P.: Adaptive visualization of search results: Bringing user models to visual analytics. Information Visualization 8(3), 167–179 (2009) 2. Ahn, J., Brusilovsky, P., Grady, J., He, D., Florian, R.: Semantic annotation based exploratory search for information analysts. In: Information Processing and Management (in press, 2010) 3. Ahn, J., Brusilovsky, P., He, D., Grady, J., Li, Q.: Personalized web exploration with task models. In: Huai, J., Chen, R., Hon, H.W., Liu, Y., Ma, W.Y., Tomkins, A., Zhang, X. (eds.) Proceedings of the 17th International Conference on World Wide Web, WWW 2008, Beijing, China, April 21-25, pp. 1–10. ACM, New York (2008) 4. Bier, E.A., Ishak, E.W., Chi, E.: Entity workspace: An evidence file that aids memory, inference, and reading. In: Mehrotra, S., Zeng, D.D., Chen, H., Thuraisingham, B.M., Wang, F.Y. (eds.) ISI 2006. LNCS, vol. 3975, pp. 466–472. Springer, Heidelberg (2006) 5. Chen, C.C., Chen, M.C., Sun, Y.: Pva: a self-adaptive personal view agent system. In: KDD ’01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 257–262. ACM Press, New York (2001) 6. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI 1(2), 224–227 (2009) 7. Florian, R., Hassan, H., Ittycheriah, A., Jing, H., Kambhatla, N., Luo, X., Nicolov, H., Roukos, S., Zhang, T.: A statistical model for multilingual entity detection and tracking. In: Proceedings of the Human Language Technologies Conference (HLT-NAACL’04), Boston, MA, USA, May 2004, pp. 1–8 (2004) 8. Gauch, S., Speretta, M., Chandramouli, A., Micarelli, A.: User profiles for personalized information access. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) Adaptive Web 2007. LNCS, vol. 4321, pp. 54–89. Springer, Heidelberg (2007) 9. Gentili, G., Micarelli, A., Sciarrone, F.: Infoweb: An adaptive information filtering system for the cultural heritage domain. Applied Artificial Intelligence 17(8-9), 715–744 (2003) 10. Hanani, U., Shapira, B., Shoval, P.: Information filtering: Overview of issues, research and systems. User Modeling and User-Adapted Interaction 11(3), 203–259 (2001) 11. Korfhage, R.R.: To see, or not to see – is that the query? In: SIGIR ’91: Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 134–141. ACM, New York (1991) 12. Kumaran, G., Allan, J.: Text classification and named entities for new event detection. In: SIGIR ’04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 297–304. ACM, New York (2004) 13. Leuski, A., Allan, J.: Interactive information retrieval using clustering and spatial proximity. User Modeling and User-Adapted Interaction 14(2-3), 259–288 (2004)
Can Concept-Based User Modeling Improve Adaptive Visualization? 15 14. Magnini, B, Strapparava, C. User modelling for news web sites with word sense based techniques. User Modeling and User-Adapted Interaction 14(2), 239-25 (2004) 15. Marchionini, G: Exploratory search: from finding to understanding. Comm ACM49(4),41-46(2006) 16. Micarelli, A, Gasparetti, F, Sciarrone, F, Gauch, S: Personalized search on the world wide web. In: Brusilovsky, P, Kobsa, A, Nejdl, W.(eds )Adaptive Web 2007. LNCS, vol 4321, pp. 195-230. Springer, Heidelberg(2007) Mihalcea, R, Moldovan, D L: Document indexing using named entities. Studies in Informatics and Control 10(1), 21-28(2001) Olsen, K.A., Korfhage, R, Sochats, K M. Spring, M. B, Williams, J.G. Visual- ization of a document collection: The vibe system. Information Processing and Management29(1),6981(1993) 19. Petkova, D, Croft, B.W.: Proximity-based document representation for entity retrieval. In: CIKM 07: Proceedings of the sixteenth ACM confe on Conference on information and knowledge management, pp. 731-74 New York(2007) 20. Pretschner, A, Gauch, S: Ontology based personalized search. In: 1lth IEEE Intl. Conf. on Tools with Artificial Intelligence(ICTAI'99), Chicago, IL, pp 391-398 21. Roussinov, D, Ramsey, M. Information forage through adaptive visualization. In DL 98: Proceedings of the third ACM conference on Digital libraries, pp 303-304 ACM, New York(1998)
Can Concept-Based User Modeling Improve Adaptive Visualization? 15 14. Magnini, B., Strapparava, C.: User modelling for news web sites with word sense based techniques. User Modeling and User-Adapted Interaction 14(2), 239–257 (2004) 15. Marchionini, G.: Exploratory search: from finding to understanding. Commun. ACM 49(4), 41–46 (2006) 16. Micarelli, A., Gasparetti, F., Sciarrone, F., Gauch, S.: Personalized search on the world wide web. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) Adaptive Web 2007. LNCS, vol. 4321, pp. 195–230. Springer, Heidelberg (2007) 17. Mihalcea, R., Moldovan, D.L.: Document indexing using named entities. Studies in Informatics and Control 10(1), 21–28 (2001) 18. Olsen, K.A., Korfhage, R., Sochats, K.M., Spring, M.B., Williams, J.G.: Visualization of a document collection: The vibe system. Information Processing and Management 29(1), 69–81 (1993) 19. Petkova, D., Croft, B.W.: Proximity-based document representation for named entity retrieval. In: CIKM ’07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 731–740. ACM, New York (2007) 20. Pretschner, A., Gauch, S.: Ontology based personalized search. In: 11th IEEE Intl. Conf. on Tools with Artificial Intelligence (ICTAI’99), Chicago, IL, pp. 391–398 (1999) 21. Roussinov, D., Ramsey, M.: Information forage through adaptive visualization. In: DL ’98: Proceedings of the third ACM conference on Digital libraries, pp. 303–304. ACM, New York (1998)
Interweaving Public User Profiles on the Web Fabian Abel. Nicola Henze. Eelco herder and Daniel Krause TVS- Semantic Web group L3s Research Center Leibniz University Hannover, Germany Habel, henze, herder, krause/@13s.de bstract. While browsing the Web, providing profile information social networking services, or tagging pictures, users leave a plethora of races. In this paper, we analyze the nature of these traces. We inves- tigate how user data is distributed across different Web systems, and examine ways to aggregate user profile information. Our analyses focus on both explicitly provided profile information(name, homepage, etc. and activity data(tags assigned to bookmarks or images). The experi- ments reveal significant benefits of interweaving profile information: more complete profiles, advanced FOAF/vCard profile generation, disclosure of new facets about users, higher level of self-information induced by the profiles, and higher precision for predicting tag-based profiles to solve the cold start problem. 1 Introduction In order to adapt functionality to the individual users, systems need information about their users 1. The Web provides opportunities to gather such information users leave a plethora of traces on the Web, varying from profile data to tags. In this paper we analyze the nature of these distributed user data traces and inves- tigate the advantages of interweaving publicly available profile data originating from different sources: social networking services(Facebook, LinkedIn), social media services(Flickr, Delicious, StumbleUpon, Twitter) and others(Google) The main research question that we will answer in this paper is the following what are the benefits of aggregating these public user profile traces? n our experiments we analyze the characteristics of both traditional profiles which are explicitly filled by the end-users with information about their names skills or homepages (see Sect. 3)-as well as rather implicitly generated tag based profiles(see Sect. 4). We show that the aggregation of profile data reveals new facets about the users and present approaches to leverage such additional information gained by profile aggregation. We made all approaches and findings presented in this paper available for the public via the Mypes service: it enables users to inspect their distributed profiles and provides access to the aggregated nd semantically enriched profiles via a RESTful AP. http://mypes.groupme.org P. De Bra, A. Kobsa, and D. Chin(Eds ) UMAP 2010, LNCS 6075, pp. 16-27, 2010
Interweaving Public User Profiles on the Web Fabian Abel, Nicola Henze, Eelco Herder, and Daniel Krause IVS – Semantic Web Group & L3S Research Center, Leibniz University Hannover, Germany {abel,henze,herder,krause}@l3s.de Abstract. While browsing the Web, providing profile information in social networking services, or tagging pictures, users leave a plethora of traces. In this paper, we analyze the nature of these traces. We investigate how user data is distributed across different Web systems, and examine ways to aggregate user profile information. Our analyses focus on both explicitly provided profile information (name, homepage, etc.) and activity data (tags assigned to bookmarks or images). The experiments reveal significant benefits of interweaving profile information: more complete profiles, advanced FOAF/vCard profile generation, disclosure of new facets about users, higher level of self-information induced by the profiles, and higher precision for predicting tag-based profiles to solve the cold start problem. 1 Introduction In order to adapt functionality to the individual users, systems need information about their users [1]. The Web provides opportunities to gather such information: users leave a plethora of traces on the Web, varying from profile data to tags. In this paper we analyze the nature of these distributed user data traces and investigate the advantages of interweaving publicly available profile data originating from different sources: social networking services (Facebook, LinkedIn), social media services (Flickr, Delicious, StumbleUpon, Twitter) and others (Google). The main research question that we will answer in this paper is the following: what are the benefits of aggregating these public user profile traces? In our experiments we analyze the characteristics of both traditional profiles – which are explicitly filled by the end-users with information about their names, skills or homepages (see Sect. 3) – as well as rather implicitly generated tagbased profiles (see Sect. 4). We show that the aggregation of profile data reveals new facets about the users and present approaches to leverage such additional information gained by profile aggregation. We made all approaches and findings presented in this paper available for the public via the Mypes1 service: it enables users to inspect their distributed profiles and provides access to the aggregated and semantically enriched profiles via a RESTful API. 1 http://mypes.groupme.org/ P. De Bra, A. Kobsa, and D. Chin (Eds.): UMAP 2010, LNCS 6075, pp. 16–27, 2010. c Springer-Verlag Berlin Heidelberg 2010
Inter ng Public user Profiles on the web 2 Related work Connecting data from different sources and services is in line with todays Web 2.0 trend of creating mashups of various applications 2 Support for the development of interoperable services is provided by initiatives such as the data- portability project standardization of APIs(e.g. Open Social)and authentica tion and authorization protocols(e. g. OpenID, OAuth), as well as by(Semantic Web standards such as rdF, RSS and specific Microformats. Further, it become easier to connect distributed user profiles--including social connections--due to the increasing take-up of standards like FOAF [31, SIOC, or GUMO [4].Con- version approaches allow for fexible user modeling 5. Solutions for user iden tification form the basis for personalization across application boundaries 6 Google's Social Graph APi enables application developers to obtain the social connections of an individual user across different services. Generic user model g servers such as CUMULAtE 7 or PersonIs 8 as well as frameworks fo mashing up profile information 9 appear that facilitate handling of aggregated user data. Given these developments, it becomes more and more important to investigate the benefits of user profile aggregation in context of today's Web In [10, Szomszor et al. present an approach to combine profiles generated wo different tagging platforms to obtain richer interest profiles; Stewart et al demonstrate the benefits of combining blogging data and tag assignments from Last. fm to improve the quality of music recommendations 11]. In this paper w do not only analyze the benefits of aggregating tag-based user profiles [12, 13 which we enrich with Wordnet facets, but also consider explicitly provided profiles coming from five different social networking and social media services 3 Traditional profile data on the web Currently, users need to manually enter their profile attributes in ate Web system. These attributes--such as the users full name, current affiliations. or the location they are living at--are particularly important for social net working services such as LinkedIn or Facebook, but may be considered as less portant in services such as Twitter. In our analysis, we measure to which de gree users fill in their profile attributes in different services. To investigate th benefits of profile aggregation we address the following questions 1. How detailed do users fill in their public profiles at social networking and social media services? 2. Does the aggregated user profile reveal more information about a particular user than the profile created in some specific service? dataportability. org rg/sioc/spec/ http://socialgraph.apisgoogle.com 5http://wordnet.princeton.edu
Interweaving Public User Profiles on the Web 17 2 Related Work Connecting data from different sources and services is in line with today’s Web 2.0 trend of creating mashups of various applications [2]. Support for the development of interoperable services is provided by initiatives such as the dataportability project2, standardization of APIs (e.g. OpenSocial) and authentication and authorization protocols (e.g. OpenID, OAuth), as well as by (Semantic) Web standards such as RDF, RSS and specific Microformats. Further, it becomes easier to connect distributed user profiles—including social connections—due to the increasing take-up of standards like FOAF [3], SIOC3, or GUMO [4]. Conversion approaches allow for flexible user modeling [5]. Solutions for user identification form the basis for personalization across application boundaries [6]. Google’s Social Graph API4 enables application developers to obtain the social connections of an individual user across different services. Generic user modeling servers such as CUMULATE [7] or PersonIs [8] as well as frameworks for mashing up profile information [9] appear that facilitate handling of aggregated user data. Given these developments, it becomes more and more important to investigate the benefits of user profile aggregation in context of today’s Web scenery. In [10], Szomszor et al. present an approach to combine profiles generated in two different tagging platforms to obtain richer interest profiles; Stewart et al. demonstrate the benefits of combining blogging data and tag assignments from Last.fm to improve the quality of music recommendations [11]. In this paper we do not only analyze the benefits of aggregating tag-based user profiles [12, 13], which we enrich with Wordnet5 facets, but also consider explicitly provided profiles coming from five different social networking and social media services. 3 Traditional Profile Data on the Web Currently, users need to manually enter their profile attributes in each separate Web system. These attributes—such as the user’s full name, current affiliations, or the location they are living at—are particularly important for social networking services such as LinkedIn or Facebook, but may be considered as less important in services such as Twitter. In our analysis, we measure to which degree users fill in their profile attributes in different services. To investigate the benefits of profile aggregation we address the following questions. 1. How detailed do users fill in their public profiles at social networking and social media services? 2. Does the aggregated user profile reveal more information about a particular user than the profile created in some specific service? 2 http://www.dataportability.org/ 3 http://rdfs.org/sioc/spec/ 4 http://socialgraph.apis.google.com 5 http://wordnet.princeton.edu/
F. Abel et al 3. Can the aggregated profile data be used to enrich an incomplete profile in an individual service? 4. To which extent can the service-specific profiles and the aggregated profile be applied to fill up standardized profiles such as FOAF 3 and v Card [14? 3.1 Dataset To answer the questions above, we crawled the public profiles of 116032 distinct users via the Social Graph API. People who have a google account can explic itly link their different accounts and Web sites: the Social Graph API allows developers to look up the different accounts of a particular user. On average, the 116032 users linked 1.26 accounts while 70963 did not link any account For our analysis on traditional profiles we were interested in popular services where users can have public profiles. We therefore focused on the social network- g services Facebook and LinkedIn, as well as on Twitter, Flickr, and Google Figure 1(a) lists the number of public profiles and the concrete profile attributes we obtained from each service. We did not consider private information, but only crawled attributes that were publicly available. Among the users for whom w crawled the Facebook, LinkedIn, Twitter, Flickr, and Google profiles were 338 sers who had an account at all five different services 3.2 Individual Profiles and Profile Aggregation The completeness of the profiles varies from service to service. The public profiles available in the social networking sites Facebook and LinkedIn are filled more accurately than the Twitter, Flickr, or Google profiles--see Fig. 1(b). Although Twitter does not ask many attributes for its user profile, users completed their profile up to just 48.9% on average. In particular the location and homepage which can also be a URL to another profile page, such as My Space-are omitted most often. By contrast, the average Facebook and LinkedIn profile is filled up to 85.4% and 82.6% respectively. Obviously, some user data is replicated at multiple services: name and profile picture are specified at nearly all services, location was provided at 2, 9 out of five services. However, inconsistencies can be found in the data: for example, 37. 3% of the users' full names in Facebook are not exactly the same as the ones specified at Twitter For each user we aggregated the public profile information from Facebook LinkedIn, Twitter, Flickr, and Google, i. e. for each user we gathered attribute value pairs and mapped them to a uniform user model. Aggregated profiles reveal more facets(17 distinct attributes)about the users than the public profiles avail- able in each separate service. On average, the completeness of the aggregated profile is 83.3%: more than 14 attributes are filled with meaningful values. As a comparison, this is 7.6 for Facebook, 8.2 for LinkedIn and 3.3 for Flickr. Ag- gregated profiles therewith reveal significantly more information about the users than the public profiles of the single services Further, profile aggregation enables completion of the profiles available at the specific services. For example, by enriching the incomplete Twitter profiles with
18 F. Abel et al. 3. Can the aggregated profile data be used to enrich an incomplete profile in an individual service? 4. To which extent can the service-specific profiles and the aggregated profile be applied to fill up standardized profiles such as FOAF [3] and vCard [14]? 3.1 Dataset To answer the questions above, we crawled the public profiles of 116032 distinct users via the Social Graph API. People who have a Google account can explicitly link their different accounts and Web sites; the Social Graph API allows developers to look up the different accounts of a particular user. On average, the 116032 users linked 1.26 accounts while 70963 did not link any account. For our analysis on traditional profiles we were interested in popular services where users can have public profiles. We therefore focused on the social networking services Facebook and LinkedIn, as well as on Twitter, Flickr, and Google. Figure 1(a) lists the number of public profiles and the concrete profile attributes we obtained from each service. We did not consider private information, but only crawled attributes that were publicly available. Among the users for whom we crawled the Facebook, LinkedIn, Twitter, Flickr, and Google profiles were 338 users who had an account at all five different services. 3.2 Individual Profiles and Profile Aggregation The completeness of the profiles varies from service to service. The public profiles available in the social networking sites Facebook and LinkedIn are filled more accurately than the Twitter, Flickr, or Google profiles—see Fig. 1(b). Although Twitter does not ask many attributes for its user profile, users completed their profile up to just 48.9% on average. In particular the location and homepage— which can also be a URL to another profile page, such as MySpace—are omitted most often. By contrast, the average Facebook and LinkedIn profile is filled up to 85.4% and 82.6% respectively. Obviously, some user data is replicated at multiple services: name and profile picture are specified at nearly all services, location was provided at 2,9 out of five services. However, inconsistencies can be found in the data: for example, 37.3% of the users’ full names in Facebook are not exactly the same as the ones specified at Twitter. For each user we aggregated the public profile information from Facebook, LinkedIn, Twitter, Flickr, and Google, i.e. for each user we gathered attributevalue pairs and mapped them to a uniform user model. Aggregated profiles reveal more facets (17 distinct attributes) about the users than the public profiles available in each separate service. On average, the completeness of the aggregated profile is 83.3%: more than 14 attributes are filled with meaningful values. As a comparison, this is 7.6 for Facebook, 8.2 for LinkedIn and 3.3 for Flickr. Aggregated profiles therewith reveal significantly more information about the users than the public profiles of the single services. Further, profile aggregation enables completion of the profiles available at the specific services. For example, by enriching the incomplete Twitter profiles with