Intelligent User profiling Silvia schiaffino 2 and Analia amandil I ISISTAN Research Institute. Universidad Nacional del Centro de la Provincia de buenos Aires, Campus Universitario, Argentina CONICET, Consejo Nacional de Investigaciones Cientificas y Tecnicas, Argentina [sschia, amandi ]@exa.uniceneduar Abstract. User profiles or user models are vital in many areas in which it is essential to obtain knowledge about users of software applications. Exam- ples of these areas are intelligent agents, adaptive systems, intelligent tutor ing systems, recommender systems, intelligent e-commerce applications, and knowledge management systems. In this chapter we study the main is- es from the perspectives of these research fields We examine what information constitutes a user profile, how the user pro file is represented; how the user profile is acquired and built; and how the profile information is used. We also discuss some challenges and future rends in the intelligent user profiling area ntroduction a profile is a description of someone containing the most important or interesting facts about him or her. In the context of users of software applications, a user profile or user model contains essential information about an individual user. The motivation of building user profiles is that users differ in their preferences, inter- ests, background and goals when using software applications. Discovering these differences is vital to providing users with personalized services The content of a user profile varies from one application domain to another. For xample, if we consider an online newspaper domain, the user profile contains the types of news(topics) the user likes to read, the types of news(topics) the user does not like to read, the newspapers he usually reads, and the user's reading hab- its and patterns. In a calendar management domain the user profile contains in- formation about the dates and times when the user usually schedules each type activity in which he is involved, the priorities each activity feature has for the user, the relevance of each user contact and the user's scheduling and rescheduling habits. In other domains personal information about the user, such as name, age, b, and hobbies might be important Not only the content of user profiles differs from one domain to also how the information they contain is acquired. The content of a user profile can be explicitly provided by the user or it has to be learned using some intelligent M. Bramer(Ed ) Artificial Intelligence, LNAI 5640, pp. 193-216, 2009 o Springer- Verlag Berlin Heidelberg 2009
M. Bramer (Ed.): Artificial Intelligence, LNAI 5640, pp. 193 – 216, 2009. © Springer-Verlag Berlin Heidelberg 2009 Intelligent User Profiling Silvia Schiaffino1,2 and Analía Amandi1,2 1 ISISTAN Research Institute, Universidad Nacional del Centro de la Provincia de Buenos Aires, Campus Universitario, Argentina 2 CONICET, Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina {sschia,amandi}@exa.unicen.edu.ar Abstract. User profiles or user models are vital in many areas in which it is essential to obtain knowledge about users of software applications. Examples of these areas are intelligent agents, adaptive systems, intelligent tutoring systems, recommender systems, intelligent e-commerce applications, and knowledge management systems. In this chapter we study the main issues regarding user profiles from the perspectives of these research fields. We examine what information constitutes a user profile; how the user profile is represented; how the user profile is acquired and built; and how the profile information is used. We also discuss some challenges and future trends in the intelligent user profiling area. 1 Introduction A profile is a description of someone containing the most important or interesting facts about him or her. In the context of users of software applications, a user profile or user model contains essential information about an individual user. The motivation of building user profiles is that users differ in their preferences, interests, background and goals when using software applications. Discovering these differences is vital to providing users with personalized services. The content of a user profile varies from one application domain to another. For example, if we consider an online newspaper domain, the user profile contains the types of news (topics) the user likes to read, the types of news (topics) the user does not like to read, the newspapers he usually reads, and the user's reading habits and patterns. In a calendar management domain the user profile contains information about the dates and times when the user usually schedules each type of activity in which he is involved, the priorities each activity feature has for the user, the relevance of each user contact and the user's scheduling and rescheduling habits. In other domains personal information about the user, such as name, age, job, and hobbies might be important. Not only the content of user profiles differs from one domain to another, but also how the information they contain is acquired. The content of a user profile can be explicitly provided by the user or it has to be learned using some intelligent
194 S Schiaffino and A. amandi technique. User profiling implies inferring unobservable information about users from observable information about them, that is, their actions or utterances(Zu- kerman and Albrecht, 2001). A wide variety of Artificial Intelligence techniques have been used for user profiling, such as case-based reasoning(Lenz et al, 1998 Godoy et al., 2004), Bayesian networks(Horvitz et al, 1998; Conati et al, 2002 Schiaffino and Amandi, 2005; Garcia et al, 2007), association rules(Adomavicius and Tuzhilin, 2001; Schiaffino and Amandi, 2006), genetic algorithms(Moukas 1996, Yannibelli et al, 2006), neural networks(Yasdi, 1999, Villaverde et al, 2006), The purpose of obtaining user profiles is also different in the various areas that use them. In adaptive systems, the user profile is used to provide the adaptation effect, that is to behave differently for different users(Brusilovsky and Millan, 2007). In intelligent agents, particularly in interface agents, the user profile is used to provide personalized assistance to users with respect to some software applica- tion(Maes, 1994). In intelligent tutoring systems, the user profile or student model is used to guide students in their learning process according to their knowledge and learning styles( Garcia et al, 2007). In e-commerce applications the user or customer profile is used to make personalized offers and to suggest or recommend products the user is supposed to like( Adomavicius and Tuzhilin, 2001). In knowl edge management systems, the skills a user or employee has, the roles he takes within an organization, and his performance in these roles are used by managers or project leaders to assign him to the job position that suits him best(Sure et al 2000). In recommender systems the user profile contains ratings for items like mov ies, news or books, which are used to recommend potentially interesting items to him and to other users with similar tastes or interests(Resnick and Varian, 1997) In this Chapter we study user profiles from the different perspectives mentioned above. In Section 2 we describe what information constitutes a user profile. In Section 3 we examine the different ways in which we can acquire informatio about a user and then build a user profile. Section 4 focuses on intelligent user profiling techniques. Finally, Section 5 presents some future trends 2 User Profile contents A user profile is a representation of information about an individual user that is essential for the(intelligent)application we are considering. This section describes the most common contents of user profiles: user interests; the user's knowledge, background and skills; the user's goals; user behaviour; the users interaction preferences; the user's individual characteristics; and the users context. We ana- lyze and provide examples for the different contents in areas like intelligent agents, adaptive systems, intelligent tutoring systems, recommender systems, and knowledge management systems
194 S. Schiaffino and A. Amandi technique. User profiling implies inferring unobservable information about users from observable information about them, that is, their actions or utterances (Zukerman and Albrecht, 2001). A wide variety of Artificial Intelligence techniques have been used for user profiling, such as case-based reasoning (Lenz et al, 1998; Godoy et al., 2004), Bayesian networks (Horvitz et al, 1998; Conati et al, 2002; Schiaffino and Amandi, 2005; Garcia et al, 2007), association rules (Adomavicius and Tuzhilin, 2001; Schiaffino and Amandi, 2006), genetic algorithms (Moukas, 1996; Yannibelli et al, 2006), neural networks (Yasdi, 1999; Villaverde et al, 2006), among others. The purpose of obtaining user profiles is also different in the various areas that use them. In adaptive systems, the user profile is used to provide the adaptation effect, that is to behave differently for different users (Brusilovsky and Millán, 2007). In intelligent agents, particularly in interface agents, the user profile is used to provide personalized assistance to users with respect to some software application (Maes, 1994). In intelligent tutoring systems, the user profile or student model is used to guide students in their learning process according to their knowledge and learning styles (Garcia et al, 2007). In e-commerce applications the user or customer profile is used to make personalized offers and to suggest or recommend products the user is supposed to like (Adomavicius and Tuzhilin, 2001). In knowledge management systems, the skills a user or employee has, the roles he takes within an organization, and his performance in these roles are used by managers or project leaders to assign him to the job position that suits him best (Sure et al, 2000). In recommender systems the user profile contains ratings for items like movies, news or books, which are used to recommend potentially interesting items to him and to other users with similar tastes or interests (Resnick and Varian, 1997). In this Chapter we study user profiles from the different perspectives mentioned above. In Section 2 we describe what information constitutes a user profile. In Section 3 we examine the different ways in which we can acquire information about a user and then build a user profile. Section 4 focuses on intelligent user profiling techniques. Finally, Section 5 presents some future trends. 2 User Profile Contents A user profile is a representation of information about an individual user that is essential for the (intelligent) application we are considering. This section describes the most common contents of user profiles: user interests; the user’s knowledge, background and skills; the user’s goals; user behaviour; the user’s interaction preferences; the user’s individual characteristics; and the user’s context. We analyze and provide examples for the different contents in areas like intelligent agents, adaptive systems, intelligent tutoring systems, recommender systems, and knowledge management systems
ntelligent User profi 195 2.1 Interests User interests are one of the most important (and typically the only) part of the user profile in information retrieval and filtering systems, recommender systems, some interface agents, and adaptive systems that are information-driven such as encyclopedias, museum guides, and news systems(Brusilovsky and Millan 2007). Interests can represent news topics, web page topics, document topics, work-related topics or hobbies-related topics. Sometimes user interests are classi fied as short-term interests or long-term interests. The interest of users in football may be a short-term interest if the user reads or listens to news about this topic only during the World Cup, or a long-term interest if the user is always interested in this topic. For example, Newsdude(Billsus and Pazzani, 1999), an interface agent that learns about a users interests in daily news stories, considers informa- tion about recent events as short-term interests, and a users general preferences for news stories as long-term interests The most common representation of user interests are keyword-based models In these models interests are represented by weighted vectors of keywor Weights traditionally represent the relevance of the word for the user or within the topic. These representations are common in the Information Filtering and Informa- tion Retrieval areas. For example Letizia (lieberman et al, 2001a), a browsing assistant,uses TF-IDF (term frequency/inverse document frequency) vectors to model user interests. In this technique the weight of each word is calculated by comparing the word frequency in a document against the word frequency in all the documents in a corpus(Salton and McGill, 1983). This technique is also used NewsDude(Billsus and Pazzani, 1999), where news stories are converted to tF IDF vectors A more powerful representation of user interests is through topic hierarchies Godoy et al, 2004). Each node in the hierarchy represents a topic of interest for a user, which is defined by a set of representative words. This representation tech- nique is important when we want to model not only general user interests such as of these interests that are relevant to a given user. For example, the user profile can indicate that a certain user is inter- ested in documents talking about a famous football player and not in sports or football in general. An example of a topic hierarchy containing a users interests is shown in Figure I Often, a topic ontology is used as the reference to construct a user interest pro- file. An ontology is a conceptualization of a domain into a human-understandable, but machine-readable format consisting of entities, attributes, relationships, and axioms( Guarino and Giaretta 1995). For instance, in Quickstep(middleton et al 2004), the authors represent user profiles in terms of a research paper topic onto laboratory setting, representing user profiling with a research topic ontology ogy. This recommender system was built to help researchers in a computer sciend using ontological inference to assist the profiling process. Similarly, in(Liang et al, 2007)students' interests within an e-learning system are determined using a topic ontology
Intelligent User Profiling 195 2.1 Interests User interests are one of the most important (and typically the only) part of the user profile in information retrieval and filtering systems, recommender systems, some interface agents, and adaptive systems that are information-driven such as encyclopedias, museum guides, and news systems (Brusilovsky and Millán, 2007). Interests can represent news topics, web page topics, document topics, work-related topics or hobbies-related topics. Sometimes user interests are classified as short-term interests or long-term interests. The interest of users in football may be a short-term interest if the user reads or listens to news about this topic only during the World Cup, or a long-term interest if the user is always interested in this topic. For example, NewsDude (Billsus and Pazzani, 1999), an interface agent that learns about a user’s interests in daily news stories, considers information about recent events as short-term interests, and a user’s general preferences for news stories as long-term interests. The most common representation of user interests are keyword-based models. In these models interests are represented by weighted vectors of keywords. Weights traditionally represent the relevance of the word for the user or within the topic. These representations are common in the Information Filtering and Information Retrieval areas. For example Letizia (Lieberman et al, 2001a), a browsing assistant, uses TF-IDF (term frequency/inverse document frequency) vectors to model user interests. In this technique the weight of each word is calculated by comparing the word frequency in a document against the word frequency in all the documents in a corpus (Salton and McGill, 1983). This technique is also used in NewsDude (Billsus and Pazzani, 1999), where news stories are converted to TFIDF vectors. A more powerful representation of user interests is through topic hierarchies (Godoy et al, 2004). Each node in the hierarchy represents a topic of interest for a user, which is defined by a set of representative words. This representation technique is important when we want to model not only general user interests such as sports or economy, but also the sub-topics of these interests that are relevant to a given user. For example, the user profile can indicate that a certain user is interested in documents talking about a famous football player and not in sports or football in general. An example of a topic hierarchy containing a user’s interests is shown in Figure 1. Often, a topic ontology is used as the reference to construct a user interest profile. An ontology is a conceptualization of a domain into a human-understandable, but machine-readable format consisting of entities, attributes, relationships, and axioms (Guarino and Giaretta 1995). For instance, in Quickstep (Middleton et al, 2004), the authors represent user profiles in terms of a research paper topic ontology. This recommender system was built to help researchers in a computer science laboratory setting, representing user profiling with a research topic ontology and using ontological inference to assist the profiling process. Similarly, in (Liang et al, 2007) students’ interests within an e-learning system are determined using a topic ontology
ino and A. Amandi OOT User Topics ( Relevance 0.1) nampionship 0.9 imbledom 0.7 ATP ser Reading Fig. 1. Hierarchical representation of a user's interests 2.2 Knowledge, background and skills The knowledge the user has about the application domain, his background experi ence and his skills are important features within user profiles in different areas. In intelligent tutoring systems and adaptive educational systems, the students knowledge about the subject taught is vital to provide proper assistance to the student or to adapt the content of courses according to it. This knowledge can be represented in different ways. The most common representation is through a model that keeps track of the student knowledge about every element in the cours knowledge base. The idea is to mark each knowledge item X with a value calcu- lated as"student knowledge of X The value could be binary(knows-does not know), qualitative(good-average -bad)or quantitative, assigned as a probability of the student's familiarity with the item X. For instance, in Cumulate(Brusi lovsky et al, 2005), the state of a student s knowledge is represented as a weighted overlay model covering a set of topics, and each educational activity can contrib- ute to only one topic Another way of representing user's knowledge is through errors or misconcep- ions. In addition to(or instead of) modelling what the user knows, some works focus on modelling what the user does not know. For example, in( Chen and Hsieh 2005)the authors aim at diagnosing learners'common learning misconcep- tions during learning processes. They try to discover relationships between mis- conceptions Also, in many applications, the user's knowledge about the underlying domain is important. Some systems categorize users as expert, intermediate, or novice, depending on how well they know the application domain. For example, MetaDoc (Boyle and Encarnacion, 1994 )considers the knowledge users have about Unix, which is the underlying application domain in this system
196 S. Schiaffino and A. Amandi ROOT (Relevance 0.5) economy finances dollar 0.9 0.8 0.8 (Relevance 0.7) championship team player 0.9 0.8 0.7 (Relevance 0.1) politics vote president 0.8 0.9 0.7 (Relevance 0.4) tennis Wimbledom ATP 1.0 0.7 0.9 (Relevance 0.3) football world-cup FIFA 1.0 0.8 0.8 User Reading Experiences User Topics of Interest Fig. 1. Hierarchical representation of a user’s interests 2.2 Knowledge, background and Skills The knowledge the user has about the application domain, his background experience and his skills are important features within user profiles in different areas. In intelligent tutoring systems and adaptive educational systems, the student’s knowledge about the subject taught is vital to provide proper assistance to the student or to adapt the content of courses according to it. This knowledge can be represented in different ways. The most common representation is through a model that keeps track of the student knowledge about every element in the course knowledge base. The idea is to mark each knowledge item X with a value calculated as “student knowledge of X”. The value could be binary (knows - does not know), qualitative (good - average - bad) or quantitative, assigned as a probability of the student’s familiarity with the item X. For instance, in Cumulate (Brusilovsky et al, 2005), the state of a student’s knowledge is represented as a weighted overlay model covering a set of topics, and each educational activity can contribute to only one topic. Another way of representing user’s knowledge is through errors or misconceptions. In addition to (or instead of) modelling what the user knows, some works focus on modelling what the user does not know. For example, in (Chen and Hsieh 2005) the authors aim at diagnosing learners’ common learning misconceptions during learning processes. They try to discover relationships between misconceptions. Also, in many applications, the user’s knowledge about the underlying domain is important. Some systems categorize users as expert, intermediate, or novice, depending on how well they know the application domain. For example, MetaDoc (Boyle and Encarnacion, 1994) considers the knowledge users have about Unix, which is the underlying application domain in this system
ntelligent User Profiling Furthermore, user skills are key in areas like Knowledge Management. Within this area, skill management systems serve as technical platforms for mostly, though not exclusively, corporate-internal market places for skills and know-ho The systems are typically built on top of a database that contains profiles of em- ployees and applicants. In this domain, profiles consist of numerous values for different skills and may be represented as vectors. In( Sure et al, 2000)authors use he integers0”( no knowledge),l"( beginner),“2”( intermediate)and“3”(ex pert) as skill values. Examples of skills can be"Programming in Y or"Admini- stration of server x” Finally, the user's background refers to those user's characteristics that are not directly related to the application domain. For instance, if we consider a tutoring system, the user's job or profession, his work experience, his traveling experience. the languages he speaks, among other information, constitute the users back ground. As an application example, in( Cawsey et al, 2007) the authors describe an adaptive information system in the healthcare domain that considers users iteracy and medical background to provide them information that they can under- stand. The representation of users' background and skills is commonly done via stereotypes. We discuss them in Section 3. 4 2.3 Goals Goals represent the users objective or purpose with respect to the application he is working with, that is what the user wants to achieve. Goals are target tasks or subtasks at the focus of a users attention(Horvitz et al, 1998). If the user browsing the Web, his goal is obtaining relevant information( this type of goal is known as an information need). If the user is working with an e-learning system, his goal is learning a certain subject. In a calendar management system, the user's goals are scheduling new events or rescheduling conflicting events Determining what a user wants to do is not a trivial task. Plan recognition is a technique that aims at identifying the goal or intention of a user from the tasks he performs. In this context, a task corresponds to an action the user can perform in the software application, and a goal is a higher level intention of the user, which will be accomplished by carrying out a set of tasks. Systems using plan recogni- tion observe the input tasks of a user and try to find all possible plans by which the observed tasks can be explained. These possible explanations or candidate plans are narrowed as the user continues performing further tasks. Plan recognition has been applied in different areas such as intelligent tutoring( Greer and Kohenn, 95), interface agents (Lesh et al, 1999, Armentano and amandi, 2006), and collaborative planning(Huber and Durfee, 1994) Goals or intentions can be represented in different ways. Figure 2 shows a Bayesian network representation of a user's intentions in a calendar domain(Ar mentano and Amandi, 2006). In this representation, nodes represent user tasks and arcs represent probabilistic dependencies between tasks. Given evidence of a task performed by the user, the system can infer the next(most probable)task, and
Intelligent User Profiling 197 Furthermore, user skills are key in areas like Knowledge Management. Within this area, skill management systems serve as technical platforms for mostly, though not exclusively, corporate-internal market places for skills and know-how. The systems are typically built on top of a database that contains profiles of employees and applicants. In this domain, profiles consist of numerous values for different skills and may be represented as vectors. In (Sure et al, 2000) authors use the integers “0” (no knowledge), “1” (beginner), “2” (intermediate) and “3” (expert) as skill values. Examples of skills can be “Programming in Y” or “Administration of Server X”. Finally, the user’s background refers to those user’s characteristics that are not directly related to the application domain. For instance, if we consider a tutoring system, the user’s job or profession, his work experience, his traveling experience, the languages he speaks, among other information, constitute the user’s background. As an application example, in (Cawsey et al, 2007) the authors describe an adaptive information system in the healthcare domain that considers users’ literacy and medical background to provide them information that they can understand. The representation of users’ background and skills is commonly done via stereotypes. We discuss them in Section 3.4. 2.3 Goals Goals represent the user’s objective or purpose with respect to the application he is working with, that is what the user wants to achieve. Goals are target tasks or subtasks at the focus of a user’s attention (Horvitz et al, 1998). If the user is browsing the Web, his goal is obtaining relevant information (this type of goal is known as an information need). If the user is working with an e-learning system, his goal is learning a certain subject. In a calendar management system, the user’s goals are scheduling new events or rescheduling conflicting events. Determining what a user wants to do is not a trivial task. Plan recognition is a technique that aims at identifying the goal or intention of a user from the tasks he performs. In this context, a task corresponds to an action the user can perform in the software application, and a goal is a higher level intention of the user, which will be accomplished by carrying out a set of tasks. Systems using plan recognition observe the input tasks of a user and try to find all possible plans by which the observed tasks can be explained. These possible explanations or candidate plans are narrowed as the user continues performing further tasks. Plan recognition has been applied in different areas such as intelligent tutoring (Greer and Kohenn, 1995), interface agents (Lesh et al, 1999; Armentano and Amandi, 2006), and collaborative planning (Huber and Durfee, 1994). Goals or intentions can be represented in different ways. Figure 2 shows a Bayesian network representation of a user’s intentions in a calendar domain (Armentano and Amandi, 2006). In this representation, nodes represent user tasks and arcs represent probabilistic dependencies between tasks. Given evidence of a task performed by the user, the system can infer the next (most probable) task, and