INTR algorithms and metrics for different recommendation methods. The functionality 20.1 provided by the framework is basically the following Management of user models (layer 1) Memory-based recommendation (layer 2) Model-based recommendation (layer 3) 34 Social analysis of virtual communities (layer 4) Recommendation engine. The framework is implemented in Java, using a client-server architecture. We can see it as a service-provider that is a server that provides recommender system capabilities to various remote clients, namely recommender agents. The client application or agent communicates with the server part of the framework to, for example, send user rating and receive collaborative recommendations. The algorithms, techniques and data structures used to make recommendations are on the server side. The user model is on the client side, since it is usually application dependent, and it is sent to the recommendation engine(previous some transformation if it is necessary)on the server side so that it can be used to make suggestions The framework allows the creation of recommender agents from scratch as well as the integration of more complex recommender agents with the purpose of enriching their functionality. In this direction, as shown in layer l in Figure 2, the framework supports a number of standard user models that can be used to create agents without further implementation efforts. More specific, domain-dependent user models can be added by specializing the supported models and providing a means to assess their similarity with both items to be recommended as well as other models. In section 4 the integration of different types of user models is exemplified by three instantiations of the framework. The comparison of user models enables agents to add collaborative recommendation by finding a set of users that have similar characteristics or have a history of agreeing with the active user(that is, they rate items similarly). Multiple algorithms and metrics are implemented in the framework for establishing the neighborhood of users and combine the preferences of neighbors for prediction. Thus, by only specifying the mechanism of comparison of user models, already developed personal agents can take advantage of collaborative recommendations using some memory-based algorithm. This functionality is provided by layer 2. In the next framework layer we can find the model-based collaborative filtering algorithms, which provide item recommendation by first extracting a model of users. The model inference is performed by machine learning algorithms such as clustering, Bayesian networks, or rule-based approaches. The algorithms in this layer can be used in combination with the mentioned memory-based algorithms. For example, user clustering can be used to narrow the search of neighbors in a collaborative algorithm. The recommendation engine is in charge of dealing with the recommendations generated by using the different recommendation approaches or a combination of them (e.g. content-based and collaborative recommendation). This engine enables the development of agents that pro-actively recommend users interesting information generate recommendations under demand, or both. In addition, the recommendation engine collects the feedback from users, which is used to update the user models, and records the activity of users in the system
algorithms and metrics for different recommendation methods. The functionality provided by the framework is basically the following: . Management of user models (layer 1). . Memory-based recommendation (layer 2). . Model-based recommendation (layer 3). . Social analysis of virtual communities (layer 4). . Recommendation engine. The framework is implemented in Java, using a client-server architecture. We can see it as a service-provider, that is a server that provides recommender system capabilities to various remote clients, namely recommender agents. The client application or agent communicates with the server part of the framework to, for example, send user ratings and receive collaborative recommendations. The algorithms, techniques and data structures used to make recommendations are on the server side. The user model is on the client side, since it is usually application dependent, and it is sent to the recommendation engine (previous some transformation if it is necessary) on the server side so that it can be used to make suggestions. The framework allows the creation of recommender agents from scratch as well as the integration of more complex recommender agents with the purpose of enriching their functionality. In this direction, as shown in layer 1 in Figure 2, the framework supports a number of standard user models that can be used to create agents without further implementation efforts. More specific, domain-dependent user models can be added by specializing the supported models and providing a means to assess their similarity with both items to be recommended as well as other models. In section 4 the integration of different types of user models is exemplified by three instantiations of the framework. The comparison of user models enables agents to add collaborative recommendation by finding a set of users that have similar characteristics or have a history of agreeing with the active user (that is, they rate items similarly). Multiple algorithms and metrics are implemented in the framework for establishing the neighborhood of users and combine the preferences of neighbors for prediction. Thus, by only specifying the mechanism of comparison of user models, already developed personal agents can take advantage of collaborative recommendations using some memory-based algorithm. This functionality is provided by layer 2. In the next framework layer we can find the model-based collaborative filtering algorithms, which provide item recommendation by first extracting a model of users. The model inference is performed by machine learning algorithms such as clustering, Bayesian networks, or rule-based approaches. The algorithms in this layer can be used in combination with the mentioned memory-based algorithms. For example, user clustering can be used to narrow the search of neighbors in a collaborative algorithm. The recommendation engine is in charge of dealing with the recommendations generated by using the different recommendation approaches or a combination of them (e.g. content-based and collaborative recommendation). This engine enables the development of agents that pro-actively recommend users interesting information, generate recommendations under demand, or both. In addition, the recommendation engine collects the feedback from users, which is used to update the user models, and records the activity of users in the system. INTR 20,1 34
As shown in layer 4 in the figure, the knowledge about the activities of users User modeling registered by the recommendation engine can be analyzed from a social point-of-view Thus, in the last layer of the framework it is possible to find algorithms and techniques approaches for social data analysis such as those included in the Social Data Mining(amento et al 2003)and Social Network Analysis(Sabater and Sierra, 2002) areas. Furthermore, the data about user activities serve as a source for generating diverse visualizations to explore and interpret the behavior of the community. Each component of the proposed framework is detailed in the following subsections. 3. 1 Management of user models As we have said before, a user model is a representation of a user interests, habits and preferences in a given domain. The representation formalism of a user model va from one application to another. Our approach provides a number of stan representations for user models that tries to capture the approaches most widely by recommender agents We consider three main categories of user models within recommender agents content-based user models, item- based (or collaborative) user models, and demographic user models. Each type of user model has its own representation and requires a different method to compare it against other models or against items to recommend. The following sections describe the representations modeled in the framework. 3.1.1 Content-based user models. Content-based user models are built from the observation of the interaction of a user with an underlying application. depending on the domain, different representations for the user model can be found. In our framework we consider three main representation formalisms for this kind of user model: a feature vector, a classifier denoting the relation between a set of features and a set of classes or categories, and a hierarchy of classifiers. In addition, new representations for user models can be easily adde One of the most popular representations of items is describing them through their main characteristics. This representation is known as feature vector. For example, a scientific paper can be described by the authors, an abstract, the publication date, the journal or conference where it was published, a set of keywords, among others. In the case of web pages or text documents, they are represented as a set of relevant words each having a frequency value. The user model is then a vector of relevant words representing the user interests. In some domains, the items are classified or categorized according to the attributes or features describing them by using a classifier inferred from a set of examples. Thus our approach also provides the representation for those user models in which the user models holds the structure of a classifier that categorizes examples in a set of classes e.g. a decision tree). In turn, the classifiers may form a hierarchy to distinguish hierarchical classes and categories. For example, the topics of interest of a user may be organized into a hierarchy where different levels of abstraction in the user preferences can be modeled 3.1.2 ltem-based user models. The idea underlying collaborative filtering is giving recommendations of items that were interesting to other users that are similar to the user the agent is assisting. The goal is obtaining the utility or rate a user would give for an item given information of the ratings provided by similar users
As shown in layer 4 in the figure, the knowledge about the activities of users registered by the recommendation engine can be analyzed from a social point-of-view. Thus, in the last layer of the framework it is possible to find algorithms and techniques for social data analysis such as those included in the Social Data Mining (Amento et al., 2003) and Social Network Analysis (Sabater and Sierra, 2002) areas. Furthermore, the data about user activities serve as a source for generating diverse visualizations to explore and interpret the behavior of the community. Each component of the proposed framework is detailed in the following subsections. 3.1 Management of user models As we have said before, a user model is a representation of a user interests, habits and preferences in a given domain. The representation formalism of a user model varies from one application to another. Our approach provides a number of standard representations for user models that tries to capture the approaches most widely used by recommender agents. We consider three main categories of user models within recommender agents: content-based user models, item-based (or collaborative) user models, and demographic user models. Each type of user model has its own representation and requires a different method to compare it against other models or against items to recommend. The following sections describe the representations modeled in the framework. 3.1.1 Content-based user models. Content-based user models are built from the observation of the interaction of a user with an underlying application. Depending on the domain, different representations for the user model can be found. In our framework we consider three main representation formalisms for this kind of user model: a feature vector; a classifier denoting the relation between a set of features and a set of classes or categories; and a hierarchy of classifiers. In addition, new representations for user models can be easily added. One of the most popular representations of items is describing them through their main characteristics. This representation is known as feature vector. For example, a scientific paper can be described by the authors, an abstract, the publication date, the journal or conference where it was published, a set of keywords, among others. In the case of web pages or text documents, they are represented as a set of relevant words, each having a frequency value. The user model is then a vector of relevant words representing the user interests. In some domains, the items are classified or categorized according to the attributes or features describing them by using a classifier inferred from a set of examples. Thus, our approach also provides the representation for those user models in which the user models holds the structure of a classifier that categorizes examples in a set of classes (e.g. a decision tree). In turn, the classifiers may form a hierarchy to distinguish hierarchical classes and categories. For example, the topics of interest of a user may be organized into a hierarchy where different levels of abstraction in the user preferences can be modeled. 3.1.2 Item-based user models. The idea underlying collaborative filtering is giving recommendations of items that were interesting to other users that are similar to the user the agent is assisting. The goal is obtaining the utility or rate a user would give for an item given information of the ratings provided by similar users. User modeling approaches 35
Thus, the user models in collaborative filtering do not model the contents of the 20.1 items a user is interested in, namely documents, movies, or books, but the evaluation or rating the user has assigned to these items. Thus, such a user model is composed of a set of name-value pairs in which the name represents an item under consideration and the value a rating provided for the item 3.1.3 Demographic user models. Demographic data about users can be also used to information may include attributes such a sex, age, city, nationality, job, hobdhic make them recommendations of potentially interesting items. Demogra among other features that may be relevant to the application domain. a demographic user model is generally obtained from the information explicitly given by the user through a user interface provided for that purpose Figure 3 shows the different user models proposed by our approach. A recommender agent that wants to define its own user model should implement a class inheriting from one of the classes shown in Figure 3(HierarchicalUM for example). Similarly, new algorithms to build the content-based user models can be defined. Our framework provides a set of well-known Machine Learning(decision trees, naive Bayes, etc )and Information Retrieval (Rochio, tf-idf, etc )algorithms that agent developers can use. 3.2 Memory-based recommendation In order to make collaborative recommendations a subset of users out of the whole population have to be chosen based on their similarity with the active user and a weighted bination of their ratings is used to generate predictions. Neighborhood-based or user-based collaborative filtering is performed in three steps weighting all users according to their similarity with the active user, selecting a subset of these users and computing predictions based on the ratings given by the group of users. For each of these steps different algorithms and techniques are implemented in the framework In the first step, memory-based algorithms utilize either a metric of comparison for content-based user models or the entire user- item matrix to estimate a neighborhood of users that resemble the active one. This process results in a user-user matrix of similarities in which a row represents a user and columns hold the distance/similarity with the remains uerMlodel addAr inneVeu(Irate val User model I MLAgoithmirazo representations provided by the framework matchingltem)
Thus, the user models in collaborative filtering do not model the contents of the items a user is interested in, namely documents, movies, or books, but the evaluation or rating the user has assigned to these items. Thus, such a user model is composed of a set of name-value pairs in which the name represents an item under consideration and the value a rating provided for the item. 3.1.3 Demographic user models. Demographic data about users can be also used to make them recommendations of potentially interesting items. Demographic information may include attributes such a sex, age, city, nationality, job, hobbies, among other features that may be relevant to the application domain. A demographic user model is generally obtained from the information explicitly given by the user through a user interface provided for that purpose. Figure 3 shows the different user models proposed by our approach. A recommender agent that wants to define its own user model should implement a class inheriting from one of the classes shown in Figure 3 (HierarchicalUM for example). Similarly, new algorithms to build the content-based user models can be defined. Our framework provides a set of well-known Machine Learning (decision trees, naı¨ve Bayes, etc.) and Information Retrieval (Rochio, tf-idf, etc.) algorithms that agent developers can use. 3.2 Memory-based recommendation In order to make collaborative recommendations, a subset of users out of the whole population have to be chosen based on their similarity with the active user and a weighted combination of their ratings is used to generate predictions. Neighborhood-based or user-based collaborative filtering is performed in three steps: weighting all users according to their similarity with the active user, selecting a subset of these users and computing predictions based on the ratings given by the group of users. For each of these steps different algorithms and techniques are implemented in the framework. In the first step, memory-based algorithms utilize either a metric of comparison for content-based user models or the entire user-item matrix to estimate a neighborhood of users that resemble the active one. This process results in a user-user matrix of similarities in which a row represents a user and columns hold the distance/similarity with the remaining users. Figure 3. User model representations provided by the framework INTR 20,1 36
In the first case, content-based models represented as feature vectors are compared by User modeling comparing the values for the different attributes representing each user model. For vectors of normalized numerical attributes several common distance/similarity approaches functions are provided by the framework, including the euclidean distance, Manhattan distance, and cosine similarity. Thus, an agent representing the interest of a user by a single vector of keywords, such as Letizia, can be straightforwardly integrated in the framework by using the cosine similarity to compare user models and gain the ability of recommending the information discovered by other users. We can observe these components in Figure 4. More specific methods can be defined to compare more complex, specialized user models. For instance, demographic user models can be compared to another demographic user model by using the similarity functions used for feature vectors or defining a similarity measure, possibly combining numerical and nominal attributes weighted according to their importance In the second case, that is in item- based models, neighbors are identified by comparing the ratings of all users with the ratings given by the active user to items. The most common metrics are implemented in the framework for this purpose, ncluding the mean squared difference, the pearson correlation coefficient and the Spearman rank correlation. Also, significance weighting(Herlocker et al, 1999)is implemented to add a certain level of trust to neighbor correlations. Further correlations measures can be defined by extending the framework in this point. The information available in the user-user similarity matrix allows the selection of the most alike users to use their opinions for prediction. The selection of neighbors can be achieved by using correlation-thresholding(Shardanand and Maes, 1995), which selects all users whose correlation is above a certain absolute threshold, and best-n-neighbors, which select the best n correlates for a given n (herlocker et al, 1999) Once a neighborhood of users is formed, different algorithms can be used to combine the preferences of neighbors to produce a prediction or top-N recommendations for the active user. The provided methods to combine the ratings of the neighbor users are the weighted average of the ratings used in ringo I CartintBanedRucummader Modk BasosRucamaneu Memory-based and nodel-based algorithms
In the first case, content-based models represented as feature vectors are compared by comparing the values for the different attributes representing each user model. For vectors of normalized numerical attributes several common distance/similarity functions are provided by the framework, including the Euclidean distance, Manhattan distance, and cosine similarity. Thus, an agent representing the interest of a user by a single vector of keywords, such as Letizia, can be straightforwardly integrated in the framework by using the cosine similarity to compare user models and gain the ability of recommending the information discovered by other users. We can observe these components in Figure 4. More specific methods can be defined to compare more complex, specialized user models. For instance, demographic user models can be compared to another demographic user model by using the similarity functions used for feature vectors or defining a similarity measure, possibly combining numerical and nominal attributes weighted according to their importance. In the second case, that is in item-based models, neighbors are identified by comparing the ratings of all users with the ratings given by the active user to items. The most common metrics are implemented in the framework for this purpose, including the mean squared difference, the Pearson correlation coefficient and the Spearman rank correlation. Also, significance weighting (Herlocker et al., 1999) is implemented to add a certain level of trust to neighbor correlations. Further correlations measures can be defined by extending the framework in this point. The information available in the user-user similarity matrix allows the selection of the most alike users to use their opinions for prediction. The selection of neighbors can be achieved by using correlation-thresholding (Shardanand and Maes, 1995), which selects all users whose correlation is above a certain absolute threshold, and best-n-neighbors, which select the best n correlates for a given n (Herlocker et al., 1999). Once a neighborhood of users is formed, different algorithms can be used to combine the preferences of neighbors to produce a prediction or top-N recommendations for the active user. The provided methods to combine the ratings of the neighbor users are the weighted average of the ratings used in Ringo Figure 4. Memory-based and model-based algorithms User modeling approaches 37