L. M. de Campos et al. group members to collaboratively specify their preferences and to reach an agree ment about an overall solution. In this case, a group profile is obtained through the nteraction of the members, taking into account the systems current recommenda tion which is obtained by aggregating individual preferences for each dimension In the article(Kudenko et al. 2003), a system is presented to help a group of users reach a joint decision based on individual user preferences CB+AR+PM: Intrigue(Ardissono et al. 2003)recommends tourist attractions for heterogeneous groups that include homogeneous subgroups where the members have similar preferences. In this system, the users record their preferences for a series of tourist attractions, and recommendations(obtained using a fuzzy AND) are then merged using a weighted scheme where each weight represents the rele vance of the corresponding subgroup(for instance, a subgroup could be particularly influential since it represents a large portion of the group) explains their recommendations, it has no means of interacting with the user CF+AR+PM: Polylens(O'Connor et al. 2001), an extension of MovieLens(Her locker et al. 2004), recommends movies to groups of users. This system uses a nearest neighbor-based algorithm to find the individuals with the most similar tastes to those of each group member and to obtain recommendations for every user. The voting preferences of these individuals are then merged according to the principle of least misery(minimum criterion). Under the same classification( Chen et al 2008)uses genetic algorithms to learn the group rating for an item that best fits the existing ratings for the item given by the individuals and the subgroups. The idea is that it is possible to learn how the user interacts from the known group ratings The proposed algorithm therefore recommends items based on the group's previous ratings for similar items. 2.1.1 The role of uncertainty As far as the authors are aware, the role of uncertainty in group recommending pro- cesses has not been considered. Nevertheless, many papers have been published which tackle this problem when recommendations are made to individual users(Zuker man and Albrecht 2001; Albrecht and Zukerman 2007). Focusing on probabilistic approaches, those relating to the one presented in this paper include content-based RSs(Mooney and Roy 2000: de Campos et al. 2005), collaborative filterin (Breese et al. 1998; Schiaffino and Amandi 2000: Butz 2002; Lekakos and 2007: Miyahara and Pazzani 2000: Heckerman et al. 2001)and hybrid methods(Pope- scu et al. 2001: de Campos et al. 2006). In terms of the group's process, the treatment of uncertainty is, however, a well known problem in other disciplines and so in this section we will review those papers which focus on the combination of probabilistic information from a purely approach(see Clemen and winkler 1999: Genest and Zidek 1986). In general, we might consider these methods as analytical models operating on the individual prob ability distributions to produce a single"combined"probability distribution. These approaches can generally be further distinguished into axiomatic approaches(by con- sidering a set of assumptions that the combination criteria might satisfy) and Bayesian approaches
212 L. M. de Campos et al. group members to collaboratively specify their preferences and to reach an agreement about an overall solution. In this case, a group profile is obtained through the interaction of the members, taking into account the system’s current recommendation which is obtained by aggregating individual preferences for each dimension. In the article (Kudenko et al. 2003), a system is presented to help a group of users reach a joint decision based on individual user preferences. – CB+AR+PM: Intrigue (Ardissono et al. 2003) recommends tourist attractions for heterogeneous groups that include homogeneous subgroups where the members have similar preferences. In this system, the users record their preferences for a series of tourist attractions, and recommendations (obtained using a fuzzy AND) are then merged using a weighted scheme where each weight represents the relevance of the corresponding subgroup (for instance, a subgroup could be particularly influential since it represents a large portion of the group). Although the system explains their recommendations, it has no means of interacting with the user. – CF+AR+PM: Polylens (O’Connor et al. 2001), an extension of MovieLens (Herlocker et al. 2004), recommends movies to groups of users. This system uses a nearest neighbor-based algorithm to find the individuals with the most similar tastes to those of each group member and to obtain recommendations for every user. The voting preferences of these individuals are then merged according to the principle of least misery (minimum criterion). Under the same classification, (Chen et al. 2008) uses genetic algorithms to learn the group rating for an item that best fits the existing ratings for the item given by the individuals and the subgroups. The idea is that it is possible to learn how the user interacts from the known group ratings. The proposed algorithm therefore recommends items based on the group’s previous ratings for similar items. 2.1.1 The role of uncertainty As far as the authors are aware, the role of uncertainty in group recommending processes has not been considered. Nevertheless, many papers have been published which tackle this problem when recommendations are made to individual users (Zukerman and Albrecht 2001; Albrecht and Zukerman 2007). Focusing on probabilistic approaches, those relating to the one presented in this paper include content-based RSs (Mooney and Roy 2000; de Campos et al. 2005), collaborative filtering RSs (Breese et al. 1998; Schiaffino and Amandi 2000; Butz 2002; Lekakos and Giaglis 2007; Miyahara and Pazzani 2000; Heckerman et al. 2001) and hybrid methods (Popescu et al. 2001; de Campos et al. 2006). In terms of the group’s process, the treatment of uncertainty is, however, a wellknown problem in other disciplines and so in this section we will review those papers which focus on the combination of probabilistic information from a purely statistical approach (see Clemen and Winkler 1999; Genest and Zidek 1986). In general, we might consider these methods as analytical models operating on the individual probability distributions to produce a single “combined” probability distribution. These approaches can generally be further distinguished into axiomatic approaches (by considering a set of assumptions that the combination criteria might satisfy) and Bayesian approaches: 123
Uncertainty in group recommending Axiomatic approach: the following common functions deal with belief aggregation: (i)Linear Opinion Pool where the group probability, Pr(G), is obtained as the weighted arithmetic average over the individual probabilities, Pr(vi),i= 1,……,n,i.e.Pr(G)=∑=1UPr(V), with wi being the weights totaling one () Logarithmic Opinion Pool(weighted geometric average)defined as Pr(G) IIs Pr(viu, with a being a normalization constant and the weights wi (called expert weights) are typically restricted to total one. If the weights are equal to 1/n, then the combined distribution is proportional to the geometric Bayesian Approach( Genest and Zidek 1986; Clemen and Winkler 1999): this has been used to combine expert information by taking into account the so-called Naive Bayes assumption. In our context, in order to obtain efficient combinations, indi- vidual opinions are assumed to be conditionally independent given the group vote 3 Modeling group decision networks The purpose of this paper is to develop a general methodology based on the Bayes ian network(BN) formalism for modeling those uncertainties that appear in both the interactions between group members and the processes leading to the final choice or recommendation. For example, let us imagine that we want to advise a group of tourists to visit a particular monument or not. In such a situation, we should assume hat the individuals in the group are unfamiliar with the monument (or item to be recommended). Each group member might speculate about their possible preference for visiting this monument and this is necessarily uncertain. Nevertheless, the group recommendations must be obtained by aggregating these preferences Individual preferences can be computed by considering two alternatives: the first considers content information(such as a description of the monument, location, etc. nd the second considers how people with similar tastes rated this monument in the ast(for instance, dislike or like). This is the approach followed in this paper where the similarity between users will be computed by considering how common items have been rated. Following the classification presented in Sect. 2, our GRS can therefore be categorized as CF+AR+PM As a collaborative approach, our model will inherit most of the disadvantages of classical collaborative filtering approaches. For example, the system cannot draw any inferences about items for which it has not yet gathered sufficient information,i.e.we also have the First-Rater problem. Similarly, we also inherit the Cold-Start problem it is difficult to recommend items to new users who have not submitted any ratings. Without any information about the user, the system is unable to guess user preferences and generate recommendations until a few items have been rated. For our information sources, we will consider a database of ratings r(which is usually extremely sparse) to store user ratings for the observed items. For example Table 2 shows the ratings given by each user Ui for an item /j using the values I dislike and 2=like(the value '-represents the fact that the user has not seen the item
Uncertainty in group recommending 213 – Axiomatic approach: the following common functions deal with belief aggregation: (i) Linear Opinion Pool where the group probability, Pr(G), is obtained as the weighted arithmetic average over the individual probabilities, Pr(Vi),i = 1,..., n, i.e. Pr(G) = n i=1 wi Pr(Vi), with wi being the weights totaling one. (ii) Logarithmic Opinion Pool (weighted geometric average) defined as Pr(G) = α n i=1 Pr(Vi)wi , with α being a normalization constant and the weights wi (called expert weights) are typically restricted to total one. If the weights are equal to 1/n, then the combined distribution is proportional to the geometric average. – Bayesian Approach (Genest and Zidek 1986; Clemen and Winkler 1999): this has been used to combine expert information by taking into account the so-called Naive Bayes assumption. In our context, in order to obtain efficient combinations, individual opinions are assumed to be conditionally independent given the group vote. 3 Modeling group decision networks The purpose of this paper is to develop a general methodology based on the Bayesian network (BN) formalism for modeling those uncertainties that appear in both the interactions between group members and the processes leading to the final choice or recommendation. For example, let us imagine that we want to advise a group of tourists to visit a particular monument or not. In such a situation, we should assume that the individuals in the group are unfamiliar with the monument (or item to be recommended). Each group member might speculate about their possible preference for visiting this monument and this is necessarily uncertain. Nevertheless, the group recommendations must be obtained by aggregating these preferences. Individual preferences can be computed by considering two alternatives: the first considers content information (such as a description of the monument, location, etc.) and the second considers how people with similar tastes rated this monument in the past (for instance, dislike or like). This is the approach followed in this paper where the similarity between users will be computed by considering how common items have been rated. Following the classification presented in Sect. 2, our GRS can therefore be categorized as CF+AR+PM. As a collaborative approach, our model will inherit most of the disadvantages of classical collaborative filtering approaches. For example, the system cannot draw any inferences about items for which it has not yet gathered sufficient information, i.e. we also have the First-Rater problem. Similarly, we also inherit the Cold-Start problem since it is difficult to recommend items to new users who have not submitted any ratings. Without any information about the user, the system is unable to guess user preferences and generate recommendations until a few items have been rated. For our information sources, we will consider a database of ratings R (which is usually extremely sparse) to store user ratings for the observed items. For example, Table 2 shows the ratings given by each user Ui for an item Ij using the values 1 = dislike and 2 = like (the value − represents the fact that the user has not seen the item). 123
214 L. M. de Campos et al. Table 2 database of user h2b456 12222 1-1-22 22211 山12 U1222 In order to achieve this objective, our aim is to build a bn where two compo- nents might be considered. The first, described in Sect. 3. 1 relates to the collaborative component of the recommender system. Both the topology of this component and the probability values will be learned from a set of past user ratings, and this will be used to compute a probability distribution representing the preferences of each group member for a given item. The second component will be used to merge these preferences in order to reach the final group opinion. This component is modeled using a Bn with a fixed structure given the group members, and the weights will be computed based on he ratings provided by the group members(see Sect. 3.2) 3. 1 BN-based collaborative component In this section, we will briefly describe this component(those readers interested in further details can consult(de Campos et al. 2008)). Our objective is to model how each user should rate an item. In order to represent relationships between users, we shall include a node, Ui, for each user in the system. We use l to denote the set of user nodes, i.e.u=(U1,., Un. The user variable Ui will therefore represent the probability distribution associated to its rating pattern. For instance, using the data in Table 2, each node will store two probability values representing the probability of U; liking(Pr(Ui= 2))or disliking(Pr(Ui= 1)an item. In order to facilitate the presence of dependence relationships between individuals in the model(to avoid a possibly complex network topology ) we propose that a new set of nodes y be included to denote collaborative ratings. There is one collaborative node for each user in the system, i.e. V=(V1, V2,..., Vn). These nodes will represent a probability distribution over ratings, and they will therefore take their values in the same domain as l 3.1.1 Learning stage Given an active user, the parent set of the variable Va in the graph, Pa(va), will be learnt from the database of votes, R. This set will contain those user variables, Ubel where Ua and Ub are most similar in taste, i. e the best neighbors for the active user Given a similarity measure, the set Pa(va)can therefore be obtained by using a thresh old or by only considering the first p variables in the ranking(see Fig. 2). It should be
214 L. M. de Campos et al. Table 2 Database of user ratings U0 U1 U2 U3 U4 U5 I1 112211 I2 1–2–22 I3 112112 I4 2–1–12 I5 221111 I6 22–222 I7 2–––12 In order to achieve this objective, our aim is to build a BN where two components might be considered. The first, described in Sect. 3.1 relates to the collaborative component of the recommender system. Both the topology of this component and the probability values will be learned from a set of past user ratings, and this will be used to compute a probability distribution representing the preferences of each group member for a given item. The second component will be used to merge these preferences in order to reach the final group opinion. This component is modeled using a BN with a fixed structure given the group members, and the weights will be computed based on the ratings provided by the group members (see Sect. 3.2). 3.1 BN-based collaborative component In this section, we will briefly describe this component (those readers interested in further details can consult (de Campos et al. 2008)). Our objective is to model how each user should rate an item. In order to represent relationships between users, we shall include a node, Ui , for each user in the system. We use U to denote the set of user nodes, i.e. U = {U1,..., Un}. The user variable Ui will therefore represent the probability distribution associated to its rating pattern. For instance, using the data in Table 2, each node will store two probability values representing the probability of Ui liking (Pr(Ui = 2)) or disliking (Pr(Ui = 1)) an item. In order to facilitate the presence of dependence relationships between individuals in the model (to avoid a possibly complex network topology), we propose that a new set of nodes V be included to denote collaborative ratings. There is one collaborative node for each user in the system, i.e. V = {V1, V2,..., Vn}. These nodes will represent a probability distribution over ratings, and they will therefore take their values in the same domain as U. 3.1.1 Learning stage Given an active user, the parent set of the variable Va in the graph, Pa(Va), will be learnt from the database of votes, R. This set will contain those user variables, Ub ∈ U, where Ua and Ub are most similar in taste, i.e. the best neighbors for the active user. Given a similarity measure, the set Pa(Va) can therefore be obtained by using a threshold or by only considering the first p variables in the ranking (see Fig. 2). It should be 123
Fig 2 Collaborative Recommending Sy sten 1)(v2)(3)(V4 noted that we do not include the links between Ui->Vi, Vi, since we are modeling a collaborative rating scheme where(assuming that the item being recommended ha not been observed by the active user)the predicted rating will only depend on those ratings given by its neighbors The similarity measure proposed in this paper is a combination of two different, but complementary, criteria: vote correlation between common items and the overlap sim(Ua, Ub)=abs (PCC(Ua, Ub))x D(Ua, Ub) The first criterion, which is normally used as the basis for calculating the weights in different collaborative systems, attempts to capture those similar users, i.e. those with the highest absolute value of pearsons correlation coefficient defined as PCC(Ua, Ub)= (ra. j-Fa)(rb, j-Fb) 7b)2 where the summations overj are over those items for which users Ua and Ub have recorded votes and Ta is the mean vote for user Ua. It should be noted that PCC ranges from +1 to-1: +l means that there is a perfect positive linear relationship between users;-I means that there is a perfect negative linear relationship: a correlation of 0 means that there is no relationship. Therefore, when there are no common items in Ua and Ub voting records, then PCC(Ua, Ub)=0 by default. In our approach, by using the absolute value of PCC, abs (PCC), we consider that both positively(those with similar ratings) and negatively correlated users(those with opposite tastes)might help- to predict an active user's final rating The second criterion tries to penalize those highly correlated neighbors which are based on very few co-rated items, which have proved to be bad predictors(Herlocker et al. 1999). We might therefore take into account the number of items that both a and Ub rated simultaneously, i. e. their overlap degree. In particular, we consider that the quality of Ub as the parent of variable Ua is directly related with the probability of a user Ua rating an item which has been also rated by Ub. This criterion can be defined by the following expression 2 For instance. if whenever Ub rates as like Ua rates with dislike, then knowing that U, had rated item with like provides information about Uas possible rating
Uncertainty in group recommending 215 Fig. 2 Collaborative Recommending System Topology U2 V0 V1 V2 V3 V4 V5 U0 U1 U3 U4 U5 noted that we do not include the links between Ui −→ Vi, ∀i, since we are modeling a collaborative rating scheme where (assuming that the item being recommended has not been observed by the active user) the predicted rating will only depend on those ratings given by its neighbors. The similarity measure proposed in this paper is a combination of two different, but complementary, criteria: vote correlation between common items and the overlap degree, i.e. sim(Ua, Ub) = abs(PCC(Ua, Ub)) × D(Ua, Ub) (1) The first criterion, which is normally used as the basis for calculating the weights in different collaborative systems, attempts to capture those similar users, i.e. those with the highest absolute value of Pearson’s correlation coefficient defined as PCC(Ua, Ub) = j(ra,j − r a)(rb,j − r b) j(ra,j − r a)2 j(rb,j − r b)2 (2) where the summations over j are over those items for which users Ua and Ub have recorded votes and r a is the mean vote for user Ua. It should be noted that PCC ranges from +1 to −1: +1 means that there is a perfect positive linear relationship between users; −1 means that there is a perfect negative linear relationship; a correlation of 0 means that there is no relationship. Therefore, when there are no common items in Ua and Ub voting records, then PCC(Ua, Ub) = 0 by default. In our approach, by using the absolute value of PCC, abs(PCC), we consider that both positively (those with similar ratings) and negatively correlated users (those with opposite tastes) might help2 to predict an active user’s final rating. The second criterion tries to penalize those highly correlated neighbors which are based on very few co-rated items, which have proved to be bad predictors (Herlocker et al. 1999). We might therefore take into account the number of items that both Ua and Ub rated simultaneously, i.e. their overlap degree. In particular, we consider that the quality of Ub as the parent of variable Ua is directly related with the probability of a user Ua rating an item which has been also rated by Ub. This criterion can be defined by the following expression: 2 For instance, if whenever Ub rates as like Ua rates with dislike, then knowing that Ub had rated an item with like provides information about Ua’s possible rating. 123
216 L. M. de Campos et al. D(Ua, Ub) JI(Ub) where I(U) is the set of items rated by user U in the data set. It should be noted that we are not considering the particular votes, merely whether the users rated an item or 3.2 Modeling the group component As mentioned previously, since groups are usually created by their members, we shall not consider how groups are formed or how they are managed. We shall therefore assume that we know the composition of the groups, and our problem is to study how this information can be represented in the Bn and also how to predict ratings for We propose to identify a group G as a new node in the BN. Since the recommen- dations are made by considering the preferences of its members, we propose that the parents(Pa(G))of the group node(G)will be the set of nodes in V representing its individuals. In this case, we are modeling that the predictions of the grou ill depend on the collaborative predictions obtained for each of its members. Figure 3 illustrates a group Ga with three members: V1, V2, and V3. We use dashed lines to epresent user-group relations since we assume that the composition of the group is known In this paper, we will focus on how different aggregation strategies can be repre sented in our BN-based model. In order to maintain generality (so that the proposed aggregation mechanisms can be applied in more general situations ) we will use the following independence assumption: given that we know the opinion(ratings)of all the group members, group opinion does not change(it is independent)if the state of any other variable in system Xi is known, i.e. I(G, XiIPa(g)), vxi Pa(Gi. It is important to remember that in certain domains this restriction might be very restric tive. For example, it might also be possible to consider other factors that would affect the group rating such as the context. Nevertheless, the study of how to include these factors in the model is beyond the scope of this paper. 3 For example, it might be used to combine multiple classifiers(Kittler et al. 1998; Abellan and Masegosa 2007)where the new cases will be classified by considering all the results obtained by each classifier
216 L. M. de Campos et al. Fig. 3 Modeling groups Ga U0 U1 U2 U3 U4 U5 V1 V2 V3 V4 V5 D(Ua, Ub) = |I(Ua) ∩ I(Ub)| |I(Ub)| . where I(U) is the set of items rated by user U in the data set. It should be noted that we are not considering the particular votes, merely whether the users rated an item or not. 3.2 Modeling the group component As mentioned previously, since groups are usually created by their members, we shall not consider how groups are formed or how they are managed. We shall therefore assume that we know the composition of the groups, and our problem is to study how this information can be represented in the BN and also how to predict ratings for groups. We propose to identify a group G as a new node in the BN. Since the recommendations are made by considering the preferences of its members, we propose that the parents (Pa(G)) of the group node (G) will be the set of nodes in V representing its individuals. In this case, we are modeling that the predictions of the group’s ratings will depend on the collaborative predictions obtained for each of its members. Figure 3 illustrates a group Ga with three members: V1, V2, and V3. We use dashed lines to represent user-group relations since we assume that the composition of the group is known. In this paper, we will focus on how different aggregation strategies can be represented in our BN-based model. In order to maintain generality (so that the proposed aggregation mechanisms can be applied in more general situations3), we will use the following independence assumption: given that we know the opinion (ratings) of all the group members, group opinion does not change (it is independent) if the state of any other variable in system Xi is known, i.e. I(G, Xi|Pa(G)), ∀Xi ∈/ Pa(Gi). It is important to remember that in certain domains this restriction might be very restrictive. For example, it might also be possible to consider other factors that would affect the group rating such as the context. Nevertheless, the study of how to include these factors in the model is beyond the scope of this paper. 3 For example, it might be used to combine multiple classifiers (Kittler et al. 1998; Abellán and Masegosa 2007) where the new cases will be classified by considering all the results obtained by each classifier. 123