Uncertainty in group recommending Table 3 Stored probability values P(U0),P(U1),P(U2).P(U3),P(U4),P(U5) P(V1|U0,U2) P(u1.l1,1) P(U111.2) 12,2) P(V2|U1,U3,U4)P(V2|l,1,1) P(n12,1,1)P(u212.1,2)P(u,112,2,1)P(u2,2,2,2) P(V3|U2,U4,Us)P(u3.11,1,1)P(u3.1,1,2)P(u3.11,2,1)P(n3.11,2,2) P(u3,12,L,1)P(u3,12.1,2)P(u3,112,2,1)P(u3,12,2,2) P(Gav1,V,v3)P(ga,11l.1,1)P(ga,l1,1,2)P(ga,11,2,1)P(ga,1,2,2) In order to complete the BN-based model, it is necessary to estimate the local probabilities that must be stored in the nodes. In particular, each node Xi has a set of onditional probability distributions, P(ilpa(Xi ))(except root nodes that store mar ginal probability distributions). 4 For each possible configuration pa(X; )of the parent set Pa(Xi), these distributions quantify the effect that the parents have on the node Xi. In our case, these probabilities are used to encode both the strength of the user-user interactions and the processes leading to the final choice or recommendation for the group. In Table 3, we show those probability distributions stored in the example of Fig 3, where for instance P(V2ll, 1, 2)represents P(V2lu1 1, u3, 1, u4.2). The method of assessing the particular values will be discussed in Sect. 4. 3.3 How to predict the group rating: inference Once the Bn is completed, it can be used to perform inference tasks. In our case, we are interested in the prediction of the group's rating for an unobserved item, 1. As evi dence, we will consider how this product was rated in the past. The problem therefore omes down to computing the conditional(posterior) probability distribution for the target group Ga given the evidence, i.e. Pr(Galev). For instance, let us assume that we want to predict the rating given by Ga in Fig. 3 to item 17. If we look at Table 2. the evidences are ev=Uo= 2, U4=1, U5 = 2) and the problem is to compute Since the Bn is a concise representation of a joint distribution, we could propagate the observed evidence through the network towards group variables. This propaga- tion implies a marginalization process(summing out over uninstantiated variables) 4 Throughout this paper we will use upper-case letters to denote variables and lower-case letters to denote the particular instantiation. More specifically, we use vi to denote a general value of variable Vi and vi, j to indicate that Vi takes the jth-value. at we consider that no member of the group has observed the ite therefore the evidences er the values taken by variables in u. In the case of a group member(let say Ui) having also previously rated 1, we shall instantiate both node Ui and Vi to the value of the given ratings. The instantiation of Vi will imply that there is no uncertainty about its rating when the information is combined at a group level. Nevertheless, the computations are more complex in this situation
Uncertainty in group recommending 217 Table 3 Stored probability values P(U0), P(U1), P(U2), P(U3), P(U4), P(U5) P(V1|U0, U2) P(v1,1|1, 1) P(v1,1|1, 2) P(v1,1|2, 1) P(v1,1|2, 2) P(V2|U1, U3, U4) P(V2|1, 1, 1) P(V2|1, 1, 2) P(V2|1, 2, 1) P(V2|1, 2, 2) P(v2,1|2, 1, 1) P(v2,1|2, 1, 2) P(v2,1|2, 2, 1) P(v2,1|2, 2, 2) P(V3|U2, U4, U5) P(v3,1|1, 1, 1) P(v3,1|1, 1, 2) P(v3,1|1, 2, 1) P(v3,1|1, 2, 2) P(v3,1|2, 1, 1) P(v3,1|2, 1, 2) P(v3,1|2, 2, 1) P(v3,1|2, 2, 2) P(Ga|V1, V2, V3) P(ga,1|1, 1, 1) P(ga,1|1, 1, 2) P(ga,1|1, 2, 1) P(ga,1|1, 2, 2) P(ga,1|2, 1, 1) P(ga,1|2, 1, 2) P(ga,1|2, 2, 1) P(ga,1|2, 2, 2) In order to complete the BN-based model, it is necessary to estimate the local probabilities that must be stored in the nodes. In particular, each node Xi has a set of conditional probability distributions, P(xi|pa(Xi)) (except root nodes that store marginal probability distributions).4 For each possible configuration pa(Xi) of the parent set Pa(Xi), these distributions quantify the effect that the parents have on the node Xi . In our case, these probabilities are used to encode both the strength of the user-user interactions and the processes leading to the final choice or recommendation for the group. In Table 3, we show those probability distributions stored in the example of Fig. 3, where for instance P(V2|1, 1, 2) represents P(V2|u1,1, u3,1, u4,2). The method of assessing the particular values will be discussed in Sect. 4. 3.3 How to predict the group rating: inference Once the BN is completed, it can be used to perform inference tasks. In our case, we are interested in the prediction of the group’s rating for an unobserved item, I. As evidence, we will consider how this product was rated in the past.5 The problem therefore comes down to computing the conditional (posterior) probability distribution for the target group Ga given the evidence, i.e. Pr(Ga|ev). For instance, let us assume that we want to predict the rating given by Ga in Fig. 3 to item I7. If we look at Table 2, the evidences are ev = {U0 = 2, U4 = 1, U5 = 2} and the problem is to compute Pr(Ga = 1| u0,2, u4,1, u5,2). Since the BN is a concise representation of a joint distribution, we could propagate the observed evidence through the network towards group variables. This propagation implies a marginalization process (summing out over uninstantiated variables). 4 Throughout this paper we will use upper-case letters to denote variables and lower-case letters to denote the particular instantiation. More specifically, we use vi to denote a general value of variable Vi and vi,j to indicate that Vi takes the jth-value. 5 It should be noted that we consider that no member of the group has observed the items beforehand and therefore the evidences are over the values taken by variables in U. In the case of a group member (let us say Ui) having also previously rated I, we shall instantiate both node Ui and Vi to the value of the given ratings. The instantiation of Vi will imply that there is no uncertainty about its rating when the information is combined at a group level. Nevertheless, the computations are more complex in this situation. 123
218 L. M. de Campos et al. A general scheme might be Pr(Ga= slev)=> Pr(Ga=svl,. Uk)Pr(ul,., Uklev) where vl, .. Uk represents a given configuration of the collaborative variables(parent set) Pa(Ga) and the sum is over the exponential number of possible configuration and this requires an exponential time O(rIPa(Ga))with r being the number of candi- date ratings. Considering that the evidence belongs tol, the joint probability over the ollaborative variables might be computed as Pr(u1,…,kl)=E∏Pr(w=n-,e)P(a) where the sum is over all the possible configurations, u, of the set of uninstantiated user variables, denoted by u, also requiring an exponential time, O() llo these computations should be performed for each group variable when there is an em to be recommended. As there is usually a large set of groups in the system, this process becomes computationally expensive and reducing computational complex ity becomes a key design parameter, especially if the objective is to obtain scalable strategies which can be implemented in real-time applications In order to tackle this problem, we propose the use of canonical models to repre sent conditional probability distributions. By means of these models, we can reduce the number of probability values stored and develop specific inference algorithms In those cases where the computation of Pr(V1,., Vklev) is complicated, we also propose to approximate these values by using extra independence assumptions(see Sect. 5). 4 Estimating the strength of the users' interactions In terms of assessing the probability values, we must distinguish between roots in the graph, nodes inu, and the remaining nodes. In particular, for every user node Uk,we need to assess the prior probability distribution over the user's rating pattern, i.e. the probability of user Uk rating with a given values, I <<r. For example, considering the relative frequency and the data in Table 2, we will obtain Pr(U3= 1)=2/4=0.5 and pr(U5=1)=2/7=0.286 For each non-root variable, we must store an exponential number of conditional probability distributions: one probability distribution for each possible configuration of its parent set. The assessment, storage, and manipulation of these probability val ues can be quite complex, especially if we consider that the number of similar users (parents of the nodes in v)and the size of the groups might be large when real group recommending applications are considered. We therefore propose the use of different canonical models to represent these conditional probabilities By using this representa- tion, it might be possible to reduce the problem of data sparsity (it is quite probable that
218 L. M. de Campos et al. A general scheme might be: Pr(Ga = s|ev) = V Pr(Ga = s|v1,...,vk )Pr(v1,...,vk |ev) (3) where v1,...,vk represents a given configuration of the collaborative variables (parent set) Pa(Ga) and the sum is over the exponential number of possible configurations, and this requires an exponential time O(r|Pa(Ga )| ) with r being the number of candidate ratings. Considering that the evidence belongs to U, the joint probability over the collaborative variables might be computed as Pr(v1,...,vk |ev) = U− k i=1 Pr(Vi = vi|u−, ev)Pr(u−) (4) where the sum is over all the possible configurations, u−, of the set of uninstantiated user variables, denoted by U−, also requiring an exponential time, O(r|U−| ). These computations should be performed for each group variable when there is an item to be recommended. As there is usually a large set of groups in the system, this process becomes computationally expensive and reducing computational complexity becomes a key design parameter, especially if the objective is to obtain scalable strategies which can be implemented in real-time applications. In order to tackle this problem, we propose the use of canonical models to represent conditional probability distributions. By means of these models, we can reduce the number of probability values stored and develop specific inference algorithms. In those cases where the computation of Pr(V1,..., Vk |ev) is complicated, we also propose to approximate these values by using extra independence assumptions (see Sect. 5). 4 Estimating the strength of the users’ interactions In terms of assessing the probability values, we must distinguish between roots in the graph, nodes in U, and the remaining nodes. In particular, for every user node Uk , we need to assess the prior probability distribution over the user’s rating pattern, i.e. the probability of user Uk rating with a given value s, 1 ≤ s ≤ r. For example, considering the relative frequency and the data in Table 2, we will obtain Pr(U3 = 1) = 2/4 = 0.5 and Pr(U5 = 1) = 2/7 = 0.286. For each non-root variable, we must store an exponential number of conditional probability distributions: one probability distribution for each possible configuration of its parent set. The assessment, storage, and manipulation of these probability values can be quite complex, especially if we consider that the number of similar users (parents of the nodes in V) and the size of the groups might be large when real group recommending applications are considered. We therefore propose the use of different canonical models to represent these conditional probabilities. By using this representation, it might be possible to reduce the problem of data sparsity (it is quite probable that 123
many configurations lack data), leading to important savings in storage(we only need to store a linear number of probability values)and more efficient inference algorithms (see Sect. 5) 4. 1 Probabilities of the collaborative component The probabilities in the collaborative component(nodes in V)might be estimated from the data set of past user ratings For a given node Vi, we must define the conditional probability distribution Pr(ui,ilpa(vi)) for each configuration pa(Vi). We propose he use of the following canonical model (studied in detail in de Campos et al. 2008) where an additive behavior of the collaborative nodes is assumed, thereby enabling the data sparsity problem to be tackled Definition 1 (Canonical weighted sun)Let Xi be a node in a bn, let Pa(Xi) be the parent set of Xi, and let Yk be the kth parent of Xi in the bn. By using a canonical weighted sum, the set of conditional probability distributions stored at node X; are then represented by means of Pr(xpa(X)=∑ Yk∈Pa(X1) where yk, I is the value that variable Yk takes in the configuration pa(Xi), and w(yk. I xi,j)are weights(effects)measuring how this /th value of variable Y describes the jth state of node Xi. The only restriction that we must impose is that the weights are a set of non-negative values verifying that w(k, I, xij) j=lYk∈Pa(X) It is interesting to note that by defining how to compute the weights w(yk, 1, xij),we can control individual bias and the relative quality (importance)of the parents for the predicting variable, X The problem now is how to estimate those weights given by similar users, i.e Ub E Pa(va). Following(de Campos et al. 2008), we consider that w(ub, t, vas)(i.e. the effect of user Ub rating with value t when it comes to predicting the rating of va) can be computed by means of N*(ub.t,Ma.s)+1/r w(ub,t, va,s)=Wb,a N*(ub, )+1 1<t,s≤F here the value N"(ub.t, va s)is the number of items from the set /()nl(ub) that having been voted with value t by user Ub have also been voted with value s by user 6 Bias refers to a user's preference for a particular vote(some users tend to rate with high values whereas others prefer to use lower ones)and ability to predict Xi judgments
Uncertainty in group recommending 219 many configurations lack data), leading to important savings in storage (we only need to store a linear number of probability values) and more efficient inference algorithms (see Sect. 5). 4.1 Probabilities of the collaborative component The probabilities in the collaborative component (nodes in V) might be estimated from the data set of past user ratings. For a given node Vi , we must define the conditional probability distribution Pr(vi,j|pa(Vi)) for each configuration pa(Vi). We propose the use of the following canonical model (studied in detail in de Campos et al. 2008) where an additive behavior of the collaborative nodes is assumed, thereby enabling the data sparsity problem to be tackled: Definition 1 (Canonical weighted sum) Let Xi be a node in a BN, let Pa(Xi) be the parent set of Xi , and let Yk be the kth parent of Xi in the BN. By using a canonical weighted sum, the set of conditional probability distributions stored at node Xi are then represented by means of Pr(xi,j|pa(Xi)) = Yk∈Pa(Xi) w(yk,l, xi,j) (5) where yk,l is the value that variable Yk takes in the configuration pa(Xi), and w(yk,l, xi,j) are weights (effects) measuring how this lth value of variable Yk describes the jth state of node Xi . The only restriction that we must impose is that the weights are a set of non-negative values verifying that r j=1 Yk∈Pa(Xi) w(yk,l, xi,j) = 1, ∀ pa(Xi) It is interesting to note that by defining how to compute the weights w(yk,l, xi,j), we can control individual bias6 and the relative quality (importance) of the parents for the predicting variable, Xi . The problem now is how to estimate those weights given by similar users, i.e. Ub ∈ Pa(Va). Following (de Campos et al. 2008), we consider that w(ub,t, va,s) (i.e. the effect of user Ub rating with value t when it comes to predicting the rating of Va) can be computed by means of w(ub,t, va,s) = wb,a N∗(ub,t, ua,s) + 1/r N∗(ub,t) + 1 , 1 ≤ t, s ≤ r. (6) where the value N∗(ub,t, va,s) is the number of items from the set I(Ua)∩ I(Ub) that having been voted with value t by user Ub have also been voted with value s by user 6 Bias refers to a user’s preference for a particular vote (some users tend to rate with high values whereas others prefer to use lower ones) and ability to predict Xi judgments. 123