rti.htel,2(2):40-55,2009 Phase II: Recommendation Process for the Active User Step 1: Choosing the Appropriate Clusters The cluster to be chosen depends upon the utility of the cluster which in turn depends on two factors viz,, density of cluster and similarity with active user profile. The probability that the cluster i is chosen for generating recommendations at time t is given by Eq. 2 P (t) pher; (t).sim, (2) ∑pher(t)sim Where. Sim, Value of the similarity fimction to measure similarity between the active user profile and l of ith cluster pher (t)= Amount of pheromone associated with ith cluster at time t The density of cluster is determined by pheromone value. During phase I, pheromone value is initialized using Eq. 1. Then, when the recommendations are generated, pheromone value is updated based on weighted rating quality of items in the clusters. The similarity function measures the similarity between the active user profile and the centroid of the cluster. There are a number of possibl measures for computing the similarity, for example the Euclidean distance metric, cosine/vector similanity and the Pearson Correlation metric, The Euclidean distance between the active user profile nd the centroid of the cluster can be given as Eq. 3 Dist(Cent, U)=ICent -U 12) Where d Dimensionality of data 1. e the mumber of attributes Cent= Centroid of the clusteri U Active user profil Cent jth attribute of the centroid profile in cluster 1 jth attribute of the active user profile Therefore, similarity measure is estimated in Eq. 4 as follows Sim, (Cent, U)= dist( Cent, U) The similarity measure of the active user profile is calculated with each cluster in order to find The amount of pheromone pher(t) available on each cluster incorporates an indirect form of ommuni cation suggesting the best cluster to be chosen by the ARS for recommen pheromone updating strategy is explained in detail in subsection further. The clusters whose probability value lies in the range interval( (highest probability-0. 1 )sprobabilityshighest probability) are chosen for generating recommendations for the active user instead of only the cluster with highest probability, This overcomes the limitation of CF based recommender systems where recommendations are provided based solely on the opinion of the users with most similar preferences and provides active
rti.htel,2(2):40-55,2009 Table 6: Normalized data of active user in the range o to 1 0.250 0.27 Table 7: Process of choosing clusters K=1 K=3 0.547 0.3635 Probability function(P) 0.1692 0.4163 0.4146 Clusters chosen K=2, 3 Table 8: Rating quality computed for jokes Jokes unrated by active user Q(=3) 06114 0.1310 user with good set of altemative recommendations. Ratings given by active user for jokes 1 to 10 (I-J10 normalized in the range 0 to l is shown in Table 5 Rating 0 indicates that the active user has not rated jokes 3, 4, 6 and 9 Normalized data of active user in the range0 to 1 shown in Table 6 profile with the centroid profile of each cluster computed probability Pi, fori=1,., K. Clusters chosen are 2 and 3, since their probability P, lies in the range(0. 4163-01)sP, s04163 Step 2: Computing the Rating Quality of Items in Each Chosen Cluster Rating quality depends on the mumber of users in the cluster who have rated the item, the individual ratings for the item in the given rating matrix and how close the rating provided by the users is, to each other. The rating quality of the item is computed by the following Eq. 5 2×UB×√vat Where Upper bound of the rating rg rating Average rating of the item in the chosen cluster Variance of the ratings given by individual users for the item in the chosen cluster As the avg rating tends to become equal to UB, (UBtavg rating (2"UB) will be close to 1 dicating that users have provided good quality rating. Standard Deviation( SD)is computed as the square root of the variance. A large SD indicates that the ratings are far from the mean and a small SD indicates that they are clustered closely around the mean The higher value of Q indicates good rating Step 3 Computing Ratings of Items Once the quality (Q) of each item which is unrated by active user is computed in the chosen clusters, clusters in whichQ for each item lies in the range interval( (highest Q-0. 1)sQshighest Q)are further selected for computing rating instead of only the cluster containing highest Q. Rating of each item is then computed from the selected clusters by computing the weighted average of the ratings using the following Eq. 6