SHAHABI AND CHEN analyzing his/her navigation behaviors. To accomplish this goal, Yoda performs a two-step process. During the offline process, Yoda clusters the collected Web usage data from the browser side and generates the corresponding cluster recommendation list(termed experts wish-lists)and the centroid for each navigation-pattern cluster Moreover, Yoda also main- tains other experts' wish-lists obtained from different experts, such as human experts, and the cluster representatives of user ratings Later, during the on-line recommendation process, the system first acquires the initial confidence values by softly classifying'the active user into navigation-pattern clusters Note that the confidence values between the user and other experts are not estimated at this step because the estimation process is comparably time consuming. Subsequently, Yoda uses these confidence values to generate the customized recommendation(termed user wish list) by weighted aggregation of the experts'wish-lists. After the user receives the user wish-list and continue to navigate the website. Yoda will correct the confidence values in a background process by utilizing the follow-up user navigation behaviors and the GA-based 4. System design In this section, we provide a detailed description of Yoda's components. Since phase I is based on our previous work [25], here we elaborate more on phase II and Ill of Yoda. The phase Ill is described in Section 5 4.1. Phase 1-Obtaining user perception Yoda uses the client-side tracking mechanism proposed in [27] to capture view-time, hit count, and sequence of visiting the web-pages (items) within a web-site. These features reflect users'interests on items. To analyze these features and infer the user interests, Yoda employs the Feature Matrices(FM)model, which we introduced in [25]. FMis aset of hyper- cube data structures that can represent various aggregated access features with any required precision. With FM, the patterns of both a single user and a cluster of users are modeled. Here, Yoda uses FM to model the navigation patterns of the active users individually, and then the aggregated navigation pattern of each cluster is generated by clustering a collection of user navigation behaviors. Yoda also applies a similarity measure, termed Projected Pure Euclidean Distance(PPED)[25], to evaluate the similarity of a user navigation to a cluster navigation pattern Thus, Yoda can quantify the confidence value of a user to each navigation-pattern cluster However, because PPED can only apply to the navigation pattern behaviors modeled by FM model, Yoda cannot acquire the confidence values of a user's interests to the recor lists of other experts at this step 4.2. Phase ll-Ranking the items Two types of work in Yoda involve ranking the items. The first type is generating the experts recommendations, which are lists of ranked items produced by either human experts, clusters
178 SHAHABI AND CHEN analyzing his/her navigation behaviors. To accomplish this goal, Yoda performs a two-step process. During the offline process, Yoda clusters the collected Web usage data from the browser side and generates the corresponding cluster recommendation list (termed experts’ wish-lists) and the centroid for each navigation-pattern cluster. Moreover, Yoda also maintains other experts’ wish-lists obtained from different experts, such as human experts, and the cluster representatives of user ratings. Later, during the on-line recommendation process, the system first acquires the initial confidence values by softly classifying1 the active user into navigation-pattern clusters. Note that the confidence values between the user and other experts are not estimated at this step because the estimation process is comparably time consuming. Subsequently, Yoda uses these confidence values to generate the customized recommendation (termed user wishlist) by weighted aggregation of the experts’ wish-lists. After the user receives the user wish-list and continue to navigate the website, Yoda will correct the confidence values in a background process by utilizing the follow-up user navigation behaviors and the GA-based learning mechanism. 4. System design In this section, we provide a detailed description of Yoda’s components. Since phase I is based on our previous work [25], here we elaborate more on phase II and III of Yoda. The phase III is described in Section 5. 4.1. Phase I—Obtaining user perception Yoda uses the client-side tracking mechanism proposed in [27] to capture view-time, hitcount, and sequence of visiting the web-pages (items) within a web-site. These features reflect users’ interests on items. To analyze these features and infer the user interests, Yoda employs theFeature Matrices(FM) model, which we introduced in [25]. FM is a set of hypercube data structures that can represent various aggregated access features with any required precision. With FM, the patterns of both a single user and a cluster of users are modeled. Here, Yoda uses FM to model the navigation patterns of the active users individually, and then the aggregated navigation pattern of each cluster is generated by clustering a collection of user navigation behaviors. Yoda also applies a similarity measure, termed Projected Pure Euclidean Distance (PPED) [25], to evaluate the similarity of a user navigation to a cluster navigation pattern. Thus, Yoda can quantify the confidence value of a user to each navigation-pattern cluster. However, because PPED can only apply to the navigation pattern behaviors modeled by FM model, Yoda cannot acquire the confidence values of a user’s interests to the recommendation lists of other experts at this step. 4.2. Phase II—Ranking the items Two types of work in Yoda involve ranking the items. The first type is generating the experts’ recommendations, which are lists of ranked items produced by either human experts, clusters
AN ADAPTIVE RECOMMENDATION SYSTEM 179 of users, or clusters of navigation patterns In our previous work [24], a content analysis technique was proposed to abstract common interests from navigation patterns. With this technique, the system can generate a list of ranked items for each navigation-pattern cluster. However, for the sake of simplicity, we briefly describe this technique and only focus on another type of ranking work--generating the user wish-list online. In order to properly describe this method, we first formally define some necessary terms Definition 4.1. An item i is an instance of product, service, etc. that is presented in a reb-site. Items are described by their properties, which are abstract perceptual features, and the corresponding degrees of the properties. ((p, pi)I p is a property E P, Pi is the degree of i to P l or example, for a music CD as an item, "styles"of the music, "ratings", and"popularity can be considered as properties of the item. Since most of properties are perceptual and asking for precise values to rate these percep- tual properties is inappropriate, we use limited fuzzy-sets(f E F) to evaluate proprieties. nis design, we can also ease the difficulties of the content analysis in the later process Definition 4.2. A wish-list, Ir, for user/expert x is defined I =l(i, v(i)I i is an item, U(i)E[0, 11) where the preference value U(i) measures the probability of item i be of interest to user/expert x Definition 43. A cluster browse-list, Bk, for navigation-pattern cluster k is a list of items visited by all users in this cluster. Definition 4. 4. A confidence value I for a user u to an expert e is formally defined as: T: u,eeE- bu.e, where E denotes a set of experts in the system and U represents the set of users who have assigned reference confidence values to experts. Note that the value of bu, e is represented as a fuzzy term for two reasons. First, using fuzzy terms to describe forms of human judgment is more appropriate. Second, by the limited set of fuzzy terms, the online computation process can be expedited 4.2.1. Generating navigation-Pattern cluster wish-lists. Yoda represents the aggregated interests of the users in each cluster by a set of property values(Pvs), termed favorite PVs of the cluster. These favorite PVs indicate the emphasized degree of the properties by the majority of users in this cluster. In order to extract these values from the navigation data in this cluster, we design a voting procedure defined as: Definition 4.5. The favorite PV, Fp(k), identifies likelihood of the cluster k being inter ested in property p of the items and is extracted from the cluster browse-list Bk through
AN ADAPTIVE RECOMMENDATION SYSTEM 179 of users, or clusters of navigation patterns. In our previous work [24], a content analysis technique was proposed to abstract common interests from navigation patterns. With this technique, the system can generate a list of ranked items for each navigation-pattern cluster. However, for the sake of simplicity, we briefly describe this technique and only focus on another type of ranking work—generating the user wish-list online. In order to properly describe this method, we first formally define some necessary terms. Definition 4.1. An item i is an instance of product, service, etc. that is presented in a web-site. Items are described by their properties, which are abstract perceptual features, and the corresponding degrees of the properties. i = {(p, p˜i) | p is a property ∈ P, p˜i is the degree of i to p} ∈ I (1) For example, for a music CD as an item, “styles” of the music, “ratings”, and “popularity” can be considered as properties of the item. Since most of properties are perceptual and asking for precise values to rate these perceptual properties is inappropriate, we use limited fuzzy-sets ( f ∈ F) to evaluate proprieties. By this design, we can also ease the difficulties of the content analysis in the later process. Definition 4.2. A wish-list, Ix , for user/expert x is defined as: Ix = {(i, vx (i)) | i is an item, vx (i) ∈ [0, 1]} (2) where the preference value vx (i) measures the probability of item i be of interest to user/expert x. Definition 4.3. A cluster browse-list, Bk , for navigation-pattern cluster k is a list of items visited by all users in this cluster. Definition 4.4. A confidence value π for a user u to an expert e is formally defined as: π : u, e ∈ E → bu,e, where E denotes a set of experts in the system and U represents the set of users who have assigned reference confidence values to experts. Note that the value of bu,e is represented as a fuzzy term for two reasons. First, using fuzzy terms to describe forms of human judgment is more appropriate. Second, by the limited set of fuzzy terms, the online computation process can be expedited. 4.2.1. Generating navigation-pattern cluster wish-lists. Yoda represents the aggregated interests of the users in each cluster by a set of property values (PVs), termed favorite PVs of the cluster. These favorite PVs indicate the emphasized degree of the properties by the majority of users in this cluster. In order to extract these values from the navigation data in this cluster, we design a voting procedure defined as: Definition 4.5. The favorite PV, Fp(k), identifies likelihood of the cluster k being interested in property p of the items and is extracted from the cluster browse-list Bk through