ADOMAVICIUS AND TUZHILIN spurious correlation or is simply not relevant to the application at hand. In the next section we present methods for validating behavioral rules in user profiles 3. Validation of user profiles A common way to perform the post-analysis of data mining results is to let the domain expert perform this task, and several data mining systems support this capability. For Mine Set(Brunk et al., 1997) provides a wide range of visual ization techniques the end-user to examine visually the results discovered by its data mining tools and thus evaluate the quality of these results In our approach, individual rules discovered during the data mining stage are validated by the expert, and, depending on how well they represent the actual behaviors of the users, some rules are"accepted"and some"rejected"by the expert. Then the accepted rules form the behavioral profiles of users One of the main issues about validating individual rules of users by a human expert is salability. In many e-commerce personalization applications the number of users tends to be very large. For example, the number of registered users at major Web sites is measured in millions. If we discover a hundred rules per customer on average, then the total number of rules for such sites would be measured in hundreds of millions Therefore. it would be impossible for a human expert to validate all the discovered rules on a one-by-one basis in uch applications We address this problem by providing a framework allowing the human expert validate large numbers of rules(instead of individual rules )at a time with relatively little input from the expert. This is done by applying different rule validation operators that are described in Section 4. Then rule validation becomes an iterative process and is described in figure I In particular, the profile building activity is divided into two phases. In Phase I, the data mining phase, rules describing behaviors of individual users are generated from the users transactional data as was described in Section 2.2 Phase II constitutes the rule validation process. Rule validation, unlike rule discovery (Phase D), is not performed separately for each user, but rather for all users at once. The Validation operator t Phase l1 Figure 1. The profile building process
38 ADOMAVICIUS AND TUZHILIN spurious correlation or is simply not relevant to the application at hand. In the next section we present methods for validating behavioral rules in user profiles. 3. Validation of user profiles A common way to perform the post-analysis of data mining results is to let the domain expert perform this task, and several data mining systems support this capability. For example, MineSet (Brunk et al., 1997) provides a wide range of visualization techniques allowing the end-user to examine visually the results discovered by its data mining tools and thus evaluate the quality of these results. In our approach, individual rules discovered during the data mining stage are validated by the expert, and, depending on how well they represent the actual behaviors of the users, some rules are “accepted” and some “rejected” by the expert. Then the accepted rules form the behavioral profiles of users. One of the main issues about validating individual rules of users by a human expert is scalability. In many e-commerce personalization applications the number of users tends to be very large. For example, the number of registered users at major Web sites is measured in millions. If we discover a hundred rules per customer on average, then the total number of rules for such sites would be measured in hundreds of millions. Therefore, it would be impossible for a human expert to validate all the discovered rules on a one-by-one basis in such applications. We address this problem by providing a framework allowing the human expert validate large numbers of rules (instead of individual rules) at a time with relatively little input from the expert. This is done by applying different rule validation operators that are described in Section 4. Then rule validation becomes an iterative process and is described in figure 1. In particular, the profile building activity is divided into two phases. In Phase I, the data mining phase, rules describing behaviors of individual users are generated from the users’ transactional data as was described in Section 2.2. Phase II constitutes the rule validation process. Rule validation, unlike rule discovery (Phase I), is not performed separately for each user, but rather for all users at once. The Figure 1. The profile building process
RULE-BASED USER MODELS reason we propose performing rule validation collectively (rather than individually)for all users is that there are usually many similar or even identical rules across different users For example, the rule" when on the Netgrocer com Web sil 1.sel ALl392 usually spends more than $100 on groceries "can be common to many users. I addition, although rules"when user ALW392 comes to our Web site from site Y, she usually returns back to site y immediately, and"when user KTL158 comes to our Web site from site Z, she usually returns back to site Z immediately, are not identical, they are quite"similar and can be examined by the expert together. The collective rule validation allows one to deal with such common rules once, thus significantly reducing validation effort. Therefore in the beginning of Phase Il, rules from all the users are collected into one set. Each rule is tagged with the Id of the user to which it belongs, so that each accepted rule could be put into the profile of that user at the end of the validation phase After rules from all users are collected into one set, the rule validation process is performed as a second part of Phase Il. This process is described in figure 2. All rules discovered during Phase I(denoted by Rall in figure 2)are considered unvalidated. The human expert selects various validation operators and applies them successively to the set of unvalidated rules The application of each validation operator results in validation of some of the rules. In particular, some rules get accepted and some rejected(sets Oacc and Ore in figure 2).Then he next validation operator would be applied to the set of the remaining unvalidated rules (set Rum ). This validation process stops when the Terminate ValidationProcess condition is met. This condition is set by the human expert and is discussed later in this sectio After the validation process is stopped, the set of all the discovered rules (rall)is split into three disjoint sets: accepted rules(racc), rejected rules(rrej), and possibly some remaining unvalidated rules(Rum ). At the end of Phase Il all the accepted rules are put into the behavioral profiles of their respective users. This is possible, because all the rules have been tagged with the user id in the beginning of phase ll as described above As was already stated above and shown in figure 2, various validation operators are successively applied to the set of the unvalidated rules until the stopping criterion Terminate dationProcess is reached. The stopping criterion can be specified by the expert and may include such conditions as(a)only few rules remain unvalidated, (b)only few rules are being validated at a time by one or several validation operators, and(c)the total elapsed validation time exceeds the predetermined validation time mput: Set of all discovered rules Ral Output: Mutually disjoint sets of rules Race, Rref, Rumm such that Rall= Roce U Rre U rum 1)Rwr: Rall, Race: =o, Rref: =w (2) while(not Terminate Validation Processo)begin Expert selects a validation operator (say, O) from the set of perators (4) O is applied to Rwr. Result: disjoint sets Oace and Or Figure 2. An algorithm for the rule validation process
RULE-BASED USER MODELS 39 reason we propose performing rule validation collectively (rather than individually) for all users is that there are usually many similar or even identical rules across different users. For example, the rule “when shopping on the NetGrocer.com Web site on weekends, user ALW392 usually spends more than $100 on groceries” can be common to many users. In addition, although rules “when user ALW392 comes to our Web site from site Y, she usually returns back to site Y immediately,” and “when user KTL158 comes to our Web site from site Z, she usually returns back to site Z immediately,” are not identical, they are quite “similar” and can be examined by the expert together. The collective rule validation allows one to deal with such common rules once, thus significantly reducing validation effort. Therefore, in the beginning of Phase II, rules from all the users are collected into one set. Each rule is tagged with the ID of the user to which it belongs, so that each accepted rule could be put into the profile of that user at the end of the validation phase. After rules from all users are collected into one set, the rule validation process is performed as a second part of Phase II. This process is described in figure 2. All rules discovered during Phase I (denoted by Rall in figure 2) are considered unvalidated. The human expert selects various validation operators and applies them successively to the set of unvalidated rules. The application of each validation operator results in validation of some of the rules. In particular, some rules get accepted and some rejected (sets Oacc and Orej in figure 2). Then the next validation operator would be applied to the set of the remaining unvalidated rules (set Runv). This validation process stops when the Terminate ValidationProcess condition is met. This condition is set by the human expert and is discussed later in this section. After the validation process is stopped, the set of all the discovered rules (Rall) is split into three disjoint sets: accepted rules (Racc), rejected rules (Rrej), and possibly some remaining unvalidated rules (Runv). At the end of Phase II all the accepted rules are put into the behavioral profiles of their respective users. This is possible, because all the rules have been tagged with the user ID in the beginning of Phase II as described above. As was already stated above and shown in figure 2, various validation operators are successively applied to the set of the unvalidated rules until the stopping criterion Terminate ValidationProcessis reached. The stopping criterion can be specified by the expert and may include such conditions as (a) only few rules remain unvalidated, (b) only few rules are being validated at a time by one or several validation operators, and (c) the total elapsed validation time exceeds the predetermined validation time. Input: Set of all discovered rules Rall. Output: Mutually disjoint sets of rules Racc, Rrej, Runv, such that Rall = Racc ∪ Rrej ∪ Runv. (1) Runv := Rall, Racc := ∅, Rrej := ∅. (2) while (not TerminateValidationProcess()) begin (3) Expert selects a validation operator (say, O) from the set of available validation operators. (4) O is applied to Runv. Result: disjoint sets Oacc and Orej. (5) Runv := Runv − Oacc − Orej, Racc := Racc ∪ Oacc, Rrej := Rrej ∪ Orej. (6) end Figure 2. An algorithm for the rule validation process
ADOMAVICIUS AND TUZHILIN In this section we described the overall validation process. We present the detailed des- cription of various specific validation operators in the next section 4. Validation operators As stated in Section 3, validation operators provide a way for the domain expert to examine multiple rules at a time. This examination process can be performed in the following two ways. First, the expert may already know some types of rules that he or she wants to amine and accept or reject based on the prior experience. Therefore, it is important to provide capabilities allowing him or her to specify such types of rules in advance. In this section,we present template- and interestingness-based filtering operators that serve this purpose. Second, the expert may not know all the relevant types of rules in advance, and it is important to provide methods that group discovered rules into classes that he or she an subsequently examine and validate. In this section we also present the similarity-based ule grouping operator that serves this purpose. In addition, we describe other operators that can be used in the validation process, including visualization, statistical analysis, and browsing op Although our validation methods are general and can be applied to several forms of conjunctive rules, we focus mainly on association rules with discrete values in this paper 4. 1. Similarity-based rule grouping As pointed out in Section 3, there can be many"similar"rules among all the discovered rules, and it would be useful for the domain expert to evaluate all these similar rules together rather than individually. In order to do this, some similarity measure that would allow grouping similar rules together needs to be specified In this paper, we propose a method to specify such a similarity measure using attribute hierarchies. An attribute hierarchy is organized as a tree by the human expert in the beginning of the validation process. The leaves of the tree consist of all the attributes of the data set to which rule discovery methods were applied, i.e, all the attributes that can potentially be present in the discovered rules. The non-leaf nodes in the tree are specified by the human expert and are obtained by combining several lower-level nodes into one parent node. For instance, figure 3 presents an example of such a hierarchy, where nodes Al and A2 are combined into node a6 and nodes a3. a4 and a5 into node a7. and then nodes a6 Al A2 A3 A4 As Al A2 A3 A4 As Al 42/A3 A Al A2 A3 A4 A5 Figure 3. An example of an attribute hierarchy for similarity-based grouping
40 ADOMAVICIUS AND TUZHILIN In this section we described the overall validation process. We present the detailed description of various specific validation operators in the next section. 4. Validation operators As stated in Section 3, validation operators provide a way for the domain expert to examine multiple rules at a time. This examination process can be performed in the following two ways. First, the expert may already know some types of rules that he or she wants to examine and accept or reject based on the prior experience. Therefore, it is important to provide capabilities allowing him or her to specify such types of rules in advance. In this section, we present template- and interestingness-based filtering operators that serve this purpose. Second, the expert may not know all the relevant types of rules in advance, and it is important to provide methods that group discovered rules into classes that he or she can subsequently examine and validate. In this section we also present the similarity-based rule grouping operator that serves this purpose. In addition, we describe other operators that can be used in the validation process, including visualization, statistical analysis, and browsing operators. Although our validation methods are general and can be applied to several forms of conjunctive rules, we focus mainly on association rules with discrete values in this paper. 4.1. Similarity-based rule grouping As pointed out in Section 3, there can be many “similar” rules among all the discovered rules, and it would be useful for the domain expert to evaluate all these similar rules together rather than individually. In order to do this, some similarity measure that would allow grouping similar rules together needs to be specified. In this paper, we propose a method to specify such a similarity measure using attribute hierarchies. An attribute hierarchy is organized as a tree by the human expert in the beginning of the validation process.1 The leaves of the tree consist of all the attributes of the data set to which rule discovery methods were applied, i.e., all the attributes that can potentially be present in the discovered rules. The non-leaf nodes in the tree are specified by the human expert and are obtained by combining several lower-level nodes into one parent node. For instance, figure 3 presents an example of such a hierarchy, where nodes A1 and A2 are combined into node A6 and nodes A3, A4 and A5 into node A7, and then nodes A6 Figure 3. An example of an attribute hierarchy for similarity-based grouping
RULE-BASED USER MODELS and A7 are combined into node A8. Another example of an attribute hierarchy is presented in figure 9. We call non-leaf nodes of an attribute hierarchy aggregated attributes The attribute hierarchy is used for determining similar rules and grouping them together More specifically, the semantics of the similarity-based grouping operator is defined as follows 1. Specifying rule aggregation level. Rules are grouped by specifying the level of rule aggregation in the attribute hierarchy which is provided by the human expert. Such a specification is called a cut, and it forms a subset of all the nodes of the tree(leaf and non-leaf), such that for every path from a leaf node to the root, exactly one node on such path belongs to this subset. Therefore, given a cut, every leaf node has its corresponding cut node. Given a cut C, we define for any leaf node Xi its corresponding cut node cutc(i) as follows cutc(Xi= cutc(parent(Xi)), otherwise Figure 3 presents several different cuts of an attribute hierarchy that are represented by shaded regions. For example, for the cut from figure 3(c), cut3c(A2)= A2 and cut3c(A3)= A7. Moreover, the cut node of any leaf node can be calculated in constant time by implementing a straightforward lookup table for that cut 2. Aggregating rules. Given a cut C, a rule x1A…∧k→Xk+1A…^ X, is aggregated by performing the following syntactic transformation cltC(1A…∧X→X+1A…A cutc(1)A…Actc(Xk)→culc(Xk+1)A… A cuIc(Y where cutc(Xi) maps each leaf node of the attribute hierarchy into its corresponding cut node as described in Step l above. The resulting rule is called an aggregated rule Since several different leaf nodes can have the same cut node, sometimes after aggre gating a rule we can get multiple instances of the same aggregated attribute in the body or in the head of the rule. In this case we simply eliminate those extra instances of an attribute. Consider, for example, the rule A2AA3A A4= A5. By applying cut(c)from fgue3 to this rule, we will get the aggregated rule a2∧47∧7→A7, and by re- moving duplicate terms A7 in the body of the rule we finally get A2 A A7= A7.2 Given a cut, the computational complexity of a single rule aggregation is linearly proportional to the size of the rule (i.e, total number of attributes in the rule), as will be 3. Grouping rules. Given a cut C, we can group a set of rules S into groups by applying C to every rule in S as described in Step 2 above. When a cut is applied to a set of rules different rules can be mapped into the same aggregated rule. For example, consider rules a2∧A3∧A4→A5andA2∧A5→A3. After applying cut(c) from figure3to both of them, they are mapped into the same rule A2 A A7= A7. More generally, we can group a set of rules based on the cut C as follows. Two rules R, and R2 belong to
RULE-BASED USER MODELS 41 and A7 are combined into node A8. Another example of an attribute hierarchy is presented in figure 9. We call non-leaf nodes of an attribute hierarchy aggregated attributes. The attribute hierarchy is used for determining similar rules and grouping them together. More specifically, the semantics of the similarity-based grouping operator is defined as follows. 1. Specifying rule aggregation level. Rules are grouped by specifying the level of rule aggregation in the attribute hierarchy which is provided by the human expert. Such a specification is called a cut, and it forms a subset of all the nodes of the tree (leaf and non-leaf), such that for every path from a leaf node to the root, exactly one node on such path belongs to this subset. Therefore, given a cut, every leaf node has its corresponding cut node. Given a cut C, we define for any leaf node Xi its corresponding cut node cutC(Xi) as follows: cutC(Xi) = ( Xi, if Xi ∈ C cutC(parent(Xi)), otherwise Figure 3 presents several different cuts of an attribute hierarchy that are represented by shaded regions. For example, for the cut from figure 3(c), cut3c(A2) = A2 and cut3c(A3) = A7. Moreover, the cut node of any leaf node can be calculated in constant time by implementing a straightforward lookup table for that cut. 2. Aggregating rules. Given a cut C, a rule X1 ∧···∧ Xk ⇒ Xk+1 ∧···∧ Xl is aggregated by performing the following syntactic transformation: cutC(X1 ∧···∧ Xk ⇒ Xk + 1 ∧···∧ Xl) = cutC(X1) ∧···∧ cutC(Xk) ⇒ cutC(Xk + 1) ∧···∧ cutC(Xl) where cutC(Xi) maps each leaf node of the attribute hierarchy into its corresponding cut node as described in Step 1 above. The resulting rule is called an aggregated rule. Since several different leaf nodes can have the same cut node, sometimes after aggregating a rule we can get multiple instances of the same aggregated attribute in the body or in the head of the rule. In this case we simply eliminate those extra instances of an attribute. Consider, for example, the rule A2∧ A3∧ A4 ⇒ A5. By applying cut (c) from figure 3 to this rule, we will get the aggregated rule A2 ∧ A7 ∧ A7 ⇒ A7, and by removing duplicate terms A7 in the body of the rule we finally get A2 ∧ A7 ⇒ A7.2 Given a cut, the computational complexity of a single rule aggregation is linearly proportional to the size of the rule (i.e., total number of attributes in the rule), as will be described later. 3. Grouping rules. Given a cut C, we can group a set of rules S into groups by applying C to every rule in S as described in Step 2 above. When a cut is applied to a set of rules, different rules can be mapped into the same aggregated rule. For example, consider rules A2 ∧ A3 ∧ A4 ⇒ A5 and A2 ∧ A5 ⇒ A3. After applying cut (c) from figure 3 to both of them, they are mapped into the same rule A2 ∧ A7 ⇒ A7. More generally, we can group a set of rules based on the cut C as follows. Two rules R1 and R2 belong to