An Organizational Coevolutionary Algorithm for Classification Licheng Jiao,Senior Member:IEEE Jing Liu'Weicai Zhong Institute of Intelligent Information Processing,Xidian University,Xi'an 710071,China Abstract-Taking inspiration from the interacting process among organizations in human societies,a new classification algorithm,organizational coevolutionary algorithm for classification (OCEC),is proposed with the intrinsic properties of classification in mind.The main difference between OCEC and the available classification approaches based on evolutionary algorithms (EAs)is its use of a bottom-up search mechanism.OCEC causes the evolution of sets of examples,and at the end of the evolutionary process, extracts rules from these sets.These sets of examples form organizations.Because organizations are different from the individuals in traditional EAs,three evolutionary operators and a selection mechanism are devised for realizing the evolutionary operations performed on organizations.This method can avoid generating meaningless rules during the evolutionary process.An evolutionary method is also devised for determining the significance of each attribute,on the basis of which,the fitness function for organizations is defined.In experiments,the effectiveness of OCEC is first evaluated by multiplexer problems.Then OCEC is compared with several well-known classification algorithms on 12 benchmarks from the UCI repository datasets and multiplexer problems.Moreover,OCEC is applied to a practical case,radar target recognition problems.All results show that OCEC achieves a higher predictive accuracy and a lower computational cost.Finally,the scalability of OCEC is studied on synthetic datasets.The number of training examples increases from 100 000 to 10 million,and the number of attributes increases from 9 to 400.The results show that OCEC obtains a good scalability. Index Terms-Data mining,classification,organization,evolutionary algorithms,coevolution. Corresponding author,e-mail:neouma@163.com,telephone:86-029-88202661 1
1 An Organizational Coevolutionary Algorithm for Classification Licheng Jiao, Senior Member, IEEE Jing Liu1 Weicai Zhong Institute of Intelligent Information Processing, Xidian University, Xi’an 710071, China AbstractTaking inspiration from the interacting process among organizations in human societies, a new classification algorithm, organizational coevolutionary algorithm for classification (OCEC), is proposed with the intrinsic properties of classification in mind. The main difference between OCEC and the available classification approaches based on evolutionary algorithms (EAs) is its use of a bottom-up search mechanism. OCEC causes the evolution of sets of examples, and at the end of the evolutionary process, extracts rules from these sets. These sets of examples form organizations. Because organizations are different from the individuals in traditional EAs, three evolutionary operators and a selection mechanism are devised for realizing the evolutionary operations performed on organizations. This method can avoid generating meaningless rules during the evolutionary process. An evolutionary method is also devised for determining the significance of each attribute, on the basis of which, the fitness function for organizations is defined. In experiments, the effectiveness of OCEC is first evaluated by multiplexer problems. Then OCEC is compared with several well-known classification algorithms on 12 benchmarks from the UCI repository datasets and multiplexer problems. Moreover, OCEC is applied to a practical case, radar target recognition problems. All results show that OCEC achieves a higher predictive accuracy and a lower computational cost. Finally, the scalability of OCEC is studied on synthetic datasets. The number of training examples increases from 100 000 to 10 million, and the number of attributes increases from 9 to 400. The results show that OCEC obtains a good scalability. Index TermsData mining, classification, organization, evolutionary algorithms, coevolution. 1 Corresponding author, e-mail: neouma@163.com, telephone: 86-029-88202661
I.INTRODUCTION Evolutionary algorithms (EAs)[1],based on an analogy to natural evolution,have recently gained increasing interest.They are suitable for solving complex or ill-defined problems and have been successfully applied to the fields of numerical optimization, combinatorial optimization,machine learning,neural networks,and many other engineering problems [2]-[7].Classification is one of the fundamental tasks of data mining [8]-[13]and can be described as follows [14].The input data,also called the training set,consist of examples,each example having several attributes.Additionally,each example is tagged with a special class name.The objective of classification is to analyze the input data and to develop an accurate description for each class using the attributes present in the data.The class descriptions are used to classify future test data for which the class names are unknown. Applications of classification include credit approval,medical diagnosis,store location,etc. This paper introduces a new evolutionary algorithm for classification. A.Related work Classification has been studied extensively and the application of EAs to this field was initiated in the 1980s [15],[16].Holland [15]and Smith [16]proposed two basic reference approaches,namely,the Michigan and Pittsburgh approaches,respectively.The Michigan approach maintains a population of individual rules which compete with each other for space and priority in the population.In contrast,the Pittsburgh approach maintains a population of variable-length rule sets which compete with each other with respect to performance on the domain task. Neither of the approaches is perfect.The Michigan approach,which converges more 2
2 I. INTRODUCTION Evolutionary algorithms (EAs) [1], based on an analogy to natural evolution, have recently gained increasing interest. They are suitable for solving complex or ill-defined problems and have been successfully applied to the fields of numerical optimization, combinatorial optimization, machine learning, neural networks, and many other engineering problems [2]-[7]. Classification is one of the fundamental tasks of data mining [8]-[13] and can be described as follows [14]. The input data, also called the training set, consist of examples, each example having several attributes. Additionally, each example is tagged with a special class name. The objective of classification is to analyze the input data and to develop an accurate description for each class using the attributes present in the data. The class descriptions are used to classify future test data for which the class names are unknown. Applications of classification include credit approval, medical diagnosis, store location, etc. This paper introduces a new evolutionary algorithm for classification. A. Related work Classification has been studied extensively and the application of EAs to this field was initiated in the 1980s [15], [16]. Holland [15] and Smith [16] proposed two basic reference approaches, namely, the Michigan and Pittsburgh approaches, respectively. The Michigan approach maintains a population of individual rules which compete with each other for space and priority in the population. In contrast, the Pittsburgh approach maintains a population of variable-length rule sets which compete with each other with respect to performance on the domain task. Neither of the approaches is perfect. The Michigan approach, which converges more
rapidly,fails to learn good solutions for complex problems,whereas the Pittsburgh approach solves more difficult problems at a relatively high computational cost.Present,it seems that both approaches have their problems and advantages.This has prompted many researchers to develop new approaches along each one or hybrid ones,by selecting good features from both approaches and by avoiding the difficulties. Choenni et al.in [17]-[20]proposed some algorithms based on the Michigan approach. They have made many improvements.They modified the individual encoding method to use nonbinary representation,and did not encode the consequents of rules into the individuals. Moreover,they used extended version of crossover and mutation operators suitable to their representations,did not allowing rules to be invoked as a result of the invocation of other rules,and defined fitness functions in terms of some measures of the classification performance. De Jong et al.in [21]proposed an algorithm based on the Pittsburgh approach,GABIL. GABIL can continually learn and refine classification rules from its interaction with the environment.By incorporating a genetic algorithm(GA)as the underlying adaptive search mechanism,GABIL is able to construct a concept learning system that has a simple,unified architecture with several important features. In order to alleviate the disadvantages of the Michigan and Pittsburgh approaches,some hybrid Michigan/Pittsburgh methodologies have been proposed,for example,COGIN [22], JoinGA [23],REGAL [24],and G-Net [25]. COGIN [22]is an inductive system based on GAs that exploits the conventions of induction from examples.Its novelty lies in the use of training set coverage to simultaneously
3 rapidly, fails to learn good solutions for complex problems, whereas the Pittsburgh approach solves more difficult problems at a relatively high computational cost. Present, it seems that both approaches have their problems and advantages. This has prompted many researchers to develop new approaches along each one or hybrid ones, by selecting good features from both approaches and by avoiding the difficulties. Choenni et al. in [17]-[20] proposed some algorithms based on the Michigan approach. They have made many improvements. They modified the individual encoding method to use nonbinary representation, and did not encode the consequents of rules into the individuals. Moreover, they used extended version of crossover and mutation operators suitable to their representations, did not allowing rules to be invoked as a result of the invocation of other rules, and defined fitness functions in terms of some measures of the classification performance. De Jong et al. in [21] proposed an algorithm based on the Pittsburgh approach, GABIL. GABIL can continually learn and refine classification rules from its interaction with the environment. By incorporating a genetic algorithm (GA) as the underlying adaptive search mechanism, GABIL is able to construct a concept learning system that has a simple, unified architecture with several important features. In order to alleviate the disadvantages of the Michigan and Pittsburgh approaches, some hybrid Michigan/Pittsburgh methodologies have been proposed, for example, COGIN [22], JoinGA [23], REGAL [24], and G-Net [25]. COGIN [22] is an inductive system based on GAs that exploits the conventions of induction from examples. Its novelty lies in the use of training set coverage to simultaneously
promote competition in various classification niches within the model and constrain overall model complexity. JoinGA [23]is a combination of the Michigan and Pittsburgh approaches together with a symbiotic niching component that can be used in multimodal classification.On the level of fixed-length individuals,JoinGA uses normal genetic operators.In order to be able to deal with multimodal concepts,JoinGA uses a new level of operators above the basic genetic level. The operators on this level put together and divide groups of individuals,building families out of single,cooperating individuals.In this way JoinGA keeps the Michigan type effective fixed-length individuals and corresponding simple crossover operations.JoinGA also avoids the problems of multiple solutions by using the Pittsburgh type families,so that the system may converge to a single highly fit family. REGAL [24],a distributed GA-based system,designed for learning first order logic concept descriptions from examples.The population constitutes a redundant set of partial concept descriptions,and each evolves separately.G-Net [25]is a descendant of REGAL,and consistently achieves a better performance.The main features of the system include robustness with respect to parameter settings,use of the minimum description length criterion coupled with a stochastic search bias,use of coevolution as a high-level control strategy,the ability to face problems requiring structured representation languages,and the suitability to parallel implementation on a network of workstations. Recently,many new approaches based on EAs for the classification task have been proposed.XCS [26],[27]acts as a reinforcement learning agent.It differs from traditional approaches in several respects [27].First,XCS has a simplified structure since it does not
4 promote competition in various classification niches within the model and constrain overall model complexity. JoinGA [23] is a combination of the Michigan and Pittsburgh approaches together with a symbiotic niching component that can be used in multimodal classification. On the level of fixed-length individuals, JoinGA uses normal genetic operators. In order to be able to deal with multimodal concepts, JoinGA uses a new level of operators above the basic genetic level. The operators on this level put together and divide groups of individuals, building families out of single, cooperating individuals. In this way JoinGA keeps the Michigan type effective fixed-length individuals and corresponding simple crossover operations. JoinGA also avoids the problems of multiple solutions by using the Pittsburgh type families, so that the system may converge to a single highly fit family. REGAL [24], a distributed GA-based system, designed for learning first order logic concept descriptions from examples. The population constitutes a redundant set of partial concept descriptions, and each evolves separately. G-Net [25] is a descendant of REGAL, and consistently achieves a better performance. The main features of the system include robustness with respect to parameter settings, use of the minimum description length criterion coupled with a stochastic search bias, use of coevolution as a high-level control strategy, the ability to face problems requiring structured representation languages, and the suitability to parallel implementation on a network of workstations. Recently, many new approaches based on EAs for the classification task have been proposed. XCS [26], [27] acts as a reinforcement learning agent. It differs from traditional approaches in several respects [27]. First, XCS has a simplified structure since it does not
have an internal message list.In addition,XCS uses a modification of Q-learning instead of the "bucket brigade"[15].Most importantly,in XCS,classifier fitness is based on the accuracy of the classifier's payoff prediction instead of the prediction itself.XCS represents a major development in learning classifier systems research,and has proved effective in many domains [28]. GEP [10]is a new approach for discovering classification rules by using genetic programming with linear representation.The antecedent of discovered rules may involve many different combinations of attributes.To guide the search process,[10]suggested a fitness function considering both the rule consistency gain and completeness.A multiclass classification problem was formulated as a combination of multiple two-class problems by using the one against all learning method.Compact rule sets were subsequently evolved using a two-phase pruning method based on the minimum description length principle. DMEL [11]handles classification problems of which the accuracy of each prediction needs to be estimated.DMEL searches through the possible rule space using an evolutionary approach that has the following characteristics:1)the evolutionary process begins with the generation of an initial set of simple,one-condition rules;2)interestingness measure is used for identifying interesting rules;3)fitness of a chromosome is defined in terms of the probability that the attribute values of a record can be correctly determined using the rules it encodes;and 4)the likelihood of predictions made is estimated so that subscribers can be ranked according to their likelihood to churn. In economics,Coase in [29]explains the sizing and formation of organizations from the framework of transaction costs.This concept was introduced to the GA-based classifiers by 5
5 have an internal message list. In addition, XCS uses a modification of Q-learning instead of the “bucket brigade” [15]. Most importantly, in XCS, classifier fitness is based on the accuracy of the classifier’s payoff prediction instead of the prediction itself. XCS represents a major development in learning classifier systems research, and has proved effective in many domains [28]. GEP [10] is a new approach for discovering classification rules by using genetic programming with linear representation. The antecedent of discovered rules may involve many different combinations of attributes. To guide the search process, [10] suggested a fitness function considering both the rule consistency gain and completeness. A multiclass classification problem was formulated as a combination of multiple two-class problems by using the one against all learning method. Compact rule sets were subsequently evolved using a two-phase pruning method based on the minimum description length principle. DMEL [11] handles classification problems of which the accuracy of each prediction needs to be estimated. DMEL searches through the possible rule space using an evolutionary approach that has the following characteristics: 1) the evolutionary process begins with the generation of an initial set of simple, one-condition rules; 2) interestingness measure is used for identifying interesting rules; 3) fitness of a chromosome is defined in terms of the probability that the attribute values of a record can be correctly determined using the rules it encodes; and 4) the likelihood of predictions made is estimated so that subscribers can be ranked according to their likelihood to churn. In economics, Coase in [29] explains the sizing and formation of organizations from the framework of transaction costs. This concept was introduced to the GA-based classifiers by