24 Chapter Two-Computational Intelligence Environment "Database" Dataset of Input Vectors Input Vectors Adaptive System Unsupervised Adaptation Algorithm Figure 2.3 Unsupervised adaptation example.An arrow going through the adaptive system box indicates the ability to adjust the parameters of the system. Examples of unsupervised adaptation are two types of neural network we discuss in this book,self-organizing feature maps and learning vector quantization neural networks,which we examine in Chapter 6,Neural Network Implementa- tions.When a set of patterns is presented to either of these types of network,the adaptation algorithm clusters patterns that are similar,perhaps subject to some constraints.With the proper algorithm and constraints,the output distribution will accurately represent the probability distribution of the input patterns,but there is no hint of a"teacher"telling the network what the answer is pattern by pattern,or even a "critic"giving the network qualitative fitness hints. Summary In summary,what are the differences,and the implications of these differences, among the three types of adaptation?Our thoughts on this comprise a thread that runs through the book.For now,we confine our comments to a few relatively straightforward observations. What does it mean to use a“teacher,”a“critic,,”ora“dataset"?A teacher has detailed input/output information,which consists of a number of specific exam- ples.Typically,the more of these examples that are available,the better a system will be able to adapt to emulate the structure underlying them.This is not always true, of course.For instance,it is impossible to build a multiclass classifier if all of your
Chapter Two--Computational Intelligence Environment "" Database "" Dataset of Input Vectors Input Vectors Adaptive System / Unsupervised Adaptation Algorithm Figure 2.3 Unsupervised adaptation example. An arrow going through the adaptive system box indicates the ability to adjust the parameters of the system. Examples of unsupervised adaptation are two types of neural network we discuss in this book, self-organizing feature maps and learning vector quantization neural networks, which we examine in Chapter 6, Neural Network Implementations. When a set of patterns is presented to either of these types of network, the adaptation algorithm clusters patterns that are similar, perhaps subject to some constraints. With the proper algorithm and constraints, the output distribution will accurately represent the probability distribution of the input patterns, but there is no hint of a "teacher" telling the network what the answer is pattern by pattern, or even a "critic" giving the network qualitative fitness hints. Summary In summary, what are the differences, and the implications of these differences, among the three types of adaptation? Our thoughts on this comprise a thread that runs through the book. For now, we confine our comments to a few relatively straightforward observations. What does it mean to use a "teacher," a "critic," or a "dataset"? A teacher has detailed input/output information, which consists of a number of specific examples. Typically, the more of these examples that are available, the better a system will be able to adapt to emulate the structure underlying them. This is not always true, of course. For instance, it is impossible to build a multiclass classifier if all of your
Adaptation 25 examples are from one class.(A multiclass classifier specifies which of several output classes represents an input pattern best.For example,a medical diagnostic classifier decides which disease in its inventory best represents a given a set of medical symp- toms comprising an input pattern.)So the distribution of the input/output patterns over the problem space is important. A critic has some notion that one solution is qualitatively better than another,but can't calculate a fitness metric specific to the problem.Furthermore,a critic doesn't inherently know where an optimum is,or even if there is one;a teacher may know the optimum location of a solution in the problem space. The dataset is just that:a dataset.There is no fitness information,qualitative or quantitative,within it. Does that make one kind of adaptation,say supervised,better than another,say unsupervised?We believe that one kind can be better than another only when con- sidered from the perspective of a specific application.If all we have is a dataset with no fitness information,then we will use unsupervised adaptation to find features, or clusters,in the data.We can then apply other analytic techniques to these clus- ters or features.Even if we have output information with our input vectors,we may use unsupervised adaptation to find new ways to look at the data or as a sort of preprocessing step to reduce the problem's dimensionality to facilitate a supervised adaptation application. Now that we've looked at the three main types ofadaptation,we look at the spaces in which these adaptation methods operate. Three Spaces of Adaptation No matter which type of adaptation is implemented,we typically refer to three kinds of space when we work with adaptive systems.We call them input parameter space, system output space,and fitness space.As there is no standard terminology,however, other authors call our input parameter space problem space,and our system output space function space. The input parameter space is defined by the dynamic ranges of the input variables. In general,these dynamic ranges are specified.However,sometimes all we have to work with are example patterns,and we may not have a valid basis for constraining the input parameters to the ranges represented by the example vectors. The system output space is defined by the dynamic range(s)of the output vari- able(s).It is not unusual for the output dynamic ranges to be specified as either a hard or a soft constraint.(A hard constraint is one that cannot be violated; a soft constraint can be violated,but a penalty is applied to the system perfor- mance measure.)We prefer to name this space system"output"rather than "func- tion"since it is common not to know what function,if any,is represented by the data.Often,we aren't interested in finding the function,at least not as our first objective
Adaptation ~ ~ 2~ examples are from one class. (A multiclass classifier specifies which of several output classes represents an input pattern best. For example, a medical diagnostic classifier decides which disease in its inventory best represents a given a set of medical symptoms comprising an input pattern.) So the distribution of the input/output patterns over the problem space is important. A critic has some notion that one solution is qualitatively better than another, but can't calculate a fitness metric specific to the problem. Furthermore, a critic doesn't inherently know where an optimum is, or even if there is one; a teacher may know the optimum location of a solution in the problem space. The dataset is just that: a dataset. There is no fitness information, qualitative or quantitative, within it. Does that make one kind of adaptation, say supervised, better than another, say unsupervised? We believe that one kind can be better than another only when considered from the perspective of a specific application. If all we have is a dataset with no fitness information, then we will use unsupervised adaptation to find features, or clusters, in the data. We can then apply other analytic techniques to these clusters or features. Even if we have output information with our input vectors, we may use unsupervised adaptation to find new ways to look at the data or as a sort of preprocessing step to reduce the problem's dimensionality to facilitate a supervised adaptation application. Now that we've looked at the three main types of adaptation, we look at the spaces in which these adaptation methods operate. Three Spaces of Adaptation No matter which type of adaptation is implemented, we typically refer to three kinds of space when we work with adaptive systems. We call them input parameter space, system output space, and fitness space. As there is no standard terminology, however, other authors call our input parameter space problem space, and our system output space function space. The inputparameter space is defined bythe dynamic ranges ofthe input variables. In general, these dynamic ranges are specified. However, sometimes all we have to work with are example patterns, and we may not have a valid basis for constraining the input parameters to the ranges represented by the example vectors. The system output space is defined by the dynamic range(s) of the output variable(s). It is not unusual for the output dynamic ranges to be specified as either a hard or a soft constraint. (A hard constraint is one that cannot be violated; a soft constraint can be violated, but a penalty is applied to the system performance measure.) We prefer to name this space system "output" rather than "function" since it is common not to know what function, if any, is represented by the data. Often, we aren't interested in finding the function, at least not as our first objective
26 Chapter Two-Computational Intelligence The fitness space is the space we use to define the "goodness"of the solutions (in the output space)generated by the adaptive system.It is common practice to scale the fitness to values between 0 and 1,with the optimal value being 0 or 1 depending on whether the goal is to minimize or maximize the fitness value. Sometimes the fitness space and the system output space are the same.A sim- ple example of this is maximizing the function sin(x/256)for integer values of x between 0 and 255(the input parameter space).This is the example we use in Chapter 3 to illustrate the step-by-step process of a genetic algorithm.In this case, the output values vary between 0 and 1,and the maximum fitness value of 1 occurs at an input value of 128. In general,however,the system output and fitness values do not coincide.Con- sider another simple example of minimizinggiven a dynamic range for of [-10,101.In this case,the system output space is [0,300].We often trans- form the output space to a better representation for the purposes of calculating fitness,frequently in the range of [0,1].One possible simple fitness function is just 1/(abs(output)),which ranges from 1/300(fairly close to 0)to 1.0 for a perfect answer. Always keep these three spaces of adaptation in mind.And always know which one you are dealing with! Now that you have some understanding of the concept of adaptation,with its three main types and three spaces,we'll discuss another concept central to compu- tational intelligence:self-organization. Self-organization and Evolution Although self-organization's inclusion as a key concept in computational intelli- gence is,for the authors,relatively recent,the term self-organization was apparently used for the first time in the literature relevant to computational intelligence by W.Ross Ashby (Ashby 1945,1947).He first used the term"self-organization"in his 1947 paper,but he was writing about the same concept in 1945.He cited the ner- vous system as an example of self-organization.He wrote that the nervous system, when in contact with a new environment,tends to develop an internal organization that leads to behavior that is adapted to that environment.(Note the reference to adaptation!) Ashby maintained that self-organization has two methods of implementation (Dyson 1997).The first is illustrated by a system that starts with its parts separate (so that the behavior of each is independent of the others'states)and whose parts then act so that they change in order to form connections.An example of the sec- ond is where a system's interconnected components become organized in a produc- tive or meaningful way.An example is an infant's brain,where self-organization is
Chapter Two---Computational Intelligence The fitness space is the space we use to define the "goodness" of the solutions (in the output space) generated by the adaptive system. It is common practice to scale the fitness to values between 0 and 1, with the optimal value being 0 or 1 depending on whether the goal is to minimize or maximize the fitness value. Sometimes the fitness space and the system output space are the same. A simple example of this is maximizing the function sin(~rx/256) for integer values of x between 0 and 255 (the input parameter space). This is the example we use in Chapter 3 to illustrate the step-by-step process of a genetic algorithm. In this case, the output values vary between 0 and 1, and the maximum fitness value of I occurs at an input value of 128. In general, however, the system output and fitness values do not coincide. Con- 3 sider another simple example of minimizing ~ ~ given a dynamic range for xi i=1 of [-10, 10]. In this case, the system output space is [0, 300]. We often transform the output space to a better representation for the purposes of calculating fitness, frequently in the range of [0, 1]. One possible simple fitness function is just 1/(abs(output)), which ranges from 1/300 (fairly close to 0) to 1.0 for a perfect answer. Always keep these three spaces of adaptation in mind. And always know which one you are dealing with! Now that you have some understanding of the concept of adaptation, with its three main types and three spaces, we'll discuss another concept central to computational intelligence: self-organization. Self-organization and Evolution Although self-organization's inclusion as a key concept in computational intelligence is, for the authors, relatively recent, the term self-organization was apparently used for the first time in the literature relevant to computational intelligence by W. Ross Ashby (Ashby 1945, 1947). He first used the term "self-organization" in his 1947 paper, but he was writing about the same concept in 1945. He cited the nervous system as an example of self-organization. He wrote that the nervous system, when in contact with a new environment, tends to develop an internal organization that leads to behavior that is adapted to that environment. (Note the reference to adaptation!) Ashby maintained that self-organization has two methods of implementation (Dyson 1997). The first is illustrated by a system that starts with its parts separate (so that the behavior of each is independent of the others' states) and whose parts then act so that they change in order to form connections. An example of the second is where a system's interconnected components become organized in a productive or meaningful way. An example is an infant's brain, where self-organization is
Self-organization and Evolution achieved less by the growth of new connections and more by allowing meaningless connections to die out. Farley was an early contributor to the investigation of self-organizing systems. In Farley and Clark(1954),the subject is the simulation of self-organizing systems by digital computer.In Farley(1960),he said that self-organizing systems"auto- matically organize themselves to classify environmental inputs into recognizable percepts or patterns,and thatthis self-organizing ability is called learned per- ception."Kleyn(1963),another early contributor,wrote:"A system is said to be self-organizing if,after observing the input and output of an unknown phenomenon (transfer relation),the system organizes itself into a simulation of the unknown phenomenon.” Today,there are almost as many ways to define self-organization as there are writ- ers on the subject,but summaries of attributes and descriptions of self-organization often include the following points(Kennedy,Eberhart,and Shi 2001): Self-organizing systems usually exhibit what appears to be spontaneous order. Self-organization can be viewed as a system's incessant attempts to organize itself into ever more complex structures,even in the face of the incessant forces of dissolution described by the second law of thermodynamics The overall system state of a self-organizing system is an emergent property of the system. Interconnected system components become organized in a productive or meaningful way based on local information;global dynamics emerge from local rules. ■ Complex systems can self-organize. The self-organization process works near the"edge of chaos." Bonabeau et al.(1999)define self-organization as "a set of dynamical mecha- nisms whereby structures appear at the global level of a system from interactions among its lower-level components.The rules specifying the interactions among the system's constituent units are executed on the basis of purely local information, without reference to the global pattern,which is an emergent property of the system rather than a property imposed on the system by an external ordering influence." This definition illustrates the close ties between self-organization and the emergent property of a system. Examples of self-organization are all around us.A simple example is the for- mation of ice crystals on the surface of water as it begins to freeze.Another simple example happens in a salt solution when the water is dried and crystals are observed forming.Yet another example is the often complex and beautiful patterns generated
Self-organization and Evolution achieved less by the growth of new connections and more by allowing meaningless connections to die out. Farley was an early contributor to the investigation of self-organizing systems. In Farley and Clark (1954), the subject is the simulation of self-organizing systems by digital computer. In Farley (1960), he said that self-organizing systems "automatically organize themselves to classify environmental inputs into recognizable percepts or 'patterns,"' and that "this self-organizing ability is called 'learned perception."' Kleyn (1963), another early contributor, wrote: "A system is said to be self-organizing if, after observing the input and output of an unknown phenomenon (transfer relation), the system organizes itself into a simulation of the unknown phenomenon." Today, there are almost as many ways to define self-organization as there are writers on the subject, but summaries of attributes and descriptions of self-organization often include the following points (Kennedy, Eberhart, and Shi 2001): Self-organizing systems usually exhibit what appears to be spontaneous order. m Self-organization can be viewed as a system's incessant attempts to organize itself into ever more complex structures, even in the face of the incessant forces of dissolution described by the second law of thermodynamics. m The overall system state of a self-organizing system is an emergent property of the system. m Interconnected system components become organized in a productive or meaningful way based on local information; global dynamics emerge from local rules. m Complex systems can self-organize. m The self-organization process works near the "edge of chaos." Bonabeau et al. (1999) define self-organization as "a set of dynamical mechanisms whereby structures appear at the global level of a system from interactions among its lower-level components. The rules specifying the interactions among the system's constituent units are executed on the basis of purely local information, without reference to the global pattern, which is an emergent property of the system rather than a property imposed on the system by an external ordering influence." This definition illustrates the close ties between self-organization and the emergent property of a system. Examples of self-organization are all around us. A simple example is the formation of ice crystals on the surface of water as it begins to freeze. Another simple example happens in a salt solution when the water is dried and crystals are observed forming. Yet another example is the often complex and beautiful patterns generated
28 Chapter Two-Computational Intelligence by cellular automata(CAs),which are specified by very simple mathematical functions.These CAs are not programmed to produce these patterns;rather,the patterns are an emergent feature of the system. As a more complex example,the evolution of the human brain has been described as a self-organizing process(McKee 2000).McKee uses the term auto- catalysis to describe how the design of an organism's features at one point in time affects or even determines the kinds of designs it can change into later.Thus the evolution of the organism is determined not only by selection pressures but by the constraints and opportunities offered by the structures that have evolved so far(Kennedy,Eberhart,and Shi 2001). The concept of self-organization has had a profound effect on how the authors view evolution,and the way evolution is viewed has had a profound effect on how we perceive computational intelligence.The following section reviews this new per- spective of evolution and illustrates why we believe that evolutionary computation provides the foundation of computational intelligence. Evolution beyond Darwin What is usually described as the Darwinian view of evolution is perhaps bet- ter described as the neo-Darwinian view.For example,chromosomes weren't even known in Darwin's time,so the prevailing view is a sort of amalgam of Darwinian and Mendelian ideas.(In 1865 Gregor Johann Mendel,an Augustinian priest in the Brno Monastery in the Czech Republic,described to the Brno Nat- ural Science Society the transfer of genetic material in pea plants.Unfortunately, the fundamental importance of Mendel's finding was not understood by the Soci- ety.Until about 1900 it was not recognized that Mendel had discovered the "law of heredity.") The neo-Darwinian view of evolution reflects three main observations.First is that chromosome composition is determined by the parents(at least in animals and humans).Second is that random mutation expands the search space of the species, providing the desirable attribute of diversity.Third is that fitter individuals have a higher probability of surviving to the next generation. According to modern researchers,including Kauffman(1993,1995),there are two fundamental shortcomings of the existing theory.The first is that the ori- gin of life by "chance"or mutation is highly improbable in the time frame of earth's history.The second is that evolution of complex life forms solely through mutation is also highly improbable.A detailed discussion of these points is beyond the scope of this book,but Kauffman(1993,1995)offers compelling arguments. This leads to a new view ofevolution,in which,due primarily to self-organization, complex systems can "appear"over a relatively short time frame compared with
Chapter TwouComputational Intelligence by cellular automata (CAs), which are specified by very simple mathematical functions. These CAs are not programmed to produce these patterns; rather, the patterns are an emergent feature of the system. As a more complex example, the evolution of the human brain has been described as a self-organizing process (McKee 2000). McKee uses the term autocatalysis to describe how the design of an organism's features at one point in time affects or even determines the kinds of designs it can change into later. Thus the evolution of the organism is determined not only by selection pressures but by the constraints and opportunities offered by the structures that have evolved so far (Kennedy, Eberhart, and Shi 2001). The concept of self-organization has had a profound effect on how the authors view evolution, and the way evolution is viewed has had a profound effect on how we perceive computational intelligence. The following section reviews this new perspective of evolution and illustrates why we believe that evolutionary computation provides the foundation of computational intelligence. Evolution beyond Darwin What is usually described as the Darwinian view of evolution is perhaps better described as the neo-Darwinian view. For example, chromosomes weren't even known in Darwin's time, so the prevailing view is a sort of amalgam of Darwinian and Mendelian ideas. (In 1865 Gregor Johann Mendel, an Augustinian priest in the Brno Monastery in the Czech Republic, described to the Brno Natural Science Society the transfer of genetic material in pea plants. Unfortunately, the fundamental importance of Mendel's finding was not understood by the Society. Until about 1900 it was not recognized that Mendel had discovered the "law of heredity.") The neo-Darwinian view of evolution reflects three main observations. First is that chromosome composition is determined by the parents (at least in animals and humans). Second is that random mutation expands the search space of the species, providing the desirable attribute of diversity. Third is that fitter individuals have a higher probability of surviving to the next generation. According to modern researchers, including Kauffman (1993, 1995), there are two fundamental shortcomings of the existing theory. The first is that the origin of life by "chance" or mutation is highly improbable in the time frame of earth's history. The second is that evolution of complex life forms solely through mutation is also highly improbable. A detailed discussion of these points is beyond the scope of this book, but Kauffman (1993, 1995) offers compelling arguments. This leads to a new view of evolution, in which, due primarily to self-organization, complex systems can "appear" over a relatively short time frame compared with