Adaptation 3.The function used to measure the performance of the system (which we call the fitness function)is very complex and nonlinear,having many local optima and/or discontinuities. 4.The fitness function landscape of global and local optima varies with time and over the problem space. 5.A complex and changing environment in which the system exists. We are making certain assumptions when we say that a system is adaptive.First, we assume that the system is converging to a sufficiently good solution.Second,we assume that adaptive processes drastically shorten the time required to arrive at a solution when compared with enumerative methods that must explore significant portions of the problem space(Kennedy,Eberhart,and Shi 2001). We believe that most engineering and computer science applications are driven by what we call the law of sufficiency:If a solution is good enough,fast enough,and cheap enough,it is sufficient.(Being good enough simply means it meets specifica- tions.)We believe that for most"optimization"applications,it is more appropriate to use the term"adaptation"because we generally do not actually find the optimum solution and often do not even know where it is. In the remainder of this section,we look at adaptation from three perspectives. First,we examine and compare the concepts of adaptation and learning.Next,we review the three main types of adaptation paradigm:supervised adaptation,unsu- pervised adaptation,and reinforcement adaptation.Finally,we consider the three spaces with which we must deal when working with adaptive systems:problem space, function space,and fitness space. Adaptation versus Learning The preceding definitions of adaptation describe and apply to computational intel- ligence systems extremely well.Too often,the process of altering structures such as neural networks,evolutionary computation tools,and fuzzy systems is described as learning.The word learning,in fact,appears throughout this book.This usage is in accordance with that of many researchers. Learning,however,is defined as"knowledge or skill acquired by instruction or study,"and the synonym listed for learning is knowledge.Likewise,to learn is defined as "to gain knowledge or understanding of or skill in by study,instruction or experience"(Mish 2001). Instead,learning is what an entire intelligent system does.All of the main com- ponents of an intelligent system participate in the learning process;all exchange information with the component of the system that is the repository of the system knowledge.Learning thus applies to the entire intelligent system,while adaptation mainly applies to the portion of the system we address in this book-the portion where computational intelligence exists
Adaptation (~, :. ~ 3. The function used to measure the performance of the system (which we call the fitness function) is very complex and nonlinear, having many local optima and/or discontinuities. 4. The fitness function landscape of global and local optima varies with time and over the problem space. 5. A complex and changing environment in which the system exists. We are making certain assumptions when we say that a system is adaptive. First, we assume that the system is converging to a sufficiently good solution. Second, we assume that adaptive processes drastically shorten the time required to arrive at a solution when compared with enumerative methods that must explore significant portions of the problem space (Kennedy, Eberhart, and Shi 2001). We believe that most engineering and computer science applications are driven by what we call the law of sufficiency: If a solution is good enough, fast enough, and cheap enough, it is sufficient. (Being good enough simply means it meets specifications.) We believe that for most "optimization" applications, it is more appropriate to use the term "adaptation" because we generally do not actually find the optimum solution and often do not even know where it is. In the remainder of this section, we look at adaptation from three perspectives. First, we examine and compare the concepts of adaptation and learning. Next, we review the three main types of adaptation paradigm: supervised adaptation, unsupervised adaptation, and reinforcement adaptation. Finally, we consider the three spaces with which we must deal when working with adaptive systems: problem space, function space, and fitness space. Adaptation versus Learning The preceding definitions of adaptation describe and apply to computational intelligence systems extremely well. Too often, the process of altering structures such as neural networks, evolutionary computation tools, and fuzzy systems is described as learning. The word learning, in fact, appears throughout this book. This usage is in accordance with that of many researchers. Learning, however, is defined as "knowledge or skill acquired by instruction or study," and the synonym listed for learning is knowledge. Likewise, to learn is defined as "to gain knowledge or understanding of or skill in by study, instruction or experience" (Mish 2001). Instead, learning is what an entire intelligent system does. All of the main components of an intelligent system participate in the learning process; all exchange information with the component of the system that is the repository of the system knowledge. Learning thus applies to the entire intelligent system, while adaptation mainly applies to the portion of the system we address in this book~the portion where computational intelligence exists
20 Chapter Two-Computational Intelligence Adaptation must overcome numerous barriers,including local optima and nonlinearities.The problem hyperspace landscape (topography,environment)is constantly changing.The adaptive systems with which we are dealing are complex, and the fitness or performance measure is often complicated and varying over time. Adaptive systems answer this challenge by progressively modifying population structures,using a set of operators that themselves evolve (adapt)over time.These adaptive processes drastically shorten the time required to arrive at a solution when compared with enumerative methods that must explore significant portions of the problem space. As you continue through this chapter,you will see that we assert that adaptation is arguably the most appropriate term for what computational intelligence systems do.In fact,it is not too much of a stretch to say that computational intelligence and adaptation(with self-organization)are synonymous.Adaptation,thus,is the leitmotif of this book. Three Types of Adaptation There are various ways to categorize adaptation.Each of the following sections dis- cusses one of three categories pertinent to computational intelligence:supervised adaptation,reinforcement adaptation,and unsupervised adaptation.2 Note that in all three cases we separate the adaptation algorithm from the adap- tive system.Usually,the algorithm is used to adapt(tune)the system and is then removed.The adaptive system(with its parameters frozen)then responds to input vectors from the environment.This is traditionally called offline adaptation.Some- times the adaptation algorithm,or a portion of it,remains active as the system is used.This is traditionally called online adaptation.Unlike offline adaptation,there are various degrees of online adaptation. Supervised Adaptation Compared to the other two categories of adaptation,supervised adaptation is well defined.A"teacher"that provides relevant input/output(I/O)examples is always present.In addition,it has a number of characteristics,including: ■ Adaptation is often carried out one step (iteration)at a time.The system adapts so that it emulates the training I/O examples while acquiring the ability to generalize. 1In many textbooks,the title of this section would be"Three Types of Learning."Based,however, on the reasoning earlier in this section,we generally use the term adaptation in this book to describe what computational intelligence systems do.We realize that this is somewhat unconventional,but we believe that the reasoning is sound,and that"adaptation,"more accurately than"learning,"describes what is going on in a computationally intelligent system. 2Other authors might call these supervised learning,reinforcement learning,and unsupervised learning
Chapter Two--Computational Intelligence Adaptation must overcome numerous barriers, including local optima and nonlinearities. The problem hyperspace landscape (topography, environment) is constantly changing. The adaptive systems with which we are dealing are complex, and the fitness or performance measure is often complicated and varying over time. Adaptive systems answer this challenge by progressively modifying population structures, using a set of operators that themselves evolve (adapt) over time. These adaptive processes drastically shorten the time required to arrive at a solution when compared with enumerative methods that must explore significant portions of the problem space. As you continue through this chapter, you will see that we assert that adaptation is arguably the most appropriate term for what computational intelligence systems do. In fact, it is not too much of a stretch to say that computational intelligence and adaptation (with self-organization) are synonymous. Adaptation, thus, is the leitmotif of this book. Three Types of Adaptation There are various ways to categorize adaptation, l Each of the following sections discusses one of three categories pertinent to computational intelligence: supervised adaptation, reinforcement adaptation, and unsupervised adaptation. 2 Note that in all three cases we separate the adaptation algorithm from the adaptive system. Usually, the algorithm is used to adapt (tune) the system and is then removed. The adaptive system (with its parameters frozen) then responds to input vectors from the environment. This is traditionally called offiine adaptation. Sometimes the adaptation algorithm, or a portion of it, remains active as the system is used. This is traditionally called online adaptation. Unlike offline adaptation, there are various degrees of online adaptation. Supervised Adaptation Compared to the other two categories of adaptation, supervised adaptation is well defined. A "teacher" that provides relevant input/output (I/O) examples is always present. In addition, it has a number of characteristics, including: [] Adaptation is often carried out one step (iteration) at a time. The system adapts so that it emulates the training I/O examples while acquiring the ability to generalize. 1 In many textbooks, the title of this section would be "Three Types of Learning." Based, however, on the reasoning earlier in this section, we generally use the term adaptation in this book to describe what computational intelligence systems do. We realize that this is somewhat unconventional, but we believe that the reasoning is sound, and that "adaptation," more accurately than "learning," describes what is going on in a computationally intelligent system. 2 Other authors might call these supervised learning, reinforcement learning, and unsupervised learning
Adaptation 21 The system's performance metric is often inversely proportional to some function of the sum of errors over the I/O examples.Examples include sum-squared error,mean-squared error,and sum of absolute error.The supervised adaptation algorithm often uses information about the gradient of the error with respect to an error surface that is averaged over all I/O examples to adapt the current point. An example of supervised adaptation appears in Figure 2.1.In Figures 2.1 through 2.3,an arrow going through the adaptive system box indicates the ability to adjust the parameters of the system.Supervised adaptation often results in an adaptive system that is used for what is,or amounts to,function approximation. The system is good at mapping input vectors to output vectors over its domain. One example of supervised adaptation that we examine in this book is a neural network adapted by the back-propagation algorithm.Input patterns for which the output patterns are known are presented to the network.The difference between what was expected at each output and what was actually there(defined as the error) is calculated for each output and each pattern.Some function of the error at each output is then used to adjust system parameters.In the case of a neural network,the weights of the network are adjusted in an attempt to minimize the error. Environment Desired Outputs "Teacher" (responses) (dataset with V/O examples) Input (state】 Vector System Adaptive Outputs System Supervised Error Values Adaptation Algorithm Figure 2.1 Supervised adaptation example.An arrow going through the adaptive system box indicates the ability to adjust the parameters of the system
Adaptation (~.2~ The system's performance metric is often inversely proportional to some function of the sum of errors over the I/O examples. Examples include sum-squared error, mean-squared error, and sum of absolute error. The supervised adaptation algorithm often uses information about the gradient of the error with respect to an error surface that is averaged over all I/O examples to adapt the current point. An example of supervised adaptation appears in Figure 2.1. In Figures 2.1 through 2.3, an arrow going through the adaptive system box indicates the ability to adjust the parameters of the system. Supervised adaptation often results in an adaptive system that is used for what is, or amounts to, function approximation. The system is good at mapping input vectors to output vectors over its domain. One example of supervised adaptation that we examine in this book is a neural network adapted by the back-propagation algorithm. Input patterns for which the output patterns are known are presented to the network. The difference between what was expected at each output and what was actually there (defined as the error) is calculated for each output and each pattern. Some function of the error at each output is then used to adjust system parameters. In the case of a neural network, the weights of the network are adjusted in an attempt to minimize the error. Environment "" Teacher "" (dataset with I/0 examples) Desired Outputs (responses) Input (state) Vector / Ada ~tive System Supervised Adaptation Algorithm System :0 Outputs Error Values Figure 2.1 Supervised adaptation example. An arrow going through the adaptive system box indicates the ability to adjust the parameters of the system
22 Chapter Two-Computational Intelligence Reinforcement Adaptation Reinforcement adaptation of a system is achieved through its interaction with a "critic"that provides heuristic reinforcement information.An illustration of reinforcement adaptation appears in Figure 2.2.The input variable information often includes the dynamic range of each variable and perhaps other variable infor- mation such as the precision required.Some sort of goal or fitness metric is also nec- essary.For example,in a multiple-city delivery-scheduling problem(e.g.,the trav- eling salesman problem),the goal may be to minimize the total distance traveled to visit all of the cities.The critic provides some fitness measure based on the goal- for example,a scaled number inversely proportional to the total distance traveled. So,although some kind of goal or fitness metric is required,the fitness cannot be obtained directly,but only a suggestion on how good the solution is relative to other solutions.(A direct fitness metric is possible only with supervised adaptation.) Of the three types of adaptation,reinforcement adaptation is most closely related to biological systems.One very simple illustration is that animals(including humans)tend to avoid behavior that causes us discomfort and tend to seek or repeat behavior that brings us comfort.Reinforcement adaptation has roots in the opti- mal control theory area called dynamic programming (Bellman 1957).Sequential decision making obtains much of its mathematical foundation from dynamic programming. Environment “Critic" Input Variable and Fitness Information Input (state) System Vector Output Adaptive System Heuristic Reinforcement Reinforcement Information Adaptation Algorithm Figure 2.2 Reinforcement adaptation example.An arrow going through the adaptive system box indicates the ability to adjust the parameters of the system
Chapter TwomComputational Intelligence Reinforcement Adaptation Reinforcement adaptation of a system is achieved through its interaction with a "critic" that provides heuristic reinforcement information. An illustration of reinforcement adaptation appears in Figure 2.2. The input variable information often includes the dynamic range of each variable and perhaps other variable information such as the precision required. Some sort of goal or fitness metric is also necessary. For example, in a multiple-city delivery-scheduling problem (e.g., the traveling salesman problem), the goal may be to minimize the total distance traveled to visit all of the cities. The critic provides some fitness measure based on the goal~ for example, a scaled number inversely proportional to the total distance traveled. So, although some kind of goal or fitness metric is required, the fitness cannot be obtained directly, but only a suggestion on how good the solution is relative to other solutions. (A direct fitness metric is possible only with supervised adaptation.) Of the three types of adaptation, reinforcement adaptation is most closely related to biological systems. One very simple illustration is that animals (including humans) tend to avoid behavior that causes us discomfort and tend to seek or repeat behavior that brings us comfort. Reinforcement adaptation has roots in the optimal control theory area called dynamic programming (Bellman 1957). Sequential decision making obtains much of its mathematical foundation from dynamic programming. Environment "" Critic "" Input Variable and Fitness Information Input (state) Vector Adaptive System Reinforcement Adaptation Algorithm ....... System Output Heuristic Reinforcement Information Figure 2.2 Reinforcement adaptation example. An arrow going through the adaptive system box indicates the ability to adjust the parameters of the system
Adaptation Characteristics of reinforcement adaptation often(but not always)include The system often deals with a time series of input(state)vectors,waiting until the sequence is complete to judge the fitness of the system. The critic looks at only the outcomes (the results),not at some error measure due to each input. An example of a paradigm using reinforcement adaptation is particle swarm optimization,which is introduced in Chapter 3.A particle swarm explores the problem space,keeping track of the fitness of its particles and also remembering where in the problem space the best solutions have so far been found.We probably do not know where the optimal solution is.We may not even know whether a single optimal solution exists(there may be multiple optima).There may be a number of constraints,making the problem very complex.All we can tell the system is whether one solution is better than another;sometimes,as in the case of particle swarm optimization,we can calculate how much better it is.But that's about the extent of it. Unsupervised Adaptation In the case of unsupervised adaptation,no external teacher or critic is involved in system adaptation.Instead,a dataset comprising example vectors of the sys- tem's variable parameters is provided.That is operated on by the unsuper- vised learning algorithm.A representation of unsupervised adaptation appears in Figure 2.3.Characteristics of unsupervised adaptation algorithms include: There is no indication of fitness whatsoever incorporated into the unsupervised adaptation algorithm.It just plods along with blinders on, executing its job,which may involve clustering or "competitive learning." The interpretation of what the unsupervised algorithm did,and how well it did it,and whether it is even appropriate and/or usable,is done after the algorithm stops running.This offline evaluation is typically done by a human or other intelligent system. Clustering aggregates similar input patterns into distinct,mutually exclusive subsets referred to as clusters.As stated by Anderberg(1973),"the objective is to group either the data units or the variables into clusters such that elements within a cluster have a high degree of natural association'among themselves while the clusters are'relatively distinct'from one another."Clustering is generally consid- ered a two-phase process.In the first phase,the number of clusters in the data is determined or assumed.The second phase assigns each data point(pattern)to a single cluster
Adaptation ~. ., ~~.3~ Characteristics of reinforcement adaptation often (but not always) include m The system often deals with a time series of input (state) vectors, waiting until the sequence is complete to judge the fitness of the system. [] The critic looks at only the outcomes (the results), not at some error measure due to each input. An example of a paradigm using reinforcement adaptation is particle swarm optimization, which is introduced in Chapter 3. A particle swarm explores the problem space, keeping track of the fitness of its particles and also remembering where in the problem space the best solutions have so far been found. We probably do not know where the optimal solution is. We may not even know whether a single optimal solution exists (there may be multiple optima). There may be a number of constraints, making the problem very complex. All we can tell the system is whether one solution is better than another; sometimes, as in the case of particle swarm optimization, we can calculate how much better it is. But that's about the extent of it. Unsupervised Adaptation In the case of unsupervised adaptation, no external teacher or critic is involved in system adaptation. Instead, a dataset comprising example vectors of the system's variable parameters is provided. That is operated on by the unsupervised learning algorithm. A representation of unsupervised adaptation appears in Figure 2.3. Characteristics of unsupervised adaptation algorithms include: m There is no indication of fitness whatsoever incorporated into the unsupervised adaptation algorithm. It just plods along with blinders on, executing its job, which may involve clustering or "competitive learning." m The interpretation of what the unsupervised algorithm did, and how well it did it, and whether it is even appropriate and/or usable, is done after the algorithm stops running. This offline evaluation is typically done by a human or other intelligent system. Clustering aggregates similar input patterns into distinct, mutually exclusive subsets referred to as clusters. As stated by Anderberg (1973), "the objective is to group either the data units or the variables into clusters such that elements within a cluster have a high degree of 'natural association' among themselves while the clusters are 'relatively distinct' from one another." Clustering is generally considered a two-phase process. In the first phase, the number of clusters in the data is determined or assumed. The second phase assigns each data point (pattern) to a single cluster