heme Feature Artificial Neural Networks:A Tutorial Anil K.Jain umerous advances have been made in developing intelligent Michigan State University systems,some inspired by biological neural networks. Researchers from many scientific disciplines are designing arti- Jianchang Mao ficial neural networks (ANNs)to solve a variety of problems in pattern K.M.Mohiuddin recognition,prediction,optimization,associative memory,and control IBM Almaden Research Center (see the“Challenging problems'”sidebar). Conventional approaches have been proposed for solving these prob- lems.Although successful applications can be found in certain well-con- strained environments,none is flexible enough to perform well outside its domain.ANNs provide exciting alternatives,and many applications could benefit from using them.3 This article is for those readers with little or no knowledge of ANNs to help them understand the other articles in this issue of Computer.We dis- cuss the motivations behind the development of ANNs,describe the basic biological neuron and the artificial computational model,outline net- work architectures and learning processes,and present some of the most commonly used ANN models.We conclude with character recognition.a successful ANN application. WHY ARTIFICIAL NEURAL NETWORKS? The long course of evolution has given the human brain many desir able characteristics not present in von Neumann or modern parallelcom- puters.These include massive parallelism, distributed representation and computation, ·learning ability, generalization ability, 。adaptivity, .inherent contextual information processing, fault tolerance,and low energy consumption. 烧 It is hoped that devices based on biological neural networks will possess some of these desirable characteristics. These massively parallel Modern digital computers outperform humans in the domain of numeric computation and related symbol manipulation.However, systems with large numbers humans can effortlessly solve complex perceptual problems (like recog- nizing a man in a crowd from a mere glimpse of his face)at such a high of interconnected simple speed and extent as to dwarf the world's fastest computer.Why is there such a remarkable difference in their performance?The biological neural processors may solve a variety system architecture is completely different from the von Neumann archi- tecture(see Table 1).This difference significantly affects the type of func- of challenging computational tions each computational model can best perform. Numerous efforts to develop "intelligent"programs based on von problems.This tutorial Neumann's centralized architecture have not resulted in general-purpose intelligent programs.Inspired by biological neural networks,ANNs are provides the background and massively parallel computing systems consisting of an exremely large num- ber of simple processors with many interconnections.ANN models attempt the basics. to use some"organizational"principles believed to be used in the human 0018-91629655.00.1996IEEE March 1996 B0
Ani1 K. Jain Michigan State University Jianchang Mao K.M. Mohiuddin ZBMAZmaden Research Center umerous advances have been made in developing intelligent N systems, some inspired by biological neural networks. Researchers from many scientific disciplines are designing artificial neural networks (A”s) to solve a variety of problems in pattern recognition, prediction, optimization, associative memory, and control (see the “Challenging problems” sidebar). Conventional approaches have been proposed for solving these problems. Although successful applications can be found in certain well-constrained environments, none is flexible enough to perform well outside its domain. ANNs provide exciting alternatives, and many applications could benefit from using them.’ This article is for those readers with little or no knowledge of ANNs to help them understand the other articles in this issue of Computer. We discuss the motivations behind the development of A”s, describe the basic biological neuron and the artificial computational model, outline network architectures and learning processes, and present some of the most commonly used ANN models. We conclude with character recognition, a successful ANN application. WHY ARTIFICIAL NEURAL NETWORKS? The long course of evolution has given the human brain many desirable characteristics not present invon Neumann or modern parallel computers. These include These massively parallel systems with large numbers of interconnected simple processors may solve a variety of challenging computational problems. This tutorial provides the background and the basics. massive parallelism, distributed representation and computation, learning ability, generalization ability, adaptivity, inherent contextual information processing, fault tolerance, and low energy consumption. It is hoped that devices based on biological neural networks will possess some of these desirable characteristics. Modern digital computers outperform humans in the domain of numeric computation and related symbol manipulation. However, humans can effortlessly solve complex perceptual problems (like recognizing a man in a crowd from a mere glimpse of his face) at such a high speed and extent as to dwarf the world’s fastest computer. Why is there such a remarkable difference in their performance? The biological neural system architecture is completely different from the von Neumann architecture (see Table l). This difference significantly affects the type of functions each computational model can best perform. Numerous efforts to develop “intelligent” programs based on von Neumann’s centralized architecture have not resulted in general-purpose intelligent programs. Inspired by biological neural networks, ANNs are massively parallel computing systems consisting of an exremely large number of simple processors with many interconnections. ANN models attempt to use some “organizational” principles believed to be used in the human 0018-9162/96/$5.000 1996 IEEE March 1996
》@弯aes Let us consider the following problems of interest to com- Control puter scientists and engineers. Consider a dynamic system defined by a tuple [u(t),vt)) where u(t)is the control input and y(t)is the resulting out- Pattern classification put of the system at time t.In model-reference adaptive The task of pattern classification is to assign an input pat- control,the goal is to generate a control input u()such that tern(like a speech waveform or handwritten symbol)rep- the system follows a desired trajectory determined by the resented by a feature vector to one of many prespecified reference model.An example is engine idle-speed control classes(see Figure A1).Well-known applications include (Figure A7) character recognition,speech recognition,EEG waveform classification,blood cell classification,and printed circuit board inspection. Clustering/categorization Pattern Normal classitier In clustering,also known as unsupervised pattern clas- Abnormal sification,there are no training data with known class labels.A clustering algorithm explores the similarity between the patterns and places similar patterns in a clus- ter(see Figure A2).Well-known clustering applications + include data mining,data compression,and exploratory data analysis. Function approximation Over-fitting to Suppose a set of n labeled training patterns(input-out- noisy training data put pairs),[(xy)(x2,y),...,(x)),have been generated from an unknown function u(x)(subject to noise).The task True functic of function approximation is to find an estimate,say u,of the unknown function u(Figure A3).Various engineering (2) (3) and scientific modeling problems require function approx- imation. y Stock values (8 Prediction/forecasting Given a set of n samples (y(t )y(t),...,yt))in a time sequence,t,t...,t the task is to predict the sample y(t)at some future time t Prediction/forecasting has a t2 tn to+1 significant impact on decision-making in business,science, (4) 5) and engineering.Stock market prediction and weather forecasting are typical applications of prediction/forecast- ing techniques(see Figure A4). Retrieved airplane Optimization Associative A wide variety of problems in mathematics,statistics, memory engineering,science,medicine,and economics can be posed as optimization problems.The goal of an optimiza- ( tion algorithm is to find a solution satisfying a set of con- straints such that an objective function is maximized or Load torque minimized.The Traveling Salesman Problem(TSP),an NP- complete problem,is a classic example(see Figure A5) Throttle Engine Content-addressable memory In the von Neumann model of computation,an entry in memory is accessed only through its address,which is inde- Controller pendent of the content in the memory.Moreover,if a small (7) error is made in calculating the address,a completely dif- ferent item can be retrieved.Associative memory or con- Figure A.Tasks that neural networks can perform: tent-addressable memory,as the name implies,can be (1)pattern classification;(2)clustering/categorization; accessed by their content.The content in the memory can (3)function approximation:(4)prediction/forecasting be recalled even by a partial input or distorted content (see (5)optimization (a TSP problem example);(6)retrieval Figure A6).Associative memory is extremely desirable in by content;and(7)control (engine idle speed).(Adapted building multimedia information databases. from DARPA Neural Network Study) 32 Computer
I ification is to assign an input patn applications include ition, EEG waveform , and printed circuit a with known class lores the similarity n labeled training patterns (input-outfunction p(x) (subject to noise). The task ion is to find an estimate, say i, of p (Figure A3). Various engineering problems require function approxdes MtJ, y(fJ, . . . , y(t,)l in a time n, the task is to predict the sample me tn+,. Prediction/forecasting has a ecision-making in business, science, k market prediction and weather roblems in mathematics, statistics, medicine, and economics can be lems. The goal of an optimizaolution satisfying a set of contive function is maximized or Normal Abnormal ++ + Over-fitting to M noisy training data True function e (4) Airplane partially occluded by clouds Associative memory Load torque Controller (7) Figure A. Tasks that neur Computer
brain.Modeling a biological nervous system using ANNs can also increase our understanding of biological functions. I.Von Neumann computer versus biological neural system. State-of-the-art computer hardware technology (such as Von Neumann Biological VLSI and optical)has made this modeling feasible. A thorough study of ANNs requires knowledge of neu- computer neural system rophysiology,cognitive science/psychology,physics (sta- 550 Complex Simple tistical mechanics),control theory,computer science, High speed Low speed artificial intelligence,statistics/mathematics,pattern One or a few A large number recognition,computer vision,parallel processing,and hardware (digital/analog/VLSI/optical).New develop Memory Separate from a processor Integrated into ments in these disciplines continuously nourish the field Localized processor On the other hand,ANNs also provide an impetus to these Noncontent addressable Distributed disciplines in the form of new tools and representations. Content addressable This symbiosis is necessary for the vitality of neural net- work research.Communications among these disciplines Computing Centralized Distributed ought to be encouraged. Sequential Parallel Stored programs Self-learning Brief historical review ANN research has experienced three periods of exten- Reliability very vulnerable Robust sive activity.The first peak in the 1940s was due to McCulloch and Pitts'pioneering work.+The second Expertise Numerical and symbolic Perceptual occurred in the 1960s with Rosenblatt's perceptron con- manipulations problems vergence theorem5 and Minsky and Papert's work showing the limitations of a simple perceptron."Minsky and Operating Well-defined, Poorly defined, Papert's results dampened the enthusiasm of most environment well-constrained unconstrained researchers,especially those in the computer science com- munity.The resulting lull in neural network research lasted almost 20 years.Since the early 1980s,ANNs have received considerable renewed interest.The major devel- opments behind this resurgence include Hopfield's energy approach?in 1982 and the back-propagation learning algorithm for multilayer perceptrons(multilayer feed- forward networks)first proposed by Werbos,reinvented several times,and then popularized by Rumelhart et al. in 1986.Anderson and Rosenfeld1 provide a detailed his- torical account of ANN developments. Biological neural networks A neuron (or nerve cell)is a special biological cell that processes information (see Figure 1).It is composed of a cell body,or soma,and two types of out-reaching tree-like Figure 1.A sketch of a biological neuron. branches:the axon and the dendrites.The cell body has a nucleus that contains information about hereditary traits and a plasma that holds the molecular equipment for pro rons about 2 to 3 millimeters thick with a surface area of ducing material needed by the neuron.A neuron receives about 2,200 cm2,about twice the area of a standard com- signals (impulses)from other neurons through its dendrites puter keyboard.The cerebral cortex contains about 101 (receivers)and transmits signals generated by its cell body neurons,which is approximately the number of stars in the along the axon (transmitter),which eventually branches Milky Way.Neurons are massively connected,much more into strands and substrands.At the terminals of these complex and dense than telephone networks.Each neuron strands are the synapses.A synapse is an elementary struc. is connected to 102 to 10other neurons.In total,thehuman ture and functional unit between two neurons (an axon brain contains approximately 101 to 1015 interconnections strand of one neuron and a dendrite of another).When the Neurons communicate through a very short train of impulse reaches the synapse's terminal,certain chemicals pulses,typically milliseconds in duration.The message is called neurotransmitters are released.The neurotransmit- modulated on the pulse-transmission frequency.This fre- ters diffuse across the synaptic gap,to enhance or inhibit quency can vary from a few to several hundred hertz,which depending on the type of the synapse,the receptor neuron's is a million times slower than the fastest switching speed in own tendency to emit electrical impulses.The synapse's electronic circuits.However,complex perceptual decisions effectiveness can be adjusted by the signals passing through such as face recognition are typically made by humans it so that the synapses can learn from the activities in which within a few hundred milliseconds.These decisions are they participate.This dependence on history acts as a mem- made by a network of neurons whose operational speed is ory,which is possibly responsible for human memory. only a few milliseconds.This implies that the computations The cerebral cortex in humans is a large flat sheet of neu- cannot take more than about 100 serial stages.In other March 1996 33
brain. Modeling a biological nervous system using A"s can also increase our understanding of biological functions. State-of-the-art computer hardware technology (such as VLSI and optical) has made this modeling feasible. A thorough study of A"s requires knowledge of neurophysiology, cognitive science/psychology, physics (statistical mechanics), control theory, computer science, artificial intelligence, statistics/mathematics, pattern recognition, computer vision, parallel processing, and hardware (digital/analog/VLSI/optical) . New developments in these disciplines continuously nourish the field. On the other hand, ANNs also provide an impetus to these disciplines in the form of new tools and representations. This symbiosis is necessary for the vitality of neural network research. Communications among these disciplines ought to be encouraged. Brief historical review ANN research has experienced three periods of extensive activity. The first peak in the 1940s was due to McCulloch and Pitts' pioneering The second occurred in the 1960s with Rosenblatt's perceptron convergence theorem5 and Minsky and Papert's work showing the limitations of a simple perceptron.6 Minsky and Papert's results dampened the enthusiasm of most researchers, especially those in the computer science community. The resulting lull in neural network research lasted almost 20 years. Since the early 1980s, ANNs have received considerable renewed interest. The major developments behind this resurgence include Hopfield's energy approach7 in 1982 and the back-propagation learning algorithm for multilayer perceptrons (multilayer feedforward networks) first proposed by Werbos,8 reinvented several times, and then popularized by Rumelhart et aL9 in 1986. Anderson and RosenfeldlO provide a detailed historical account of ANN developments. Biological neural networks A neuron (or nerve cell) is a special biological cell that processes information (see Figure 1). It is composed of a cell body, or soma, and two types of out-reaching tree-like branches: the axon and the dendrites. The cell body has a nucleus that contains information about hereditary traits and a plasma that holds the molecular equipment for producing material needed by the neuron. A neuron receives signals (impulses) from other neurons through its dendrites (receivers) and transmits signals generated by its cell body along the axon (transmitter), which eventually branches into strands and substrands. At the terminals of these strands are the synapses. A synapse is an elementary structure and functional unit between two neurons (an axon strand of one neuron and a dendrite of another), When the impulse reaches the synapse's terminal, certain chemicals called neurotransmitters are released. The neurotransmitters diffuse across the synaptic gap, to enhance or inhibit, depending on the type of the synapse, the receptor neuron's own tendency to emit electrical impulses. The synapse's effectiveness can be adjusted by the signals passing through it so that the synapses can learn from the activities in which they participate. This dependence on history acts as amemory, which is possibly responsible for human memory. The cerebral cortex in humans is a large flat sheet of neuFigure 1. A sketch of a biological neuron. ions about 2 to 3 millimeters thick with a surface area of about 2,200 cm2, about twice the area of a standard computer keyboard. The cerebral cortex contains about 10" neurons, which is approximately the number of stars in the Milky Way." Neurons are massively connected, much more complex and dense than telephone networks. Each neuron is connected to 103 to lo4 other neurons. In total, the human brain contains approximately 1014 to loi5 interconnections. Neurons communicate through a very short train of pulses, typically milliseconds in duration. The message is modulated on the pulse-transmission frequency. This frequency can vary from a few to several hundred hertz, which is a million times slower than the fastest switching speed in electronic circuits. However, complex perceptual decisions such as face recognition are typically made by humans within a few hundred milliseconds. These decisions are made by a network of neurons whose operational speed is only a few milliseconds. This implies that the computations cannot take more than about 100 serial stages. In other March 1996
and has the desired asymptotic properties. The standard sigmoid function is the logis tic function,defined by gx)=1/(1+exp{-Bx}), where B is the slope parameter. Network architectures ANNs can be viewed as weighted directed graphs in which artificial neurons are Figure 2.McCulloch-Pitts model of a neuron nodes and directed edges (with weights) are connections between neuron outputs and neuron inputs. words,the brain runs parallel programs that are about 100 Based on the connection pattern(architecture),ANNs steps long for such perceptual tasks.This is known as the can be grouped into two categories (see Figure 4): hundred step rule.The same timing considerations show that the amount of information sent from one neuron to feed-forward networks,in which graphs have no another must be very small(a few bits).This implies that loops,and critical information is not transmitted directly,but captured recurrent (or feedback)networks,in which loops and distributed in the interconnections-hence the name, occur because of feedback connections. connectionist model,used to describe ANNs. Interested readers can find more introductory and eas- In the most common family of feed-forward networks. ily comprehensible material on biological neurons and called multilayer perceptron,neurons are organized into neural networks in Brunak and Lautrup.n layers that have unidirectional connections between them Figure 4 also shows typical networks for each category. ANN OVERVIEW Different connectivities yield different network behav- iors.Generally speaking,feed-forward networks are sta- Computational models of neurons tic,that is,they produce only one set of output values MeCulloch and Pitts+proposed a binary threshold unit rather than a sequence of values from a given input.Feed- as a computational model for an artificial neuron (see forward networks are memory-less in the sense that their Figure 2). response to an input is independent of the previous net- This mathematical neuron computes a weighted sum of work state.Recurrent,or feedback,networks,on the other itsn input signals,x,=1,2,...,n,and generates an out- hand,are dynamic systems.When a new input pattern is put of 1 if this sum is above a certain threshold u. presented,the neuron outputs are computed.Because of Otherwise,an output of 0 results.Mathematically, the feedback paths,the inputs to each neuron are then modified,which leads the network to enter a new state. Different network architectures require appropriate learning algorithms.The next section provides an overview of learning processes. where 0()is a unit step function at 0,and w,is the synapse Learning weight associated with the jth input.For simplicity of nota- The ability to learn is a fundamental trait of intelligence. tion,we often consider the threshold u as another weight Although a precise definition of learning is difficult to for- wo=-u attached to the neuron with a constant input xo mulate,a learning process in the ANN context can be =1.Positive weights correspond to excitatory synapses, viewed as the problem of updating network architecture while negative weights model inhibitory ones.McCulloch and connection weights so that a network can efficiently and Pitts proved that,in principle,suitably chosen weights perform a specific task.The network usually must learn let a synchronous arrangement of such neurons perform the connection weights from available training patterns. universal computations.There is a crude analogy here to Performance is improved over time by iteratively updat. a biological neuron:wires and interconnections model ing the weights in the network.ANNs'ability to auto- axons and dendrites,connection weights represent matically learn from examples makes them attractive and synapses,and the threshold function approximates the exciting.Instead of following a set of rules specified by activity in a soma.The McCulloch and Pitts model,how- human experts,ANNs appear to learn underlying rules ever,contains a number of simplifying assumptions that (like input-output relationships)from the given collec- do not reflect the true behavior of biological neurons. tion of representative examples.This is one of the major The McCulloch-Pitts neuron has been generalized in advantages of neural networks over traditional expert sys- many ways.An obvious one is to use activation functions tems. other than the threshold function,such as piecewise lin- To understand or design a learning process,you must ear,sigmoid,or Gaussian,as shown in Figure 3.The sig- first have a model of the environment in which a neural moid function is by far the most frequently used in ANNs. network operates,that is,you must know what informa- It is a strictly increasing function that exhibits smoothness tion is available to the network.We refer to this model as 34 Computer
Figure 2. McCulloch-Pitts model of a neuron. words, the brain runs parallel programs that are about 100 steps long for such perceptual tasks. This is known as the hundred step rule.12 The same timing considerations show that the amount of information sent from one neuron to another must be very small (a few bits). This implies that critical information is not transmitted directly, but captured and distributed in the interconnections-hence the name, connectionist model, used to describe A"s. Interested readers can find more introductory and easily comprehensible material on biological neurons and neural networks in Brunak and Lautrup.ll ANN OVERVIEW Computational models of neurons McCulloch and Pitts4 proposed a binary threshold unit as a computational model for an artificial neuron (see Figure 2). This mathematical neuron computes a weighted sum of its n input signals,x,, j = 1,2, . . . , n, and generates an output of 1 if this sum is above a certain threshold U. Otherwise, an output of 0 results. Mathematically, where O( ) is a unit step function at 0, and w, is the synapse weight associated with the jth input. For simplicity of notation, we often consider the threshold U as anotherweight wo = - U attached to the neuron with a constant input x, = 1. Positive weights correspond to excitatory synapses, while negative weights model inhibitory ones. McCulloch and Pitts proved that, in principle, suitably chosen weights let a synchronous arrangement of such neurons perform universal computations. There is a crude analogy here to a biological neuron: wires and interconnections model axons and dendrites, connection weights represent synapses, and the threshold function approximates the activity in a soma. The McCulloch and Pitts model, however, contains a number of simplifylng assumptions that do not reflect the true behavior of biological neurons. The McCulloch-Pitts neuron has been generalized in many ways. An obvious one is to use activation functions other than the threshold function, such as piecewise linear, sigmoid, or Gaussian, as shown in Figure 3. The sigmoid function is by far the most frequently used in A"s. It is a strictly increasing function that exhibits smoothness Computer and has the desired asymptotic properties. The standard sigmoid function is the logistic function, defined by where p is the slope parameter. Network architectures A"s can be viewed as weighted directed graphs in which artificial neurons are nodes and directed edges (with weights) are connections between neuron outputs and neuron inputs. Based on the connection pattern (architecture), A"s can be grouped into two categories (see Figure 4) : * feed-forward networks, in which graphs have no * recurrent (or feedback) networks, in which loops loops, and occur because of feedback connections. In the most common family of feed-forward networks, called multilayer perceptron, neurons are organized into layers that have unidirectional connections between them. Figure 4 also shows typical networks for each category. Different connectivities yield different network behaviors. Generally speaking, feed-forward networks are static, that is, they produce only one set of output values rather than a sequence of values from a given input. Feedforward networks are memory-less in the sense that their response to an input is independent of the previous networkstate. Recurrent, or feedback, networks, on the other hand, are dynamic systems. When a new input pattern is presented, the neuron outputs are computed. Because of the feedback paths, the inputs to each neuron are then modified, which leads the network to enter a new state. Different network architectures require appropriate learning algorithms. The next section provides an overview of learning processes. Learning The ability to learn is a fundamental trait of intelligence. Although aprecise definition of learning is difficult to formulate, a learning process in the ANN context can be viewed as the problem of updating network architecture and connection weights so that a network can efficiently perform a specific task. The network usually must learn the connection weights from available training patterns. Performance is improved over time by iteratively updating the weights in the network. ANNs' ability to automatically learnfrom examples makes them attractive and exciting. Instead of following a set of rules specified by human experts, ANNs appear to learn underlying rules (like input-output relationships) from the given collection of representative examples. This is one of the major advantages of neural networks over traditional expert systems. To understand or design a learning process, you must first have a model of the environment in which a neural network operates, that is, you must know what information is available to the network. We refer to this model as
(a (b) d Figure 3.Different types of activation functions:(a)threshold,(b)piecewise linear,(c)sigmoid,and(d) Gaussian. Neural networks Feed-forward networks Recurrent/feedback networks 1+2 Single-layer Multilayer Radial Basis Competitive Kohonen's Hopfield SOM ART models perceptron perceptron Function nets networks network Figure 4.A taxonomy of feed-forward and recurrent/feedback network architectures. a learning paradigm.Second,you must understand how stored,and what functions and decision boundaries a net- network weights are updated,that is,which learning rules work can form. govern the updating process.A learning algorithm refers -Sample complexity determines the number of training to a procedure in which learning rules are used for adjust- patterns needed to train the network to guarantee a valid ing the weights. generalization.Too few patterns may cause "over-fitting' There are three main learning paradigms:supervised, (wherein the network performs well on the training data unsupervised,and hybrid.In supervised learning,or set,but poorly on independent test patterns drawn from the learning with a"teacher,"the network is provided with a same distribution as the training patterns,as in Figure A3). correct answer (output)for every input pattern.Weights Computational complexity refers to the time required are determined to allow the network to produce answers for a learning algorithm to estimate a solution from train- as close as possible to the known correct answers ing patterns.Many existing learning algorithms have high Reinforcement learning is a variant of supervised learn- computational complexity.Designing efficient algorithms ing in which the network is provided with only a critique for neural network learning is a very active research topic. on the correctness of network outputs,not the correct There are four basic types of learning rules:error- answers themselves.In contrast,unsupervised learning,or correction,Boltzmann,Hebbian,and competitive learning learning without a teacher,does not require a correct answer associated with each input pattern in the training ERROR-CORRECTION RULES.In the supervised learn- data set.It explores the underlying structure in the data ing paradigm,the network is given a desired output for or correlations between patterns in the data,and orga each input pattern.During the learning process,the actual nizes patterns into categories from these correlations. outputy generated by the network may not equal the Hybrid learning combines supervised and unsupervised desired output d.The basic principle of error-correction learning.Part of the weights are usually determined learning rules is to use the error signal(d-y)to modify through supervised learning,while the others are the connection weights to gradually reduce this error. obtained through unsupervised learning. The perceptron learning rule is based on this error-cor- Learning theory must address three fundamental and rection principle.A perceptron consists of a single neuron practical issues associated with learning from samples: with adjustable weights,w,j=1,2,...,n,and threshold capacity,sample complexity,and computational com- u,as shown in Figure 2.Given an input vector x=(x,x2, plexity.Capacity concerns how many patterns can be ...,x),the net input to the neuron is March 1996 35
(a) Jr- (4 (C) 4 (4 Figure 3. Different types of activation functions: (a) threshold, (b) piecewise linear, (c) sigmoid, and (d) Gaussian. Figure 4. A taxonomy of feed-forward and recurrentlfeedback network architectures. a learning ~aradigm.~ Second, you must understand how network weights are updated, that is, which learning rules govern the updating process. A learning algorithm refers to a procedure in which learning rules are used for adjusting the weights. There are three main learning paradigms: supervised, unsupervised, and hybrid. In supervised learning, or learning with a “teacher,” the network is provided with a correct answer (output) for every input pattern. Weights are determined to allow the network to produce answers as close as possible to the known correct answers. Reinforcement learning is a variant of supervised learning in which the network is provided with only a critique on the correctness of network outputs, not the correct answers themselves. In contrast, unsupervised learning, or learning without a teacher, does not require a correct answer associated with each input pattern in the training data set. It explores the underlying structure in the data, or correlations between patterns in the data, and organizes patterns into categories from these correlations. Hybrid learning combines supervised and unsupervised learning. Part of the weights are usually determined through supervised learning, while the others are obtained through unsupervised learning. Learning theory must address three fundamental and practical issues associated with learning from samples: capacity, sample complexity, and computational complexity. Capacity concerns how many patterns can be stored, and what functions and decision boundaries a network can form. , Sample complexity determines the number of training patterns needed to train the network to guarantee a valid generalization. Too few patterns may cause “over-fitting” (wherein the network performs well on the training data set, but poorly on independent test patterns drawn from the same distribution as the training patterns, as in Figure A3). Computational complexity refers to the time required for a learning algorithm to estimate a solution from training patterns. Many existing learning algorithms have high computational complexity. Designing efficient algorithms for neural network learning is avery active research topic. There are four basic types of learning rules: errorcorrection, Boltzmann, Hebbian, and competitive learning. ERROR-CORRECTION RULES. In the supervised learning paradigm, the network is given a desired output for each input pattern. During the learning process, the actual output y generated by the network may not equal the desired output d. The basic principle of error-correction learning rules is to use the error signal (d -y) to modify the connection weights to gradually reduce this error. The perceptron learning rule is based on this error-correction principle. A perceptron consists of a single neuron with adjustable weights, w,, j = 1,2, . . . , n, and threshold U, as shown in Figure 2. Given an input vector x= (xl, x,, . . . , xJt, the net input to the neuron is March 1996