if our hardware breaks down "slightly"the TLU may still function perfectly well as a result of its nonlinear functionality. Table 2.3 TLU with degraded signal input. 2 Activation Output 0.20.2 0.2 0 0.20.8 0.8 1 0.80.2 0.2 0 0.80.8 0.8 Suppose now that,instead of the weights being altered,the input signals have become degraded in some way,due to noise or a partial power loss,for example, so that what was previously "1"is now denoted by 0.8,and "0"becomes 0.2.The resulting TLU function is shown in Table 2.3.Once again the resulting TLU function is the same and a similar reasoning applies that involves the nonlinearity implied by the threshold.The conclusion is that the TLU is robust in the presence of noisy or corrupted signal inputs.The reader is invited to examine the case where both weights and signals have been degraded in the way indicated here.Of course, if we increase the amount by which the weights or signals have been changed too much,the TLU will eventually respond incorrectly.In a large network,as the degree of hardware and/or signal degradation increases,the number of TLU units giving incorrect results will gradually increase too.This process is called 'graceful degradation"and should be compared with what happens in conventional computers where alteration to one component or loss of signal strength along one circuit board track can result in complete failure of the machine. 33
if our hardware breaks down "slightly" the TLU may still function perfectly well as a result of its nonlinear functionality. Suppose now that, instead of the weights being altered, the input signals have become degraded in some way, due to noise or a partial power loss, for example, so that what was previously "1" is now denoted by 0.8, and "0" becomes 0.2. The resulting TLU function is shown in Table 2.3. Once again the resulting TLU function is the same and a similar reasoning applies that involves the nonlinearity implied by the threshold. The conclusion is that the TLU is robust in the presence of noisy or corrupted signal inputs. The reader is invited to examine the case where both weights and signals have been degraded in the way indicated here. Of course, if we increase the amount by which the weights or signals have been changed too much, the TLU will eventually respond incorrectly. In a large network, as the degree of hardware and/or signal degradation increases, the number of TLU units giving incorrect results will gradually increase too. This process is called "graceful degradation" and should be compared with what happens in conventional computers where alteration to one component or loss of signal strength along one circuit board track can result in complete failure of the machine. 33
2.4 Non-binary signal communication The signals dealt with so far(for both real and artificial neurons)have taken on only two values.In the case of real neurons these are the action-potential spiking voltage and the axon-membrane resting potential.For the TLUs they were conveniently labelled "1"and "O"respectively.Real neurons,however,are believed to encode their signal values in the patterns of action-potential firing rather than simply by the presence or absence of a single such pulse.Many characteristic patterns are observed (Conners Gutnick 1990)of which two common examples are shown in Figure 2.5. Part(a)shows a continuous stream of action-potential spikes while (b)shows Time Figure 2.5 Neural firing patterns. a pattern in which a series of pulses is followed by a quiescent period,with this sequence repeating itself indefinitely.A continuous stream as in (a)can be characterized by the frequency of occurrence of action potential in pulses per second and it is tempting to suppose that this is,in fact,the code being signalled by the neuron.This was convincingly demonstrated by Hartline(1934,1940)for the optic neurons of the Horseshoe crab Limulus in which he showed that the rate of firing increased with the visual stimulus intensity.Although many neural codes are available (Bullock et al.1977)the frequency code appears to be used in many instances. Iff is the frequency of neural firing then we know that f is bounded below by zero and above by some maximum value which is governed by the duration of the interspike refractory period.There are now two ways we can code for f in our artificial neurons.First,we may simply extend the signal representation to a continuous range and directly represent f as our unit output.Such signals can certainly be handled at the input of the TLU,as we remarked in examining the 34
2.4 Non-binary signal communication The signals dealt with so far (for both real and artificial neurons) have taken on only two values. In the case of real neurons these are the action-potential spiking voltage and the axon-membrane resting potential. For the TLUs they were conveniently labelled "1" and "0" respectively. Real neurons, however, are believed to encode their signal values in the patterns of action-potential firing rather than simply by the presence or absence of a single such pulse. Many characteristic patterns are observed (Conners & Gutnick 1990) of which two common examples are shown in Figure 2.5. Part (a) shows a continuous stream of action-potential spikes while (b) shows Figure 2.5 Neural firing patterns. a pattern in which a series of pulses is followed by a quiescent period, with this sequence repeating itself indefinitely. A continuous stream as in (a) can be characterized by the frequency of occurrence of action potential in pulses per second and it is tempting to suppose that this is, in fact, the code being signalled by the neuron. This was convincingly demonstrated by Hartline (1934, 1940) for the optic neurons of the Horseshoe crab Limulus in which he showed that the rate of firing increased with the visual stimulus intensity. Although many neural codes are available (Bullock et al. 1977) the frequency code appears to be used in many instances. If f is the frequency of neural firing then we know that f is bounded below by zero and above by some maximum value fmax , which is governed by the duration of the interspike refractory period. There are now two ways we can code for f in our artificial neurons. First, we may simply extend the signal representation to a continuous range and directly represent f as our unit output. Such signals can certainly be handled at the input of the TLU, as we remarked in examining the 34
effects of signal degradation.However,the use of a step function at the output limits the signals to be binary so that,when TLUs are connected in networks(and they are working properly),there is no possibility of continuously graded signals occurring.This may be overcome by "softening"the step function to a continuous "squashing"function so that the output y depends smoothly on the activation a.One convenient form for this is the logistic sigmoid (or sometimes simply "sigmoid") shown in Figure 2.6. As a tends to large positive values the sigmoid tends to 1 but never actually reaches this value.Similarly it approaches-but never quite reaches-0 as a tends to large negative values.It is of no importance that the upper bound is notsince we can simply multiply the sigmoid's value byx if we wish to interprety as a real firing rate.The sigmoid is symmetric about the y-axis value of0.5; 1.0 0.5 a 6 Figure 2.6 Example of squashing functionthe sigmoid. the corresponding value of the activation may be thought of as a reinterpretation of the threshold and is denoted by 0.The sigmoid function is conventionally designated by the Greek lower case sigma,O,and finds mathematical expression according to the relation y=a(a)=+e-(a-0)/ (2.4) where e2.7183 is a mathematical constant,which,like 7C,has an infinite decimal expansion.The quantity P(Greek rho)determines the shape of the function,large values making the curve flatter while small values make the curve rise more steeply.In many texts,this parameter is omitted so that it is implicitly assigned the value 1.By making p progressively smaller we obtain functions that look ever closer to the hard-limiter used in the TLU so that the output function of the latter can 35
effects of signal degradation. However, the use of a step function at the output limits the signals to be binary so that, when TLUs are connected in networks (and they are working properly), there is no possibility of continuously graded signals occurring. This may be overcome by "softening" the step function to a continuous "squashing" function so that the output y depends smoothly on the activation a. One convenient form for this is the logistic sigmoid (or sometimes simply "sigmoid") shown in Figure 2.6. As a tends to large positive values the sigmoid tends to 1 but never actually reaches this value. Similarly it approaches—but never quite reaches—0 as a tends to large negative values. It is of no importance that the upper bound is not fmax , since we can simply multiply the sigmoid's value by fmax if we wish to interpret y as a real firing rate. The sigmoid is symmetric about the y-axis value of 0.5; Figure 2.6 Example of squashing function—the sigmoid. the corresponding value of the activation may be thought of as a reinterpretation of the threshold and is denoted by . The sigmoid function is conventionally designated by the Greek lower case sigma, , and finds mathematical expression according to the relation (2.4) where e 2.7183 is a mathematical constant 3 , which, like , has an infinite decimal expansion. The quantity (Greek rho) determines the shape of the function, large values making the curve flatter while small values make the curve rise more steeply. In many texts, this parameter is omitted so that it is implicitly assigned the value 1. By making progressively smaller we obtain functions that look ever closer to the hard-limiter used in the TLU so that the output function of the latter can 35
be thought of as a special case.The reference to 0 as a threshold then becomes more plausible as it takes on the role of the same parameter in the TLU. Artificial neurons or units that use the sigmoidal output relation are referred to as being of the semilinear type.The activation is still given by Equation(2.2)but now the output is given by (2.4).They form the bedrock of much work in neural nets since the smooth output function facilitates their mathematical description.The term "semilinear"comes from the fact that we may approximate the sigmoid by a continuous,piecewise-linear function,as shown in Figure 2.7.Over a significant region of interest,at intermediate values of the activation,the output function is a linear relation with non-zero slope. As an alternative to using continuous or analogue signal values,we may emulate the real neuron and encode a signal as the frequency of the occurrence of a "1"in a pulse stream as shown in Figure 2.8. Time is divided into discrete "slots"and each slot is filled with either a 0(no pulse)or a 1(pulse).The unit output is formed in exactly the same way as before but,instead of sending the value of the sigmoid function directly,we interpret it as the probability of emitting a pulse or "1".Processes that are governed by probabilistic laws are referred to as stochastic so that these nodes might be dubbed stochastic semilinear units,and they produce signals quite close in general 1.0 0.5 0 Figure 2.7 Piecewise-linear approximation of sigmoid. 口几几L几口 ::::::::::::: K N time slots Figure 2.8 Stream of output pulses from a stochastic node. 36
be thought of as a special case. The reference to as a threshold then becomes more plausible as it takes on the role of the same parameter in the TLU. Artificial neurons or units that use the sigmoidal output relation are referred to as being of the semilinear type. The activation is still given by Equation (2.2) but now the output is given by (2.4). They form the bedrock of much work in neural nets since the smooth output function facilitates their mathematical description. The term "semilinear" comes from the fact that we may approximate the sigmoid by a continuous, piecewise-linear function, as shown in Figure 2.7. Over a significant region of interest, at intermediate values of the activation, the output function is a linear relation with non-zero slope. As an alternative to using continuous or analogue signal values, we may emulate the real neuron and encode a signal as the frequency of the occurrence of a "1" in a pulse stream as shown in Figure 2.8. Time is divided into discrete "slots" and each slot is filled with either a 0 (no pulse) or a 1 (pulse). The unit output is formed in exactly the same way as before but, instead of sending the value of the sigmoid function directly, we interpret it as the probability of emitting a pulse or "1". Processes that are governed by probabilistic laws are referred to as stochastic so that these nodes might be dubbed stochastic semilinear units, and they produce signals quite close in general Figure 2.7 Piecewise-linear approximation of sigmoid. Figure 2.8 Stream of output pulses from a stochastic node. 36
appearance to those of real neurons.How are units downstream that receive these signals supposed to interpret their inputs?They must now integrate over some number,N,of time slots.Thus,suppose that the afferent node is generating pulses with probability y.The expected value of the number of pulses over this time is yN but,in general,the number actually produced,N,will not necessarily be equal to this.The best estimate a node receiving these signals can make is the fraction, N/N,of Is during its integration time.The situation is like that in a coin tossing experiment.The underlying probability of obtaining a "head"is 0.5,but in any particular sequence of tosses the number of heads Nh is not necessarily one-half of the total.As the number N of tosses increases,however,the fraction N/N will eventually approach 0.5. 37
appearance to those of real neurons. How are units downstream that receive these signals supposed to interpret their inputs? They must now integrate over some number, N, of time slots. Thus, suppose that the afferent node is generating pulses with probability y. The expected value of the number of pulses over this time is yN but, in general, the number actually produced, N1 , will not necessarily be equal to this. The best estimate a node receiving these signals can make is the fraction, N1 /N, of 1s during its integration time. The situation is like that in a coin tossing experiment. The underlying probability of obtaining a "head" is 0.5, but in any particular sequence of tosses the number of heads Nh is not necessarily one-half of the total. As the number N of tosses increases, however, the fraction Nh /N will eventually approach 0.5. 37