0网e 194 DESIMONE DUNCAN NPX N N P NEURAL MECHANISMS OF · SELECTIVE VISUAL E E 0 ATTENTION in human visice Robert Desimonel and John Duncan? 8a6ppaw以 NohoyomBethesda Taken together,such results suggest the following general model (Broadbent KEY WORDS:vision.cortex,primaes.visual search.neglect 1958:Neisser 1967:Treisman 1960,1993).At some point (or several points) between input and response,objects in the visual input compete for represen- tation analy vsis,or control.The competition is biased,however,towards in- formation that is currently relevant to behavior.Attended stimuli make demands on processing capacity.while unattended ones often do not. INTRODUCTION In the follow ing sections,we first outline the major behavioral characteris- tics of competition and consider the limitations within the nervous system that make com etition necessary.We then describe selectivity,or how the co The two basic phenomena that define the problem of visual attention can be of tition may h esolved at hoth the e be s shown in each avioral and ne eural vel To some ext nt builds models of biased co ion by Walley Weiden In ake differs e0【o repor ters appe 01 color(targets ere b (1973 d Harte (1984).The appr from 0f hich a on f 0 ar and the su hancing the pro sing(and perhaps bin opportunity for eye movements,would give their report.The display mimic illuminated item.Instead,the model we develop is that attention is an cmergent our usual cluttered visual environment:It contains one or more objects that property of many neural mechanisms working to resolve competition for visual are relevant to current behavior,along with others that are irrelevant. processing and control of behavior. The first basic phenomenon is limited capacity for processing information. At any given time,only a small amount of the information available on the COMPETITION retina can be processed and used in the control of behavior.Subjectively.giving attention to any one target leaves less available for others.In Figure 1,the Behavioral Data probability of reporting the target letter N is much lower with two accompa- nying targets(Figure 1a)than with none (Figure 1b). In one simple type of experiment,two objects are presented in the visual field. The second basic phenomenon is selectivity. Subjects must identify some property of both objects,with a separate response -the ability to filter out un wanted information.Subjectively,one is aware of attended stimuli and largely for each.Such studies reveal several important facts.First,dividing attention unaware of unattended ones.cor espondingly,accuracy in identifying an between two objects almost always results in poorer performance than focusing attended stimulus may be independent of the r number of nontargets in a display attention on one.Identifying simple properties of each object such as size (Figure la vs lc)(see Bundesen 1990.Duncan 1980). brightness,orientation,or spatial position gives much the same result as iden- tifving more complex properties such as shape (see Duncan 1984.1985,1993)
Annu. Rev. Neurosci. 1995. 18:193-222 Copyright © 1995 by Annual Reviews Inc. All rights reserved NEURAL MECHANISMS OF SELECTIVE VISUAL ATTENTION Robert Desimone 1 and John Duncan 2 1Laboratory of Neuro~sychology, NIMH, Building 49, Room 1B80, Bethesda, Maryland 20892 and MRC Applied Psychology Unit, 15 Chaucer Road, Cambridge CB2 2EF, England KEY WORDS: vision, cortex, primates, visual search, neglect INTRODUCTION The two basic phenomena that define the problem of visual attention can be illustrated in a simple example. Consider the arrays shown in each panel of Figure 1. In a typical experiment, before the arrays were presented, subjects would be asked to report letters appearing in one color (targets, here black letters), and to disregard letters in the other color (nontargets, here white letters). The array would then be briefly flashed, and the subjects, without any opportunity for eye movements, would give their report. The display mimics our. usual cluttered visual environment: It contains one or more objects that are relevant to current behavior, along with others that are irrelevant. The first basic phenomenon islimited capacity for processing information. At any given time, only a small amount of the information available on the retina can be processed and used in the control of behavior. Subjectively, giving attention to any one target leaves less available for others. In Figure 1, the probability of reporting the target letter N is much lower with two accompanying targets (Figure la) than with none (Figure lb). The second basic phenomenon isselectivity--the ability to filter out unwanted information. Subjectively, one is aware of attended stimuli and largely unaware of unattended ones. Correspondingly, accuracy in identifying an attended stimulus may be independent of the number of nontargets in a display (Figure lavs lc) (see Bundesen 1990, Duncan 1980). 193 0147-006X/95/0193505.00 www.annualreviews.org/aronline Annual Reviews Annu. Rev. Neurosci. 1995.18:193-222. Downloaded from arjournals.annualreviews.org by University of California - San Diego on 01/05/07. For personal use only. 194 DESIMONE & DUNCAN a) p b) c) N N D N P @ D Figure 1 Displays demonstrating limited processing capacity and selectivity in human vision. Subjects are shown the displays briefly and asked to report only the black letters. Limited capacity is shown by reduced accuracy as the number oftargets is increased (compare b and a). Selectivity is shown by negligible impact of nontargets (compare a and c). Taken together, such results suggest the following general model (Broadbent 1958; Neisser 1967; Treisman 1960, 1993). At some point (or several points) between input and response, objects in the visual input compete for representation, analysis, or control. The competition is biased, however, towards information that is currently relevant to behavior. Attended stimuli make demands on processing capacity, while unattended ones often do not. In the following sections, we first outline the major behavioral characteristics of competition and consider the limitations within the nervous system that make competition necessary. We then describe selectivity, or how the competition may be resolved, at both the behavioral and neural level. To some extent, our account builds on early models of biased competition by Walley & Weiden (1973) and Harter & Aine (1984). The approach we take differs from standard view of attention, in which attention functions as a mental spotlight enhancing the processing (and perhaps binding together the features) of the illuminated item. Instead, the model we develop is that attention is an emergent property of many neural mechanisms working to resolve competition fo~ visual processing and control of behavior. COMPETITION Behavioral Data In one simple type of experiment, two objects are presented in the visual field. Subjects must identify some property of both objects, with a separate response for each. Such studies reveal several important facts. First, dividing attention between two objects almost always results in poorer performance than focusing attention on one. Identifying simple properties of each object such as size, brightness, orientation, or spatial position gives much the same result as identifying more complex properties such as shape (see Duncan 1984, 1985, 1993). www.annualreviews.org/aronline Annual Reviews Annu. Rev. Neurosci. 1995.18:193-222. Downloaded from arjournals.annualreviews.org by University of California - San Diego on 01/05/07. For personal use only
VISUAL ATTENTION 195 196 DESIMONE DUNCAN et al 1992) ond,as long a uses brief stimulus exposures and me sures the accuracy of tion,the major perfo ance lim appears to occur at stimulus input rather than subsequent short-term storage and response.For example,interference from processing two objects is abo ished if they are shown one after the other,with an interval of perhaps a second between them(Duncan 1980),even though the two responses called for must still be remembered and made together at the end of the trial. Third,interference is independent of eye movements.Even though gaze is always maintained at fixation,it is easier to identify one object in the periphery than two Fourth,interference is largely independent of the spatial separation between two objects,at least when the field is otherwise empty (Sagi Julesz 1985, Vecera Farah 1994).Though attention is sometimes seen as a mental spot (TE light lluminating or selectin ion of vi per not d on the ab ion of ween-object competi on sterior parie been rgued that ful sis of every ob ject in a scene ould be impossibly complex(Broadbent 1958.Tsotsos199).Competition have fughe ctions of the two refects a limit on visual i ntification capacity.Equally strong.however,has been the view that competition concerns control of response systems(Allport 1980.Deutsch Deutsch 1963).Certainly.some response activation often occurs from objects a person has been told to ignore (Eriksen Eriksen 1974). which shows that unwanted information is not entirely filtered out in early important for spatial perception and visuomotor performance(Ungerleider vision.Very probably,competition between objects occurs at multiple levels Haxby 1994,Ungerleider Mishkin 1982).Since competition impacts object between sensory input and motor output (Allport 1993). recoenition would expect to find one basis for it in the ventral stream tral al subr Neural Basis for Competition The cific ar ea V2(thin are TEO and TE in the nfe If the nervous system had unlimited capacity to process information in parallel eds fr necessary only at final oud pres (T pro xt along this path nway,neuronal properties change in tw bvio Befor ses.For examplc,where what lim many V c fanction eetially oca paotem p the ulfield compete for V2 neurons may respond to virtual or illusory contours in certain figures (von ng within der Heydt et al 1984),and IT neurons respond sele vely to global or ove visual areas(Des object features such as shape (Desimone et al 1984.Schwartz et al 1983 Tanaka et al 1991).Second,the receptive field size of individua neurons corical processing pathways.or streams,each of which begins with the pri- increases at each stage.As one moves from VI to V4 to TEO to TE,typical mary visual cortex,or VI (see Figure 2).The first.a ventral stream,is directed receptive fields in the central field representation are on the order of 0.2.3.6. nto the inferior temporal cortex and is important for object recognition,while and 25 in size,respectively (see Boussaoud et al 1991.Ungerleider the other,a dorsal stream.is directed into the posterior parietal cortex and is Desimone 1989).I.arge recentive fields mav contribute towards the recognition
VISUAL ATTENTION 195 A possible exception is simple detection of simultaneous energy onsets or offsets (Bonnel et al 1992). Second, as long as the experiment uses brief stimulus exposures and measures the accuracy of stimulus identification, the major performance limitation appears to occur at stimulus input rather than subsequent short-term storage and response. For example, interference from processing two objects is abolished if they are shown one after the other, with an interval of perhaps a second between them (Duncan 1980), even though the two responses called for must still be remembered and made together at the end of the trial. Third, interference is independent of eye movements. Even though gaze is always maintained at fixation, it is easier to identify one object in the periphery than two. Fourth, interference is largely independent of the spatial separation between two objects, at least when the field is otherwise empty (Sagi & Julesz 1985, Vecera & Farah 1994). Though attention is sometimes seen as a mental spotlight illuminating or selecting information from a restricted region of visual space (Eriksen & Hoffman 1973, Posner et al 1980), performance seems not to depend on the absolute spatial distribution of information. An enduring issue is the underlying reason for between-object competition. It has often been argued that full visual analysis of every object in a scene would be impossibly complex (Broadbent 1958, Tsotsos 1990). Competition reflects a limit on visual identification capacity. Equally strong, however, has been the view that competition concerns control of response systems (Allport 1980, Deutsch & Deutsch 1963). Certainly, some response activation often occurs from objects a person has been told to ignore (Eriksen & Eriksen 1974), which shows that unwanted information is not entirely filtered out in early vision. Very probably, competition between objects occurs at multiple levels between sensory input and motor output (Allport 1993). Neural Basis for Competition If the nervous system had unlimited capacity to process information in parallel throughout the visual field, competition between objects would presumably be necessary only at final motor output stages. Before discussing these motor stages, we first consider what limitations in the visual system make competition necessary at the input. Objects in the visual field compete for processing within a network of 30 or more cortical visual areas (Desimone & Ungerleider 1989, Felleman & Van Essen 1991). These areas appear to be organized within two major corticocortical processing pathways, or streams, each of which begins with the primary visual cortex, or V1 (see Figure 2). The first, a ventral stream, is directed into the inferior temporal cortex and is important for object recognition, while the other, a dorsal stream, is directed into the posterior parietal cortex and is www.annualreviews.org/aronline Annual Reviews Annu. Rev. Neurosci. 1995.18:193-222. Downloaded from arjournals.annualreviews.org by University of California - San Diego on 01/05/07. For personal use only. 196 DESIMONE & DUNCAN Figure 2 Striate cortex, or V1, is the source of two conical visual streams. A dorsal stream is directed into the posterior parietal cortex and underlies spatial perception and visuomotor performance. A ventral stream is directed into the inferior temporal cortex and underlies object recognition. Both streams have further projections into prefrontal cortex. Adapted from Mishkin et al (1983) and Wilson et al (1993). For a "wiring diagram" of the areas and connections of the two streams, see Desimone & Ungerleider (1989) and Felleman & Van Essen (1991). important for spatial perception and visuomotor performance (Ungerleider Haxby 1994, Ungerleider & Mishkin 1982). Since competition impacts object recognition, we would expect to find one basis for it in the ventral stream. The ventral stream includes specific anatomical subregions of area V2 (thin and interstripe regions), area V4, and areas TEO and TE in the inferior temporal (IT) cortex (see Desimone & Ungerleider 1989). As one proceeds from area to the next along this pathway, neuronal properties change in two obvious ways. First, the complexity of visual processing increases. For example, whereas many V1 cells function essentially as local spatiotemporal energy filters, V2 neurons may respond to virtual or illusory contours in certain figures (von der Hey& et al 1984), and IT neurons respond selectively to global or overall object features, such as shape (Desimone et al 1984, Schwartz et al 1983, Tanaka et al 1991). Second, the receptive field size of individual neurons increases at each stage. As one moves from V1 to V4 to TEO to TE, typical receptive fields in the central field representation are on the order of 0.2, 3, 6, and 25° in size, respectively (see Boussaoud et al 1991, Ungerleider Desimone 1989). Large receptive fields may contribute towards the recognition www.annualreviews.org/aronline Annual Reviews Annu. Rev. Neurosci. 1995.18:193-222. Downloaded from arjournals.annualreviews.org by University of California - San Diego on 01/05/07. For personal use only
VISUAL ATTENTION 197 198 DESIMONE DUNCAN of objects over retinal translation (Gross&Mishkin 1977.Lueschow et al the competition in their favor.This issue,which we term selectivity,is con- 1994. sidered in later sections. These receptive fieldscn be viewed asacritical visual proce ing re source. If the dorsal stream receives its visual input in parallel to the ventral stream for which objects in the visual field must compete(Desimone 1992.Olshausen as the anatomy suggests (Desimone Ungerleider 1989),then it is presumably et al 1993.Tsotsos 1990).If one were to add ever more independent objects faced with competition among objects as well.As in IT cortex,receptive fields to a V4 or IT receptive field,the information available about any one of them in posterior parietal cortex are very large,and it seems likely that increasing would certainly decrease.If,for example,a color-sensitive IT neuron were to the number of independent objects in the visual field will ev integrate wavelength over its large receptive field,one might not be able to the c parietal cor to ext act the locations of eah of them in tell from that cell alone if a given level of response was due to,say,one red object or two yellow ones or three green ones at different locations in the field. etition.to the must also deal nt th dist y filter of the Such ambiguity may be responsible for the interference effects found in divided input (e.g. &Wurtz 1993a,b.U1 fo attention sible eyes to only arget at a time.Acritic e is how may be reduced.in part by linking objects and their features selectivity is coord different systems so that the same target retinal o ed that object is selected for perceptual and spatial analysis as well as for motor tfro om the entra stbe supp cd by th control. e In fact,the ven about the location of complex object featur es.V and TEO eurons process relatively sophisticated information about SELECTIVITY:SCREENING OUT UNWANTED STIMULI object shape (Desimone Schein 1987,Gallant et al 1993.Tanaka et al 1991)and have retinotopically organized receptive fields(Boussaoud et al 1991.Gattass et al 1988).At any Behavioral Data given retinotopic locus in these areas,receptive fields show considerable The ability to screen out irrelevant objects(Figure 1)is not absolute.It is easy scatter.One could.in principle.derive information about the relative locations in some cases and difficult in others.as is well illustrated in visual search.The of nearby features from a population of cells with partially overlapping fields subject detects or identifies a single target presented in an array of nontargets the same way one could derive information about a specific color from a Examples are shown in Figure 3.In easy cases,the target appears to"pop out" population of neurons with broad but different color tuning.Similarly,although of the array,as if attention were drawn directly to it (Donderi Zelnicker receptive fields in IT cortex may span 20-30 degrees or more,they are not 1969,Treisman Gelade 1980).Under such circumstances,the number of ogeneous Typically.the fields have a hol spot nontargets has little effect on the speed or accuracy of target detection or may extend asyr etrically into the or lo wer contralateral visual field. identification.In hard cases,how are not filtered out well.In Alth gh the ces o of IT n the sam these instand number of nta edisplay has a large effect on ions,for ge min ab for eac anges signi i.e cells are t Gelade 1980) hough in fact me w y are tuned to oth er obj ect features (Desi one et a ■ b) Thus,in principle,objects and their locations might be linked to some exten Q within the ventral stream.Even so,parallel processing across the visual field 0 is likely to be limited To sum up.retinal location,as with other object features,is coarsely coded in the ventral stream.Information about more than one object may,to some Q ■ X extent,be processed in parallel,but the information available about any given object will decline as more and more objects are added to receptive fields n the target is a mis atching Therefore,objects must compete for processing in the ventral stream,and the visual svstem should use anv information it has about relevant ohiects to hias
VISUAL ATTENTION 197 of objects over retinal translation (Gross & Mishkin 1977, Lueschow et al 1994). These receptive fields can be viewed as a critical visual processing resource, for which objects in the visual field must compete (Desimone 1992, Olshausen et al 1993, Tsotsos 1990). If one were to add ever more independent objects to a V4 or IT receptive field, the information available about any one of them would certainly decrease. If, for example, a color-sensitive IT neuron were to integrate wavelength over its large receptive field, one might not be able to tell from that cell alone if a given level of response was due to, say, one red object or two yellow ones or three green ones at different locations in the field. Such ambiguity may be responsible for the interference effects found in divided attention. This ambiguity may be reduced, in part, by linking objects and their features to retinal locations. It is sometimes presumed that location information is absent from the ventral "what" stream altogether and must be supplied by the dorsal "where" stream. In fact, the ventral stream itself contains information about the retinal location of complex object features. V4 and TEO neurons process relatively sophisticated information about object shape (Desimone Schein 1987, Gallant et al 1993, Tanaka et al 1991) and have retinotopically organized receptive fields (Boussaoud et al 1991, Gattass et al 1988). At any given retinotopic locus in these areas, receptive fields show considerable scatter. One could, in principle, derive information about the relative locations of nearby features from a population of cells with partially overlapping fields the same way one could derive information about a specific color from a population of neurons with broad but different color tuning. Similarly, although receptive fields in IT cortex may span 20-30 degrees or more, they are not homogeneous. Typically, the fields have a hot spot at the center of gaze, which may extend asymmetrically into the upper or lower contralateral visual field. Although the stimulus preferences of IT neurons remain the same over large retinal regions, for a large minority of cells the absolute response to a given stimulus changes significantly with retinal location, i.e. cells are tuned to retinal location the same way they are tuned to other object features (Desimone et al 1984, Lueschow et al 1994, Schwartz et al 1983; also see Chelazzi et al 1993a). Thus, in principle, objects and their locations might be linked to some extent within the ventral stream. Even so, parallel processing across the visual field is likely to be limited. To sum up, retinal location, as with other object features, is coarsely coded in the ventral stream. Information about more than one object may, to some extent, be processed in parallel, but the information available about any given object will decline as more and more objects are added to receptive fields. Therefore, objects must compete for processing in the ventral stream, and the visual system should use any information it has about relevant objects to bias www.annualreviews.org/aronline Annual Reviews Annu. Rev. Neurosci. 1995.18:193-222. Downloaded from arjournals.annualreviews.org by University of California - San Diego on 01/05/07. For personal use only. 198 DESIMONE & DUNCAN the competition in their favor. This issue~ which we term selectivity, is considered in later sections. If the dorsal stream receives its visual input in parallel to the ventral stream as the anatomy suggests (Desimone & Ungerleider 1989), then it is presumably faced with competition among objects as well. As in IT cortex, receptive fields in posterior parietal cortex are very large, and it seems likely that increasing the number of independent objects in the visual field will eventually exceed the capacity of parietal cortex to extract the locations of each of them in parallel. Likewise, neural systems for visuomotor control must also deal with competition, to the extent that distractors are not already filtered out of the visual input (e.g. Munoz & Wurtz 1993a,b). Ultimately, for example, it possible to move the eyes to only one target at a time. A critical issue is how selectivity is coordinated across the different systems so that the same target object is selected for perceptual and spatial analysis as well as for motor control. SELECTIVITY: SCREENING OUT UNWANTED STIMULI Behavioral Data The ability to screen out irrelevant objects (Figure 1) is not absolute. It is easy in some cases and difficult in others, as is well illustrated in visual search. The subject detects or identifies a single target presented in an array of nontargets. Examples are shown in Figure 3. In easy cases, the target appears to "pop out" of the array, as if attention were drawn directly to it (Donderi & Zelnicker 1969, Treisman & Gelade 1980). Under such circumstances, the number of nontargets has little effect on the speed or accuracy of target detection or identification. In hard cases, however, nontargets are not filtered out well. In these instances, the number of nontargets in the display has a large effect on performance. An increase of 50 ms in target detection time for each nontarget added to the array is typical (Treisman & Gelade 1980), though in fact, this ¯ ¯ P [] [] C ¯ X 3 J Figure 3Selectivity in visual search. Target pop-out is revealed when the target is a mismatching element in an otherwise homogeneous field (panel a). Search is also extremely easy, however, whenever targets and nontargets are highly discriminable. Pop-out can also be based on more complex properties (panel b; search for the single digit). www.annualreviews.org/aronline Annual Reviews Annu. Rev. Neurosci. 1995.18:193-222. Downloaded from arjournals.annualreviews.org by University of California - San Diego on 01/05/07. For personal use only
VISUAL ATTENTION 199 200 DESIMONE DUNCAN figure varies widely and continuously from one task to another (Treisman ican 1988) According to the biased competition model,targets and nontargets comp city in visu sea ch.One fa ing mple,t find a uniquc et in a Figure 3a),per ring com 1984).There ma be similar biases towards sudden appearances of new objects in the visual field (Jonides Yantis 1988)and towards obiects that are larger,brighter,faster- moving,etc (Treisman Gormican 1988). K 5 0 An attentional system,however,would be of little use if it were entirely dominated by bottom-up biases.What is needed is a way to bias competition L 8 towards whatever information is relevant to current behavior that is on needs too-down control in addition to bottom-uD.stimulus-driven biases.Cor gly,there are m ny cases of easy search that do no depend on local arget onset. A colored et in s ned.(c d)No hias.Iis 989 At least after a pop-out if th e p nng search pt a 72 Schneider Shiffrin 1977). of working memory(Baddeley 1986).The template can specify any property vn control,the ability to find 对月 Even when target selection is guided by top-do of required input-shape,color,location,ete. targets is still dependent on bottom-up stimulus factors.especially the visual Visual search is easy if targets and nontargets are easily discriminable.In similarity of targets to nontargets.Provided that targets and nontargets are this case,nontargets are poor matches to the attentional template and receive sufficiently different.however.easy search can be based on many different a weak competitive bias.Thus,the time it takes to find the target may be visual attributes,including simple features.such as size or color.and more independent of the number of nontargets in the display.By contrast,search is complex coniunctions of these features (Duncan Humphreys 1989.Mcleod difficult if nontargets are similar to the target.In this case,the competitive et al 1988 Wolfe ct al 1989).Coniunction search provides a good example of advantage of the target is reduced because each nontarget shares in the bias the importance of similarity.In Figures 4a and b.the target is a large,white provided by the attentional template.Thus,each nontarg added to the display vertical bar.This target is much harder to find in Figure 4a,where each I-search accounts are consid. nontarget shares two oroperties with the target,than in Figure 46,where or red be 51987).lndc ed the l atter ca cifically with spatial selection,i.e.sele can e xcellen scTiminabiyieachconc be produ ed n' mply by 197 Pos 80.Sperling 1960). al often special case.We do not reviev in detail t wa uch results suggest the following model of biased covered earlier by Posner Petersen (1990),and Colby (1991)has reviewe to the task,any kind of input-objects of a certain kind,objects with a certain the ncural mechanisms of spatial selection.Certainly,however,space is only color or motion,objects in a certain location,etc-can be behaviorally relevant one of the many cues that can be used in efficient target selection.A general Some kind of short-term description of the information currently nceded must account of selectivity must deal with both spatial and nonspatial cases.In terms be used to control competitive bias in the visual system.such that inputs of the biased competition model.prior knowledge of the target's spatial loca- matching that description are favored in the visual cortex (Bundesen 1990. tion is just another type of attentional template that can be used to bias Duncan Humphreys 1989).This short-term description has been called the competition in favor of the target attentional template (Duncan Humnhrevs 1989):it mav he seen as one asnect A final consideration is hias deriv ed from Iono-term memory One interest-
VISUAL ATTENTION 199 figure varies widely and continuously from one task to another (Treisman Gormican 1988). According to the biased competition model, targets and nontargets compete for processing capacity in visual search. One factor influencing selectivity is bottom-up bias. It is very easy, for example, to find a unique target in an array of homogeneous nontargets (Figure 3a), perhaps reflecting an enduring competitive bias towards local inhomogeneities (Sagi & Julesz 1984). There may be similar biases towards sudden appearances of new objects in the visual field (Jonides & Yantis 1988) and towards objects that are larger, brighter, fastermoving, etc (Treisman & Gormican 1988). An attentional system, however, would be of little use if it were entirely dominated by bottom-up biases. What is needed is a way to bias competition towards whatever information is relevant to current behavior. That is, one needs top-down control in addition to bottom-up, stimulus-driven biases. Correspondingly, there are many cases of easy search that do not depend on local inhomogeneity or sudden target onset. A colored target in a multicolored display, for example, may show good pop-out if the colors are highly discriminable (Duncan 1989). At least after a little practice, pop-out can be obtained during search for a single digit among letters (Figure 3b) (see Egeth et al 1972, Schneider & Shiffrin 1977). Even when target selection is guided by top-down control, the ability to find targets is still dependent on bottom-up stimulus factors, especially the visual similarity of targets to nontargets. Provided that targets and nontargets are sufficiently different, however, easy search can be based on many different visual attributes, including simple features, such as size or color, and more complex conjunctions of these features (Duncan & Humphreys 1989, McLeod et al 1988, Wolfe et al 1989). Conjunction search provides a good example of the importance of similarity. In Figures 4a and b, the target is a large, white vertical bar. This target is much harder to find in Figure 4a, where each nontarget shares two properties with the target, than in Figure 4b, where only one property is shared (Quinlan & Humphreys 1987). Indeed, the latter case can give excellent pop-out; a similar result can be produced simply by increasing the discriminability of each conjunction’s component features (Wolfe et al 1989). Such results suggest the following model of biased competition. According to the task, any kind of input---objects of a certain kind, objects with a certain color or motion, objects in a certain location, etc-~can be behaviorally relevant. Some kind of short-term description of the information currently needed must be used to control competitive bias in the visual system, such that inputs matching that description are favored in the visual cortex (Bundesen 1990, Duncan & Humphreys 1989). This short-term description has been called the attentional template (Duncan & Humphreys 1989); it may be seen as one aspect www.annualreviews.org/aronline Annual Reviews Annu. Rev. Neurosci. 1995.18:193-222. Downloaded from arjournals.annualreviews.org by University of California - San Diego on 01/05/07. For personal use only. 200 DESIMONE & DUNCAN c) Y d) h K Q £) L V g H P F Figure 4(a, b) Discriminability between targets and nontargets in conjunction search. Searching for a large, white vertical bar is harder when nontargets share two (panel a) rather than one (panel b) property with the target. In the latter case good pop-out can be obtained. (c, at) Novelty bias. It easier to find a single inverted letter among upright nontargets (panel c) than the reverse (panel d). of working memory (Baddeley 1986). The template can specify any property of required input--shape, color, location, etc. Visual search is easy if targets and nontargets are easily discriminable. In this case, nontargets are poor matches to the attentional template and receive a weak competitive bias. Thus, the time it takes to find the target may be independent of the number of nontargets in the display. By contrast, search is difficult if nontargets are similar to the target. In this case, the competitive advantage of the target is reduced because each nontarget shares in the bias provided by the attentional template. Thus, each nontarget added to the display interferes with target detection. Alternative, serial-search accounts are considered below. A great deal of work has dealt specifically with spatial selection, i.e. selection based on some cue to the location of target information (Eriksen Hoffman 1973, Posner et al 1980, Sperling 1960). Indeed, spatial selection is often dealt with as a special case. We do not review this work in detail; it was covered earlier by Posner & Petersen (1990), and Colby (1991) has reviewed the neural mechanisms of spatial selection. Certainly, however, space is only one of the many cues that can be used in efficient target selection. A general account of selectivity must deal with both spatial and nonspatial cases. In terms of the biased competition model, prior knowledge of the target’ s spatial location is just another type of attentional template that can be used to bias competition in favor of the target. A final consideration is bias derived from long-term memory. One interestwww.annualreviews.org/aronline Annual Reviews Annu. Rev. Neurosci. 1995.18:193-222. Downloaded from arjournals.annualreviews.org by University of California - San Diego on 01/05/07. For personal use only
VISUAL ATTENTION 201 202 DESIMONE DUNCAN ing case is bias to novelty.As shown in Figures 4c and d,for example,it is while monkeys performed delayed matching-to-sample(DMS)tasks with much easier to find an inverted (novel)target among upright (familiar)non- either novel or familiar stimuli.In DMS.a sample stimulus is followed by targets (Figure 4c)than the reverse (Figure 4d)(Reicher et al 1976).In fact. one or more test stimuli,and the animal signals when a test stimulus matches the time it takes to find an inverted character may be independent of the number the sample.For up to a third of the cells in this region,responses to novel of upright ones in a display (Wang et al 1992).which implies that multiple sample stimuli become suppressed as the animal acquires familiarity with objects have parallel access to memory and that familiarity is a type of object them (Fahy et al 1993,Li et al 1993,Miller et al 1991.Riches et al 1991). feature that can be used to bias attentional competition.A second consideration The cells are not novelty detectors.in that they do not respond to any nove ned imp tance.In a busy om,attention can be attracted by stimulus.Rather,they remain stimulus selective both before and after the b 1959).Similarly visual e with rd t n the In fact this shrinkage in the population of activated neurons as stimuli eider 1977).Thus,the top- fam the down selection bias of a current task can sometimes be overturned by infor for those stim ams the ures hew stim mation of long-term or general significance acting in a bottom-up fashion.In fashion drop out of the next sections we consider both bottom-up and top-down mechanisms for et al 1993).leaving those that are most selective.There is also direct evidence resolving competition. that some IT cells selective for faces become more tuned to a familiar face following expcrience (Rolls et al 1989). Bottom-Up Neural Mechanisms for Object Selection An effect akin to the novelty effect is also found for familiar stimuli that The first neural mechanisms for resolving competition we consider are those have been seen recently.When a test stimulus matches the previously seen that derive from the intrinsic or learned biases of the perceptual systems sample in the DMS trial,responses to that stimulus tend to be suppressed towards certain types of stimuli.We describe them bere as bottom-up pro (Miller et al 1991,1993:also see Baylis Rolls 1987,Eskandar et al 1992. cesses,not because they do not involve feedback pathways in visual Fahy et al 1993.Riches ct al 1991).Although it was originally roposed that (hey may well do so)but be se th utomatic processes this suppressive effect was dependent on that e hey appede sample,recent work has shown it to be an automatic outcome of any stimulus uli that stand out f the e pro sed preferentially a repetition (Miller&Desimone )For many cells.this uppression occurs near of the of ma en if the muli differ in size or etinal locations otherwis re ppea optimal stim y c receptiv al 1994).Thus,the detec ion ovelty and ency apparently may b letely are within a l arge sur at a high ev of s repres rounding region (for reviews see Allman et al 1985,Desimone et al 1985). together, at bo The greater the density of stimuli in the surround,the greater the suppression ve no been recently seen vill have a larger neural (Knierim Van Essen 1992).In the middle temporal area (MT),for example, giving them a competitive a vantage in ga ing con a cell that normally responds to vertically moving stimuli within its receptive orienting systems.This would explain the bi as towards novelty in the H feld may be unresponsive if the same stimuli are part of a larger moving behavioral data described above.The longer the organism attends to the obje pattern coverine the receptive field and surround (allman et al 1985 tanaka the more knowledge about the object is incorporated into the structure of the et al 1986).These mechanisms almost certainly contribute to the pop-out cortex;this reduces the visual signal.It will also reduce the drive on the effects of targets in visual search orienting system so that the organism is free to orient to the next new object As indicated above,the visual system also seems to be biased towards new (Li et al 1993,Desimone et al 1994).This view is compatible with Adaptive objects or objects that have not h n.Thus the t Resonance Theory (Carpenter Grossberg 1987),in which novel stimuli of a stimulu s m activate attentional systems that allow new lone-term memories to he formed. he e stim li n as the Consistent with these neurophysiological results in animals.a reduction in nd,or cor which t neural activation with stimulus epetition in human subjects has been seen in King exam cn ter both event-related p ve ocen entials of the temporal cortex(Begleiter et al 1993)and in brain-im oing studies (Squire et al 1992)
VISUAL ATTENTION 201 ing case is bias to novelty. As shown in Figures 4c and d, for example, it is much easier to find an inverted (novel) target among upright (familiar) targets (Figure 4c) than the reverse (Figure 4d) (Reicher et al 1976). In the time it takes to find an inverted character may be independent of the number of upright ones in a display (Wang et al 1992), which implies that multiple objects have parallel access to memory and that familiarity is a type of object feature that can be used to bias attentional competition. A second consideration is long-term learned importance. In a busy room, attention can be attracted by the sound of one’s own name spoken nearby (Moray 1959). Similarly, long practice with one set of visual targets makes them hard to ignore when they are subsequently made irrelevant (Shiffrin & Schneider 1977). Thus, the topdown selection bias of a current task can sometimes be overturned by information of long-term or general significance acting in a bottom-up fashion. In the next sections we consider both bottom-up and top-down mechanisms for resolving competition. Bottom-Up Neural Mechanisms for Object Selection The first neural mechanisms for resolving competition we consider are those that derive from the intrinsic or learned biases of the perceptual systems towards certain types of stimuli. We describe them here as bottom-up processes, not because they do not involve feedback pathways in visual cortex (they may well do so) but because they appear to be largely automatic processes that are not dependent on cognition or task demands. Stimuli that stand out from their background are processed preferentially at nearly all levels of the visual system. In visual cortex, the responses of many cells to an otherwise optimal stimulus within their classically defined receptive field may be completely suppressed if similar stimuli are within a large surrounding region (for reviews see Allman et al 1985, Desimone et al 1985). The greater the density of stimuli in the surround, the greater the suppression (Knierim & Van Essen 1992). In the middle temporal area (MT), for example, a cell that normally responds to vertically moving stimuli within its receptive field may be unresponsive if the same stimuli are part of a larger moving pattern covering the receptive field and surround (Allman et al 1985, Tanaka et al 1986). These mechanisms almost certainly contribute to the pop-out effects of targets in visual search. As indicated above, the visual system also seems to be biased towards new objects or objects that have not been recently seen. Thus, the temporal context of a stimulus may contribute as much to its saliency as its spatial context. In the temporal domain, stimuli stored in memory may function as the temporal surround, or context, against which the present stimulus is compared. Striking examples of such temporal interactions have been found in the anteroventral portion of IT cortex. Most studies in this region recorded cells www.annualreviews.org/aronline Annual Reviews Annu. Rev. Neurosci. 1995.18:193-222. Downloaded from arjournals.annualreviews.org by University of California - San Diego on 01/05/07. For personal use only. 202 DESIMONE & DUNCAN while monkeys performed delayed matching-to-sample (DMS) tasks with either novel or familiar stimuli. In DMS, a sample stimulus is followed by one or more test stimuli, and the animal signals when a test stimulus matches the sample. For up to a third of the cells in this region, responses to novel sample stimuli become suppressed as the animal acquires familiarity with them (Fahy et al 1993, Li et al 1993, Miller et al 1991, Riches et al 1991). The cells are not novelty detectors, in that they do not respond to any novel stimulus. Rather, they remain stimulus selective both before and after the visual experience. In fact, this shrinkage in the population of activated neurons as stimuli become familiar may increase the selectivity of the overall neuronal population for those stimuli. As one learns the critical features of a new stimulus, cells activated in a nonspecific fashion drop out of the activated pool of cells (Li et al 1993), leaving those that are most selective. There is also direct evidence that some IT cells selective for faces become more tuned to a familiar face following experience (Rolls et al 1989). An effect akin to the novelty effect is also found for familiar stimuli that have been seen recently. When a test stimulus matches the previously seen sample in the DMS trial, responses to that stimulus tend to be suppressed (Miller et al 1991, 1993; also see Baylis & Rolls 1987, Eskandar et al 1992, Fahy et al 1993, Riches et al 1991). Although it was originally proposed that this suppressive effect was dependent on active working memory for the sample, recent work has shown it to be an automatic outcome of any stimulus repetition (Miller & Desimone 1994). For many cells, this suppression occurs even if the repeated stimuli differ in size or appear in different retinal locations (Lueschow et al 1994). Thus, the detection of novelty and recency apparently occurs at a high level of stimulus representation. Taken together, the results indicate that both novel stimuli and stimuli that have not been recently seen will have a larger neural signal in the visual cortex, giving them a competitive advantage in gaining control over attentional and orienting systems. This would explain the bias towards novelty in the human behavioral data described above. The longer the organism attends to the object, the more knowledge about the object is incorporated into the structure of the cortex; this reduces the visual signal. It will also reduce the drive on the orienting system so that the organism is free to orient to the next new object (Li et al 1993, Desimone et al 1994). This view is compatible with Adaptive Resonance Theory (Carpenter & Grossberg 1987), in which novel stimuli activate attentional systems that allow new long-term memories to be formed. Consistent with these neurophysiological results in animals, a reduction in neural activation with stimulus repetition in human subjects has been seen in both event-related potentials of the temporal cortex (Begleiter et al 1993) and in brain-imaging studies (Squire et al 1992). www.annualreviews.org/aronline Annual Reviews Annu. Rev. Neurosci. 1995.18:193-222. Downloaded from arjournals.annualreviews.org by University of California - San Diego on 01/05/07. For personal use only