Early and Late Selection Theories of Attention

(May 1994)

This essay is about attention. Specifically, it is concerned with some of the theories that have attempted to explain attention by using ideas from information processing theory. Of the most influential theories in the field, the majority fall into two broad categories: "bottleneck" theories and capacity model theories. Although, the latter are arguably the most favoured nowadays, it is the former which we shall concentrate upon, since they have probably been the most influential.


It is worth noting at the outset that both bottleneck and capacity theories are based on the idea that humans have limited information processing capacity: i.e. we are never able to deal with of all the inputs that continuously flood into our processing systems from our senses and memory, and even if we were, we are limited in the number of motor responses we can make.

One can describe bottleneck theories as a strong version of this limited capacity idea, in that only one message at a time can enter consciousness,

since at some point processing is reduced to a single channel. Capacity models, on the other hand, are a weaker version, in that information can be processed via many channels but that there is a fixed capacity limit to be distributed amongst the channels.

What is Attention?

Before launching into a detailed analysis of the three most influential bottleneck theories which emerged in the late 1950s and early 1960s, it would be helpful to establish a perspective from which to make the comparison. As a start, therefore, it does not seem unreasonable to ask a fundamental question: What is attention? Without a good idea of what attention is, and especially its relationship to perception and consciousness, we are in a poor position to compare theories.

A very early definition (recently described as an "elegant summary"; Underwood, 1993) is that of William James over 100 years ago :

"It is the taking possession in the mind, in clear and vivid form, of one out of several simultaneous possible objects or trains of thought. Focalization, concentration of consciousness are of its essence. It implies withdrawal from some things in order to deal effectively with others" (James, 1890).


James's definition neatly exposes a problem that dogs the field of attention: the relationship between attention and consciousness. Dixon (1981) is almost a lone voice in squarely confronting this issue, noting that one legacy of behaviourism is that the conscious mind is virtually a taboo subject. Add to this the idea that the effects of attention are largely preconscious, hence dangerously close to the 'unconscious' of psychoanalytic theory, and one has a double taboo whammy that makes many uneasy and obscures much that is important and interesting.


One outcome of this unease is the varying ways in which researchers in the field conceptualise 'attention' and 'consciousness', often doing so in order to skirt around some of the fundamental problems just described. As we shall see later, when comparing Deutsch and Deutsch (1963) theory with the rest, the tangled notions and vague definitions of attention, awareness, mental effort/concentration and consciousness can lead to severe problems, not least that researchers, coming from different conceptual frameworks, seem at times to be talking different languages.


Some theorists have made an explicit differentiation between attention and consciousness. For example, Johnston and Heinz (1968) view attention as "the systematic admission of perceptual data into consciousness . . . the process whereby perception is biased toward or against specific inputs". This seems a compelling idea, and especially fits with the bottleneck theories of selective attention, unfortunately there are some problems with it. Firstly, there is the problem that attention seems sometimes to be under conscious volition and at other times (often annoyingly) not. This raises the key question: under what circumstances is attention under conscious control? A second problem is that the view of attention as a kind of selection process before ideas are allowed into consciousness (where, presumably, most high level semantic processing, planning and decision making is performed) does not fit with certain empirical findings, especially in the field of subliminal perception. Dixon (1981) in a major review of the field, proposes a model of the mind as an information processor in which there are two separate systems; one involving consciousness/awareness, and the other involving preconscious (or unconscious) processing. If we ally this to the popular view of attention as "the concentration and focusing of mental effort (Matlin, 1983)", then we have a notion of attention which can be quite separate to consciousness. In other words, if attention is a concentration of mental effort, and mental effort can be exerted unconsciously, then attention, at least in part, acts separately to consciousness.


As an alternative, attention can be viewed in the highly restrictive sense of only being necessary for information that is novel and important. This introduces the ideas of acquisition of skill and automaticity which are performed without conscious control (which Reason, 1979, calls open loop control). As we shall see later, this has important implications for the efficacy of the ubiquitous dichotic listening paradigm. Treisman (1988) takes this view, seeing attention as being necessary for the integration of perceptual features that form objects i.e. it is required to combine otherwise separate features that we have not already "chunked" (to borrow a word from memory research) together by previous repeated exposure.


The third problem with the Johnston and Heinz definition is more practical, in that not everybody has adopted it (e.g. Best's definition just presented). This may seem an obvious point, but introduces an important and often overlooked idea: that different theories/models of attention necessarily entail different definitions of what attention is. It is all too easy to impose one's own conceptualisations on certain words (e.g. "attention", "awareness" and "consciousness") and compare theories using language which severely distorts the original theorist's conceptualisations. It is as if one were to compare and contrast a Jackson Pollock with a Raphael on one's own notions of "beauty", "meaning" and "harmony".


Theories of Attention

From issues associated with a defintion of "attention", let me now move on to compare the clearest and the most influential bottleneck theories of selective attention: those of Broadbent (1958), Treisman (1960), and Deutsch and Deutsch (1963).


At a basic level, these theories appear to explicitly show that the locus of the bottleneck in a person's information processing system is either 'early' (perceptual limitations) or 'late' (response limitations). We shall see later that this may be too crude an assessment, and shall also briefly consider other possibilities that arise from the work of Johnson and his colleagues which have been seen as resolving this 'early' verses 'late' problem.


For both historic and conceptual reasons, the easiest way to start a comparison of the three theories is to look at them in two pairs: firstly Broadbent and Treisman; secondly Treisman and Deutsch & Deutsch.


Comparing Broadbent's and Treisman's theories

Comparing Broadbent's and Treisman's theories is a relatively straightforward affair since Treisman's model is a direct amendment of Broadbent's. Let me firstly give an brief outline of each model.


Broadbent's model: Incoming stimuli, briefly held in a sensory register, undergo preattentive analysis by a selective filter on the basis of their physical characteristics. Those stimuli selected pass along a (very) limited capacity channel to a detection device where semantic analysis takes place. Those stimuli not selected ('filtered' out) are not analysed for meaning and do not reach consciousness. This is, therefore, an early selection theory, and an 'all or nothing' view of perception.


Treisman's model: Incoming stimuli, briefly held in a sensory register, undergo preattentive analysis by an attenuation filter on the basis of crude physical characteristics (the information resulting from this analysis is available to conscious perception and for reporting by the subject, regardless of what happens to the message beyond this point). Those stimuli selected (attended to) pass along a limited capacity channel to a detection device (a pattern recognizer, comprising a number of 'dictionary' units) where semantic analysis takes place. Unattended stimuli are attenuated (the signal strength is lowered) before passing along the limited capacity channel to the detection device, where they are semantically processed if they meet certain criteria. This is, therefore, an early selection theory, and an attenuation model of attention.


These theories have far more ideas in common than they do differences, yet it is the differences which are the key aspects. The two major differences are outlined in the following two paragraphs.


1. Broadbent's filter is all-or-nothing (it does not allow through unattended messages), whereas Treisman's filter allows unattended messages through, but in an attenuated form. Treisman proposed this amendment to account for a number of empirical findings which were not explained by Broadbent. For instance, Moray (1959) had found that "subjectively 'important' messages such as a person's own name can penetrate the block [the all-or-nothing filter]: thus a person will hear instructions if they are presented with his own name as part of the rejected message". A similar finding by Oswald et al (1960) found that a person's own name and critical names presented to a sleeping subject elicited a clench response which had been previously conditioned. Treisman (1960), using a dichotic listening with shadowing procedure, found that if different sentences in the two ears are suddenly switched, then the subject shadows one or two words of the unattended message before reverting back to shadow the attended ear. Clearly, certain unattended messages can be processed semantically, hence the need to modify the physical characteristics filter.


2. Broadbent's is a simple single filter model, whereas Treisman's can be thought of as a two-stage filtering process: firstly, filtering on the basis of incoming channel characteristics, and secondly, filtering by the threshold settings of the dictionary units. Treisman's explanation as to the way these threshold settings perform a filtering operation explains the findings of Moray, Oswald and Treisman described above, and many other similar findings. The dictionary units have the two mportant properties of having thresholds that differ, and which are variable. Some units, those which respond to biologically (or emotionally) important signals, have permanently lowered thresholds. Hence, even very attenuated signals (because they are not being attended to) can trigger a unit which is 'tuned' to that signal. This explains the reason why one's own name can attract one's attention in a previously unattended message. On a more biological level, this explains the sensitivity that mothers have for the noises their babies make, even when virtually out of earshot. In addition to these semi-permanent threshold differentials, there is the transient variation in thresholds due to the expectations of the subject i.e. the context. The occurrence of a particular signal will, if it triggers a dictionary unit, lower the threshold for other signals which in the past have been associated with it. Hence, highly probably words (e.g. those half way through a sentence in Treisman's 1960 experiment) are made more likely to fire even if their signal is attenuated.


Comparing Treisman's model and Deutsch and Deutsch's model

Let us move on to a more problematical task: comparing Treisman's model with that of Deutsch and Deutsch (1963). There are two key points to consider. Firstly, Deutsch and Deutsch's theory is based on the same empirical data as Treisman's theory, but is reinterpreted from a different perspective. Secondly, Deutsch and Deutsch saw Treisman's model as being flawed in that the dual filtering process is redundant - why have the lower level filter, if the same job can be done by suitably controlling the thresholds of the dictionary units? Let me briefly outline Deutsch and Deutsch's theory.


Deutsch and Deutsch: Almost every incoming message in a sensory register reaches "the same perceptual and discriminatory mechanisms whether attention is paid to it or not; and such information is then grouped or segregated by these mechanisms". Discriminatory (or classifying) mechanisms become excited by particular attributes of the incoming message depending on preset weightings of importance. The discriminatory mechanism with the highest weighting will transfer this weighting to the other classifying mechanisms with which it has been grouped or segregated. Since there will normally be activity in a number of classifying mechanisms, a "diffuse and non-specific system is necessary" which takes up a level, at any one time, corresponding with the level of the 'highest' discriminatory mechanism. This highest level sets a criterion by which all other levels are compared. Hence, only the discriminatory mechanism with the highest level activates the appropriate outputs (storage, motor response) and inhibits the outputs associated with the other discriminatory mechanisms. Further, the general level of arousal will alter access output systems. Hence, for a low level of arousal (e.g. sleep), only very high level messages will be able to alter storage/motor response.


At first sight, and accounting for major differences in terminology, this looks very similar to Treisman's theory. The 'discriminatory mechanisms' with their 'weightings of importance' look remarkably similar to Treisman's 'dictionary units' and 'thresholds'. One obvious difference is that Deutsch and Deutsch have no low level (physical characteristics) filter. Another difference is that all the 'discriminatory mechanisms' (including physical and semantic classifying structures) in Deutsch and Deutsch's theory are triggered, whereas Treisman has only those 'dictionary units' with signals above a threshold being triggered. However, since Deutsch and Deutsch have only the highest level being capable of further processing (output response), this amounts to the same idea as Treisman, whose threshold settings are altered depending on those units being triggered. So what is all the fuss about? Apart from the lack of a separate low level filter, both models have pattern recognition units/mechanisms that select one or two highly salient signals dependent on bottom-up (physical) and top-down (contextual) features which are then passed on to the appropriate response mechanisms.


The question then is: Why is Treisman's theory seen as typifying 'early' selection and Deutsch and Deutsch's as typifying 'late' selection? Support for the idea that both theories are basically the same comes from Moray (page 35, 1969) who in reviewing early and late selection theories, quotes a personal communication with J. A. Deutsch in 1968, in which: "Deutsch does not agree that their model is a response selection model, but regards it as selecting incoming signals in the same sense in which Treisman's model does".


So why this misinterpretation in analysing the two theories. I believe the problem stems from the fact that the two theories come from different areas of psychology, and consequently are couched in the different language and conceptualisations. Treisman is wholeheartedly a cognitive psychologist using the tricks of the trade (dichotic listening, shadowing, reaction times, memory tests) to tease out the way different units in the mind process information. Deutsch and Deutsch, on the other hand, are from the biological school (and I believe have never touched a pair of dichotic listening headphones in their lives). To use an artificial intelligence analogy, one could say that Treisman favours a traditional serial processing approach, whereas Deutsch and Deutsch are from the connectionist school of parallel distributed processing and neural networks. The difference is highlighted when one reads their original papers. Whereas Treisman is written in "information processing speak", Deutsch and Deutsch's language is in terms of organisms, behaviours and neurophysiology (in fact their original paper contains a substantial section on the neurophysiological evidence relevant to their ideas). It is an astonishing, and little known, fact that Deutsch and Deutsch original (1963) paper on attention contains neither the word "consciousness" nor the word "conscious".


The conceptual rift between Treisman and Deutsch & Deutsch is clearly demonstrated in two papers printed in 1967, in which they, themselves, appear to be talking different languages. In the first, "Selective attention: perception or response?", Treisman and Geffen seem to demonstrate by means of a dichotic listening task with shadowing and finger tapping instructions that "the main limit is perceptual". In the second paper, "Comments on 'Selective attention: perception or response?'", Deutsch and Deutsch take a completely different reading of this experiment opening with the statement "We cannot understand why Treisman and Geffen (1967) think their experiment argues against our theory". To add insult to injury, Deutsch and Deutsch go on to criticise Treisman's amendment of Broadbent's theory, by arguing that the reduced signal-to-noise ratio of unattended messages rather than reducing the load on the signal-recognition system would actually increase it, since signal detection theory says that the noisier the signal, the more analysis is required to extract it.


No doubt Treisman, Deutsch and Deutsch are highly intelligent and knowledgeable people, and it is not my place to try and resolve their differences. Suffice to say (if I am not labouring the point) that it is their different conceptualisation and definitions of 'attention', 'awareness', 'perception' and 'consciousness' that, I believe, is causing the problems.


Problems for Bottleneck Theories

Let me end by briefly looking at two analyses that have a significant bearing on the bottleneck theories of attention. Firstly, a problem that a number of researchers believe is inherent with the major paradigm involved in providing empirical evidence for selective attention theorists: i.e. dichotic listening with shadowing. The usefulness of this technique, I believe, was twofold: it clearly demonstrated the direction of attention, and it required so much mental effort that none is available to allow even a brief switch of attention to other directions. However, this technique is seen as being so mentally effortful to the experimental subject that it precludes studying the affect of attention under the more real-life circumstances where attention is rarely overloaded to such an extent and, therefore, can be switched and interwoven with the many parallel activities that can be performed without conscious attention (i.e. automatic skills). Another objection is that dichotic listening may still allow for switches of attention. for instance, Mowbray (1964) found that memory for 'unattended' events was associated with a temporary shift of attention away from the primary message. A major advance in experimental technique was introduced around 1970 with inferential methodologies (e.g. the seminal work of Lewis, 1970). This led to research that showed that at least under conditions below information-overload, a capacity model of attention was probably more applicable than bottleneck theories.


The second idea, I feel is appropriate to end with, comes from work by William Johnston and his colleagues: e.g. Johnston & Heinz (1978); Johnston & Wilson (1980). This work appeared to demonstrate that there is no single bottleneck in the information processing system, but that there are a series of filters so that incoming stimuli can be processed both 'early' or 'late' depending on the situation: i.e. "the unattended message is processed according to task demands" (Underwood, 1993). This is, therefore, a flexible theory of selective attention.


One of the strongest empirical evidence which, it is claimed, supports this hypothesis is that of Johnston and Wilson (1980). In this study, subjects were aurally presented with a series of simultaneous word pairs, one of which was occasionally an instance of a predetermined category (e.g. animals) that the subject had been instructed to 'listen out' for. These words were always homonyms (e.g. bear) so that the simultaneous paired word could be appropriate (e.g. hibernate), inappropriate (e.g. naked) or neutral (e.g. luck). The first part of the experiment was carried out with binaural presentation (i.e. both words to both ears), with the finding that "detection of targets can be facilitated by appropriate nontargets and inhibited by inappropriate ones. Thus, non-targets can influence the way in which targets are semantically represented". In the second part, presentation was dichotic with the subject precued as to the ear that the target word would appear in. Under these conditions, the effects found in the first part of the experiment did not occur. The overall conclusion being that "precueing appears to curtail the perceptual processing of nontargets. The data run counter to theories that claim that focused attention does not entail the perceptual suppression of nontargets".


I have a comment to make that makes me suspicious of these findings. A close examination of the actual words used in the experiment reveals that appropriate nontarget words are almost invariably words that would normally be associated with the targets (e.g. hair-comb, blocks-wooden, duck-quacking), whereas the majority of inappropriate nontargets (23 out of 36 words) are synonyms of the inappropriate categorisation (e.g. hair-rabbit, blocks-barricades, duck-dodge). It seems to me that rather than there being facilitation by semantic processing of the nontarget word in the binaural case, it was a simple association (a precategorical perceptual grouping) between the target and nontarget. For example, one has heard the words hearing the simultaneous presentation of the word blocks and wooden would trigger a 'dictionary unit' that 'looks out' for "wooden block", as if this was a single word. I believe this seriously questions the validity of the results.



To sum up. A comparison of the 'early' selection theories of Broadbent and Treisman is relatively straightforward, since they can be seen as being in the same evolutionary line of ascent. Comparing the 'early' selection theory of Treisman with the 'late' selection theory of Deutsch and Deutsch is far more problematical, and, I believe, is severely confused with the simplistic labels of 'early' and 'late'. I believe the best way to conceive of them is as almost the same structures/mechanisms but viewed from different conceptual frameworks where key terms such as attention, awareness, perception and consciousness have different uses and meanings. In addition, there is reaso n believe that the empirical evidence used to justify bottleneck theories is flawed and of dubious ecological validity. Finally, attempts to resolve the 'early'' versus 'late' debate by a flexible filter theory also seem to be built on doubtful empirical evidence.




Best, J. B. (1992). Cognitive Psychology (3rd Edition). St Paul, Mn: West Publishing Co.


Dixon, N. (1981). Preconscious Processing. Chichester: Wiley.


Moray, N. (1969). Attention: Selective Processes in Vision and Hearing. London: Hutchinson.


Underwood, G, ed. (1993). The Psychology of Attention. Vol 1. Aldershot: Elgar.


Underwood, G., (1976). Attention and Memory. Oxford: Pergammon Press.