next up previous contents
Next: 3.5 Summary Up: 3 Psychological and Physiological Previous: 3.3 Eye Movements

3.4 The Connection Between Eye-Gaze Pattern and Interest

Having treated the "low-level processes," we now turn to a higher level of description of the eye-gaze pattern, and describe the possible connections between the eye-gaze pattern and the person's cognitive state; can we somehow infer the person's current object of interest from the gaze pattern?

Kahneman (1973) classifies eye movements into three general types of looking, distinguished by the situations in which they occur:

Spontaneous looking
occurs when the subject views a scene without any specific task in mind, i.e. when she is "just watching" the scene. When exerting spontaneous looking, the gaze pattern is "controlled by collative features of stimuli, such as novelty, complexity, incongruity" (Kahneman 1973, p. 65). Interestingly, it is not in general the physical qualities of the parts of the visual scene-brightness, number of details etc.-that determine where the person looks. Rather, the eyes tend to be attracted by those parts of the scene that contain the most information for the perception of it; not even physical contours are given much attention, unless they convey important information for the recognition of the scene (Yarbus 1967, p. 190). Spontaneous looking is also guided by stored knowledge; when viewing faces, the eyes, lips and nose are given the most attention-but this is because the observer knows that
[t]he human eyes and lips ... are the most mobile and expressive elements of the face. The eyes and lips can tell an observer the mood of a person and his attitude towards the observer, the steps he may take next moment, and so on. It is therefore absolutely natural and understandable that the eyes and lips attract the attention more than any other part of the human face. (Yarbus 1967, p. 191)
We might note, though, that the high degree of attraction can also be the result of a bottom-up process, because of the simple fact that the eyes and lips are mobile: movement generally attracts attention, so these facial parts-that more or less constantly are in motion-would automatically attract attention, according to a bottom-up explanation. It is most probably a combination of top-down and bottom-up processes, though.

An observer might look at elements of the scene that do not convey important information, but this is interpreted as a result of the observer thinking she might find some important information (ibid.)

Figure 7: Seven records of eye movements by the same subject. Each record lasted 3 minutes. 1) Free examination. Before subsequent recordings, the subject was asked to: 2) estimate the material circumstances of the family; 3) give the ages of the people; 4) surmise what the family had been doing before the arrival of the "unexpected visitor;" 5) remember the clothes worn by the people; 6) remember the position of the people and objects in the room; 7) estimate how long the "unexpected visitor" had been away from the family (from Yarbus 1967).

Task-relevant looking
is performed when the observer views the scene with a particular question or task in mind. Thus, she is seeking information of a special kind, and the eye-gaze pattern is affected; in his study, Yarbus (1967, p. 192, 174) instructed a subject to answer seven different questions concerning the depicted situation in Repin's picture "An Unexpected Visitor." This resulted in seven substantially different patterns, each one once again being easily construable as a sampling of those picture objects that were most informative for the answering of the question (see figure 7).
Orientation of thought looking
occurs when the observer is not paying much attention to where she is looking, but is attending to some "inner thought." For example, Stern (1993) reports that some subjects, when asked to spell "MOTHER" backwards, move their eyes from right to left, as if they visualised the word "MOTHER" and simply read off the letters R-E-H-T-O-M.

To this list we can add a new type of looking that we expect will become more prevalent as the use of eye-gaze media becomes more widespread:
Intentional manipulatory looking
is the observer's act of directing the eyes to a specific part of the scene or in a specific way, with the intention of manipulating something in the scene. The observer then becomes more than a "passive" observer-the eyes are now used not only for gathering input from the surroundings, but also for producing output to the surroundings.

Eye-gaze media where the user knows that looking at specific parts of the display initiates different actions constitutes the obvious example of application for intentional manipulatory looking, and several such systems have already been constructed (Jacob 1991, Jacob 1995, Hansen et al. 1995, Smyth et al. 1994, Can 1992, see also section 4). When users first use such systems, they are conscious of their manipulatory looking-they utilize knowledge-based processing (cf. Rasmussen 1983) for manipulating the displayed objects with their eyes. It is not yet known to what extent this intentional manipulatory looking can be internalised to a rule-based or skill-based level, but we expect that research and experience on this topic to show that unless the interfaces of these systems are built in a directly illogical or counter-intuitive way, training will be able to bring it down to at least a rule-based level of processing; people will begin to find that they do not have to think about it when they manipulate displayed objects with their eyes-just as the mouse is thought of as a natural pointing device today.

One must bear in mind that intentional manipulatory looking is not a novel way of looking, although the manipulative power that can be exerted through the aforementioned systems is probably not parallelled by any "natural" instances of looking. Several natural examples of intentional manipulatory looking do exist, however: Some parents can simply look at their children if they want them to do what they are told, and looking at one's wristwatch in a demonstrative fashion when attending a meeting is a generally socially accepted sign that one is eager to get the meeting over with. An example of intentional manipulatory looking which clearly demonstrates that we can use our eyes as a pointing device is when one's hands are occupied with lifting, say, heavy furniture; if one wants a friend to move an object out of the way, one can look at said object while saying "could you please move that..." and then point ones eyes at a different location while saying "...over there!" This is certainly a way of intentionally manipulating the surrounding world with the eyes!

Most research literature concludes that humans generally are interested in what they are looking at-at least when doing spontaneous or task-relevant looking (e.g. Ware & Mikaelian 1987, Barber & Legge 1976, Bolt 1984). One line of evidence for this claim comes from an investigation Barber & Legge (1976, p. 60) reports Mackworth & Morandi (1967) as having carried out. In this study, they asked one group of subjects to assess what the most informative parts of pictures were, and tracked the eye-gaze of another group of subjects viewing the same pictures. Barber & Legge (1976) conclude that "[t]here was good agreement between what was considered informative and what was looked at most often." It is hardly surprising that what we are looking at is what we are interested in; for centuries merchants have used the technique of observing where the customer's eyes were looking to discover what item she was interested in, and every day the parents of babies follow their line of gaze to try to deduce what the baby could be interested in. This is also the case the other way around: the baby is born with the ability to recognise the two round spots in a face that constitute the eyes of a potential parent, and when it perceives eye-contact with the parent, it feels more secure-the parent's direction of gaze is a sign of interest. The direction of gaze is also used by skilled road-users, when they try to determine another road-user's intent; if the driver is looking the other way while approaching a crossing from a side-road, it might not be safe to pass her.

One must be wary, though, of interpreting this as if it always is so that whatever an observer is looking at is what she notices. In some findings that Barber & Legge (1976, p. 61) cite Snyder (1973) as reporting, it was found that test pilots who were asked to report seeing target objects while viewing a film of a low-level flight occasionally would fail to notice said objects, even though their line of gaze indicated that they were in fact looking at them. From this and another study which Barber & Legge (1976, p. 61) cite Kaufman & Richards (1969) for, where subjects' indications of what they thought they were looking at were more "spread out" than their actual fixations, we can conclude that it must be remembered that "what a man looks at is not necessarily an accurate indicator of what he is attending to" (Barber & Legge 1976, p. 61, our italics).

This fact can be better understood if one consideres a model of the mental processes where the main processing is carried out in what Baddeley (1981) calls working memory that uses a visuo-spatial scratch pad which is a kind of "desktop" where visual information can be temporarily stored. Just & Carpenter (1980) use this model as a basis for developing a theory of how the process of text reading works; to make a link between eye fixations and their theory of reading, they first make two assumptions:

An immediacy assumption
which states that the object that is focused by the eyes is immediately processed at several levels, perhaps making guesses as to how it fits into the overall picture. "The immediacy assumption posits that the interpretation at all levels of processing are not deferred; they occur as soon as possible" (Just & Carpenter 1980, p. 330).
An eye-mind assumption
which states that the eye is coupled with the mind in such a way that it fixates on an object as long as it is processing it. "The eye-mind assumption posits that there is no appreciable lag between what is being fixated and what is being processed" (ibid., p. 331).

Especially the last assumption-the eye-mind assumption-is interesting, because it reveals a connection between eye-gaze data and what the mind is processing, i.e., what the person currently is interested in. But it is important to specify under which conditions this assumption can be used; "one of the most important conditions is that the [person's current] task require that information from the visual environment be encoded and processed. If the visual display is not relevant, there are no mapping rules between what is being fixated and what is being internally processed" (Just & Carpenter 1976, p. 475). In other words, if we know that the person is performing processing that requires information from the environment, then we can assume that the eye-gaze data is well-correlated to the items currently being processed. They do note, however, that
While the duration of the gaze is closely related to the duration of cognitive processes, the two durations are not necessarily identical... At best, the gaze duration may provide a rough estimate of the absolute duration of a stage of processing, or at least it provides an upper bound on the estimate. In any case, the difference between gaze durations in different conditions may provide a good estimate of the duration of the cognitive processes by which they differ. (Just & Carpenter 1976, p. 474)
Perhaps one would expect that another condition for justifying the use of the eye-mind assumption would be that the information in the visual display that is required actually is present in the display. This is not so. Several studies (e.g. Just & Carpenter 1976) have shown that subjects will fixate where the required information used to be, even though it is no longer there!

Figure 8: A schematic diagram of the major processes and structures in reading comprehension (from Just & Carpenter).

A schematic diagram of the theory of reading proposed by Just & Carpenter is shown in figure 8. Rayner & McConkie (1975) conducted a study where they tried to determine what controls the eye movements that are associated to reading. In this study it was found that, among other things, the lengths of saccades are unrelated to the duration of fixations-thus excluding purely bottom-up control processes-and that the location of the fixations is controlled in some non-random way and is related to aspects of the encountered information. They suggest that the connection between the working memory and the fixation selection is governed by a process-monitoring control model, where a unit separate from the visual processing (reading) mechanism monitors aspects of this mechanism, and produces eye movements on the basis of these aspects. This shows that the exact connection between eye-gaze pattern and processing in working memory-and thus a person's current objects of interest-definitely is non-trivial.

Of course the theory does not a priori generalise to viewing of any kind of visual information; as Rayner & McConkie (1975) write,

it should be pointed out that in visual search and in picture perception the useful information may be of a more directly visual type than is the case in reading...It is quite possible that the eye could be guided largely on the basis of the visual pattern in examining pictures, but not in the case of reading. (Rayner & McConkie 1975, p. 830)
We do believe, however, that perception of much computer graphics-especially graphics that depict underlying data structures-is to some extent comparable with the process of reading; the subject must still try to make more sense of the displayed objects than what is immediately perceived.

To conclude, if we can assume that a subject needs the information in the visual display, it is reasonable to make the assumption that her direction of sight is a pretty good indicator of what she is interested in, at least during spontaneous and task-relevant looking, which, apart from intentional manipulatory looking, is the type of looking we are mainly interested in. Yarbus' (1967) experiment where a subject was to look at "The Unexpected Visitor" after being asked to determine some specific aspect of the depicted scene (see figure 7), resulted in eye movement records that depended on what the subject was trying to determine. It seems reasonable to conclude that the fixation patterns were different because the subject was interested in different parts of the picture, according to what she had to determine, i.e., was interested in. Yarbus himself also concludes that

Eye movements reflect the human thought processes; so the observer's thought may be followed to some extent from records of eye movements (the thought accompanying the examination of the particular object). It is easy to determine from these records which elements attract the observer's eye (and, consequently, his thought), in what order, and how often. (Yarbus 1967, p. 190)

Another interesting finding in Yarbus' study was that the eye movements occur in cycles, i.e., the important parts (according to the observer) of the entire scene are first scanned, and then re-scanned, instead of using excess observation time to fixate on the less important parts. Norton and Stark (cited by Barber & Legge 1976, p. 62) did some research on this, and described this repeated pattern as a "scanpath." Their research confirmed Yarbus' findings, and they found that "[t]he scanpath emerges in the course of initially viewing a figure and [occupies] about 30% of the viewing period" (Barber & Legge 1976, p. 62). In other words, the observer quickly "decides" what the important parts are-Yarbus (1967, p. 194) reports that "the duration of a cycle during which the observer's eye can cover the whole picture amounts sometimes to several seconds, sometimes to several tens of seconds"-and then spends time on re-scanning these parts.

It must be noted that the scanpath is determined by the composition and the individual observer (Barber & Legge 1976, p. 62). The first factor is hardly surprising; several design rules for good user interfaces are based on a deliberate planning of the order in which the user is supposed to view the displayed items. Moreover, for centuries artists have known that they could to some extent control how the viewers would view their paintings by exploiting composition. Hansen & Støvring (1988) have also done an experiment where the artist Michael Støvring was asked to explicitly explain where he had intended the viewers of his art should look. Subsequent eye tracker recordings of ten subjects viewing the art for 30 seconds to try to see what it represented were in considerable concordance with what the artist's intended scanpath had been.

It might not come as a surprise, either, that the scanpaths are idiosyncratic, since our perception of the scene is also influenced by our stored knowledge of the viewed objects etc. Yet it is interesting to note that in Norton and Stark's experiment, the observers would adopt their own previous scanpath of a specific picture on 65% of occasions where a picture was shown to them a second time-some time after their first viewing it.

The spatio-temporal layout of the eye-gaze pattern can tell us more, though. Some research conducted by Gould (1967) (cf. Barber & Legge 1976, p. 58), where subjects were to report how many times a standard pattern occurred among a set of comparison patterns, showed that not only was the reaction time influenced by the pattern similarity, but the durations of the fixations were also affected; fixation durations were longer (340ms as opposed to 280ms) for highly similar target and comparison patterns. It is thus suggested that the fixation duration indicates the time taken to register and process the stimulus, i.e., that the length of our fixations reveals how difficult it is for the observer to perceive the fixated items. To add to this, Kahneman also states that "the rate of eye movements often corresponds to the rate of mental activity" (Kahneman 1973, p. 65, our italics).


next up previous contents
Next: 3.5 Summary Up: 3 Psychological and Physiological Previous: 3.3 Eye Movements
Authors: Arne John Glenstrup and Theo Engell-Nielsen