next up previous contents
Next: 6.2 Interest and Emotion Sensitive Media Up: 6 Visions of the Future Previous: 6 Visions of the Future

6.1 Improved Eye Tracking Techniques

Though the eye-gaze tracking techniques available today are far more sofisticated than 20 years ago, they are still far from perfect. Users are still fighting severe problems concerning head-movement over-sensitive trackers and equipment that loses its calibration far too soon. In this section we suggest some areas of eye-gaze tracking that we find must be improved to make eye-gaze tracking practical and possible for use by the general community.

6.1.1 Tricks of the Trade Today

Today, several techniques and tricks are exploited to improve the overall performance of interfaces that are based on tracking various aspects of the user:

Local user initiated re-calibration.
Jacob (1993) discovered that the tracking precision for the corneal/pupil reflection eye-tracker he was using was nonuniform across the screen, i.e. it was more imprecise at specific locations on the screen. The way this was solved was to give the user the possibility of making local re-calibration by manually moving the mouse pointer to the area of local calibration. Then the user must stare at the pointer while clicking on the mouse, causing all future eye-gazes recorded on the vicinity of the point to be taken as gazes at the actual point.

Local automatic re-calibration.
A slightly different kind of re-calibration is found in the EyeCatcher (Hansen et al. 1995). The EyeCons have a well-defined border, but they can actually be activated when an eye-gaze is detected slightly outside the EyeCon. This is based on the assumption that the eye of the EyeCon with its small black pupil is such an attractive object that a user would not normally look at a point slightly outside the border, but rather straight at the pupil of the eye. When an eye-gaze is detected, off-centre or not, the system performs an automatic re-calibration based on the current tracking data and the position of the EyeCon, assuming the user is looking directly at the pupil of the EyeCon. As this is completely automatic, the user does not notice it-save perhaps for the feeling that the eye-gaze tracking is rather accurate, also after some use when eye-gaze systems not using this technique would be rendered quite inaccurate.

Reassignment of off-target fixations.
Jacob (1993) has also used a similar technique, although without the re-calibration, which seems reasonable, given that the objects used do not have as well-defined a centre as an EyeCon. Fixations slightly off-target are accepted if they are "`reasonably' close to one object and `reasonably' further from all other such objects (i.e. not halfway between two objects, which would lead to unstable behaviour)" (ibid., p. 172).

Tracking data tokenisation.
Raw eye-tracking data does not describe that smooth motion we think our eyes follow. This is caused by the fact that we usually are unaware of our exact, jittery eye-movements, but the raw data does often also contain entirely wrong coordinates, usually because the tracker missed a video frame or the user blinked. Jacob (1993) addressed this problem by expecting a series of fixations separated by fast saccades and trying to fit the raw data into this "mould." Momentary spikes in the raw data are not interpreted as saccades, but as faulty measurements; the algorithm used waits for 100ms intervals before it reports the mean gaze position of this interval as a fixation. The resulting data is a string of tokens, not eye-tracker measurements, that describe fixations closer to what the user thinks she is fixating than the raw data.

Selection prediction using Markov-chains.
Eye-gaze selection (or zooming-in) of objects can be thought of as navigating through a menu structure. Due to the slight jitter of eye-movements, selectable objects must not be too small, or the user will be unable to fixate correctly on the right object. Thus, at each level, relatively few objects can be selected (or zoomed-in), making the resulting menu structure fairly deep. Selecting one's way down a deep menu can be a rather monotonous affair, and to this end the Erica user interface for handicapped people (described in section 4.3.2) employs a selection prediction algorithm, which predicts the user's most likely next choice, based on the two preceding choices using a second order Markov chain (Hutchinson et al. 1989, Frey et al. 1990). This prediction is then used to compose the menu that is to be displayed, resulting in a dynamic menu layout system. Selection time is reported to be reduced by 25-30% using dynamic menus for eye-type-writing (ibid.), but one must carefully consider the pros and cons of changing the menu structure-Christensen et al. (1993) reported an experiment where the performance was best if the structure of the experimental menu was static, because users tend to remember the selection sequence in a spatial fashion.

Wide-angle for locating, tele-lens for tracking.
The problem of too heavy restrictions on head-movement during eye-tracking has been addressed by Applied Science Laboratories that have made an `extended head tracking unit' (see figure 4). This system operates simultaneously with two cameras, one with a tele-lens for the actual eye-tracking and one with a wide-angle lens to constantly locate and adjust to the user's eye position (Bolt 1984). Hunke & Waibel (1994) have developed a face locating and tracking unit that starts by detecting all the faces in the field of view of a wide angle camera, followed by a selection of the closest face. This face is then continuously tracked, using techniques for face-colour (even in different lighting situations and skin colours) and movement detection, combined with among other things an artificial neural network that considers shapes to detect faces. This concept of combining general tracking (of the face position) with specific tracking (of the eye) seems very promising, because it allows for unobtrusive and fairly movement-insensitive eye-gaze tracking.

Combining tracking from several modalities.
Generally, it is quite advantageous to combine the tracking data from multiple modalities, as the different modalities often help disambiguating tracking data. Bub et al. (1995) has shown that the combination of visual (face) tracking system and a speech recognition system that is able to "listen" in specific directions greatly improve the speech recognition in noisy environments. Duchnowski et al. (1995) use a face-locator to obtain a stable image of the user's face; this image is fed to a lip-reading unit that is combined with an auditive speech recognition unit, resulting in a 20-50% recognition error rate reduction.

Multi-resolution screens for speedy display response.
A slightly different problem is addressed by multi-resolution screens. If the bandwidth from the place where image data is retrieved to the place where it is displayed, one can make sure to transmit only the necessary image information by detecting the viewer's point of regard. As the acuity of the peripheral vision is low (cf. section 3.1), the image need not be displayed with a high resolution in areas where the viewer is not looking. In this technique, the viewer's point of looking is constantly transmitted to the image retrieval store, and the resolution distribution of the transmitted image is dynamically altered accordingly so that the viewer gets the impression of looking at a uniformly high-resolution image (Bolt 1984).

6.1.2 Expected Improvements in the Future

If eye-gaze media are to have any noticable impact on user interfaces, some technical problems have to be addressed, though. Jacob (1995) notes that "[p]erformance does not appear to be constrained by fundamental limits, but simply by lack of effort in this area, due to its narrow market." An important technical factor is how discreet the equipment is; we find that the remotely operated corneal reflection/pupil centre technique of eye-tracking already is sufficiently discreet, and that the most important problems for future development are:


next up previous contents
Next: 6.2 Interest and Emotion Sensitive Media Up: 6 Visions of the Future Previous: 6 Visions of the Future
Authors: Arne John Glenstrup and Theo Engell-Nielsen