Next: 6.2 Interest and Emotion Sensitive Media
Up: 6 Visions of the Future
Previous: 6 Visions of the Future
Though the eye-gaze tracking techniques available today are far more
sofisticated than 20 years ago, they are still far from perfect. Users are
still fighting severe problems concerning head-movement over-sensitive
trackers and equipment that loses its calibration far too soon. In this
section we suggest some areas of eye-gaze tracking that we find must be
improved to make eye-gaze tracking practical and possible for use by the
general community.
Today, several techniques and tricks are exploited to improve the overall
performance of interfaces that are based on tracking various aspects of the
user:
- Local user initiated re-calibration.
-
Jacob (1993) discovered that the tracking precision
for the corneal/pupil reflection eye-tracker he was using was nonuniform
across the screen, i.e. it was more imprecise at specific locations on
the screen. The way this was solved was to give the user the possibility
of making local re-calibration by manually moving the mouse pointer to
the area of local calibration. Then the user must stare at the pointer
while clicking on the mouse, causing all future eye-gazes recorded on the
vicinity of the point to be taken as gazes at the actual point.
- Local automatic re-calibration.
- A slightly different kind of
re-calibration is found in the EyeCatcher (Hansen et al. 1995). The
EyeCons have a well-defined border, but they can actually be activated
when an eye-gaze is detected slightly outside the EyeCon. This is
based on the assumption that the eye of the EyeCon with its small black
pupil is such an attractive object that a user would not normally look at
a point slightly outside the border, but rather straight at the pupil of
the eye. When an eye-gaze is detected, off-centre or not, the system
performs an automatic re-calibration based on the current tracking
data and the position of the EyeCon, assuming the user is looking
directly at the pupil of the EyeCon. As this is completely automatic, the
user does not notice it-save perhaps for the feeling that the eye-gaze
tracking is rather accurate, also after some use when eye-gaze systems
not using this technique would be rendered quite inaccurate.
- Reassignment of off-target fixations.
-
Jacob (1993) has also used a similar technique,
although without the re-calibration, which seems reasonable, given that
the objects used do not have as well-defined a centre as an EyeCon.
Fixations slightly off-target are accepted if they are "`reasonably'
close to one object and `reasonably' further from all other such objects
(i.e. not halfway between two objects, which would lead to unstable
behaviour)" (ibid., p. 172).
- Tracking data tokenisation.
- Raw eye-tracking data does not describe
that smooth motion we think our eyes follow. This is caused by the fact
that we usually are unaware of our exact, jittery eye-movements, but the
raw data does often also contain entirely wrong coordinates, usually
because the tracker missed a video frame or the user blinked.
Jacob (1993) addressed this problem by
expecting a series of fixations separated by fast saccades and
trying to fit the raw data into this "mould." Momentary spikes in the
raw data are not interpreted as saccades, but as faulty measurements; the
algorithm used waits for 100ms intervals before it reports the mean gaze
position of this interval as a fixation. The resulting data is a string
of tokens, not eye-tracker measurements, that describe fixations
closer to what the user thinks she is fixating than the raw data.
- Selection prediction using Markov-chains.
- Eye-gaze selection (or
zooming-in) of objects can be thought of as navigating through a menu
structure. Due to the slight jitter of eye-movements, selectable objects
must not be too small, or the user will be unable to fixate correctly on the
right object. Thus, at each level, relatively few objects can be selected
(or zoomed-in), making the resulting menu structure fairly deep.
Selecting one's way down a deep menu can be a rather monotonous affair,
and to this end the Erica user interface for handicapped people
(described in section 4.3.2) employs a selection
prediction algorithm, which predicts the user's most likely next
choice, based on the two preceding choices using a second order Markov
chain (Hutchinson et al. 1989, Frey et al. 1990). This prediction is
then used to compose the menu that is to be displayed, resulting in a
dynamic menu layout system. Selection time is reported to be reduced by
25-30% using dynamic menus for eye-type-writing (ibid.), but one must
carefully consider the pros and cons of changing the menu
structure-Christensen et al. (1993) reported an experiment where the
performance was best if the structure of the experimental menu was
static, because users tend to remember the selection sequence in a
spatial fashion.
- Wide-angle for locating, tele-lens for tracking.
- The problem of too
heavy restrictions on head-movement during eye-tracking has been
addressed by Applied Science Laboratories that have made an `extended
head tracking unit' (see figure 4). This system
operates simultaneously with two cameras, one with a tele-lens for
the actual eye-tracking and one with a wide-angle lens to constantly
locate and adjust to the user's eye position (Bolt 1984).
Hunke & Waibel (1994) have developed a face locating and tracking unit that
starts by detecting all the faces in the field of view of a wide angle
camera, followed by a selection of the closest face. This face is then
continuously tracked, using techniques for face-colour (even in different
lighting situations and skin colours) and movement detection, combined
with among other things an artificial neural network that considers
shapes to detect faces. This concept of combining general tracking (of
the face position) with specific tracking (of the eye) seems very
promising, because it allows for unobtrusive and fairly
movement-insensitive eye-gaze tracking.
- Combining tracking from several modalities.
- Generally, it is quite
advantageous to combine the tracking data from multiple modalities, as
the different modalities often help disambiguating tracking data.
Bub et al. (1995) has shown that the combination of visual (face)
tracking system and a speech recognition system that is able to
"listen" in specific directions greatly improve the speech recognition
in noisy environments. Duchnowski et al. (1995) use a face-locator to
obtain a stable image of the user's face; this image is fed to a
lip-reading unit that is combined with an auditive speech recognition
unit, resulting in a 20-50% recognition error rate reduction.
- Multi-resolution screens for speedy display response.
-
A slightly different problem is addressed by
multi-resolution screens. If the bandwidth from the place where
image data is retrieved to the place where it is displayed, one can make
sure to transmit only the necessary image information by detecting the
viewer's point of regard. As the acuity of the peripheral vision is low
(cf. section 3.1), the image need not be displayed
with a high resolution in areas where the viewer is not looking. In this
technique, the viewer's point of looking is constantly transmitted to the
image retrieval store, and the resolution distribution of the transmitted
image is dynamically altered accordingly so that the viewer gets the
impression of looking at a uniformly high-resolution
image (Bolt 1984).
If eye-gaze media are to have any noticable impact on user interfaces, some
technical problems have to be addressed, though. Jacob (1995) notes
that "[p]erformance does not appear to be constrained by fundamental
limits, but simply by lack of effort in this area, due to its narrow
market." An important technical factor is how discreet the equipment is;
we find that the remotely operated corneal reflection/pupil centre technique
of eye-tracking already is sufficiently discreet, and that the most
important problems for future development are:
- The eye-tracking equipment must be much less sensitive to user
movement. Users might put up with having to stay within a `field of view'
of an angle of, say, ±45°
-somewhat like remote control devices
for TV sets must nowadays-but it is unlikely that they will accept
having to sit quite still all the time.
- The eye-tracking equipment must not constantly require user-attended
re-calibration. if a new type of interface begins to fail, "the user can
no longer rely on the fact that the computer dialogue is influenced by
where his or her eye is pointing and will thus soon be tempted to retreat
permanently to whatever backup input modes are
available" (Jacob 1995).
- The eye-tracking equipment must be able to track several persons
simultaneously; a single-user interface is `OK,' but people work
together, watch TV together etc., and will find it discouraging not to be
able to do that.
- The eye-tracking equipment should be able to identify the tracked
persons, most likely by iris pattern recognition. There is-to the best
of our knowledge-no eye-tracking equipment today that has attempted to
do this.
- If the eye-tracking equipment supports the two previous features, it
will be possible to treat eye-gaze tracking data on an individual basis;
the eye-tracking equipment should utilize a database for storing and
retrieving the characteristics and preferences of individuals.
Next: 6.2 Interest and Emotion Sensitive Media
Up: 6 Visions of the Future
Previous: 6 Visions of the Future
Authors: Arne John Glenstrup
and Theo Engell-Nielsen