next up previous contents
Next: 2 Present-day Eye-Gaze Tracking Up: Eye-Gaze Media Previous: List of Figures and Tables

1 Introduction

Every day of their life, most people use their eyes intensively for a large variety of purposes: for reading, for watching entertainment, for gathering information to plan their actions, for perceiving and learning new things, for guiding their locomotion and for evaluating their actions, just to name a few. Normally, we do not appreciate how great an effort our eyes put into our perception processes, and what vast amounts of information they process; we are used to regarding our visual system as "transparent," i.e., that we can concentrate our conscious processes on operating on the concepts that surround us, leaving the intake and basic processing of optical information to our eyes and visual system.

This is definitely not a trivial task; light structure from the ambient optic array must continuously be sampled and integrated, both temporally and with prior, stored knowledge. To this end, our eyes are constantly moving (and so are our head and body normally, too) to make the next important light structure sample. We are used to thinking of our eyes mainly as this: input-organs, that only observe the surroundings, and this is definitely also their most important role, but in fact they also operate as ouput-organs. The output they are capable of producing is, on the face of it (no pun intended!), direction: in the process of sampling the optic array, the eyes are pointed in one direction, thus indicating what is being focused upon. What is important to notice is that because of physiological constraints-humans can only observe their surroundings in detail with the middle of their retinas-this direction of physical organs can also in many cases give an indication of their direction of thoughts.

As early as 1936, Mowrer succeeded in making automatic recordings of the orientation of the eye in the head, and thus the direction of gaze (cf. Scott & Findlay 1993). The techniques for making these recordings have only improved gradually during the 20th century, but recently they have become sufficiently non-intrusive on the subject to be useful also outside the laboratory. In particular, one technique using a video camera can track the eye-gaze from a distance, and we believe that this technique will be commercially available in the near future for ordinary computer users.

The ability to track the direction of gaze of the user has sparked off a parallel research direction: enhancing the communication between the user and the computer. Since the introduction of computers for the general community, advances in communication have mainly been made on the communication from the computer to the user (graphical presentation of data, window systems and the use of sound for presenting data), whereas communication from the user to the computer is still (for the majority of users) confined to keyboards, joysticks or mice-all operated by hand. By tracking the direction of gaze of the user, the bandwidth of communication from the user to the computer-that is, the potential information transfer amount-can be increased by using the information about what the user is looking at, and even designing objects specially intended for the user to look at. This is only the start of a discussion on the increasing of user-computer bandwidth; by monitoring the entire user the computer can react to all kinds of gestures too, and we have already seen the first try-outs of voice recognition systems that can react intelligibly to the user's utterances on a commercial basis.

This in turn leads to a new way of regarding the computer, not as a tool that must be operated explicitly by commands, but instead as an agent that monitors the user, who in turn is allowed to concentrate on interacting with the data presented by the computer, instead of using the computer applications as tools to operate on the data. This does not mean removing the user's control over the machine, but rather that the hardware as a porthole into the user's data-which is what the user really is interested in-becomes more transparent: the user must be aware of the keyboard or mouse to a much lesser degree than today. In terms of Shneiderman's (1992) Syntactic-Semantic Object and Actions model of the user's mental model of the interaction, this could drastically reduce the syntactic demands on the user, thus freeing cognitive resources for the processing of the semantic objects.

We envision that a future computer user who would like to, say, use some information from the Information Highway (the Internet) and corporate or private databases for doing some work, could wander into the room where the so-called `cyberputer' was located-or open up her portable `cyberlaptop'-and interact with it by looking, speaking and gesturing. We could imagine that the cyberputer would be an integration of several of the home appliances of today, e.g. radio, television, video, laserdisc, CD-i, computer, telephone. As the user's gaze falls on the `cyberscreen' (the screen of the cyberputer), it would start to operate from the user's favorite starting point-or perhaps where she left off last time. She can then look at some objects that create a 3D effect by altering the perspective when she moves about. This 3D effect will facilitate graspable presentation of much larger amounts of data than is possible with the present-day windowing systems.

Wherever she looks, the cyberputer will begin to emphasize the appropriate data carrying object (database information, perhaps graphical, ongoing movies, videophone calls etc.), and utterances like "let's take a closer look at that" combined with a glance or a pointing gesture will zoom in on the selected object. If the user's gaze flutters over several things, the cyberputer will assume that the user would like an overview, and an appropriate zooming out or verbalized data summary can take place.

To investigate the present status of technology needed for this class of futuristic, multimodal systems, we have chosen to focus on one of the techniques described, namely eye-gaze interaction, and only mention the use of other modalities when it is essential for stating a point. Gesture and voice-based commands would need a similar in-depth investigation, but restricted time and place makes this focusing necessary. Hopefully, some of the major points on the usefulness of eye-gaze systems may apply equally well to other modalities.

This thesis is structured in the following way:



next up previous contents
Next: 2 Present-day Eye-Gaze Tracking Up: Eye-Gaze Media Previous: List of Figures and Tables



Authors: Arne John Glenstrup and Theo Engell-Nielsen