In this thesis we have attempted to give an overview of eye-gaze media; how the techniques work, what the psychological background is, what present-day implementations are like, what potentials exist in it and finally we have given two examples of possible future eye-gaze applications.
The generally rising interest in eye-gaze media is a result of the development of unobtrusive, video-camera-based eye-tracking techniques that make it possible to use eye-movements for real-time feed-back to the eye-tracked user. In this way eyes can also be used as output-organs, thus increasing the bandwidth of communication from the user to the computer. Using eye-gaze for interaction in multi-modal non-command user interfaces where the interface itself is as transparent as possible and the user simply interacts with her actual data will result in a more "natural," "direct" way of manipulating data.
The structure of the eye-which provides high-acuity foveal vision in the centre and low-acuity peripheral vision in the remaining part of the retina-makes it necessary for human beings to move their eyes about to observe all of their surroundings. This is accomplished mainly by fixations where the eyes are stationary separated by saccades where the eyes move in a ballistic way to view a new point in the visual scene. Research has shown that the place for the next fixation is determined during the fixation, and according to the feature integration theory proposed by Treisman, these processes can be divided into a pre-attentive, parallel phase and a subsequent attentive, serial phase. The pre-attentive phase is unlimited in capacity and operates in parallel across the entire visual field, calculating local mismatch for basic features like orientation, size, colour, movement direction. These mismatch values are added together and in the attentive stage mismatching objects are serially inspected according to their degree of mismatch. The pre-attentive phase is driven by bottom-up processes, whereas the attentive phase is at least partly under strategic control through top-down processes that-according to the zoom lens metaphor-can limit attention to a greater or smaller part of the visual scene. Finally, the different basic features from the selected objects are integrated and stimulus is identified, before the next fixation position is determined.
Eye movements can be classified according to the situation in which they occur into spontaneous, task-oriented, orientation of thought and intentional manipulatory looking. The last class of looking, intentional manipulatory looking, is a new class of looking we propose which will become more widespread with the introduction of eye-gaze controlled applications. It is a kind of looking where the observer actively uses her eyes for manipulating the surroundings; people have always been using their eyes to manipulate the world-one can think of the situation where busy people look at their watches to make others finish quickly-but with the introduction of eye-tracking intentional manipulatory looking will be much more powerful.
To explain why eye-gaze is so interesting an aspect to track one can picture the cognitive processes that constitute perception as activities in working memory, which is a part of the short-term memory. According to two assumptions suggested by Just & Carpenter, the immediacy assumption which states that fixated objects are immediately processed and the eye-mind assumption which states that the eye looks at an object as long as it is being processed, one can assume that if the subject's current task requires information from the display, then we can assume that the direction of eye-gaze is a good indicator of what is currently being processed.
An interesting finding by Yarbus is that eye-gaze occurs in cycles, creating a pattern, a scanpath. When a subject views a picture, the informative areas are rapidly sought out and scanned, and this scanpath is then repeated. It is important to notice that the scanpath is determined both by picture compositional factors and by idiosyncratic factors.
Several eye-gaze controllable applications exist today, mostly communication applications for disabled people, as the use of their eyes is the fastest way to manipulate their prosthetic devices. However, gradually eye-gaze based systems intended for use by non-disabled people are being developed. One such application is an experimental eye-gaze based interface called the EyeCatcher that is a part of the exhibition at the Experimentarium . The EyeCatcher utilizes EyeCons, medium-sized eye-gaze activated animated icons that allow the user to directly perceive when an eye-gaze based selection will be effectuated. EyeCons also serve to separate the gaze-responsive area from the selectable objects, so as to circumvent the Midas Touch Problem-the problem of accidentally activating objects by looking at them.
We have carried out a minor evaluation of the EyeCatcher showing that even though the current-day eye-tracking techniques are unobtrusive, they still leave a lot to be desired; many people had difficulties calibrating the system, and successful use of the EyeCatcher depended on the visitor keeping her head completely motionless. Despite these at times severe technical problems people seemed very positively inclined towards the concept of using eyes as output-organs, and a majority of the guests expressed that they believed eye-gaze operated applications will be an everyday thing in the future. We have also timed ourselves to try to see whether we would experience any speed-up in performance from training. Unfortunately, the application running was not suited for this purpose, so we cannot conclude anything concerning this aspect; it must be a topic for future research. During our evaluation we found several problems that we have listed in the thesis; some were due to the sensitive eye-tracking technology, some were interface problems, but many were also plain ordinary practical problems with the way the installation was set up.
In this thesis we have also argued that eye-gaze interfaces will be faster to operate because "the eye is there before the hand," but that since eye-movements normally are subconscious, information about the user should, wherever possible, be obtained from the user's natural eye-movements. In general, several input modes besides eye-gaze should be used, as this increase in bandwidth of communication from human to computer can support a more "natural" form of human communication, where the different modalities supplement each other to disambiguate the input.
We argue that the main problems that should be addressed by eye-gaze research are how the eye-tracking data should be used, how to solve the Midas Touch Problem and how to solve the One-Way Zoom Problem. The latter is the problem of how to "zoom out" after having "zoomed in" by gazing at some object.
We also propose some usability criteria for eye gaze media; the criteria determine that a good eye-gaze media should offer a high degree of involuntariness, tracking data utilization, modality integration, customizability and technical transparency.
Finally, we try to make some predictions about the future; we argue that eye-gaze media will experience the ketchup effect, where the number of eye-gaze based applications will explode in the next few decades. An important factor in this process is that the basic technology-eye-tracking equipment-must be improved, especially to allow for greater user mobility and less re-calibration.
An eye-gaze based application is described in this thesis: interest and emotion sensitive media (IES). IES can in principle use tracking data from several modalities to determine the user's current interest; this is used in navigating through for example a film. Such films are analogous to hypertext, only the scenes between branching points are temporal instead of spatial. This can potentially cause problems when producing an IES film-especially if one is to cater for multiple people watching each their own version of the same film together. One very serious problem in this connection is that sound is not as directionally controllable as light, and therefore semi-obtrusive techniques like head-phones must be used. Problems also arise when considering the tracking of the IES-viewers; how should the equipment react to users out of view? How should the final decision of which scene to show next be made?
An important issue is how to display the gaze-responsive areas, the so-called activation ares. Of several possibilities, we suggest using invisible activation areas, as we argue that users should not be conscious of being tracked.
Several aspects regarding the production of IES films must be considered; the producer will need some form of script planner, and some means of performing semi-automated activation area selection must be conceived, too. Problems concerning the combination of current-day video compression techniques with IES are outlined; the solution seems to be to integrate the activation area selection and the compression ratio processes. Finally, one must not forget that IES can alter the artistic expressions of the present TV and video media; one must not underestimate the effect of taking part in the navigation of the film one is watching.