Five tracking techniques use light (mainly infrared light) reflected by the eye (on the cornea or further in in the eye): limbus tracking, pupil tracking, corneal and pupil reflection relationship, corneal reflection and eye image using an artificial neural network and Purkinje image tracking.
The limbus is the boundary between the white sclera and the dark iris of the eye. Due to the fact that the sclera is (normally) white and the iris is darker, this boundary can easily be optically detected and tracked. This technique is based on the position and shape of the limbus relative to the head, so either the head must be held quite still or the apparatus must be fixed to the user's head.
Due to the more or less occasional covering of the top and bottom of the limbus by the eyelids, "it is probably fair to regard limbus tracking as suitable for precise horizontal tracking only" (Scott & Findlay 1993). Scott and Findlay also say that limbus tracking does not satisfy requirements d, e and g, but we believe it might be possible to improve on the temporal dynamics (requirement g) by refining the technique.
Tracking the direction of gaze by the pupil tracking technique is similar to limbus tracking, only here the smaller boundary between the pupil and the iris is used instead. Once again, the apparatus must be held completely still in relation to the head. The advantages of this technique over limbus tracking is that
Figure 2: The four Purkinje images are reflections of incoming light on the boundaries of the lens and cornea
When (infrared) light is shone into the user's eye, several reflections occur on the boundaries of the lens and cornea, the so-called Purkinje images (see figure 2). The first Purkinje image is also called the glint, and this together with the reflection of light off the retina-the so-called bright-eye-can be video-recorded using an infrared sensitive camera as a very bright spot and a less bright disc, respectively. When the eye is panned horizontally or vertically, the relative positioning of the glint and the centre of the bright-eye change accordingly, and the direction of gaze can be calculated from these relative positions.
The problems associated with this technique are primarily those of getting a good view of the eye; lateral head movement can put the video image of the eye out of focus, or even make the image of the eye fall out of view of the camera. The range over which the direction of gaze can be tracked (cf. requirement f above) by simple software algorithms is ±12°-15°, according to Scott & Findlay (1993), due to the fact that further eye movement will render the glint outside the spherical part of the cornea, thus requiring more complex calculations to perform the necessary geometrical corrections.
Hopefully, these problems can in future be circumvented by better software correction algorithms, better tracking cameras that adapt to the position of the user's head (as indicated by Jacob 1991), and perhaps several cameras that co-operate to expand the range over which the direction of gaze can be registered (cf. also section 6.1).
One of the more recently developed techniques is one where the computations are done by an Artificial Neural Network (Baluja & Pomerleau 1994). The raw material for eye-gaze tracking is still a digitised video image of the user, but this technique is based on a more wide-angled image of the user, so that the entire head is in the field of view of the camera. A stationary light is placed in front of the user, and the system starts by finding the right eye of the user by searching the video image for the reflection of this light-the glint, distinguished by being a small, very bright point surrounded by a darker region. It then extracts a smaller, rectangular part of the video image (typically only 40 by 15 pixels) centered at the glint (see figure 3), and feeds this to an ANN . The output of the ANN is a set of display coordinates.
Figure 3: An example of a 30 by 15 pixel low resolution extraction of the user's eye from a wide angle image (from Baluja & Pomerleau 1994)
The ANN requires more than the simple calibration that is required by the other techniques; it must be trained by gathering images of the user's eye and head for at least three minutes while the user visually tracks a moving cursor on the display. This is followed by an automatic training session that uses the stored images lasting approximately 30 minutes using the current technology, but then the system should not require re-calibration on the next encounter.
The accuracy of the ANN -based system is not as good yet as for the other techniques; it can be improved slightly by augmenting the corneal/pupil based calculations with a calculation based on the position of the glint in the eye socket, but it is still limited to an accuracy of about 1.5-2° (and thus does not fulfill requirement d). The great advantage of this technique is that due to the wide angle of the base image, user head mobility is increased-the authors report that the user is free to move her head up to 30 cm. This could make it a quite tractable solution in places where a high degree of accuracy is not essential.
The first and fourth Purkinje images can be used for tracking the direction of gaze by the Dual-Purkinje Image technique (Müller et al. 1993), which uses the relative positions of these reflections to calculate the direction. The Dual-Purkinje-Image technique is generally more accurate than the other techniques, and the sampling frequency is high, up to 4000Hz. The disadvantage with this technique is that the fourth Purkinje image is rather weak, so the surrounding lighting must be heavily controlled (Cleveland & Cleveland 1992).