An important issue when designing interfaces based on eye-gaze control is exactly how to use the gaze direction. The point of regard in a display can be used as it is, for positioning a perhaps invisible mouse pointer, and using this to select from some sort of menu-based system (Hansen et al. 1995, Hutchinson et al. 1989, Frey et al. 1990, Ware & Mikaelian 1987, LC Technologies 1993, Chapman 1991, Gips et al. 1993, Smyth et al. 1994), but it can also be processed further, using the knowledge of the connection between eye-gaze and interest described in section 3.4 (Starker & Bolt 1990, Nielsen 1993, Jacob 1993, Jacob 1995).
The main reason for the great success of the mouse as a pointing device combined with direct manipulation interfaces is the fact that it is based on human abilities. Through thousands of years man has developed the skill of grabbing and moving objects with his hands and working on them, and this is basically what happens when one places the mouse over an object, click on it and drag it to somewhere else-the only initial difficulty for a new user is to interpret the position of the mouse pointer as the position of the hand, and this is quickly done.
Man does not have a natural ability to drag and drop objects with the eyes, so this could indicate that it would not be beneficial to use the eyes to point at and move objects. Yet the empirical evidence shows that it can be done-even with greater speed than with a mouse, which shows that adaption to this new mode of usage is swift. There are some problems with eye-gaze control in relation to the mouse, however:
) solely on the basis
of the gaze pattern. Thus, there is a need for some sort of "clutch" to
engage and disengage the intentional manipulatory looking
(Jacob 1993, p. 164).
These problems can be addressed using two fundamentally different strategies. One can construct methods for solving them one by one, perhaps even succeeding in solving more than one problem by one method. This is the strategy used in most eye-gaze controlled applications to date-typically the first and second problems are solved by introducing a latency time that must elapse before a selection is effectuated, or an explicit clutch, a manual button, is supplied (Hansen et al. 1995, Frey et al. 1990, Hutchinson et al. 1989, Ware & Mikaelian 1987). If a dwell-time solution is chosen, it is almost mandatory to issue some form of warning to the user that the selection is about to be effectuated. This can be done by an auditory signal (Hutchinson et al. 1989), but a rather elegant solution has been developed for the EyeCatcher, where the selection icon is animated in such a way that the user can directly perceive and predict when the selection will be made (cf. section 4.2). This strategy does not automatically solve the third problem, so care must be taken when designing object behaviour to allow for darting eye movements without a plethora of activated objects confusing the user. In the case of the EyeCatcher, the user might be afraid to look freely at the display, as a quick glance across a line of EyeCons causes a "button flashing" effect (cf. section 4.2.6), thus putting a strain on her to concentrate on not looking at the EyeCons.
A different strategy for attacking the three problems is simply to remove them by giving up the idea of using the coordinates of eye-gaze as a substitute for mouse-coordinates. This requires interpreting the raw eye-gaze tracking data at a higher semantic level than simple coordinates, deducing not the immediate point of looking but the object of interest and the user's general cognitive state. Surely this is an non-trivial task that will require ingenious use of combinations of artificial intelligence, cognitive state heuristics and perhaps artificial neural networks. An example of a first attempt to use this strategy is given by "the Little Prince" storyteller application described in section 4.3.1; the interesting thing to note is that the application bases its responses on aggregate data from an interest module that uses heuristics to determine how specific the user's interests are, and in what.
This idea of using a different "dimension" of the raw tracking data lends itself nicely to a related concept described by Nielsen (1990), namely noncommand user interfaces. Basically, all computer applications today operate by some sort of command system, be it explicit command-line interfaces or direct manipulation interfaces; they all require the user to view the computer as a collection of tools that must be activated (commanded to operate) in conjunction with the user's data to solve the required task. In noncommand user interfaces,
the unifying concept does seem to be exactly the abandonment of the principle underlying all earlier paradigms: that a dialogue has to be controlled by specific and precise commands issued by the user and processed and replied to by the computer. The new interfaces are often not even dialogues in the traditional meaning of the word, even though they obviously can be analysed as having some dialogue content at some level since they do involve the exchange of information between a user and a computer. (Nielsen 1990, p. 21)
In noncommand user interfaces, the computer does not wait for the user to actuate objects, but rather continuously senses the user and responds quietly to the user's interests. Thus, highly processed and interpreted eye-gaze data revealing the user's interest can be the basis for interaction in a noncommand user interface.
If careful consideration goes into crafting noncommand user interfaces, the user will experience a more "transparent" interface that will enable her to concentrate on perceiving and interacting directly with the data-which is of course the main task for the user-instead of operating tools. One can view this in terms of Shneiderman's (1992) syntactic-semantic object-action (SSOA) model that classifies the user's knowledge of interaction with the system. In the SSOA model, the user's knowledge of concepts is divided into computer concepts (e.g. that computers can store information) and task concepts (e.g. that when writing up a report, one must subdivide it into smaller units). Syntactic knowledge is the knowledge of which keys to press and how commands can be used and combined. The amount of syntactic knowledge required of the user should be minimised, as this is typically hard to learn and remember, and is inessential for the user's task. What noncommand user interfaces can offer is virtually to eliminate the requirements of syntactic knowledge, thus freeing more of the user's cognitive resources to be used for processing semantic knowledge. Thus noncommand user interfaces should be an improvement over command-based interfaces (e.g. WIMP interfaces, eye-gaze `mouse' controlled interfaces), because they allow the user to process more task-related information.