An important prerequisite for producing noncommand user interfaces that will be accepted by users is that they are multimodal, i.e. that they account for several human `output-modes.' Input should not stop with eye tracking; body position and movement tracking (specifically hand gesture tracking) should add to the `user state' input to the computer system. The reason is this: when the user is required only to interact in a "natural," "direct" way with the displayed data, one input mode (e.g. eye-gaze) would often be ambiguous, since humans generally rely on multimodal signals for disambiguating communication (Jacob 1994).
| "Move the triangle..." "...there." |
![]() |
Figure 11: An example of multimodal communication using eye-gaze and gesture combined with a "speech mouse button" (from Bolt 1984).
Multimodality should not be confined to noncommand user interfaces though;
the present-day interface consisting of a keyboard and mouse constitutes a
bottleneck in the communication from the human to the
computer (Jacob et al. 1993). By utilizing multimodality the bandwidth
of this communication can be increased: the communication can take place in
a more parallel fashion (currently one uses either the keybord
or the mouse), and the different types of information that are to be
communicated can use the most appropriate communication mode (cf.
table 11). If general spatial positions and areas are
to be communicated, gesture or eye-gaze could be used; commands could be
given by speech and exact postioning of a cursor could be done with the
keyboard. In the case of an eye-gaze interface, augmenting it with voice
recognition could greatly help solve at least the first and second eye-gaze
control problems
: voice commands could
function as a mouse button (see figure 11) and an
eye-gaze clutch (i.e. saying for example "start eye-gaze" or "stop
eye-gaze").
Table 11: Communication advantages and disadvantages of the different modes
of communication from human to computer