One of the future spin-offs of eye tracking is "interest and emotion sensitive media" (IES) as suggested in Hansen et al. (1995)-an idea that makes audience-directed script navigation possible.
In this section we will narrow in on what IES is and discuss some of the aspects concerning the success of IES. Furthermore we will try to pin down some problems we expect will arise with the video technology available today.
Since this is a new term, and to avoid any misunderstandings, we will start out by defining our notion of what IES is. IES is of course related to eye tracking, but it is more than that. A lot of other detectable features could be incorporated, such as gesturing, body language, etc. We regard IES as being a media that determines and utilizes the user's interests and emotional affections when subjected to (visual, auditory or otherwise) impressions. Affection can be detected by various parameters as shown in table 12.
Feature Voluntary Time context
Change in pupil size -
Short Altered blinking rate - Short Head orientation + Short
Sudden head movements - Short Gestures + Short Change
in heartbeat volume - Short Uneasiness / nervous movements
+ Short Voice characteristica + Short Eye contact
avoidance + Short Alterations in pulse rate - Long
Breathing rate - Long Body temperature - Long Sweat
rate - Long
Table 12: Affection measures
All features are, under normal conditions, not fully or not at all controllable by the user. In the second column of table 12, involuntary features are marked with a -, voluntary with a +. The features are also divided into two other categories: those we can interpret relative to samplings made within a short time interval and those which demand long time interpretation sampling intervals (cf. table 12).
In a broader perspective we regard IES as being an unobtrusive (it must comply with requirement b) registration and interpretation of these features, all of which are interpreted again as a whole, ending up with a description of the spectator's emotional "status" (see figure 12). This description can then be used to find out why the spectator is gazing at a certain area (it could be due to interest, disgust, fear, etc.), and ultimately determine the future media responses. When more that one user is tracked, the determination could be altered by a "arbiter" module, which could either use certain heuristics at random making it more fun to use IES or be programmed in advance by the users. The arbiter module could be "democratic"-statistically giving every user the same amount of script selections, some heuristics could be "cheating," some could change from one heuristic to another under the duration of the film.
Figure 12: IES related to affection feature interpretations. The input from the user through the tracking devices are put into the input devices which can manipulate the input according to its type. Some input can be bypassed (such as eye-gaze position), some need to be interpreted according to earlier observations (such as recent eye-gaze movement pattern). These inputs (of different types) are then passed on to the IES device which makes a selection from the available script database. The navigation directions are then stored in the database to rule out future repetitive path selections. The information is passed on to the navigation module, which controls the IES media unit (typically an CD-i playback unit), thus giving the user feedback. Please notice that when more than one user is tracked, the process of the interior dotted bounding box should be "duplicated", implying activation of the arbiter module.
IES can be used for several purposes; some possibilities are:
Some problems can occur when designing a script for an interactive film. In figure 13 we have made a representation of a simple script to give some examples of problems that can happen if the paths have not been carefully designed.
Figure 13: An example of a film script. Should it be possible to make "flash backs" as the dotted and dashed lines imply? This must depend on the director's opinion.
When only one scene can be displayed to all of the audience (singular viewing), things are relatively simple. When, on the other hand, the technology will allow us to show a different picture to each individual viewer (parallel viewing), some problems regarding parallel script navigation arise. For instance: Should parallel branches share the same real-time length or in-film length? What will happen with script-cycles, such as flashbacks (see dotted and dashed line in figure 13, should they be allowed? We think this is for the director or script manager to decide as it is one of this new techonology's unique features. One large problem we predict is that scripts with branches with different film-time lengths are used. If spectators are to meet, say at the bottle neck in figure 13, it is necessary to maintain the same length of film-time on all paths. Also problems concerning different total film length could be foreseen, this could be solved by run-time insertion of small and still unseen film clips.
We have found one considerable problem concerning parallel viewing: Sound. Technology today allows the displaying of multiple images as described in Arthur (1995). Whereas light does not have the tendency to spread out and bounce back and forth, sound does, thus making it virtually impossible to use playback of several sound tracks at the same time. The only solution to this problem seems to be headphones, which raises the problem of giving the spectator a feeling of solitary viewing. Of course semi-soundproof headphones could be used, but the comments from other spectators at times when the spectators are not viewing the same sequence could seem to be too large a trade-off. Contratictive to this postulation are the very popular voting systems that a few cinemas offer, since they have an very positive effect of increasing the public's interest of the information displayed on the media.
Some critical issues arise when talking about tracking: How should the system behave when handling more than one person watching the media, and should she/they be made conscious of being tracked all the time or once in a while or not at all? How should it be done?
Today this is the ideal tracking environment due to the technical limitations of the systems present today; it is both easy to implement and gives the spectator what she wants. The simplicity makes it very well suited for use in usability labs when testing methods and equipment for future usage.
When tracking more than one person some problems come to mind:
Concerning interference (point 1) we believe the only tolerable solution is iris pattern recognition. When an IES system has been initialised accordingly to all of its users, problems concerning interference disappear. The solution for missing input (point 2) should depend on field studies, as algorithms for average calculations must be evaluated empirically. The problems in point 3 should be solved by the script manager for the IES media; this solution is the easiest to implement and (hopefully) guarantees the users' satisfaction. The problem concerning being ignored (point 4) is difficult to solve. An algorithm taking care of every spectator's preference will be inadequate since this is a subjective and temporally dependent problem.
The tracking device can be implemented in different ways as already described in section 2. We will now try to describe how we expect IES will be used in the near future.
The IES as a concept is thought to be used in an interactive multimedia device that makes the spectator capable of selecting the path through a multiple path video scenario using her gaze to point at predefined areas of interest on the visualized media. The predefined areas (called activation areas) are buttons or polygons that enclose an object on the image displayed, making detection of eye gazing at that particular object possible. A distinction should be made between the activation areas and IES links. Activation areas are to look at so that one's eyes can be tracked, links are descriptions of where to continue in a script when a specific activation area has recieved the largest amount of fixations.
The activation areas were originally not thought to be visible to the user, so that the scenario path selection would become more or less unconsciously controlled. This will be more thoroughly discussed in section 6.2.4. To detect which predefined areas are currently being gazed at, a pick-correlation technique (a method for detecting whether a point is inside or outside a polygon) as described in Foley et al. (1990) should be used.
When IES becomes available to the public it will probably be implemented as a part of a television combined with a CD-i player. If a system should be able to track multiple possibly-moving persons, (at least) two solutions come to mind:
The most reasonable solution is to have a mirror in front of the camera instead of moving the camera itself (see figure 4). When tracking multiple persons 50 times a second, the mirror is the most inexpensive item and much more likely to withstand the strain (the rotating mirror technique is already used today in laser printers and in most "laser show" hardware). This camera is used for tracking the orientation of all persons' eyes. Therefore another camera is needed for an over-view scanning of the persons in front of the eye tracker to solve the problem concerning how many people to track and where they are situated in the room. Keeping track of the persons' body movements, making it possible to find each individual's area of interest, is not important if the software simply calculates an average of covered activation areas.
Displaying activation areas is not trivial, since the way in which it is done makes the application emit a certain appeal. We have found four different ways of using activation areas (see figure 14 for examples on these):
This type of button must not be confused with the multi resolution displays described in 6.1.1, in Stelmach et al. (1991) and in Bolt (1984, p. 62).
Figure 14: The four types of IES
buttons:
First: The visible buttons.
Second: The outlined activation areas.
Third: The resolution buttons.
Last: The invisible buttons (the original image)
We therefore think that the visible buttons will mostly be used in game-like films and the invisible button will be the button to choose when making serious IES films.
When watching an IES film the user can either be conscious of being tracked or not. The unconsciousness can be treated at two levels:
It is most possible that the users will act differently when watching an IES film with visible activation areas. Problems concerning lack of detail is a problem that is implied by two means: a) What the spectators' reactions are towards the problem. If the spectators think it is really annoying, the visible areas should be avoided at all costs. b) If the script is well prepared this problem will be obsolete. If activation areas are not displayed, no one will know when the selection is possible and what the choices are, implying that no one will get the impression of being ignored. These possible solutions points in the direction that activation areas should not be made visible.
In this section we describe the selection of tools needed for making IES films. The section ends up with a discussion of present problems with the available video compression techniques.
A script planner is essential to plan and edit hyper links, detect missing hyper links, dead ends or loops in scripts. It could be used also for planning the actual filming. It would be an essential tool during pre- and post-production.
When all the IES hyper links are ready to be implemented, the active areas in each scene must be specified. This is already available due to the recently developed image processing units used for making colour films from old black and white films which are capable of tracking outlines of objects-even automatic detection in image sequences to some extent.
As written in Hansen et al. (1995), IES is thought to be implemented using the CD-i-environment. The CD-i is a good choice of media since it already provides multimedia capacities (sight, sound and interaction). It supports the MPEG compression method for restoring and playback compressed video and audio. But the compression method used is lossy (see Murray & vanRyper (1994) for further specification). This implies loss of detail, and since the video images are divided into small "compressable" areas the result displayed might consist of mono-coloured square-shaped chunks. The loss of detail might turn out to be fatal, since areas of interest with their details should be clearly visible. To solve this problem the activation areas must be determined before the video images are compressed into the MPEG format and stored onto the CD-i to ensure a good quality of these areas. The problem with the sometimes quite large mono-coloured squares is made worse by the "nature" of the MPEG compression. Due to the compression method and choice of compression ratio (typically between 1:16 and 1:200), these squares outside the activation areas must be tolerated. It is hard to tell if it is possible to ignore these badly uncompressed areas; the answer must depend on field study experience.
We have thought of three ways of finding activation areas.
One problem arises, though. When links have not been set up, should the audience be shown all the filmed material? It could end up being quite a hard job being a member of the audience-We hope it will be well paid.
The artistic expression is in most visual media very important. When giving the user the possibility of selecting her own path through a carefully planned scenario, the artistic expression could change radically. Therefore