next up previous contents
Next: 6.3 A Multipurpose Eye-Gaze Controlled Application: the "Cyberputer" Up: 6 Visions of the Future Previous: 6.1 Improved Eye Tracking Techniques

6.2 Interest and Emotion Sensitive Media

One of the future spin-offs of eye tracking is "interest and emotion sensitive media" (IES) as suggested in Hansen et al. (1995)-an idea that makes audience-directed script navigation possible.

In this section we will narrow in on what IES is and discuss some of the aspects concerning the success of IES. Furthermore we will try to pin down some problems we expect will arise with the video technology available today.

6.2.1 Definition of IES

Since this is a new term, and to avoid any misunderstandings, we will start out by defining our notion of what IES is. IES is of course related to eye tracking, but it is more than that. A lot of other detectable features could be incorporated, such as gesturing, body language, etc. We regard IES as being a media that determines and utilizes the user's interests and emotional affections when subjected to (visual, auditory or otherwise) impressions. Affection can be detected by various parameters as shown in table 12.

FeatureVoluntaryTime context

Change in pupil size - Short
Altered blinking rate - Short
Head orientation + Short
Sudden head movements - Short
Gestures + Short
Change in heartbeat volume - Short
Uneasiness / nervous movements + Short
Voice characteristica + Short
Eye contact avoidance + Short
Alterations in pulse rate - Long
Breathing rate - Long
Body temperature - Long
Sweat rate - Long

Table 12: Affection measures

All features are, under normal conditions, not fully or not at all controllable by the user. In the second column of table 12, involuntary features are marked with a -, voluntary with a +. The features are also divided into two other categories: those we can interpret relative to samplings made within a short time interval and those which demand long time interpretation sampling intervals (cf. table 12).

In a broader perspective we regard IES as being an unobtrusive (it must comply with requirement b) registration and interpretation of these features, all of which are interpreted again as a whole, ending up with a description of the spectator's emotional "status" (see figure 12). This description can then be used to find out why the spectator is gazing at a certain area (it could be due to interest, disgust, fear, etc.), and ultimately determine the future media responses. When more that one user is tracked, the determination could be altered by a "arbiter" module, which could either use certain heuristics at random making it more fun to use IES or be programmed in advance by the users. The arbiter module could be "democratic"-statistically giving every user the same amount of script selections, some heuristics could be "cheating," some could change from one heuristic to another under the duration of the film.

Figure 12: IES related to affection feature interpretations. The input from the user through the tracking devices are put into the input devices which can manipulate the input according to its type. Some input can be bypassed (such as eye-gaze position), some need to be interpreted according to earlier observations (such as recent eye-gaze movement pattern). These inputs (of different types) are then passed on to the IES device which makes a selection from the available script database. The navigation directions are then stored in the database to rule out future repetitive path selections. The information is passed on to the navigation module, which controls the IES media unit (typically an CD-i playback unit), thus giving the user feedback. Please notice that when more than one user is tracked, the process of the interior dotted bounding box should be "duplicated", implying activation of the arbiter module.

Where can IES be applied?

IES can be used for several purposes; some possibilities are:

Recreative viewing:
When applied to film s, the IES adds a lot of new perspectives to the (television) media. Being able to more or less consciously select the information that you are most interested in gives the viewing experience a completely new dimension.

Commercials:
Both at home and in supermarkets as we know them today, with VCRs playing commercials all day long, the commercials could be designed to adapt to the spectators' likings so that they will give the best impression of the product.
Information browsing systems:
When viewing, say, a new tourist information screen at a tourist office or a railway station, equipped with eye-control and IES , a browsing tourist would be able to find what she finds interesting more naturally than today, where she must depend on the knowledge and preferences of the tourist information staff or an inadequately detailed city map. A tourist might want some information on the theatres in Copenhagen and starts searching for this topic in a list sorted by name. After finding a list of all theatres in Copenhagen she finds what she is looking for and wants to know where the theatre is located. When the map is displayed she catch a glimpse of a cinema. The system zooms in, and presents her with the films the cinema offers. Instead of going to the theatre, she decides for trying out this new cinema having eye-gaze controlled films.
Television programmes educational programmes:
Usually when showing a television programme with a broad target group, as in wild-life television programmes, it never uses the amount of time to convey the information any individual wants on each single subject because the television programme covers a lot of different subjects. If a spectator could decide for herself, she would probably delve deeper into those subjects she likes, instead of being cut off just when the programme was starting to get really interesting. To exemplify: When looking at a wild life film on animals from Africa, you might see an animal that you find very interesting. The IES module detects interest in changing subject and starts downloading video and sound on the given animal.
Video games:
The IES could also influence future video games as multi-media computers and similar devices become more and more common. Today most new video games have a predefined storyboard, containing numerous predefined sequences, and depending on those actions the user takes, she advances through this storyboard. An example: "Wing Commander 3," a newly released computer game for an ordinary IBM-compatible PC which is stored on four CD-ROMs. The CD-ROMs include many interactive video clips and takes place in the future on a large space ship from where you carry out missions against a enemy race. You can speak with most of the characters (simply by clicking on the person) on the mother ship, to obtain knowledge that you can use against the unsympathetic pilots. At certain times in the game you are offered the choice of a friendly or a suspicious approach when talking the the other characters. Your actions through out the game determines the future choices you are offered. The game stars professional actors, thus the production cost: approximately $20,000,000.

Script Construction

Some problems can occur when designing a script for an interactive film. In figure 13 we have made a representation of a simple script to give some examples of problems that can happen if the paths have not been carefully designed.

Figure 13: An example of a film script. Should it be possible to make "flash backs" as the dotted and dashed lines imply? This must depend on the director's opinion.

When only one scene can be displayed to all of the audience (singular viewing), things are relatively simple. When, on the other hand, the technology will allow us to show a different picture to each individual viewer (parallel viewing), some problems regarding parallel script navigation arise. For instance: Should parallel branches share the same real-time length or in-film length? What will happen with script-cycles, such as flashbacks (see dotted and dashed line in figure 13, should they be allowed? We think this is for the director or script manager to decide as it is one of this new techonology's unique features. One large problem we predict is that scripts with branches with different film-time lengths are used. If spectators are to meet, say at the bottle neck in figure 13, it is necessary to maintain the same length of film-time on all paths. Also problems concerning different total film length could be foreseen, this could be solved by run-time insertion of small and still unseen film clips.

We have found one considerable problem concerning parallel viewing: Sound. Technology today allows the displaying of multiple images as described in Arthur (1995). Whereas light does not have the tendency to spread out and bounce back and forth, sound does, thus making it virtually impossible to use playback of several sound tracks at the same time. The only solution to this problem seems to be headphones, which raises the problem of giving the spectator a feeling of solitary viewing. Of course semi-soundproof headphones could be used, but the comments from other spectators at times when the spectators are not viewing the same sequence could seem to be too large a trade-off. Contratictive to this postulation are the very popular voting systems that a few cinemas offer, since they have an very positive effect of increasing the public's interest of the information displayed on the media.

6.2.2 Tracking of the Eye(s)

Some critical issues arise when talking about tracking: How should the system behave when handling more than one person watching the media, and should she/they be made conscious of being tracked all the time or once in a while or not at all? How should it be done?

Single person tracking

Today this is the ideal tracking environment due to the technical limitations of the systems present today; it is both easy to implement and gives the spectator what she wants. The simplicity makes it very well suited for use in usability labs when testing methods and equipment for future usage.

Multiple person tracking

When tracking more than one person some problems come to mind:

  1. When eye tracking a group, for instance parents and their children, should some person's input to the system be ignored (some parents might be quite aware of what is inappropriate for their children to watch, and thus select what they think is sound for their children to see. The children might get angry when their parents interfere with their favourite cartoon show)? How should this be solved?
  2. What should happen when a user does not monitor the display while selective areas are active? It could be interpreted as `the user is currently not looking at an activation area' or `the user is not looking at all.' Implementation of the latter would imply problems in calculations of average coverage of the activation areas.
  3. When an average has been calculated and a number of groups have spent the same time on each of their areas of interest, what should happen? Should there be a predefined preference on the media, defined by, for example, the director? This would work fine if it would be possible to watch a film without being tracked and still get a reasonable impression of it.
  4. Should all persons watching be tracked and a simple average be calculated so that the selectable area of interest that has the longest total fixation time is "selected"? What happens to the person who has an interest in an aspect that is overruled by the rest of the spectators? Most possibly she will get annoyed and it will probably spoil the entertainment. Will she get so irritated that she will watch the film again another time-alone?

Concerning interference (point 1) we believe the only tolerable solution is iris pattern recognition. When an IES system has been initialised accordingly to all of its users, problems concerning interference disappear. The solution for missing input (point 2) should depend on field studies, as algorithms for average calculations must be evaluated empirically. The problems in point 3 should be solved by the script manager for the IES media; this solution is the easiest to implement and (hopefully) guarantees the users' satisfaction. The problem concerning being ignored (point 4) is difficult to solve. An algorithm taking care of every spectator's preference will be inadequate since this is a subjective and temporally dependent problem.

6.2.3 Tracking IES

The tracking device can be implemented in different ways as already described in section 2. We will now try to describe how we expect IES will be used in the near future.

The IES as a concept is thought to be used in an interactive multimedia device that makes the spectator capable of selecting the path through a multiple path video scenario using her gaze to point at predefined areas of interest on the visualized media. The predefined areas (called activation areas) are buttons or polygons that enclose an object on the image displayed, making detection of eye gazing at that particular object possible. A distinction should be made between the activation areas and IES links. Activation areas are to look at so that one's eyes can be tracked, links are descriptions of where to continue in a script when a specific activation area has recieved the largest amount of fixations.

The activation areas were originally not thought to be visible to the user, so that the scenario path selection would become more or less unconsciously controlled. This will be more thoroughly discussed in section 6.2.4. To detect which predefined areas are currently being gazed at, a pick-correlation technique (a method for detecting whether a point is inside or outside a polygon) as described in Foley et al. (1990) should be used.

When IES becomes available to the public it will probably be implemented as a part of a television combined with a CD-i player. If a system should be able to track multiple possibly-moving persons, (at least) two solutions come to mind:

Using a camera and an infrared beam
The tracking devices (the infrared source and the tracking camera) will be almost hidden from the spectator (most possibly the infrared beamer will be placed on one side of the television, the camera on the other).

The most reasonable solution is to have a mirror in front of the camera instead of moving the camera itself (see figure 4). When tracking multiple persons 50 times a second, the mirror is the most inexpensive item and much more likely to withstand the strain (the rotating mirror technique is already used today in laser printers and in most "laser show" hardware). This camera is used for tracking the orientation of all persons' eyes. Therefore another camera is needed for an over-view scanning of the persons in front of the eye tracker to solve the problem concerning how many people to track and where they are situated in the room. Keeping track of the persons' body movements, making it possible to find each individual's area of interest, is not important if the software simply calculates an average of covered activation areas.

Pros:
The tracking devices are almost entirely hidden. This makes it possible to "forget" that you are being monitored and therefore behave as if you were not being tracked. The user does not have to wear any special hardware, so this solution complies with Hallett's requirement b.
Cons:
If the software is to maintain a database with the preferences of each individual, this must be implemented as well. This is not a trivial task to solve.

Using high-frequency shutter-glasses
This kind of glasses make the user see only certain frames from the television, and since other users see other frames with their glasses, they are capable of seeing different clips on the same television at the same time. Combining a pair of high-frequency shutter-glasses ) with an eye tracker and built-in IES capacities, a solution is created which avoids a lot of the currently unsolved problems concerning hardware and computer vision (tracking individuals, locating facial features, etc. ).

Pros:
If a person does not wear the special glasses she will not interact with the other persons watching and not interfere.
These glasses make tracking of many persons relatively easy.
Viewing multiple film s at the same time is also possible.
Cons:
A high-frequency television is needed to display the film.
The user must wear special hardware glasses, so this solution does not comply with Hallett's requirement b.
If a person does not wear the special glasses she will not be able to "just" get a glimpse of what is going on in the film.
It will most likely be impossible to get an impression of "what is on the television."

6.2.4 Displaying Activation Areas

Displaying activation areas is not trivial, since the way in which it is done makes the application emit a certain appeal. We have found four different ways of using activation areas (see figure 14 for examples on these):

Visible Button
This is an ordinary graphical user interface button as seen in Microsoft Windows or the X-window system. The EyeCon is such a button. We believe that usage of such buttons will give the IES system a video game feeling, which might not be intended or wanted. These buttons will need a caption telling the user which options she has, which in most cases will imply that the film is paused for a brief moment if the user is too slow to choose.
Visible Outline
The object's or person's outline is drawn. This can be done using normal lines, dotted or dashed lines or flashing lines. The problems concerning the "visible buttons" also concerns the "outline button," but we think it will be easier to use without being annoyed by the outlines.
Resolution Button
This type of button is visualised by changing the resolution of the background (blurring it) and using normal resolution on the activation areas. This has been used in adventure games to bring down the time a user has to spend examining new locations and objects. A variant of this button could be to start the clip with the background blurred and the activation areas in high detail, and then after, say, half a second, the background could be updated gradually ending up with the original images. This approach sounds quite reasonable, but it is not problem free: When clips are made between different shots, it is shown rapidly. If each clip should start by a blurred image being updated, this would be very annoying to look at; What should happen when an object or person enters a scene? In other words: how should it be shown that this object or person in fact is an activation area? We believe that this somehow be very enervating to look at for a long period of time. Another approach using these buttons is the usage of markov chains to predict the next clip's most probable activation area, thus updating this first, then the subsidiary and so on.

This type of button must not be confused with the multi resolution displays described in 6.1.1, in Stelmach et al. (1991) and in Bolt (1984, p. 62).

Invisible Button
This type of button is the one we think will be the preferred, as it interferes least of all.

   

   

Figure 14: The four types of IES buttons: First: The visible buttons. Second: The outlined activation areas. Third: The resolution buttons. Last: The invisible buttons (the original image)

We therefore think that the visible buttons will mostly be used in game-like films and the invisible button will be the button to choose when making serious IES films.

Using IES

When watching an IES film the user can either be conscious of being tracked or not. The unconsciousness can be treated at two levels:

Unconscious tracking
It is hard to think of an authentic situation where it will be possible to maintain the spectators' unawareness of being tracked.
Conscious tracking:
There are two ways of being conscious of being tracked; either the spectator simply knows she is being tracked or she is frequently reminded of it, for example by visible activation areas.

It is most possible that the users will act differently when watching an IES film with visible activation areas. Problems concerning lack of detail is a problem that is implied by two means: a) What the spectators' reactions are towards the problem. If the spectators think it is really annoying, the visible areas should be avoided at all costs. b) If the script is well prepared this problem will be obsolete. If activation areas are not displayed, no one will know when the selection is possible and what the choices are, implying that no one will get the impression of being ignored. These possible solutions points in the direction that activation areas should not be made visible.

6.2.5 IES Film Production Tools

In this section we describe the selection of tools needed for making IES films. The section ends up with a discussion of present problems with the available video compression techniques.

Script planner

A script planner is essential to plan and edit hyper links, detect missing hyper links, dead ends or loops in scripts. It could be used also for planning the actual filming. It would be an essential tool during pre- and post-production.

Selector for activation areas

When all the IES hyper links are ready to be implemented, the active areas in each scene must be specified. This is already available due to the recently developed image processing units used for making colour films from old black and white films which are capable of tracking outlines of objects-even automatic detection in image sequences to some extent.

Problems with the present available video compression techniques

As written in Hansen et al. (1995), IES is thought to be implemented using the CD-i-environment. The CD-i is a good choice of media since it already provides multimedia capacities (sight, sound and interaction). It supports the MPEG compression method for restoring and playback compressed video and audio. But the compression method used is lossy (see Murray & vanRyper (1994) for further specification). This implies loss of detail, and since the video images are divided into small "compressable" areas the result displayed might consist of mono-coloured square-shaped chunks. The loss of detail might turn out to be fatal, since areas of interest with their details should be clearly visible. To solve this problem the activation areas must be determined before the video images are compressed into the MPEG format and stored onto the CD-i to ensure a good quality of these areas. The problem with the sometimes quite large mono-coloured squares is made worse by the "nature" of the MPEG compression. Due to the compression method and choice of compression ratio (typically between 1:16 and 1:200), these squares outside the activation areas must be tolerated. It is hard to tell if it is possible to ignore these badly uncompressed areas; the answer must depend on field study experience.

How to find activation areas

We have thought of three ways of finding activation areas.

Structure Based
In this way the activation areas are predetermined by the director of the film. The activation areas are determined in a way that the message of the film (if any) is expressed. All scenes are planned and the director can concentrate on what he finds important in the film. The director could get an impression of how well planned his predefined activation areas are when showing a draft of the IES film to an audience. If the majority of people's attention is drawn towards the predefined activation areas, the director has succeeded. If not, it might indicate that some of the IES film should be rescheduled or new material filmed if the audience's attention was attracted to a subject not thought of or not wanted.
Audience Based
In this case the activation areas are not planned in advance, but instead the raw material for the IES film is shown to an audience that with their gaze points out what should be activation areas. Afterwards this preview, the director or script planner must check the outcome of the automatic activation area detection made by a computer. Today the semantics of a film cannot in any way be analysed entirely by a computer, and we therefore think that the director or script planner will have to establish the links in the IES film. Again the result can be shown to an(other) audience to get indications of the correctness of the activation areas.

One problem arises, though. When links have not been set up, should the audience be shown all the filmed material? It could end up being quite a hard job being a member of the audience-We hope it will be well paid.

At Random
When using this approach you simply put in all the hyper-links into the script as possible. This allows for the user to spend a lot of time picking up all available information. This approach demands a large amount of material, which might lower the overall quality of the film.

Alterations of the artistic expression

The artistic expression is in most visual media very important. When giving the user the possibility of selecting her own path through a carefully planned scenario, the artistic expression could change radically. Therefore


next up previous contents
Next: 6.3 A Multipurpose Eye-Gaze Controlled Application: the "Cyberputer" Up: 6 Visions of the Future Previous: 6.1 Improved Eye Tracking Techniques
Authors: Arne John Glenstrup and Theo Engell-Nielsen