Interfaces Homme-Machine Concepts et exemples en Java


									     Analyse du Comportement multimodal

DEA Sciences Cognitives

                                  Jean-Claude MARTIN
                                Maître de Conférences
                    LIMSI-CNRS / LINC-IUT de Montreuil
                                       martin @
Cycle de développement

Généralement 4 phases
  Analyse des besoins
  Prototypage rapide
  Evaluation de l'utilisabilité

Outils d’analyse

Outils d’analyse

Outils d’analyse

         Jean-Claude MARTIN - LIMSI / LINC   5
Wizard interface /
Subject interface
SRI (Kehler et al. 98)

 Subjects Side

Subject: study pen & voice in unconstrained
Non Operational Input Interface
       • Audio input saved, but not processed
       • Pen drawings transmitted to Wizard, but not processed
       • Exception: icon selection (point, circle) enabled

Fully Operational Output (Multimedia)
       • Photos, videos, audio and displayed information are available

Wizard’s Side

Wizard: study pen, voice & GUI using real system
Fully Operational System
        • Can use any available modality or combination of modalities

Shared Subject’s screen PLUS features
        • Advanced GUI (Databases query, Map navigation, etc…)
        • Clock in order to answer within an acceptable amount of time
        • Additional panel to send audio messages to subject

The Wizard is THE EXPERT
        • Captures user intention & decides system response

The problem:
Analyzing Subjects Data



                                            Pen Files

The goal:
building the real system

How to analyze the multimodal behavior ?
How to specify a multimodal system ?
Find a common grid ?
A survey of multimodal
user studies
             Guyomard            Oviatt          Denda           Trafton       Fais 97
              et al. 95         et al. 97       et al. 97        et al. 97

application touristic map      real estate   touristic map    touristic map     map
                                                              hotel booking
                              edition task
  real /     simulated,        simulated          real           simulated       real
simulated     then real
  input                                          speech         keyboard      speech
               speech           speech       tactile screen      ("natural   keyboard
            tactile screen        pen          (pointing)      language"), tactile screen
                                                              mouse, menus
  output       speech           speech          speech           graphics     speech
            graphics, text     graphics        graphics                    graphics, text

What are the monomodal
features of user’s behavior ?

Categories of words and gesture
    Guyomard et al. 95: pointing, lines, areas, contours,
     closed or open
    Oviatt et al. 97: composed circle-line-circle
    Mignot et al. 96: several fingers (rotation)

Monomodal behavior changed ?
    Oviatt et al. 97: less spoken disfluencies with
Does the subject use
complementarity ?

    different chunks of information belonging to the same
     command are transmitted on different modalities
    is this a chinese restaurant + circling gesture
    Different patterns (Guyomard et al. 95 et al. 95)
       • Are there any beaches in this locality ? + <pointing>
       • What are the camping sites at + <pointing>

Does the subject use
redundancy ?

    the same chunk of information is transmitted on
     several modalities

    is Tiger Lily’s a chinese restaurant
     + circling gesture around Tiger Lily’s restaurant

Does the subject use
redundancy ?

       • Oviatt et al. 97: seldom (2% of commands)
       • Mignot et al. 96: often with continuous gestures
       • Petrelli et al. 97: often with short labels

What is missing ?
       • continuum between redundancy and complementarity, saliency
       • impact of graphical output on speech and gesture (2 maps in
         Oviatt et al. 97)

Complementarity / redundancy
Temporal relation

       • Oviatt et al. 97:pen (writing, drawing) often before speech
       • Mignot et al. 96: no obvious systematic temporal relation
       • Catinis et al. 95:temporal coincidence often observed

Not possible to generalize
What is missing ?
       • Why such differences ? Media, task, users …
       • distinction between complementarity and redundancy

Does the subject use
equivalence ?

    the user tries several ways of achieving the same
    equivalence does not mean equality !

    speech : scroll the map to the left
    gesture : arrow towards the left
    complementarity or redundant combination
Does the subject use
equivalence ?

       • modality as a function of types of commands

What is missing ?
       • Rating the equivalence behavior of the users
         Does she switches between modalities ?
         That is useful for system implementation

Does the subject use
concurrency ?

       • independent chunks of information are transmitted on several
         modalities and overlap in time

       • once in (Mignot et al. 96)
       • once at SRI (moving a window)

What is missing ?
       • Why ?

Does the subject use
specialization ?
    a specific type of information is always transmitted on
     the same modality

    Mignot et al. 96: 2 subjects preferred speech only
     2 subjects continuous gestures only for moving action

What is missing ?
    distinction between sub-types of specialization

Example of transcription

   INP. Speech   Is this a Chinese restaurant?
  INP. Gesture   Circles around the selected restaurant (gesture after speech)
  OUT. Graph     Textual description displayed
   ANALYSIS      Information about selected object :
                 partial redundancy
                    speech (this),
                    graphical context,
                    gesture (circle),
                    gesture after speech

Multimodal Metrics

                        Cj    equivalent(Cj )
     equivalence 
                                     Cj

                                        salience(Cj , rk )
                              Cj rk  R (Cj )
   compl . / redund . 
                                            R(Cj )

Figure 1: Example of the XML annotation of a sample command observed in
                   the SRI corpus (Cheyer et al. 1998).
Figure 2: The “referenceable objects” section of a multimodal annotation.

 Figure 3: A speech segment ("Senator dinner ... ? can I eat a
hamburger there ?" which contains two references to the object

Figure 4: A gesture segment including a reference to the object rest1.

   STEP 1: Parse the XML file
   Cf Article

 Build the document tree out of the XML file
 Build Java representation of referenceable objects (Figure 5) and
  references (Figure 6).
 Build the table associating each couple (objects, reference) with a
  salience value (Figure 7) ; these values are computed according to
  pre-defined salience rules such as “if the reference contains the full
  name of this object, set the salience of this object in this reference to
  1.0” ; these rules are expected to be dependent on the corpus at hand.
 Build the table computing the average salience values for all the
  references in the different modalities within the same multimodal
  segment (Figure 8).

