Interfaces Homme-Machine Concepts et exemples en Java by yurtgc548


									     Analyse du Comportement multimodal

DEA Sciences Cognitives

                                  Jean-Claude MARTIN
                                Maître de Conférences
                    LIMSI-CNRS / LINC-IUT de Montreuil
                                       martin @
Cycle de développement

Généralement 4 phases
  Analyse des besoins
  Prototypage rapide
  Evaluation de l'utilisabilité

               Jean-Claude MARTIN - LIMSI / LINC   2
Outils d’analyse

         Jean-Claude MARTIN - LIMSI / LINC   3
Outils d’analyse

         Jean-Claude MARTIN - LIMSI / LINC   4
Outils d’analyse

         Jean-Claude MARTIN - LIMSI / LINC   5
Jean-Claude MARTIN - LIMSI / LINC   6
Wizard interface /
Subject interface
SRI (Kehler et al. 98)

                   Jean-Claude MARTIN - LIMSI / LINC             7
 Subjects Side

Subject: study pen & voice in unconstrained
Non Operational Input Interface
       • Audio input saved, but not processed
       • Pen drawings transmitted to Wizard, but not processed
       • Exception: icon selection (point, circle) enabled

Fully Operational Output (Multimedia)
       • Photos, videos, audio and displayed information are available

                   Jean-Claude MARTIN - LIMSI / LINC               8
Wizard’s Side

Wizard: study pen, voice & GUI using real system
Fully Operational System
        • Can use any available modality or combination of modalities

Shared Subject’s screen PLUS features
        • Advanced GUI (Databases query, Map navigation, etc…)
        • Clock in order to answer within an acceptable amount of time
        • Additional panel to send audio messages to subject

The Wizard is THE EXPERT
        • Captures user intention & decides system response

                    Jean-Claude MARTIN - LIMSI / LINC                    9
The problem:
Analyzing Subjects Data



                                            Pen Files

        Jean-Claude MARTIN - LIMSI / LINC            10
The goal:
building the real system

How to analyze the multimodal behavior ?
How to specify a multimodal system ?
Find a common grid ?
            Jean-Claude MARTIN - LIMSI / LINC   11
A survey of multimodal
user studies
             Guyomard            Oviatt          Denda           Trafton       Fais 97
              et al. 95         et al. 97       et al. 97        et al. 97

application touristic map      real estate   touristic map    touristic map     map
                                                              hotel booking
                              edition task
  real /     simulated,        simulated          real           simulated       real
simulated     then real
  input                                          speech         keyboard      speech
               speech           speech       tactile screen      ("natural   keyboard
            tactile screen        pen          (pointing)      language"), tactile screen
                                                              mouse, menus
  output       speech           speech          speech           graphics     speech
            graphics, text     graphics        graphics                    graphics, text

                             Jean-Claude MARTIN - LIMSI / LINC                           12
What are the monomodal
features of user’s behavior ?

Categories of words and gesture
    Guyomard et al. 95: pointing, lines, areas, contours,
     closed or open
    Oviatt et al. 97: composed circle-line-circle
    Mignot et al. 96: several fingers (rotation)

Monomodal behavior changed ?
    Oviatt et al. 97: less spoken disfluencies with
                Jean-Claude MARTIN - LIMSI / LINC      13
Does the subject use
complementarity ?

    different chunks of information belonging to the same
     command are transmitted on different modalities
    is this a chinese restaurant + circling gesture
    Different patterns (Guyomard et al. 95 et al. 95)
       • Are there any beaches in this locality ? + <pointing>
       • What are the camping sites at + <pointing>

                   Jean-Claude MARTIN - LIMSI / LINC             14
Does the subject use
redundancy ?

    the same chunk of information is transmitted on
     several modalities

    is Tiger Lily’s a chinese restaurant
     + circling gesture around Tiger Lily’s restaurant

                 Jean-Claude MARTIN - LIMSI / LINC       15
Does the subject use
redundancy ?

       • Oviatt et al. 97: seldom (2% of commands)
       • Mignot et al. 96: often with continuous gestures
       • Petrelli et al. 97: often with short labels

What is missing ?
       • continuum between redundancy and complementarity, saliency
       • impact of graphical output on speech and gesture (2 maps in
         Oviatt et al. 97)

                   Jean-Claude MARTIN - LIMSI / LINC          16
Complementarity / redundancy
Temporal relation

       • Oviatt et al. 97:pen (writing, drawing) often before speech
       • Mignot et al. 96: no obvious systematic temporal relation
       • Catinis et al. 95:temporal coincidence often observed

Not possible to generalize
What is missing ?
       • Why such differences ? Media, task, users …
       • distinction between complementarity and redundancy

                   Jean-Claude MARTIN - LIMSI / LINC              17
Does the subject use
equivalence ?

    the user tries several ways of achieving the same
    equivalence does not mean equality !

    speech : scroll the map to the left
    gesture : arrow towards the left
    complementarity or redundant combination
                Jean-Claude MARTIN - LIMSI / LINC        18
Does the subject use
equivalence ?

       • modality as a function of types of commands

What is missing ?
       • Rating the equivalence behavior of the users
         Does she switches between modalities ?
         That is useful for system implementation

                  Jean-Claude MARTIN - LIMSI / LINC     19
Does the subject use
concurrency ?

       • independent chunks of information are transmitted on several
         modalities and overlap in time

       • once in (Mignot et al. 96)
       • once at SRI (moving a window)

What is missing ?
       • Why ?

                  Jean-Claude MARTIN - LIMSI / LINC             20
Does the subject use
specialization ?
    a specific type of information is always transmitted on
     the same modality

    Mignot et al. 96: 2 subjects preferred speech only
     2 subjects continuous gestures only for moving action

What is missing ?
    distinction between sub-types of specialization

                 Jean-Claude MARTIN - LIMSI / LINC      21
Example of transcription

   INP. Speech   Is this a Chinese restaurant?
  INP. Gesture   Circles around the selected restaurant (gesture after speech)
  OUT. Graph     Textual description displayed
   ANALYSIS      Information about selected object :
                 partial redundancy
                    speech (this),
                    graphical context,
                    gesture (circle),
                    gesture after speech

                     Jean-Claude MARTIN - LIMSI / LINC                    22
Multimodal Metrics

                        Cj    equivalent(Cj )
     equivalence 
                                     Cj

                                        salience(Cj , rk )
                              Cj rk  R (Cj )
   compl . / redund . 
                                            R(Cj )

               Jean-Claude MARTIN - LIMSI / LINC               23

Figure 1: Example of the XML annotation of a sample command observed in
                   the SRI corpus (Cheyer et al. 1998).
                     Jean-Claude MARTIN - LIMSI / LINC               24
Figure 2: The “referenceable objects” section of a multimodal annotation.

                        Jean-Claude MARTIN - LIMSI / LINC                   25
 Figure 3: A speech segment ("Senator dinner ... ? can I eat a
hamburger there ?" which contains two references to the object

                    Jean-Claude MARTIN - LIMSI / LINC            26
Figure 4: A gesture segment including a reference to the object rest1.

                      Jean-Claude MARTIN - LIMSI / LINC                  27
   STEP 1: Parse the XML file
   Cf Article

 Build the document tree out of the XML file
 Build Java representation of referenceable objects (Figure 5) and
  references (Figure 6).
 Build the table associating each couple (objects, reference) with a
  salience value (Figure 7) ; these values are computed according to
  pre-defined salience rules such as “if the reference contains the full
  name of this object, set the salience of this object in this reference to
  1.0” ; these rules are expected to be dependent on the corpus at hand.
 Build the table computing the average salience values for all the
  references in the different modalities within the same multimodal
  segment (Figure 8).

                        Jean-Claude MARTIN - LIMSI / LINC            28

To top