ma - PowerPoint by xiangpeng


an Intelligent MultiMedia storytelling
interpretation & presentation system

               Minhua Eunice Ma
         Supervisor: Prof. Paul Mc Kevitt
       School of Computing and Intelligent Systems
                  Faculty of Informatics
               University of Ulster, Magee
       Objectives of CONFUCIUS

 To interpret natural language story and movie
  (drama) script input and to extract conceptual
  semantics from the natural language
 To generate 3D animation and virtual worlds
  automatically from natural language
 To integrate 3D animation with speech and non-
  speech audio, to form an intelligent multimedia
  storytelling system for presenting multimodal
          CONFUCIUS’ context diagram

Storywriter   Movie/drama script
                                   CONFUCIUS   3D animation
/playwright                                                    /story
              Previous systems

 Schank‟s CD Theory (1972)
     Primitive & scripts
     SAM & PAM
 Automatic Text-to-Graphics Systems
     WordsEye (Coyne & Sproat, 2001)
     „Micons‟ and CD-based language animation
      (Narayanan et al. 1995)
     Spoken Image (Ó Nualláin & Smith, 1994)
      & its successor SONAS (Kelleher et al.
 MultiModal interactive storytelling
      AesopWorld
      KidsRoom
      Larsen & Petersen‟s Interactive Storytelling
      Oz
      Computer games

Virtual humans & embodied agents
    BEAT (Cassell et al., 2000)
    Jack (University of Pennsylvania)

    Improv (Perlin and Goldberg, 1996)

    SimHuman

    Gandalf

    PPP persona
      Architecture of CONFUCIUS
                                                                 Natural language stories

                                                           Script writer

                                                           Script parser
  Prefabricated objects
    (knowledge base)
                                  lexicon     Natural        Text To       Sound
   Language knowledge             grammar
                                  etc         Language       Speech        effects
            3D authoring tools,
mapping    existing 3D models &
             character models
                                  visual      Animation
     visual knowledge                         generation
    (3D graphic library)

                                                   Synchronizing & fusion

                                                                 3D world with audio in VRML
                    Semantic representations
     Categories         Knowledge representations      Decomposition      Typical applications
                    rule-based representation               
                                                                      expert systems
                    FOPC                                               sentence representation,
                    (First Order Predicate Calculus)                   expert systems
(1) general
knowledge           semantic networks                       
                                                                      lexical semantics
representation &    Schank‟s scripts                                   story understanding
                    frame-based representations             
                                                                       multimodal semantics
                    XML-based representations               

                    Conceptual Dependency (CD)              

(2) physical        event-logic truth conditions            
representation &    x-schema and f-structure                
                                                                      dynamic vision (movement)
reasoning (inc.     Jackendoff‟s Lexical-Conceptual                    recognition & generation
spatial /temporal   Semantics (LCS)
                    decomposite predicate-argument          
           MultiModal semantic representation

High-level multimodal
                                    Multimodal semantics
semantic representation:
XML/frame-based                  Media-independent representation

                     Visual media-dependent representation
Intermediate level
                                             Audio media-dependent representation

               Visual modality         Language modality       Non-speech audio modality
  Mental imagery & meaning processing
                                           Meanings, communicable ideas,
                                           thoughts, manifestable
                                           messages, proverbs, examples,
                                           parables, etc.

        presentation via language or other modalities
Mental world      Communicati             Mental world

        Simulation:                             Simulation:
        Image recognition                       Language understanding

  Cognition                               Re-cognition
Physical world                            Virtual world
 Knowledge base of CONFUCIUS

                   knowledge base
                        Semantic knowledge - lexicons (eg. WordNet)
                        Syntactic knowledge - grammars
Language knowledge      Statistical models of language
                        Associations between words

                    Object model (nouns)
                    Event model (event verbs, describes the motion of objects)
Visual knowledge    Functional information
                    Internal coordinate axes (for spatial reasoning)
                    Associations between objects

World knowledge

Spatial & qualitative reasoning knowledge
          Graphic library

   objects/props                         characters

                                 geometry & joint hierarchy
Simple geometry files



                   animation library
                     (key frames)
                         Data Flow Diagram
                              Primitives library

                        language    Visual        Animation
                                    semantics     generator
                                                              VRML without sound nodes

         Scene&Actor descriptions
                                                   TTS                 Media             Synthesized
                 Script                                                coordination      animation
script           parser
                               Non-speech audio
               script                              Sound effect

story            Script
                                                  Music library
      Animation generator
          LCS representation
 semantic analysis             use lexical relations (WordNet)
                               to replace synonyms, scripts
                               application, etc.

match basic motions      Y
    in library?

animation controller           instantiation

           VRML format of the virtual story world
           examples demo
               Categories of events
Atomic entities
  Change physical location such as position and orientation, e.g. “bounce”, “turn”
  Change intrinsic attributes such as shape, size, color, and texture, e.g. “bend”,
  and even visibility, e.g. “disappear”, “fade” (in/out)
Non-atomic entities
     Non-character events
        Two  or more individual objects fuse together, e.g. “melt” (in)
        One object divides into two or more individual parts, e.g. “break” (into
        Change sub-components (their position, size, color), e.g. “blossom”

        Environment events (weather verbs), e.g. “snow”, “rain”

     Character events
        Action   verbs
             Intransitive verbs
             Transitive verbs
        Non-action verbs (stative, emotion, possession, mental activities,
        cognition & perception)
        Idioms & metaphor verbs
            Categories of action verbs

 Intransitive verbs
      Biped kinematics, e.g. “walk”, “swim”, & other motion models
       like “fly”
      Face expressions, e.g. “laugh”, “anger”
      Lip movement, e.g. “speak”, “say”        involve speech modality

 Transitive verbs
      single object, e.g. “throw”, “push”, “kick”
      multiple objects
           direct and indirect objects, e.g. “give”, “pass”, “show”
           indirect object & the instrument, e.g. “cut”, “hammer”
              Visual definition & word sense

verb   many     many
                       word sense one       visual definition entry
         synonymy                  mapping

                                      1.   a normal door (rotation on y axis)
                                      2.   a sliding door (moving on x axis)
        Example: “close” (a door)     3.   a rolling shutter door (a combination of
                                           rotation on x axis and moving on y axis)

       word sense -- minimal complete unit of meaning in
       the language modality
       visual definition entry -- minimal complete unit of
       meaning in the visual modality
             Troponyms &
  verbs derived from adjectives/nouns
 troponym
      elaborates the manners of a base verb (Fellbaum 1998)
      examples: “trot”-“walk” (fast), “gulp”-“eat” (quickly)
      base verb + adverb
       present the base verb + modify the manner (speed, the agent‟s state,
       duration of the activity, iteration, etc.)
 Verbs derived from adjectives or nouns
      change objects‟ properties (size, color, shape) or the world
      verbs with affixes such as –en, -ify, or –ize, e.g. “lengthen”
      using predicates scale(), squash() or changing the
       corresponding property fields of the object in VRML
 Representing active & passive voice

 active and passive voice
 converse verb pairs such as “give/take”,
  “buy/sell”, “lend/borrow”
 same activity from different point of view
 use of VRML Viewpoint node
       Implementation: semanticsVRML
                                    DEF ball Transform {
  Example: “A ball is bouncing”        translation 0 0 0
                                       children [
                                       DEF ball-TIMER TimeSensor {
                                              loop TRUE
 [moveTo(ball, [0,0,0]),
                                              cycleInterval 0.5 },
                                       DEF ball-POS-INTERP
(a) visual definition of “bounce”           PositionInterpolator {
                                         key [0, 0.5, 1 ]
DEF ball Transform {                     keyValue [0 0 0, 0 20 0, 0 0 0 ]
  translation 0 0 0                 },
  children [                             Shape {
    Shape {                                 appearance Appearance {
      appearance Appearance{                   material Material {}
         material Material{}                }
       }                                    geometry Sphere { radius 5 }
       geometry Sphere {                 }]
         radius 5                   ROUTE ball-TIMER.fraction_changed TO
       }                               ball-POS-INTERP.set_fraction
    }                               ROUTE ball-POS-INTERP.value_changed TO
  ]                                    ball.set_translation
}                                   }
(b) VRML code of a static ball      (c) Output  VRML code of a bouncing ball
               Categories of adjectives
               Objects‟ attributes/states: dark/light, large/small, big/little, white/black
               (color adj.), long/short, new/old, high/low, full/empty, open/closed

Visually                                 Feelings: happy/sad, angry, excited, surprised,
               Observable                terrified
observable     human attributes          Others: old/young, beautiful/ugly, strong/weak,
                                      poor/rich, fat/thin
               Relational adj.: nasal (nose), mural (wall), dental (teeth)

               Perceivable by other modalities: wet/dry, warm/cold, coarse/smooth,
               hard/soft, heavy/light
                                             Unobservable human attributes (virtue):
Visually                                     good/evil, kind, mean, ambitious
unobservable   Abstract attributes
                                             Others: easy/difficult, real, important, particular,
                                             right/wrong, early/late

               Reference-modifying adj.: possible/impossible, former, past/present,
               last, other, different/same
                 Software Analysis
 Java programming language
      parsing intermediate representation
      changing VRML code to create/modify animation
      integrating modules
 Natural language processing tools
      Gate (pre-processing)
      PC-PARSE (morphologic and syntax analysis)
      WordNet (lexicon, semantic inference)
 3D graphic modelling
      existing 3D models on the Internet
      3D Studio Max (props & stage)
      VRML (Virtual Reality Modelling Language) 97, H-anim 2001 spec.
 The Actors – using embodied agents
      Microsoft Agent (the narrator and minor actors)
      Character Studio, Internet Character Animator (protagonists)
      Natural Language Processing


                        Part-of-speech tagger                   LEXICON &
PC-PARSER                                                  MORPHOLOGICAL RULES

                                    FEATURES      morphological
                 Syntactic parser

     Semantic      Coreference        Temporal
     inference     resolution         reasoning
WordNet 1.6
Contribution & prospective applications
  multimodal semantic representation of natural language
  automatic animation generation
  multimodal fusion and coordination

  Children‟s education
  Multimedia presentation
  Movie/drama production
  Script writing
  Computer games
  Virtual Reality
The objectives of CONFUCIUS meet the challenging
problems in language visualisation:
 formalizes meaning of action verbs and states

 mapping language primitives with visual primitives

 a reusable „common sense‟ knowledge base for other systems

 sophisticated spatial and temporal reasoning

 representing stories by temporal multimedia requires
significant coordination

To top