Emotional Speech

Document Sample
Emotional Speech Powered By Docstoc
					Emotional Speech

   Who cares?
   The Idea of Emotion
   Difficulties in approaching
   Describing Emotion
   Computational Models
   Modeling Emotion in Speech
   An example – Ang ’02
Who Cares?

   Practical impact
     Detecting Frustration/Anger
     Stress/Distress
     Help call prioritizing
     Tutorials – Boredom/Confusion/Frustration
          Pacing/Positive feedback
     User acceptance
          Users preferred talking head using ES (Stallo, in Schröder)
Who Cares?

   Esoteric Impact
     Is artificial intelligence possible w/o detection of
     w/o display of “emotion”?
   Do we experience someone/something as
    understanding us if it can’t understand our
    emotional state/experience?
Who Cares? – Izard ’77

   Emotion & Perception
   E & Cognition
   E & Action
   E & Personality Development
   Understanding a speaker’s emotional state
    gives us insight into his/her intention, desire,
    motivation (Zimring)
The Bad News (Picard ’97)

   Maintaining realistic expectations
   User’s confidence in information
   Potential to forge affective channels
   Problem solving vs. empathic/observational
   Symmetry of communication
   Privacy issues
Idea of Emotion (Hergenhahn ’01)

   Descartes
     “Passions”
          Understood emotions as originating from both
           physiological and cognitive sources
          Pineal gland
   Late 1800’s – early 1900’s
     Psychology was study of consciousness
          William James  “The Science of Mental Life”
          Major method was introspection – mental
            –   Relies on a person reporting his/her experience
Idea of Emotion

   1930’s – 1950’s
     Behaviorist tradition – study of behavior
          “Objective” (at least measurable and observable)
          Emerged from academia – a lot of rats suffered
          Explains everything in terms of stimulus / response
          Fails to explain some crucial issues, e.g., language
                                  No one expects to get wet in a pool
                                    filled with ping pong ball models
                                                   of water molecules.
Idea of emotion                                              Searle ’90

   1950’s – Cognitive “Revolution”
     Piaget, Miller, Chomsky, et al.
          Miller  The Science of Mental Life
     John Searle
          Syntax vs. semantics
     Materialism vs. Dualism
     What are reasonable expectations?
Difficulties in approaching (Cowie)

   E is resistant to capture in symbols
   Speech presents special problems
   Modeling of primary E’s not so useful
   Consensus
   Display Rules (Ekman)
   Mixes  “Love/hate relationship”
   Negative response to simulated displays
                        “[Utterances were] said by two actors in the
                       emotions of happiness, sadness, anger […]”
Difficulties in approaching

   Quality of reference data
   Rating believability (Schröder)
     Forced choice tests often ignore issue of
     “How appropriate was utterance to given E”
      (Rank 98)
     (Iida, et al.) Rated using scales for preference and for
      subjective degree of expressed E.
   Subject generosity
   Temporal and contextual relationships
            Everything it is possible to analyze depends
                on a clear method of distinguishing the
                              similar from the dissimilar.
                                           – Carl Linnaeus
Describing Emotion

        =                =

        =                ≠


Describing Emotion (Cowie)

   Primary emotions
     Acceptance, anger, anticipation, disgust, joy,
      fear, sadness, surprise
   Secondary Emotions 
   Arousal
   Attitude
   An aside: Intention may generate all of these
activity decisiveness haughtiness restrained adoration delighted
helplessness restraint alarm dependence hope righteousness alertness
depression humiliation rigor anger desire indifference routine
animosity despair inferiority sadness annoyance dimness initiative
satisfaction anxiety disappointment intensity satisfied appetite disgust
interest   skepticism    approval    disqualification   scorn artificiality
disregard involvement serenity astonishment disrespect joy servility
at ease distress leniency shame attraction droopy loneliness
sharpness balanced embarrassment longing shyness belonging
embitterment love simplicity bitterness enjoyment meditative sincerity
bliss envy mirth sleepy restlessness blur exaggeration misery
slumber boldness excitement sorrow boredom fatigue naturalness
stability calmness fear nervousness stubbornness caution firmness
pain suffering clearness frankness panic superiority compassion
fondness     passiveness    surprise    complexity    friendly  patience
suspiciousness concern frustration pity sympathy conciliated gaiety
pleasure tenderness confidence generosity posing tension constraint
gloom pride tolerance hate confusion grateful quiescence tranquility
contempt greediness regret uneasiness contentment grievance relaxed
unstable courage guilt relief vigilance yearning craving happiness
repulsion weakness criticism haste respect worry curiosity
                  “…emotion is a fact upon which all introspection
                  agrees. [Most emotional states] are states which
                                 we have experienced personally.
                                        (Gellhorn & Loofbourrow ’63)

Data of Emotion (Lang ’87)

   Everyone generally agrees on existence
   Basic datum is a state of feeling
     Completely private
   Include understanding of antecedents and
   Important to determine how E is represented
    in memory
   Suggest a Turing test (but don’t describe…)
Describing Emotion

   One approach:
      continuous dim. model (Cowie/Lang)
   Activation – evaluation space
   Add control
   Curse of dimensionality
   Primary E’s differ on at least 2 dimensions of
    this scale (Pereira)
Computational Models (Pfeifer ’87)

   Emotion as process
   Emotion generation
   Influence of emotion
   Goal oriented nature
   Interaction between subsystems
   E. as heuristics
   Representation of emotion
Computational Models (Pfeifer ’87)

   Examines models dimensionally
     A) Symbolic vs non-symbolic (cognitive vs AI)
     B) Augmented by emotion vs focused on emotion
   All approaches deal with E as process
   Unclear whether system state = emotion
   Models must function in complex, uncontrollable,
    unpredictable context
   No model for physiological aspect
   Emotions tightly coupled to commonsense reasoning
Modeling Emotion in Speech

   Synthesis: basic issues (Schröder)
     How is a given emotion expressed?
     Which properties of the E state are to be
     Relationship between this state and another
   Approaches
     Formant synthesis (Burkhardt)
     Diphone concatenation
     Unit selection
Modeling Emotion in Speech

   Formant synthesis (Burkhardt)
     High degree of control “emoSyn”
          Mean pitch, pitch range, variation, phrase and word
           contour, flutter, intensity, rate, phonation type, vowel
           precision, lip spread
     Two experiments
          Stimuli systematically varied, then classified
          Prototype generated and varied slightly
Modeling Emotion in Speech

   Formant synthesis (Burkhardt)
     Fear
          High pitch, broad range, falsetto voice, fast rate
     Joy
          Broader pitch range, faster rate, modal or tense
           phonation, precise articulation
          Lowest recognition rates (perhaps due to intonation
     Boredom
          Lowered mean pitch, narrow range, slow rate, imprecise
Modeling Emotion in Speech

   Formant synthesis (Burkhardt)
     Sadness
          Narrow range, slow rate, breathy articulation
          Also raised pitch, falsetto
          Possible that sadness was imprecise term
     Anger
          Faster rate, tense phonation
     General results
          Recognition rates are comparable to natural speech,
           especially when the categories from experiment 2 are
Modeling Emotion in Speech

   Generally: tradeoff between flexibility of
    modeling and naturalness:
     Rule-based less natural
     Selection-based is less flexible
An Example – Ang ’02

   Prosody-Based detection of annoyance/
    frustration in human computer dialog
   DARPA Communicator Project Travel
    Planning Data (a simulation)
     (NIST, UC Boulder, CMU)
   Considers contributions of prosody, language
    model, and speaking style
   Doesn’t begin with a strong hypothesis
An Example – Ang ’02

   Uses recognizer output (sort of)
   Examines rel. of emotion and speaking style
   Uses hand coded style data
     Hyperaticulation, pauses, raised voice
   Repeated requests or corrections
   Hand labeled emotion relative to speaker
     Original and consensus labels
An Example – Ang ’02

 Emotion Class    Instances   Percent
 NEUTRAL              41545     83.84%
 ANNOYED               3777      7.62%
 FRUSTRATED             358      0.72%
 TIRED                  328      0.66%
 AMUSED                 326      0.66%
 OTHER                  115      0.23%
 NOT-APPLICABLE        3104      6.26%

 Total                49553     100.0%
An Example – Ang ’02

   Prosodic Features
     Duration and speaking rate
     Pause, pitch, energy, spectral tilt
   Non-prosodic Features
     Repetitions & corrections
     Position in dialog
   Language model features
   Discriminated using decision trees
     “Brute force iterative algorithm” to determine useful features
     With and without LM features
An Example – Ang ’02
Ang ’02 – Decision Tree Usage

   Temporal features 28%
     Longer duration, slow speaking rate corr.
      w/ frustration
   Pitch features 26%
     Generally, high F0 correlated w/ frustration
   Repeats/corrections (= system error) 26%
     Correlated w/ frustration
   Raised Voice
Ang ’02 – Results
Ang ’02 – Results

   Performance better by 5-6% for utterances
    on which labelers originally agreed
   Use of the repeat/correction feature improves
    success by 4%
   Frustration vs Else – very little data
   Only slight difference between labeled and

Shared By: