Speech Summarization

Shared by: HC12083110304
Categories
Tags
-
Stats
views:
3
posted:
8/31/2012
language:
English
pages:
23
Document Sample
scope of work template
							     Speech Summarization

Julia Hirschberg (thanks to Sameer
      Maskey for some slides)
              CS4706
        Summarization Distillation



• ‘…the process of distilling the most important
  information from a source (or sources) to
  produce an abridged version for a particular user
  (or users) and task (or tasks) [Mani and
  Maybury, 1999]
• Why summarize? Too much data!
           Types of Summarization

• Indicative
   – Describes the document and its contents
• Informative
   – ‘Replaces’ the document
• Extractive
   – Concatenate pieces of existing document
• Generative
   – Creates a new document
• Document compression
               [Salton, et al., 1995]

                  Sentence Extraction
                  Similarity Measures
                                           [McKeown, et al., 2001]

                                           Extraction Training
                                          w/ manual Summaries
   SOME SUMMARIZATION
                                                              [Hovy & Lin, 1999]
    TECHNIQUES BASED
ON TEXT (LEXICAL FEATURES)                             Concept Level
                                                    Extract concepts units

                                                   [Witbrock & Mittal, 1999]

                                             Generate Words/Phrases


                                            [Maybury, 1995]

                                        Use of Structured Data
     Sentence Extraction/Similarity measures
              [Salton, et al. 1995]
• Extract sentences by their similarity to a topic
  sentence and their dissimilarity to sentences
  already in summary (Maximal Marginal
  Relativity)
• Similarity measures
   – Cosine Measure
   – Vocabulary Overlap
   – Topic word overlap
   – Content Signatures Overlap
   Concept/content level extraction [Hovy & Lin,
                      1999]
• Present key-words as summary
• Builds concept signatures by finding relevant
  words in 30,000 WSJ documents, each
  categorized into different topics
• Phrase concatenation of relevant
  concepts/content
• Sentence planning for generation
        Feature-based statistical models
             [Kupiec, et al., 1995]
• Create manual summaries
• Extract features
• Train statistical model using various ML techniques
• Use the trained model to score each sentence in the test
  data
• Extract N highest-scoring sentences
                                                    k

                                                     P( F
                                                    j 1
                                                             j   |s  S ) P( s  S )
                    P( s  S | F1 , F2, ...Fk )                 k

                                                            P( F )
                                                             j 1
                                                                        j




      • Where S is summary given k features Fj and P(Fj) & P(Fj|s of
        S) can be computed by counting occurrences
    Structured Database [Maybury, 1995]

• Summarize text represented in structured form:
  database, templates
   – E.g. generation of a medical history from a
     database of medical ‘events’
                                                     s
                                    # of occurrence of event E
         Relative frequencyof E 
                                        Total # of all events

• Link analysis (semantic relations within the
  structure)
• Domain dependent importance of events
   Comparing Speech and Text Summarization

• Alike                      • Different
   – Identifying important      – Speech Signal
     information                – Prosodic features
   – Some lexical,              – NLP tools?
     discourse features         – Segments –
   – Extraction or                sentences?
     generation or              – Generation?
     compression
                                – Errors
                                – Data size
        Text vs. Speech Summarization (NEWS)
                                    Speech Signal

                                           Speech Channels
                                           - phone, remote satellite, station

   Error-free Text                                Transcripts
                      Transcript- Manual          - ASR, Close Captioned

Lexical Features                                      Many Speakers
                     Some Lexical Features
                                                      - speaking styles

 Segmentation          Story presentation         Structure
 -sentences            style                      -Anchor, Reporter Interaction

                                              Prosodic Features
       NLP tools                              -pitch, energy, duration

                                    Commercials, Weather Report
        Speech Summarization Today

• Mostly extractive:
  – Words, sentences, content units
• Some compression methods
• Generation-based summarization difficult
  – Text or synthesized speech?
                 Generation or Extraction?
•   SENT27 a trial that pits the cattle industry against tv talk show host oprah winfrey is under
    way in amarillo , texas.
•   SENT28 jury selection began in the defamation lawsuit began this morning .
•   SENT29 winfrey and a vegetarian activist are being sued over an exchange on her April
    16, 1996 show .
•   SENT30 texas cattle producers claim the activists suggested americans could get mad
    cow disease from eating beef .
•   SENT31 and winfrey quipped , this has stopped me cold from eating another burger
•   SENT32 the plaintiffs say that hurt beef prices and they sued under a law banning false
    and disparaging statements about agricultural products
•   SENT33 what oprah has done is extremely smart and there's nothing wrong with it she
    has moved her show to amarillo texas , for a while
•   SENT34 people are lined up , trying to get tickets to her show so i'm not sure this hurts
    oprah .
•   SENT35 incidentally oprah tried to move it out of amarillo . she's failed and now she has
    brought her show to amarillo .
•   SENT36 the key is , can the jurors be fair
•   SENT37 when they're questioned by both sides, by the judge , they will be asked, can
    you be fair to both sides
•   SENT38 if they say , there's your jury panel
•   SENT39 oprah winfrey's lawyers had tried to move the case from amarillo , saying they
    couldn't get an impartial jury
•   SENT40 however, the judge moved against them in that matter …
                   story                                                        summary
                      [Christensen et al., 2004]

                 Sentence extraction with
                   similarity measures
                                                   [Hori C. et al., 1999, 2002] , [Hori T. et al., 2003]

                                                           Word scoring
                                                     with dependency structure
SPEECH SUMMARIZATION
     TECHNIQUES                                                [Koumpis & Renals, 2004]

                                                                        Classification

                                                        [He et al., 1999]

                                                         User access information
        [Zechner, 2001]

                                                       [Hori T. et al., 2003]
         Removing disfluencies
                                                    Weighted finite state
                                                        transducers
Content/Context sentence level extraction for
 speech summary [Christensen et al., 2004]

 Find sentences similar to the lead topic sentences
 Use position features to find the relevant nearby sentences after
  detecting a topic sentence
    where Sim is a similarity measure between two sentences or a
     sentence and a document (D) and E is the set of sentences
     already in the summary
                       ^
               Sk  s  arg max {Sim( s1 , si )}
                             si D / E
                   ^
              Sk  s  arg max {Sim( D, si )}
                           si D / E


    Choose a new sentence which is most like D and most
     different from E
    Weighted finite state transducers for speech
                   summarization
                [Hori T. et al., 2003]
• Summarization includes speech recognition, paraphrasing, sentence
  compaction integrated into single Weighted Finite State Transducer
• Decoder can use all knowledge sources in one-pass strategy
• Speech recognition using WFST         R  H C  LG
    – Where H is state network of triphone HMMs, C is triphone
      connection rules, L is pronunciation and G is trigram language
      model
• Paraphrasing can be looked at as a kind of machine translation with
  translation probability P(W|T) where W is source language and T is
  the target language           Z  H C  LG S  D
• If S is the WFST representing translation rules and D is the
  language model of the target language speech summarization can
  be looked at as the following composition
                            Speech Translator

                     H     C      L       G        S         D
                      Speech recognizer         Translator
       User Access Identifies What to Include
                 [He et al., 1999]
• Summarize lectures or shows by extracting parts that
  have been viewed the longest
• Needs multiple users of the same show, meeting or
  lecture for training
• E.g. To summarize lectures compute the time spent on
  each slide
• Summarizer based on user access logs did as well as
  summarizers that used linguistic and acoustic features
   – Average score of 4.5 on a scale of 1 to 8 for the
     summarizer (subjective evaluation)
•
Word level extraction by scoring/classifying words
           [Hori C. et al., 1999, 2002]
  Score each word in the sentence and extract a set of words to form
   a sentence whose total score is the product/sum of the scores of
   each word
  Example:
     Word Significance score (topic words)
     Linguistic Score (bigram probability)
     Confidence Score (from ASR)
     Word Concatenation Score (dependency structure grammar)
                  M
         S (V )  {L(vm | ... vm 1 )  I I (vm )  cC (vm )  T Tr (vm1,vm )
                  m 1


     Where M is the number of words to be extracted, and I C T
      are weighting factors for balancing among L, I, C, and T r
        Segmentation Using Discourse Cues
                [Maybury, 1998]
   Discourse Cue-Based Story Segmentation
   Discourse Cues in CNN
      Start and end of broadcast

      Anchor/Reporter handoff, Reporter/Anchor handoff

      Cataphoric Segment (“still ahead …”)

   Time Enhanced Finite State Machine representing discourse states
    such as anchor segment, reporter segment, advertisement
   Other features: named entities, part of speech, discourse shifts
    “>>” speaker change, “>>>” subject change

       Source            Precision        Recall
       ABC               90               94
       CNN               95               75
       Jim Lehrer Show   77               52
   CU: Summarization without Words: Does
importance of ‘what’ is said correlates with ‘how’ it
                      is said?
• Hypothesis: “Speakers change their amplitude, pitch,
  speaking rate to signify importance of words, phrases,
  sentences.”
  – If so, then the prediction labels for sentences predicted
    using acoustic features (A) should correlate with labels
    predicted using lexical features (L)
  – In fact, this seems to be true (corr .74 between precitions
    of A and L
 Is It Possible to Build ‘good’ Automatic Speech
     Summarization Without Any Transcripts?


     Feature Set            F-Measure            ROUGE-avg

     L+S+A+D                0.54                 0.80

     L                      0.49                 0.70

     S+A                    0.49                 0.68

     A                      0.47                 0.63

     Baseline               0.43                 0.50

• Just using A+S without any lexical features we get 6% higher F-
  measure and 18% higher ROUGE-avg than the baseline
           Evaluation using ROUGE

• F-measure too strict
   – Predicted summary sentences must match
     summary sentences exactly
   – What if content is similar but not identical?
• ROUGE(s)…
                     ROUGE metric
•   Recall-Oriented Understudy for Gisting Evaluation (ROUGE)
•   ROUGE-N (where N=1,2,3,4 grams)
•   ROUGE-L (longest common subsequence)
•   ROUGE-S (skip bigram)
•   ROUGE-SU (skip bigram counting unigrams as well)




• Does ROUGE solve the problem?
                 Next Class

• Emotional speech
• HW 4 assigned

						
Related docs
Other docs by HC12083110304
In casu
Views: 6  |  Downloads: 0
AMICUS BRF OF ADN supporting direct review1
Views: 0  |  Downloads: 0
PAY 1 Dec2011
Views: 0  |  Downloads: 0
refund policy 2012
Views: 2  |  Downloads: 0
Dear Parent(s):
Views: 0  |  Downloads: 0
Qst502prac
Views: 0  |  Downloads: 0
Slide 1
Views: 1  |  Downloads: 0
DOG�S REGISTERED NAME
Views: 1  |  Downloads: 0
MEDICAL FLEXIBLE BENEFITS ACCOUNT
Views: 2  |  Downloads: 0
SK HASANUR RAHMAN
Views: 25  |  Downloads: 0