Docstoc

Content Based Multimedia Signal Processing

Document Sample
Content Based Multimedia Signal Processing Powered By Docstoc
					Content Based Multimedia
   Signal Processing


                Yu Hen Hu
    University of Wisconsin – Madison
               Outline
• Multimedia content description
  Interface (MPEG-7)
• Video content features
• Spoken content features
• Multimedia indexing, and retrieval
• Multimedia summary, filtering
• Other applications
              MPEG-7 Overview
• Large amount of digital     • MPEG-7 Objective
  contents are available        Provide inter-operability
• Easy to create, digitize,     among systems and
                                applications used in
  and distribute audio-         generation, management,
  visual content                distribution, and consumption
• Family album syndrome         of audio-visual content
                                description.
   – Need organize, index,
     retrieval                  Help user to identify, retrieve,
                                or filter audio-video
• Information overloading       information.
   – Need filtering
 Potential Application of MPEG-7
• Summary,                            • Retrieval
    – Generation of multimedia           – Recall music using samples
      program guide or content             of tunes
      summary                            – Recall pictures using
    – Generation of content                sketches of shape, color
      description of A/V archive to        movement, description of
      allow seamless exchange              scenario
      among content creator,          • Recommendation
      aggregator, and consumer.
                                         – Recommend program
• Filtering                                materials by matching user
    – Filter and transform                 preference (profile) to
      multimedia streams in                program content
      resource limited                • Indexing
      environment by matching
      user preference, available         – Create family photo or video
      resource and content                 library index
      description.
            Content descriptions
• Descriptors                     • Description Scheme
   – MPEG-7 contains                 – Specify the structure and
     standardized descriptors          relations among different
                                       A/V descriptors
     for audio, visual, generic
     contents.                    • Description Definition
   – Standardize how these          Language (DDL)
     content features are            – Standardized language
     being characterized, but          based on XML (eXtended
     not how to extract.               Markup Language) for
                                       defining new Ds and
   – Different levels of syntax        DSs; extending or
     and semantic                      modifying existing Ds
     descriptions are available        and Dss.
        Visual Color Descriptors
• Color space: HSV (hue-         • Dominant color
  saturation-value)                descriptor (DCD):
   – Scalable color descriptor      – colors are clustered first.
     (SCD): color histogram      • Color structure
     (uniform 255 bin) of an       descriptor (CSD):
     image in HSV encoded           – scan 8x8 block in slide
     by Haar transform.               window, and count
                                      particular color in
• Color layout descriptor:            window.
   – spatial distribution of
     color in an arbitrarily
                                 • Group of Frame/Group
     shaped region.                of Picture color
                                   descriptor
       Visual Texture Descriptor
• Texture Browsing D.                • Homogeneous Texture
   – Regularity:                       D. (HTD)
       • 0: irregular; 3: periodic
                                       – Divide frequency space
   – Directionality                      into 30 bins (5 radial, 6
       • Up to 2 directions              angular)
       • 1-6 in 30O increment
                                       – 2D Gabor filter bank
   – Coarseness
                                         applied to each bin
       • 0: fine; 3: coarse
                                       – Energy and energy
• Edge histogram D.                      deviation in each bin
   – 16 sub-images                       computed to form
   – 5 (edge direction)                  descriptor.
     bins/sub-image
         Visual Shape Descriptor
• 3D Shape D. – Shape                • Contour based shape
  spectrum                             descriptor
   – Histogram (100 bins,               – Curvature scale space
     12bits/bin) of a shape index,        (CSS)
     computed over 3D surface.          – N points/curve, successively
   – Each shape index measures            smoothed by [0.25 0.5 0.25]
     local convexity.                     till curve become convex.
• Region-based D.: Art                  – Curvature at each point
   – Angular radial transform             form a curvature at that
                                          scale.
   – Shape analysis based on
     moments                            – Peaks of each scale are
                                          used as feature
   – ART basis:
     Vnm(, ) = exp(jm)Rn()
                                     • 2D/3D descriptors
     Rn() = 2 cos(n) n 0            – Use multiple 2D descriptors
                                          to describe 3D shape
           =1 n=0
           Visual Motion Descriptor
• Motion activity D.                                                      Motion
    –   Intensity                                Video                    region
    –   Direction of activity                   segment
    –   Spatial distribution of activity                             trajectory
    –   Temporal distribution of        Camera          Mosaic
        activity                         motion                              Parametric
• Camera motion                                          Warping              motion
    – Panning                               Motion
                                                         parameter
                                            activity
    – Booming (lift up)
    – Tracking
    – Tilting
    – Zooming                            • Warping (w.r.t. mosaic)
    – Rolling (around image              • Motion trajectory
      center)
    – Dollying (backward)
   MPEG-7 Audio Content Descriptors

• 4 classes of audio signals      – Spoken content Ds:
                                     •   Speaker type
   –   Pure music
                                     •   Link type
   –   Pure speech                   •   Extraction info type
   –   Pure sound effect             •   Confusion info type
   –   Arbitrary sound track      – Timbre Ds:
                                     • Instrument
• Audio descriptors
                                     • Harmonic instrument
   – Silence Ds: silencetype         • Percussive instrument
   – Sound effect Ds:             – Melody contour Ds
        • Audio Spectrum             • Contour
        • Sound effect features      • Meter
                                     • beat
               Spoken content description
  Speech
waveform
             Audio          ASR   MPEG-7    •   Spoken content Header
           processing             Encoder
                                                 –   Word lexicon (vocabulary)
                                                 –   Phone lexicon:
                                                       •    IPA (international phonetic
  Goal: To support potentially                              association. Alphabet)
  erroneous decoding extracted using                   •    SAMPA (speech assessment method
  an automatic speech recognition                           phonetic alphabet)

  system for robust retrieval.                   –   Phone confusion statistics
                                                 –   Speaker
                                            •   Spoken content lattice (word or
                                                phone)
                                  lattice        –   Lattice Node
                                                 –   Word and phone link

  Header                                                                IS
                                                                         P=0.7
                                                           BORE
                                                           P=0.6
                                  lattice                                 HIS
                                                                         P=0.3
           Use of Content Features
• Multimedia information           • Filtering
  retrieval                           – Automated email sorter
   – Create searchable                – Personalized information
     archive of A/V materials,          portal
     e.g. album, digital library   • Enhance low-level
   – Real world examples:            signal processing
       •   call routing
                                      – Coding and trans-coding
       •   Technical support
                                      – Post-processing
       •   On-line manual
       •   Shopping
       •   Multimedia on demand
      Content-based Retrieval

 Query         Retrieval               Input
 Module         Module                 Module
 Feature        Feature    Feature     Feature
extraction    comparison   Database   extraction

Interactive   Browsing
                            Image     Multimedia
  Query          &
                           Database     data
Formation     Feedback


  User         Output
  Multimedia CBR System Design Issues

• Requirement analysis
   – How the multimedia materials are to be used
   – Determines what set of features are needed.
• Archiving
   – How should individual objects are stored? Granularity?
• Indexing (query) and retrieving
   – With multi-dimensional indices, what is an effective and efficient
     retrieval method?
   – What is a suitable perceptually-consistent similarity measure?
• User interface
   – Modality? Text or spoken language or others?
   – Interactive or batch? Will dialogue be available?
           Multimedia Archiving
• Facts:
  – Often in compressed format and needs large
    storage space
  – Content index will also occupy storage space
• Issues
  – Granularity must match underlying file system
  – Logical versus physical segmentation
  – File allocation on file system must support multiple
    stream access and low latency
          Indexing and Retrieving
• Index                         • Retrieval
  – A very high dimensional        – Retrieval is a pattern
    binary vector                    classification problem
  – Encoding of content            – Use index vector as the
    features                         feature vector
  – Text-based content can         – Classify each object as
    be represented with term         relevant and irrelevant to
    vectors                          a query vector (template)
  – A/V content features can       – A perceptually consistent
    be either Boolean vectors        similarity measure is
    or term vectors                  essential
               Term Vector Query
• Each document is represented by a specific term vector
• A term is a key-word or a phrase
• A term vector is a vector of terms. Each dimension of the vector
  corresponding to a term.
• Dimension of a term vector = total number of distinct terms.
• Example:
      Set of terms = [tree, cake, happy, cry, mother, father, big, small]
      document = “Father gives me a big cake. I am so happy”, “mother
      planted a small tree”
      Term vectors: [ 0, 1, 1, 0, 0, 1, 1, 0], [1, 0, 0, 0, 1, 0, 0, 1]
 Inverse Term Frequency Vector
– A probabilistic term vector representation.
– Relative Term Frequency (within a document)
   tf (t,d) = count of term t / # of terms in document d
– Inverse document Frequency
   df(t) = total count of document/ # of doc contain t
– Weighted term frequency


                   dt = tf(t,d) · log [ df(t)]

– Inverse document frequency term vector D = [d1, d2, … ]
              ITF Vector Example
          Document 1: The weather is great these days.
          Document 2: These are great ideas
          Document 3: You look great
               Eliminate: The, is, these, are, you

Term       tf(t,1)   tf(t,2)   tf(t,3)   df(t)   D1    D2   D3
Weather        1/6       0       0        3        0.08 0.00 0.00
great          1/6       1/4       1/3    1        0.00 0.00 0.00
day            1/6     0         0        3        0.08 0.00 0.00
idea         0           1/4     0        3        0.00 0.12 0.00
look         0         0           1/3    3        0.00 0.00 0.16
Human Computer Interface
      Voice, gesture
      push button/key                   Command
      expression, eye




           HCI is a match-maker: Matching
           the needs of human and computers

   Sensation: visual
   audio, pressure                            Data
   smell: virtual
   environment
Basic HCI Design Principles
• Consistency: Same command means the same
    thing
•   Intuition: Metaphor that is familiar to the user
•   Adaptability: Adapt to user’s skill, style
•   Economy: Use minimum efforts to achieve a goal
•   Non-intrusive: Do not decide for user without
    asking
• Structure: Present only relevant information to
    user in a simple manner.
                  User Models
• User Profiles:
  –   Categorize users using features relevant to tasks
  –   Static features: age, sex, etc.
  –   Dynamic features: activity logs, etc.
  –   Derived features: skill levels, preferences, etc.
• Use of Profiles for HCI
  – Adaptation: Customize HCI for different category
    of users
  – Better understanding of user’s needs
Principles of Dialogue Design
•   Feedback: Always acknowledge user’s input
•   Status: Always inform users where are they in the system
•   Escape: Provide a graceful way to exit half way.
•   Minimal Work: Minimize amount of input user must
    provide
• Default: Provide default values to minimize work
• Help: Context sensitive help
• Undo: Allow user to make unintentional mistake and
    correct it
• Consistency:
             Performance Evaluation
• Document retrieval problem is a      • Precision Recall Curve
  hypothesis testing problem:             – P(recision) = w/(w+y) is a
  H0: di is relevant to q (r=1)             measure of specificity of the
                                            result
  H1: di is irrelevant to q (r=0)         – R(ecall) = w/(w+x) is an indicator
• Type I error (Pe1=P{r=0|H0})              of completeness of the result.
  Relevant but not retrieved.          • Operating curve
• Type II error (Pe2 =P{r=1|H1}) :        – Pe1 = x/(w+x) = 1 – R
  Irrelevant but retrieved.               – Pe2 = y/(y+z) = F(allout)
Contingency table for evaluating       • Expected search length =
  retrieval                              average # of documents need to
                                         be examined to retrieve a given
             Retrieved Not retrieved     number of relevant documents.
   Relevant     w            x         • Subjective criteria
  Irrelevant     y           z
             Example: MetaSEEk

• MetaSEEk-A meta-search engine
  – Purpose: retrieving images
  – Method: Select and interface with multiple on-line
    image search engines
  – Search Principle: Performance of different query
    classes of search engines and their search
    options

       A. B. Benitez, M. Beigi, and S.-F. Chang, Using Relevance
       Feedback in Content-Based Image Metasearch, IEEE Internet
       Computing, Vol. 2, No. 4, pp. 59-69, July/August 1998
           Basic idea of MetaSEEk
• Classify the user queries into different clusters by
  their visual content.
• Rank the different search engines according to their
  performance for the different classes of user queries
• Select the search engines and search options
  according to their rank for the specific query cluster
• Display the search results to User
• Modify these performance according to the user
  feedback
Overview-Basic components of a
      meta-search engine
  Content-Based Visual Query (1)

• Advantage
  – Ease of creating, capturing and collecting digital
    imaginary
• Approaches
  – Extract significant features (Color, Texture, Shape,
    Structure)
  – Organize Feature Vectors
  – Compute the closeness of the feature vectors
  – Retrieve matched or most similar images
  Content-Based Visual Query (2)
             Improve Efficiency

• Keyword-based search
  – Match images with particular subjects and narrow
    down the search scope
• Clustering
  – Classify images into various categories based on
    their contents
• Indexing
  – Applied to the image feature vectors to support
    efficient access to the database
             Cluster the visual data

• K-means algorithm
  – Simplicity
  – Reduced computation
• Tamura algorithm (for text)
• For Color, feature vector are calculated using
  the color histogram
• Using Euclidean distance
Conceptual structure of the
 meta-search database.
Multimedia summary and filtering
• Summary                         • Filtering
   – Text: email reading             – Same as retrieval: filter
   – Image: caption                    out irrelevant objects
     generation                        based on a given
   – Video: high-lights, story         criterion (query)
     board
                                     – Often need to be
• Issues:                              performed based on
   –   Segmentation                    content features
   –   Clustering of segments            • E.g. filtering traffic
   –   Labeling clusters                   accidents or law
                                           violations from traffic
   –   Associate with syntactic
                                           monitoring videos
       and semantic labels
  Content based Coding and Post-processing

• Different coding                • Multiple abstraction layer
  decisions based on low            coding
                                      – An analysis/synthesis
  level content features                approach
   – coding mode (inter/intra         – Synthesize low level
     selection)                         contents from higher level
   – motion estimation                  abstraction
                                          • E.g. texture synthesis
• Object based coding
                                  • Content based post-
   – Encoding different             processing
     regions (VOP) separately
                                      – Identify content types and
   – Using different coder for          en synthesize low level
     different types of regions         content

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:12/26/2011
language:
pages:34