Document Sample
FINGERTIPS Powered By Docstoc
					convenient MIR systems
vision vs. reality check, research & e-commerce
Stephan Baumann

• Personal Profile
• Convenient Music Information Retrieval
  – Multi-modal queries
  – Identification by description
  – Multi-facet music similarity
     • Timbre
     • Lyrics
     • Cultural aspects
• Project MPEER: P2P, semantic web and MIR
              Research Diary (1991-2003)

•   1991/92   optical music recognition
•   1992/93   online handwriting recognition
•   1993/94   optical music recognition
•   1995/97   document analysis and understanding
•   1996      first look on MultimediaIR (S.Pfeiffer)
•   1998/99   spinoff activities with Insiders GmbH
•   2000      freelancing/research for draft MIR system
•   2001      co-founding Sonicson GmbH
•   2001/03   subjective music similarity (Ph.D.. Sep03)
                      Desiderata MIR [Huron]

•   1. Access to all of the world’s music
•   2. Access via an indexing method
•   3. Fair use (reimbursement to all contributors)
•   4. Open system
•   5. Self-correcting system
•   6. Ensurement of privacy and cultural practices
                 MIR Categorization [Futrelle]
Representation       Description                 Research

Symbolic             Notation,                   Matching, Theme/Melody
                     Event-based recordings      Extraction, Voice Separation,
                     (MIDI),                     Musical Analysis
                     Hybrid representations
Audio                Recordings, Streaming       Sound/Song Spotting,
                     Audio, Instr. Libraries     Transcription, Timbre/Genre
                                                 Classification, Musical
                                                 Analysis, Recommendation
Visual               Scores                      Score Reading (OMR)

Metadata             Cataloging, Bibliography,   Library Testbeds, Traditional
                     Descriptions                IR, Interoperability,
                                                 Recommendation Systems
                                                     Related Work
•   Audio:
     – [Blum, Wold], [Pfeiffer], [Foote], [Logan], ...
     – [Scheirer], [Tzanetakis], [Welsh], [Aucouturier], [Peeters], ...
•   Cultural:
     – [Whitman], [Pachet], [Ellis, Berenzweig]
•   Multi-modal MIR
     – [Bainbridge], ...
•   Recommendation
     – [Amazon, Moodlogic, MusicGenome, MuBu, MongoMusic], ...
     – [Uitdenbogerd]
•   User Models
     – [Chai, Vercoe], [Rolland]
•   Music Psychology
     – [Bruhn, Rösing], [Gabriellson, Västfjäll], ...
•   Usability, Convenience
     – [Shneiderman], [Nielson], ...

• Using natural language as input for queries of
• Accessing meta data, symbolic and audio layers in
  one interface
• Evaluation of usability (e.g. eye-tracking + user
• Acquisition of audio features, symbolic features, meta
  data and lyrics
• Machine communication by using shared music
  ontologies (MPEG-7, RDF/S, DAML-S)

                                                 matching of
                                                ambiguities and

               treatment of
             refinements and                                           extraction of musical
                 negations                                            concepts from natural
                                                                        language queries

                                                                  generation of SQL
                               recognition of                        queries on
  Intention-based                intention                            demand
result presentation
          Software Development Lifecylce
• System Design Philosophy: Google-Style
• 1. Collection of User req. V1
    – Offline
    – 20 germans, different user segments
• 2. Setup of prototype V1
    – Online Refinement of req. V1 -> Introduction of PhoneticMatch
• 3. Collection of User req. V2
    – Online with prototype V1
    – 100 american native-speakers, internet-aware users
• 4. Setup of prototype V2
    – Bilingual phonetic match
    – NLP frontend
    – Audio-based music similarity
• 5. Scaling of phonetic match component for commercial website

                                         ´s no.1 hit                 Convenience

         status que -> Status Quo
         golgen earing -> Golden Earring
         Fisher Set -> Fischer Z
         Novospaski Chor -> Novo Spassky Chor
         four none blondes -> 4 Non Blondes
         Matchbox twenty -> Matchbox 20

Statistics: 540.000 queries/month  400.000 queries for artists/month  80.000 fuzzy queries for artists/month
Usability Evaluation: helping text
           Multi-facet Music Similarity

• Audio: MFCCs
• Lyrics: TFIDF
• Cultural:
  – Webcrawling
  – POS
Song Similarity: Audio-based Perception
  •   Feature Extraction
       –   Input Segment [30..60] sec
       –   30ms Hanning-Windows, Log Spectrum, Mel-Scale, Inverse Fourier Transform
       –   1000 vectors using the first 13 MFCCs
  •   Representation
       –   Intra-Song-Clustering -> Song Signature [Logan]
       –   (Gaussian Mixture Models [Aucouturier])
  •   Similarity Measure
       –   Euclidean Distance [Foote]
       –   Kullback-Leibler Distance [Logan, Aucouturier]
       –   (Approximative solutions: Sampling [Ellis, Aucouturier])
       –   DistMinMean [Ellis]
       –   Earth Moving Distance (EMD) [Logan]
  •   Different Features & Similarity Measures
       –   [Welsh] Tonal histograms, tonal transition, volume, tempo, noise->Euclidean Distance
       –   [Rauber&Frühwirth] Psychoacoustic Features -> Hierarchical SOM
       –   [Pfeiffer] A review of MP3-native features
       –   ...
Perception of similar Timbre in Songs:
 • Audio Database: 700 MP3s of mainstream music at full-length, 40 artists, 70 different genres
 • Evaluation: no GT available! only anecdotal evidence or genre/artist/volume GT
 Lyrics: Vector Space Model (TFIDF)

• Representation of a Collection of Lyrics
   # of terms k:
   Song j:
   Occurence of term h in collection d(h):
   Weight of term j in song i:
• Similarity metric
                       Song Similarity: Lyrics (1)
Reference Song 112: Lucy pearl - Dance tonight.txt
Most-relevant terms:     toast spend tonight dance money
1. Similar Song : Lucy Pearl - you (feat. snoop dogg and Q-tipp).txt
2. Similar Song: Phil Collins - Please Come Out Tonight.txt
3. Similar Song: Madonna - Into the groove.

Reference Song 56: Das Kind Vor Dem Euch.txt - die fantastischen vier
Most-relevant terms:    wollten euch sehn entsetzt selben
1. Similar Song: Die fantastischen Vier - Auf Der Flucht.txt
2. Similar Song: Freundeskreis - Mit Dir.txt Artist:
3. Similar Song: Die fantastischen Vier – Populär

Reference Song 145: madonna - Paradise.txt
Most-relevant terms: remains pas encore fois moi
Zero Hits
               Song Similarity: Lyrics (2)

Reference Song 193: Phil Collins - One More Night.txt
Most-relevant terms: forever wait night cos ooh
1. Similar Lyrics: Phil Collins - YOU CAN'T HURRY LOVE.txt
2. Similar Lyrics: Phil Collins - Inside Out.txt
3. Similar Lyrics: Phil Collins - This must be Love.txt

Reference Song 297: Cat Stevens - Father And Son.txt
Most-relevant terms: fault decision marry son settle
1. Similar Lyrics: Phil Collins - We're Sons Of Our Fathers.txt
2. Similar Lyrics: Sheryl Crow - No One Said It Would Be Easy.txt
3. Similar Lyrics: George Michael - Father Figure.txt
Artist Similarity: Cultural Aspects
 Web Crawling+PartOfSpeech+TFIDF
  adj Terms   TFIDF              Phrases     TFIDF
daft          0,20463   techno music         0,86982
new           0,14242   old school           0,80009
french        0,12907   great techno buzz    0,40004
different     0,09314   overall groove       0,40004
digital       0,08607   electronic artists   0,40004
vocal         0,07558   new wave             0,40004
cool          0,07339   usual drum n bass    0,40004
electronic    0,06887   only band            0,36956
funky         0,06497   big thing prodigy    0,36956
underground   0,06497   good beat            0,34793
   Visual Evaluation: Similarity (Cosine)






Recall/Precision against P2P, AMG data

 Evaluation ?![Downie, Uitdenbogerd]
                  Rel.Feedback (Rocchio)
  Listening             -subjective
    mode           -context-dependent
                     -„personal taste“

                                      Cosine                Personal
                                       vs.                  Classifier

   Clustering           Similarity         Classification          Experiment ?   P2P=collabor.

 Unsupervised           Learning?          Supervised                      GroundTruth

                    VectorSpaceModel                               MusicSeer ?    AMG=Experts
              Part Of Speech + TermWeighting

                      Web Sources
Psychological Factors >>Musical Taste
  • Personality >> preferred Styles, Genres
     – Stability
     – Introversion / Extraversion
     – Aggressive / Passive
  • Socio-economics >> preferred Styles, Genres
  • Demographic >> similar users in CF approaches >> recos
     – Gender
     – Age
  • Situation
     – Mood >> tempo, tonality, beatness, pitch height
     – Listening Mode [Huron]
          User Model [Chai,Vercoe]
   <context>I’m happy
    <tempo>very fast</tempo>
<genre>blues</genre> <generalbackground>
                          <name>John White </name>
<genre>rock/pop</genre>   <education>MS</education>
   </context> <animal>dog</animal>
<composer>Wolfgang Amadeus Mozart</composer>
<artist>Beatles</artist> <sex>male</sex>
    <tempo>very slow</tempo>
    <softness>very soft<softness>
           Multi-facet Music Similarity and
                      Adaptive User Model
•   Hard-wired multi-facet similarity [Whitman]
•   Weighting of audio vs. cultural description by slider usage
•   Description Weight Vectors (DWV) [Rolland]
     – Original work for melodic similarity
     – DWV contains weight for each description in the representation
     – Weight is varying with user interaction
     – Explicit user feedback: re-ranking of system´s output
     – Implicit adaptation of weights
•   Future Work
     – Apply DWV to multi-facet similarity (audio,lyrics,cultural)
     – Infer initial setting of weights according to psychological factors
                              Project MPEER
"In a world of
spontaneously federating
services, there is no point
in having a proprietary
service, there is no point
in staying out of the
directory, there is no
point in using an XML
protocol that no one
understands, there is no
point in basing it on a
proprietary server, and
there is no need to justify
the obvious error in
following that path."
- Simon Phipps, chief
technology evangelist,
Sun Microsystems, Inc.,
                                                                                 MPEER Objectives
• Relate MIR to the Semantic Web activities (W3C)
• Create (composite) Semantic Web Services for MIR
• Explore the P2P computing paradigm (shared resources)
                    “Bringing the web to its full potential” [Fensel, Bussler]

   Distributed /

                   Web Services                        Intelligent Web Services

                                                      WFSL -> WSMF
                   UDDI, WSDL, SOAP

                   WWW                                 Semantic Web
   Centralized /
   Static                                              RDF, RDF(S),
                   URI, HTML, HTTP                     DAML, OIL

                                                                Formal Semantic
„Title Artist Volume Genre
                                                       MPEER Architecture
    Bpm Loud Sound
       Like Dislike
      SimilarTo ...“

                                                Music Similarity                                    WebService
                             P2P Client                                 Semantic Web                - Ontologies, Taxonomies
     User                      GUI                                        Wrapper
                                                   Loudness                                         - CD-Retailers, EMD
                                                                                                    - MIR services
                                                                                                             - Audio ID
                                                Basic Features,                                              - Thumbnails
                                                  Descriptors                                                - ...

                                      P2P Client/Server (Jtella/JXTA)

                                                        Meta Data (XML / MPEG-7 / RDF-S)
                                                               Meta Data (XML / RDF)
            Audio(MP3)                                          Meta Data (XML / RDF)
             Audio(MP3)             Title,Artist,Volume,Genre,bpm,Loudness,Timbre,Like,Dislike,SimilarTo
              Audio(MP3)                 Title,Artist,Volume,Genre,bpm,Loud,Sound,Like,Dislike,SimilarTo
                  MPEER: composite Webservice
•   Service Type: „query service“
     –   Sub Type: Semantic web enabled
     –   Domain: Music
     –   Supported ontologies: {ontoson,, allmusicguide, ..}
•   Port Types:
     –   Identification by audio, Similarity by audio, Retrieval by partial information
     –   Personalized recommendations, Playlist generation
     –   Music-Question Answering
•   Operations/Messages of Port Type Identification by audio:
     –   IF_NOT_MP3(input)->Convert2MP3(input)->CalculateMetadata-> ...
•   Composite, Distributed Services: (maybe P2P using users local content&processing power)
     –   (1) MPeer.getEverythingFrom(Prince)
     –   (2)
     –   (3) SpecialArtistService=AllMusicGuide.detailedInfo
     –   (4) NegotiateContract(contract1,MPeer,AllMusicGuide)
     –   (5) Contract1.StartTransaction(MPeer,AllMusicGuide)
     –   (5.1) AllMusicGuide.detailedinfo(Prince)
     –   (5.2) ...
Prototypical P2P Client
OpenSource Tools: Ontology Editor
OpenSource Tools: DataMining, ML
• The Web offers potential beyond symbolic or
  audio-based MIR reflecting cultural issues
• User-centric MIR systems may benefit from user
  models and situation-driven adaptation
• The field is too large to be handled by individual
• Composite web services offer a way for
  collaboration on the topic and maybe to provide
  holistic, high-quality MIR systems

Shared By: