New Meeting Corpus _ IDIAP

Document Sample
New Meeting Corpus _ IDIAP Powered By Docstoc
					New Meeting Corpus @ IDIAP

  Daniel Gatica-Perez, Iain McCowan, Samy Bengio

       Corpus Administration – Joanne Schulz
Technical Assistance – Thierry Collado, Olivier Masson

Features of the new corpus

l   Meeting room equipped with more sensors
l   Meetings based on scenarios
l   Majority of scenarios for real meetings
l   Participants act naturally
l   3-5 participants per meeting
l   Natural length: 30-80 minutes
Devices in the SMR

l   24 microphones (2 arrays, lapels, binaural manikin)
l   3 cameras
l   Projector capture device
l   Whiteboard capture device
l   Personal notes capture devices (Logitech pens)
l   Electronic versions of documents (e.g. papers)
l   Under evaluation
    –   close-view cameras
    –   EPFL camera whenever available
Features of each meeting

l   People entering/leaving get recorded
l   Real-life interruptions OK (e.g. latecomers, cell-phones)
l   Varying visual clutter (photographs, bookshelves)
l   Meeting artifacts included (documents, laptops, coffee)
l   Some meetings have agendas
l   Single and multi-session meetings
l   So far, native and non-native speakers are mixed
Current look
Current scenarios

l   Corpus project meeting
      Weekly meeting with the people involved in the recording
      process to discuss its progress.
l   Conference report
      People returning from conferences hold a meeting to present
      what they learned, and discuss emerging trends.
Current scenarios (2)

l   Book club
      Several clubs to read non-technical books, and meet to
      discuss them. Meetings will occur once or twice a month.
l   Technical reading group
      Staff members get together to review and discuss relevant
      papers on a field of interest, every two weeks.
Current scenarios (3)

l   Presentation rehearsals
      Students/researchers rehearse a presentation in front of a
      group. Discussion occurs naturally.
§   Creation of reading area at IDIAP
      Staff and students meet to discuss location, furnishings…
l   Other scenarios
       Open to be specified and played as the process goes on.
Current corpus status (1)

l   Recordings started in late November
l   18 meetings, ~14 hours (Jan 20th)
    –   Corpus project meeting:   5
    –   Conference report:        8
    –   Book club                 1
    –   Technical reading group   2
    –   Presentation rehearsals   1
    –   Other (Brno camera)       1
Current corpus status (2)

l   Media conversion procedure ready
    –   One script from DV tape to divX
l   Under evaluation:
    –   Subcontractor for speech transcriptions
    –   Annotation tool for meeting actions
         l   Noldus’ The Observer
         l   Brno video annotation tool
    –   Recruitment of annotators
The procedure

l   Each meeting coordinated by one person
l   Corpus administrator is present but “hidden”
l   Meetings booked via a web site (IDIAP only)
l   Room is ready: enter, hold meeting, leave
l   All feedback via e-mail
Ethical concerns

l   Steps to protect participants
l   Procedures to track the data and ensure
    respectful use
l   Creation of an internal ethics committee
l   There will be opportunity to ‘bleep’ data by
    meeting participants
Timeline and outlook

l   Core meeting set: 20 meetings
    –   Raw data on mmm:            late February
    –   Speech transcriptions:      mid Feb - ???
    –   Group action annotations:   mid Feb - late March
l   What to do with the dataset?
    –   Define common processing tasks
    –   Define protocols for evaluation
Initial group action annotations

l   Group turn-taking
    –   floor, dialogue, discussion
l   Group focus-of-attention
    –   whiteboard, presentation, notes, table, unfocused
l   Group level-of-interest
    –   High, low
l   Feedback via e-mail

Shared By: