Interfaces For Accessing Multimodal Meeting Data: Systems, Problems and Possibilities Simon Tucker and Steve Whittaker University of Sheffield Introduction The availability of multimodal meeting data increases the requirement for sophisticated mechanisms for browsing this data. In this talk we will examine a taxonomy of meeting browsers. We will highlight some of the problems associated with such systems. We will also describe potential directions for future research. Why Taxonomy? One aim of the AMI project is to generate new browser components and technology. To begin this task it is useful to get an understanding of the ‘browser space’. This allows us to identify where efforts are allocated and where future opportunities lie. There are many potential taxonomies. For example, a taxonomy could be produced which classified meeting browsers by the types of data it made use of. Browser Taxonomy We segregate browsers into four classes according to their focus: Audio Browsers. Video Browsers. Artefact Browsers. Discourse Browsers. An example system from each of these classes will now be described in turn. Audio Browsers Example: SpeechSkimmer (Arons, 1997) Playback at two levels of skimming, e.g. Pause shortening. Segment selection. Also allows for playback rate to be changed. Listeners can skim forwards and backwards through speech recordings. Video Browsers Example: Manga-Style (Girgensohn et al., 2001) Displays a sequence of keyframes. Size of each frame determined by an importance score. Browsing by selection of keyframes or by accessing a time-line. Artefact Browsers Example: Distributed Meetings (Cutler et al., 2002) Focus of browsing is captured whiteboard image. Interface also displays video of current speaker. Also able to navigate via a participant-involvement timeline. Furthermore, option to increase playback speed. Discourse Browsers Example: Lalanne et al. (2003) Navigation using both the ASR transcript and modes of discourse. Also aligns the document segments with discussion. All views are time synchronised. Novel kaleidoscope navigation device. Interim Summary Perceptual Semantic Audio Artefacts Speaker Turns. Presented Slides. Pause Detection. Agenda Items Emphasis. Whiteboard Annotations User Determined Markings. Notes – Both Personal and Private Documents Discussed During The Meeting Video Discourse Keyframes. ASR Transcript. Participant Behaviour. Named Entities. Mode Of Discourse. Possibility (1) – From Indices To Search Systems tend to focus on browsing via indices. There could be potential in making use of search in meeting browsers. There are potential benefits available from the use of text-processing techniques. For example navigation through a summary or by entities. Possibility (2) - Filtered Presentation The current set of browsers largely assume that users will want to review the entire meeting. There is an argument for browsers which filter the meeting and present a limited set of data. This filtering could be topic specific e.g. “Present anything to do with films said at the meeting” Or it could take a more abstract form e.g. “Summarise this hour long meeting in 5 minutes” Problem (1) - Assessment Most browser studies do not consider assessment, and those that do are largely not rigorous enough. Assessments are either too component-centric or are not comparative. The next generation of browsers should be assessed in relation to the field. See e.g. Browser Evaluation Test - Flynn and Wellner (2003). Problem (2) – Platform Assumption Current browsers assume that they will be used on a standard workstation. There are also advantages to providing access to meetings data for browsers with limited resources. For example, PDAs may be used to access meeting data between meetings. Such browsers would be significantly different from the feature-rich browsers described in the literature. For example, there may not be the screen space for a video component. Conclusion We have described a taxonomy of meeting browsers and have highlighted an example of each class. We have identified two problems with the current generation of meeting browsers. Furthermore, we have examined two areas with potential for future research. References B. Arons (1997), "SpeechSkimmer: A System for Interactively Skimming Recorded Speech", ACM Transactions on Computer-Human Interaction, pp 3-38. R. Cutler, Y. Rui, A. Gupta, J.J. Cadiz, I. Tashev, L. He, A. Colburn, Z. Zhang, Z. Liu, and S. Silverberg (2002), "Distributed Meetings: A Meeting Capture And Broadcasting System", Proceedings of 10th ACM International Conference on Multimedia, pp 503-512, Juan-les-Pins, France, 1-6 December. M. Flynn and P. Wellner, (2003), “In Search of a Good BET: A Proposal For a Browser Evaluation Test”, IDIAP-COM 03-11. A. Girgensohn, J. Borczky, and L. Wilcox (2001), "Keyframe-Based User Interfaces For Digital Video", IEEE Computer, pp 61-67. D. Lalanne, S. Sire, R. Ingold, A. Behera, D. Mekhaldi, and D. Rotz (2003), "A Research Agenda For Assessing The Utility Of Document Annotations In Multimedia Databases Of Meeting Recordings", Proceedings of 3rd International Workshop on Multimedia Data And Document Engineering, Berlin, Germany, September 8th.
Pages to are hidden for
"Accessing Multimodal Meeting Data Idiap Research Institute"Please download to view full document