; advancement
Learning Center
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>



  • pg 1
									                                        YEAR 2007

Participation to Interspeech 2007 (Antwerpen , August 27-31)
The first day was dedicated to the tutorials. I attended one of them on voice transformation, in direct
link with my thesis. The author was Yannis Stylianou, from Crete. He introduced the basics in voice
conversion and voice morphing using the time domain as well as the frequency domain. Future
potentialities in voice transformation were discussed.

The four next days were devoted to plenary (in the morning), oral and poster sessions. I firstly have
to admit I was impressed by the importance of the event : a complete week only for speech
processing with 7 sessions at the same time… All the names we use to write in our bibliographies
were present.

Most of the time, I prefered to follow the poster sessions – because oral sessions inherently required
a strong background in the concerned field. Even if all this people work in the same domain (speech
processing), it is impossible to have enough knowledge for understanding everything.

I particularly focused my attention to the oral session about Multimodal Speech Recognition because
two presentations were very close to my master thesis. This session and the discussions which ensue
were very fruitful.

On the other hand, I followed the sessions which dealt with 2 fields of my PhD :

    -   The Lombard effect and speech in noise
    -   Speech Synthesis using Hidden Markov Models.

For the first subject, three presentations were quite interesting :

- « Lombard Speech Impact on Perceptual Speaker Recognition », Ikeno, Hansen, University of Texas.
Hansen was known for having writtena famous paper about the importance of Lombard effect in
speech recognition.

- « Two-Stage System for Robust Neutral/Lombard Speech Recognition », Boril, Fousek, Hoge,
University of Prague. The authors are one of the rare groups which built a Lombard databse.

- « Speech Synthesis enhancement in noisy environments », Bonardo, Zovato, Loquendo in Torino.

The authors use a dynamic range controller for improving speech intelligibility.

As for the second subject, two presentations caught my attention :

    -   « An HMM-based Speech Synthesis System Applied to German and Its Adaptation to a
        Limited Set of Expressive Football Announcements », Krstulovic, Hunecke, Schroeder.
        The authors managed to reach a good voice quality even with little data for the training.
    -   « Implementation and Evaluation of an HMM-based Thai Speech Synthesis System »,
        Chomphan, Kobayashi, Tokyo Institute of Technology.
        I had the opportunity to converse a lot with the author. It gave me a lot of ideas for
        implementing it in French.

Participation to MMSP 2007 (International Workshop on Multimedia Signal
Processing- Chania, Crete October 1-3)
I was the first author of a paper dealing with feature selection (I wrote it at EPFL, Lausanne,
Switzerland). Unfortunately, the date coincided with the beginning of my FNRS grant. One of my
colleagues in Switzerland had the opportunity to go in my place for presenting a 20 minutes oral
session (and for benefiting from the Greek beaches and sun). Here is the paper details:

      Paper: D2O1.2
    Session: Image & Video I
       Time: Tuesday, October 2, 10:13 - 10:26
Presentation: Lecture
    Authors: Thomas Drugman; Faculte Polytechnique de Mons
               Mihai Gurban; Ecole Polytechnique Federale de Lausanne (EPFL)
               Jean-Philippe Thiran; Ecole Polytechnique Federale de Lausanne (EPFL)
    Abstract: We present a feature selection method based on information theoretic
              measures, targeted at multimodal signal processing, showing how we can
              quantitatively assess the relevance of features from different modalities. We
              are able to find the features with the highest amount of information relevant for
              the recognition task, and at the same having minimal redundancy. Our
              application is audio-visual speech recognition, and in particular selecting
              relevant visual features. Experimental results show that our method
              outperforms other feature selection algorithms from the literature by improving
              recognition accuracy even with a significantly reduced number of features.

Seminar of Information Technology research center (FPMs , Mons,
Belgium, October 11th)
The Lombard effect: analysis and applications: the Lombard effect refers to the speech changes due
to the immersion of the speaker in a noisy environment. These modifications are observed on an
acoustic, phonetic as well as an articulatory point of view. Through an hyper-articulation
(unconsciously most of the time), the speaker placed in a communicative context aims at maximizing
the intelligibility of his utterances. After an analysis of the different changes produced, hindrances
induced in automatic speech recognition and future potential applications in speech synthesis will be
Lecture at Computational Intelligence and Learning doctoral school
(Louvain-la-Neuve, November 5 th )
Masashi Shimbo, professor at the Computational Linguistics Laboratory in the Nara Institute of
Science and Technology (Japan), presented two courses:

Kernels on graph nodes and their application to link analysis : In this elementary tutorial, he
presented an interpretation of Kandola et al.'s von Neumann kernels in the context of link analysis,
with an emphasis on their relationship to the HITS importance ranking method. He then talked about
the effect of 'topic drift,' a problem which was first observed with HITS, but affects the von Neumann
kernels as well. The property of the von Neumann kernels is also compared with the kernels based
on the Laplacian matrix.

Introduction to conditional random fields and other discriminative sequence labeling methods : In
recent years, the conditional random field (CRF) have become a popular method in natural language
processing. It has not only served as an effective alternative to the hidden Markov model in sequence
labeling problems, but also provides a generic framework that are applicable to a wide range of
applications. This lecture started with a tutorial on the basics of CRFs, and their alternative
algorithms that are more light-weight. Some natural language processing tasks were described to
which these algorithms have been applied.

Tutorial on Quartz Composer and Isadora (FPMs, Nov 28 th and 29 th
Raphaël Sebbe and Celine Mancas-Thillou, both doctors in Image Processing, presented tutorials on
famous visual programming environments.

Quartz Composer is a node based visual programming language provided as part of the Xcode
development environment in Mac OS X v10.5 "Leopard" for processing and rendering graphical data.

Isadora is a proprietary graphic programming environment for Mac OS X and Microsoft Windows,
with emphasis on real-time manipulation of digital video. It has support for OpenSound Control.

Tutorial on Max-MSP (FPMs, December 6 th PM)
Nicolas D’Alessandro, PhD Student in Singed Voice Synthesis, presented a tutorial on Max-MSP, a
real-time sound processing programme.

Max is a graphical development environment for music and multimedia developed and maintained
by San Francisco-based software company Cycling '74. It has been used for over fifteen years by
composers, performers, software designers, researchers and artists interested in creating interactive
Tutorial on Blender (FPMs, December 19 th )
Sebastien Noël, PhD Student in Informatics, presented a tutorial on Blender, a powerful 3D
animation software.

Blender is a free software 3D animation program. It can be used for modeling, UV unwrapping,
texturing, rigging, skinning, animating, rendering, particle and other simulating, non-linear editing,
compositing, and creating interactive 3D applications.

To top