My research activities mainly deal with the modeling, analysis, and synthesis of sounds. For the near future, I would like to focus on the analysis, transformation, and synthesis of auditory scenes. More precisely, I aim at allowing the identification of the several sound entities (sources) which are perceived in a binaural mix. From the auditory scene composed by these entities, I would like to allow the musical manipulation of each individual entity and then resynthesize a different – transformed – scene, with transformations closely linked to the perception. This is what we will call active listening (see below), which is a generalization of a research area known as Computational Auditory Scene Analysis (CASA).
I'm interested in spectral models, and more precisely in sinusoidal modeling. These models rely on strong mathematical and physical bases, are closely linked to perception, are well-suited for musical transformations, and give rise to numerous interesting research topics in computer science. The basic structure of these models is the partial, a pseudo-sinusoidal oscillator whose parameters (mainly frequency and amplitude) evolve slowly with time. We have proposed (together with Myriam Desainte-Catherine) the Structured Additive Synthesis (SAS) model which places constraints on these parameters in order to allow the individual modification of musical parameters such as the pitch, loudness, or duration. This model also served as a starting point for the scientific study of the timbre. Modeling the evolutions of the spectral parameters using, again, a spectral model leads to hierarchical spectral modeling, allowing an enhanced control of the sound. Nowadays, most of the sound models are hybrid models, where the noisy (stochastic) part is separated from the sinusoidal (deterministic) part. We now wish to propose a new spectral model, more flexible, where these two parts are not separated anymore.
Beside the model itself, one also needs an accurate analysis method in order to obtain the model parameters from real existing sounds, as well as an efficient synthesis method in order to compute the digital sound from its modeling, possibly in real time.
The precision of the analysis method is extremely important, since it is the main factor for the perceived quality of the resulting sounds. Although numerous methods have already been proposed previously, it turned out that most of them were not suitable for practical use, since we typically need a realtime analysis method with sufficient precision under realistic hypotheses (unknown number of sinusoidal components, presence of noise, etc.). These methods were tested using the InSpect software. During my Ph.D., I proposed a new analysis method, which extends the classic Fourier analysis by also considering the first derivatives of the sound signal. This method greatly enhances the frequency precision of the classic short-time Fourier transform, and allows the use of smaller analysis windows, thus increasing the temporal resolution too. Thanks to this method, we have recently shown that many of the most used analysis methods were in fact equivalent in theory, and very close to the optimal in practice. These spectral analysis methods were designed for the stationary case. We can now generalize these methods to the non-stationary, where the parameters can evolve even within the analysis window.
But the weak point of the analysis chain is now the tracking of the trajectories (in frequency and amplitude) of the partials. We have shown (together with Mathieu Lagrange and Martin Raspaud) that linear prediction can be used to improve partial tracking algorithms. We propose in the future to lay the emphasis on partial tracking, since partial tracking algorithms still have to be enhanced.
In the synthesis stage, spectral modeling requires the computation of a large number of sinusoidal oscillations. The challenge is then to design an algorithm for generating the sequence of samples of each oscillator with as few instructions as possible. We have developed a synthesis algorithm whose complexity is close to optimal. This algorithm is based on the incremental computation of the sine function. In order to further fasten the additive synthesis process, we have successfully studied (during the Master's thesis ofMathieu Lagrange) the possibility of on-the-fly reducing the number of partials to be synthesized by taking advantage of psychoacoustic phenomena such as masking and of efficient data structures such as the skip-lists. We have now at our disposal the fastest linear (1 oscillator for 1 partial) additive synthesis technique. We have then investigated (during the Master's thesis of Matthias Robine) the use of non-linear techniques to fasten the synthesis even further. Although these non-linear techniques are extremely efficient, they lack flexibility and proved to be useless for our purposes. Finally, we proposed (together with Robert Strandh and Matthias Robine) an original fast additive synthesis method based on a unique polynomial generator in association with a priority queue, this algorithm being particularly efficient for low-frequency partials. The combination of all these methods (incremental computation of the sine function for the high frequencies, polynomial generation for the low frequencies, together with psychoacoustic considerations) gives the most efficient additive synthesis method, currently being implemented at the SCRIME in the ReSpect software library. From now, we chose to lay the emphasis on the increase of the synthesis quality. We have studied (in collaboration with LORIA – Nancy, ICP/INPG – Grenoble, and IRCAM – Paris) several polynomial phase models for the oscillators.
The applications of spectral modeling are numerous, either artistic of scientific. The SAS model favors for example the unification of the representation of sound and music at a sub-symbolic level, which is particularly useful for computer music. This model can ease musical composition, audio effects and musical transformations. We have added (during the Ph.D. of Joan Mouba) the ability for real-time spatialization, which places the listener into a virtual tridimensionnal acoustic space. The SAS model has been used for creation purposes by composers of electro-acoustic music (Jean-Michel Rivet, György Kurtag). The SAS model has even been used in the context of early-learning musical activities for young children and musical pedagogy. We have used (together with Mathieu Lagrange and France Telecom R&D) sinusoidal modeling for the coding and the compression of sounds, in order to allow their low-bitrate transmission and their storage. Sound restoration, consisting in recovering missing information within the structure of an altered sound, was also studied with success. We are also interested in sound indexing and classification, for an easy and efficient search within large audio databases for example. More recently, we have shown (together with Laurent Girin) the suitability of sinusoidal modeling for audio watermarking, that is for hiding (inaudible) information within sounds, which is extremely useful for example for audio copyright management.
Sound source separation is a major scientific and technical issue. We aim at isolating each sound source from a complex mono or stereophonic sound (such as the ones stored in standard Compact Discs), by using the spectral structure of this sound (Ph.D. of Mathieu Lagrange, Master's thesis of Grégory Cartier), and the spatial location of these sound sources (Master's thesis of Nicolas Sarramagna, Ph.D. of Joan Mouba), always by taking advantage of acoustic and/or psychoacoustic laws. This is one of our major research direction for the future.
Active listening: Nowadays, the listener is considered as a receptor who passively listens to the audio signal stored on various medias (CD, DVD audio, etc.). The only modification which is easy to perform is basically to change the volume. By observing the practice of the composers of electroacoustic music within the SCRIME, we realized that it was essential for them to be able to interact with the original media sound, while the sound is playing. We thus propose to enable any listener to have an active listening behavior, by giving him/her the freedom to interact with the sound in real-time during its diffusion, instead of listening to this sound in the usual – passive – way, with only very limited controls such as volume changes. We propose to offer the listener the possibility to also change the spatial locations of the individual sound sources, their respective volumes, pitches, and even timbres, as well as their durations, and the rhythm of the music.