INTERSPEECH 2019

INTERSPEECH 2019, the world’s most prominent conference on the science and technology of speech processing, features world-class speakers, tutorials, oral and poster sessions, challenges, exhibitions, and satellite events, gathering around thousands of attendees from all over the world. This year’s Interspeech conference was held in Graz, Austria, with NAVER-LINE running a corporate exhibition booth as a gold-level sponsor.

Four papers of Clova AI researchers have been accepted at INTERSPEECH 2019. Please refer to the table in the below for an overview:

Probability density distillation with generative adversarial networks for high-quality parallel waveform generationRyuichi Yamamoto, Eunwoo Song, Jae-Min Kim
Parameter enhancement for MELP speech codec in noisy communication environmentMin-Jae Hwang, Hong-Goo Kang
Who said that?: Audio-visual speaker diarisation of real-world meetingsJoon Son Chung, Bong-Jin Lee, Icksang Han
My lips are concealed: Audio-visual speech enhancement through obstructionsTriantafyllos Afouras, Joon Son Chung, Andrew Zisserman

State-of-the-art technologies developed by Clova AI have attracted many attendees. The following are photos of our researchers at a poster presentation.

Ryuichi Yamamoto is passionately explaining about his research, probability density distillation with generative adversarial networks for high-quality parallel waveform generation.

A number of attendees are very focused on the poster presentation of our two researchers.

Two researchers have delivered a poster presentation on probability density distillation with generative adversarial networks for high-quality parallel waveform generation. This research is about the new distillation method using GAN for a high-quality parallel WaveNet vocoder.

(researcher name) is delivering a poster presentation on his paper, Parameter enhancement for MELP speech codec in a noisy communication environment. His presentation has also drawn a myriad of crowds.

The paper, Parameter enhancement for MELP speech codec in noisy communication environment focuses on the proposed parameter enhancement method directly enhances the noise-corrupted vocoder parameters of 2.4 kbit/s mixed excitation linear prediction (MELP) coder system without any pre-processing speech enhancement module. As a result, the proposed parameter enhancement achieves a much simpler and faster speech enhancement method than the conventional time-frequency (T-F) mask-based speech enhancement methods, whereas its enhancement quality remains high.

Joon Son Chung is presenting his dissertation to the audience at the poster venue.

Many crowds have gathered to listen to Joon Son Chung’s poster presentation on Who said that?: Audio-visual speaker diarisation of real-world meetings. This research is about Audio-visual speaker diarisation of the automatic recording of the proceedings.

Engineers of LINE have presented their accepted papers. Please refer to the table below for further information.

Multichannel Loss Function for Supervised Speech Source Separation by Mask-based BeamformingYoshiki Masuyama, Masahito Togami, Tatsuya Komatsu
Variational Bayesian Multi-channel Speech Dereverberation under Noisy Environments with Probabilistic Convolutive Transfer FunctionMasahito Togami, Tatsuya Komatsu

Professor Andrew Zisserman (University of Oxford) and Dr. Joon Son Chung (University of Oxford, Clova AI) have delivered a presentation on VoxSRC: VoxCeleb Speaker Recognition Challenge. As one of the satellite events approved by ISCA, this workshop has been sponsored by NAVER and LINE.