INTERSPEECH 2019, the world’s most prominent conference on the science and technology of speech processing, features world-class speakers, tutorials, oral and poster sessions, challenges, exhibitions, and satellite events, gathering around thousands of attendees from all over the world. This year’s Interspeech conference was held in Graz, Austria, with NAVER-LINE running a corporate exhibition booth as a gold-level sponsor.
Four papers of Clova AI researchers have been accepted at INTERSPEECH 2019. Please refer to the table in the below for an overview:
|Probability density distillation with generative adversarial networks for high-quality parallel waveform generation||Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim|
|Parameter enhancement for MELP speech codec in noisy communication environment||Min-Jae Hwang, Hong-Goo Kang|
|Who said that?: Audio-visual speaker diarisation of real-world meetings||Joon Son Chung, Bong-Jin Lee, Icksang Han|
|My lips are concealed: Audio-visual speech enhancement through obstructions||Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman|
Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee
Motivations for this Research
Examples of regular (IIIT5k, SVT, IC03, IC13) and irregular (IC15, SVTP, CUTE) real-world datasets
Referred to as scene text recognition (STR), reading text in natural scenes, as shown above, has been an essential task in many industrial practices. In recent years, researches have proposed an increasing number of new scene text recognition (STR) models, with each model claimed to have widened the boundary of technology.
While existing methods have pushed the boundary of technology, means for holistic and fair comparison have been, in large, missing in the field because of the inconsistent choices of training and evaluation datasets. It is not easy to determine whether and how much the new module has improved upon the current art, because of the different assessment and testing environments that make it challenging to compare reported numbers at face value.
Problem: Inconsistent Comparison
The table exhibits the performance of existing STR models with their inconsistent training and evaluation settings. This inconsistency hinders the fair comparison among those methods. We present the results reported by the original papers and also show our reimplemented results in a unified and consistent setting. In the last row, we also show the best model we have found, which shows competitive performance to state-of-the-art methods.
The top accuracy for each benchmark is shown in bold.
We examine different training and evaluation datasets used by prior works and point out the discrepancies. We aim to highlight how each work differs in constructing and using their datasets and scrutinize the bias in comparing performance among different works.Read More