INTERSPEECH 2019, the world’s most prominent conference on the science and technology of speech processing, features world-class speakers, tutorials, oral and poster sessions, challenges, exhibitions, and satellite events, gathering around thousands of attendees from all over the world. This year’s Interspeech conference was held in Graz, Austria, with NAVER-LINE running a corporate exhibition booth as a gold-level sponsor.

Four papers of Clova AI researchers have been accepted at INTERSPEECH 2019. Please refer to the table in the below for an overview:

Probability density distillation with generative adversarial networks for high-quality parallel waveform generationRyuichi Yamamoto, Eunwoo Song, Jae-Min Kim
Parameter enhancement for MELP speech codec in noisy communication environmentMin-Jae Hwang, Hong-Goo Kang
Who said that?: Audio-visual speaker diarisation of real-world meetingsJoon Son Chung, Bong-Jin Lee, Icksang Han
My lips are concealed: Audio-visual speech enhancement through obstructionsTriantafyllos Afouras, Joon Son Chung, Andrew Zisserman
Read More

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis (ICCV 2019 Oral)

Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee

arXiv Github

Motivations for this Research


Examples of regular (IIIT5k, SVT, IC03, IC13) and irregular (IC15, SVTP, CUTE) real-world datasets

Referred to as scene text recognition (STR), reading text in natural scenes, as shown above, has been an essential task in many industrial practices. In recent years, researches have proposed an increasing number of new scene text recognition (STR) models, with each model claimed to have widened the boundary of technology.

While existing methods have pushed the boundary of technology, means for holistic and fair comparison have been, in large, missing in the field because of the inconsistent choices of training and evaluation datasets. It is not easy to determine whether and how much the new module has improved upon the current art, because of the different assessment and testing environments that make it challenging to compare reported numbers at face value.

Problem: Inconsistent Comparison

The table exhibits the performance of existing STR models with their inconsistent training and evaluation settings. This inconsistency hinders the fair comparison among those methods. We present the results reported by the original papers and also show our reimplemented results in a unified and consistent setting. In the last row, we also show the best model we have found, which shows competitive performance to state-of-the-art methods.
The top accuracy for each benchmark is shown in bold.

We examine different training and evaluation datasets used by prior works and point out the discrepancies. We aim to highlight how each work differs in constructing and using their datasets and scrutinize the bias in comparing performance among different works.

Read More