DEVIEW 2019

Since 2008, DEVIEW has been the South Korea’s most prominent tech forum on software engineering. It is unarguably a forum from which developers and researchers share ideas and get inspirations. Approximatively 3,000 local and foreign software developers and tech industry officials participated in this year’s conference. DEVIEW 2019 was held at COEX Grand Ballroom, Seoul, from October 28 to 29.

President Moon Jae-In attended DEVIEW 2019 as a keynote speaker. His speech carried significance in that he was the first president to attend to the event and deliver the keynote address at such a prestigious software engineering conference. During his keynote address at DEVIEW 2019, President Moon Jae-in promised to provide significant support on the nation’s artificial intelligence. He said that the government had allocated an increased budget of 1.7 trillion South Korean Won (1.4 billion USD) on data, network, artificial intelligence. Since inauguration, the government has also established The Presidential Committee on the Fourth Industrial Revolution and offered support in data, network, and artificial intelligence, the three innovative industries.

President Moon’s keynote speech not only shared prospects in the artificial intelligence industry at the national level but also confirmed that DEVIEW is the nation’s most prestigious conference in software engineering and that NAVER Corporation is the front runner in the South Korean artificial intelligence market.

Read More

NAVER AI Hackathon 2019 #Speech

As of October 27, 2019, the third NAVER hackathon has ended.

NAVER selected 100 teams through document screening. The shortlisted teams were invited for the preliminary round. The second online round was held on NSML (Sung et al., 2017) from September 16 to October 4. The participants solved problems on speech recognition using the Korean phone network database. Then from October 7 to 10, the top 30 teams moved on to the next round: the final online round. The final offline round was held at the Chuncheon Connect One center from October 26 to 27. The final mission was to solve the making of the optimized model by considering the recognition rate, speed, and model size.

Read More

Subword Language Model for Query Auto-Completion (EMNLP-IJCNLP 2019)

Gyuwan Kim

arXiv Github

Motivations to Faster Neural Query Auto-Completion

When browsing on search engines, such as NAVER, users type in the information which they want to look for. Query auto-completion (QAC) suggests most likely completion candidates when a user enters the input. It is one of the essential features of search engines. In this paper, we suggest a method to speed-up QAC and eventually maximize user experience. 

The selection of an appropriate granularity level for text segmentation has been long studied over the variety of natural language processing problems: character, subword, word, phrase, sentence, and even paragraph. The best-performing granularity generally depends on tasks and datasets. The recent neural QAC models rely on a character-level language model since QAC systems need to respond whenever a user inputs a query in a character-by-character manner.

The generation process is auto-regressive, and the size of the search space is exponential to the sequence length. However, long character sequences make predictions slow and inaccurate in the constraints of limited computation. Also, character-level models are susceptible to errors because of long-range dependency. Thus, given these disadvantages, a need for a shorter query representation arises.

Read More

INTERSPEECH 2019

INTERSPEECH 2019, the world’s most prominent conference on the science and technology of speech processing, features world-class speakers, tutorials, oral and poster sessions, challenges, exhibitions, and satellite events, gathering around thousands of attendees from all over the world. This year’s Interspeech conference was held in Graz, Austria, with NAVER-LINE running a corporate exhibition booth as a gold-level sponsor.

Four papers of Clova AI researchers have been accepted at INTERSPEECH 2019. Please refer to the table in the below for an overview:

Probability density distillation with generative adversarial networks for high-quality parallel waveform generationRyuichi Yamamoto, Eunwoo Song, Jae-Min Kim
Parameter enhancement for MELP speech codec in noisy communication environmentMin-Jae Hwang, Hong-Goo Kang
Who said that?: Audio-visual speaker diarisation of real-world meetingsJoon Son Chung, Bong-Jin Lee, Icksang Han
My lips are concealed: Audio-visual speech enhancement through obstructionsTriantafyllos Afouras, Joon Son Chung, Andrew Zisserman
Read More

Mixture Content Selection for Diverse Sequence Generation (EMNLP-IJCNLP 2019)

Mixture Content Selection for Diverse Sequence Generation (EMNLP 2019)

Jaemin Cho, Minjoon Seo, Hannaneh Hajishirzi

arXiv Github

Seq2Seq is not for One-to-Many Mapping

Comparison between the standard encoder-decoder model and ours

An RNN Encoder-Decoder (Seq2Seq) model is widely used for sequence generation, in particular, machine translation in which neural models are as competent as human translation in some languages. Despite such accuracy, a generic encoder-decoder model often exhibits poor performance in generating diverse outputs.

There are two different kinds of relationships between the source and target sequences. First, paraphrasing or machine translation tasks involves a one-to-one relationship, as the generated sequences should convey an identical meaning. On the other hand, summarization or question generation tasks have a one-to-many relationship in that a source sequence can be mapped to diverse sequences by focusing on different contents in the source sequence.

The semantic variance among the targets is generally low in tasks that have a one-to-one relationship. For these tasks, it is reasonable to train a model with maximum likelihood estimation (MLE) and encourage all generated sentences to have a similar meaning with the target sentence. For example, the sentence, “What kind of food do you like?” can be translated into as following: “Quel genre de nourriture aimez-vous?” / “Quel genre de plat aimez-vous?” Even though there are two variations in the translation of the source sentence, the meaning is almost identical.

However, the maximum likelihood estimation may show unsatisfactory performance at a one-to-many relationship task, where a model should produce outputs with different semantics. Given the following sentence in a question generation task, Donald Trump (1946 -) is the 45th president of the USA (Target Answer: Donald Trump),” one can think of the following questions with a different focus each: “Who was born in 1946? “Who is the 45th president of the USA?” Nevertheless, training a sequence generation model with maximum likelihood estimation often causes a degeneration problem (Holtzmann et al. 2019, Welleck et al. 2019). For example, a model produces “Who is the the the the the …” because “Who,” “is,” and “the” are the most frequent words in the data. This phenomenon happens when a model fails to capture the diversity in the target distribution. As you can see in the figure below, when the target distribution is diverse (has multiple modes), mode collapse can even hurt the accuracy as well as diversity.

In multi-modal distribution tasks, a one-to-one mapping learned from maximum likelihood estimation may result in suboptimal mapping.

To tackle this issue, we reformulate sequence generation in two stages: 1) Diverse Content Selection and 2) Focused Generation

Read More