AI Hub Data Application Competition


Clova AI’s LaRva team was the first runner-up at AI Hub Data Application Competition and was awarded the cash prize of 1,500,000 South Korean Won. The team shared its application of the AI Hub data to enhance the performance of LaRva.

The award ceremony took place at the AI for Society conference on 12th November 2019. The chairman of the NIA (National Information Society Agency), Yong-sik Moon presented the award to Minjeong Kim, who received the award as a representative of the whole team.

Please see below for details.

Purpose of Development

AI Call is a phone reservation and answering service that offers communication service to customers on behalf of waiters. AI Call aims to facilitate effective management of small businesses by providing such a useful service.

Challenges or Difficulties

The structure of the AI Call system is as above. When a customer calls a restaurant, a voice recognition system converts the customer’s voice into text. The converted voice in the text form returns an appropriate system response through the natural language understanding (NLU) module and the dialog manager (DM) module. The response is then converted into voice and forwarded to the customer via telephone.

In this process, the NLU module carries out two tasks, including intent classification and slot extraction. Intent classification is a task which classifies the intention of a speech of the customer. Slot extraction is a task that extracts necessary information from the speech.

The preliminary model implemented LaRva (Language Representation of Clova AI)[1] to perform these tasks. LaRva is a large scale pre-trained language model that was specialized in processing Korean. However, this model was not optimal in handling conversations involving numerous turns for the following two reasons. Firstly, the model uses spoken Korean data. Secondly, it is trained by receiving two-sentence segments as an input, as BERT[2] does.

We invented Dialog-LaRva (D-LaRva) that implements the conversational Korean language data and dialogue structure-specific model inputs and training methods.

Types of AI Hub Data Used

We first had to secure conversation data. We then realized that various conversational data were provided on AI Hub. The AI-Hub conversation corpus included data that can arise in many domains, including restaurants, cafes, accommodations, and retail stores. Thus, we used them to train our D-LaRva, as we believed that they would be appropriate for a conversational model to train a pre-trained language model.

Practical Ways to Develop AI Services

I would like to make a reservation for next Wednesday, 26th.

You can make a reservation for that date.

Then, please make it 14:00 26th.

How many people will come?

8 people in total

Would you like to make a reservation for the meal, as well?

Any recommendations, please?

Course A is the most popular.

Yes, then that one, please.

Ok, I will send you the reservation confirmation text message shortly.
May I take a look at the folded item?

Yes, you may.

Can I machine wash this sweater?

No, you should hand wash that item.

Do you have this black sweater in a small size?

No, we don’t have black ones. But we have blue ones.

Then may I try the blue sweater in a small size?

Yes, the changing room is farther inside.

There is a snag in this sweater. I would like to try a different one.

Yes, I will get you the new one right away.

We preprocessed the downloaded data as the photo above and gathered approximately 40,000 speech sets and created an input by adding other conversational data onto the sets. We masked several subwords followed and trained D-LaRva by employing N-gram masking and sentence order prediction. N-gram masking refers to a technique that matches those subwords. Sentence order prediction denotes a technique to tell whether the conversation sequence is in a forward or backward order.


Relative increase or decrease rates of performance
(based on the self-established evaluation of the dialog data)

As in the diagram above, D-LaRva trained by using the AI Hub data showed a better performance than its predecessor, LaRva, and Google Multilingual BERT. Interestingly, the intent classification performance was far better than that of slot extraction. We assume that unlike slot extraction, which generally is affected by past conversations, intent classification benefitted more from D-LaRva, which had learned the conversational flow.

Plans on Future Improvement

We developed the current version of AI Call as a target project to one specific small business. To carry the fine-tuning of intent classification tasks, we use the internally gathered data only. When we have expanded our domain, we plan to use intent data that are provided by AI Hub. Furthermore, we project that the model will show far better performance if we also use the intent data collected for AI Starthon[3].


  2. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

NAVER AI Burning Day

NAVER’s AI technologies, including speech, vision, OCR, voice, and face recognition, have been continuously evolving. Now, what kind of services and values can developers and engineers create by merging such technologies and novel ideas together?

Announcing a new hackathon on AI and web development, NAVER would like to provide applicants with an opportunity to create web services and applications on its AI API platform. NAVER’s researchers and engineers will also offer inspirational mentoring during the two-night-three-day-long final round.

Read More


Organized by the Association for Computer Linguistics special interest group on linguistic data (SIGDAT), Empirical Methods in Natural Language Processing is a prominent conference in the area of Natural Language Processing. The conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019 was held in Hong Kong from November 3 to 7, 2019. NAVER Corporation attended the conference as a gold-level sponsor.

The following papers were accepted at the EMNLP-IJCNLP 2019.

Subword Language Model for Query Auto-CompletionGyuwan KimSubword language model for query auto-completion (QAC) to decode faster while maintaining close accuracy compared to the character language model baseline, and proposing a new evaluation metric for QAC.arXiv
NL2pSQL: Generating Pseudo-SQL Queries from Under-specified Natural Language QuestionsFuxiang Chen, Seung-won Hwang, Jaegul Choo, Jung-Woo Ha, Sung KimA new dataset which has a better performance in managing and generating tables and is more applicable to the general situation than existing NL2SQLPaper
Mixture Content Selection for Diverse Sequence GenerationJaemin Cho*, Minjoon Seo, Hannaneh HajishirziA new mixture-based model for the generation of diverse queries and summariesarXiv
Read More

ICCV 2019

At ICCV 2019, Clova AI had a remarkable performance in research. Four papers have been accepted to ICCV 2019. Clova AI had won a great opportunity to present two of its accepted papers: CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features, and What is Wrong with Scene Text Recognition Model Comparisons? Dataset and Model Analysis.

CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (Oral) Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, YoungJoon YooA new data augmentation technique for the enhancement of backbone performance arXiv Github Blog
What is Wrong with Scene Text Recognition Model Comparisons? Dataset and Model Analysis (Oral)Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk LeeAn overhaul of OCR experiment protocols and benchmarks A proposal for the new framework to measure performance and analyze the existing STR modelsarXiv Github
Photorealistic Style Transfer via Wavelet TransformsJaeJun Yoo, YoungJung Uh, Sanghyuk Chun, Byungkyu Kang, Jung-Woo HaAn end-to-end wavelet pooling photorealistic style transfer without any post-processingarXiv Github Blog
A Comprehensive Overhaul of Feature DistillationByeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, Nojun Kwak, Jin Young ChoiA new knowledge distillation technique on ImageNet-1k that shows a better performance than the existing state-of-the-art methodsarXiv Github Blog
Read More