AI Hub Data Application Competition

Congratulations!

Clova AI’s LaRva team was the first runner-up at AI Hub Data Application Competition and was awarded the cash prize of 1,500,000 South Korean Won. The team shared its application of the AI Hub data to enhance the performance of LaRva.

The award ceremony took place at the AI for Society conference on 12th November 2019. The chairman of the NIA (National Information Society Agency), Yong-sik Moon presented the award to Minjeong Kim, who received the award as a representative of the whole team.

Please see below for details.

Purpose of Development

AI Call is a phone reservation and answering service that offers communication service to customers on behalf of waiters. AI Call aims to facilitate effective management of small businesses by providing such a useful service.

Challenges or Difficulties

The structure of the AI Call system is as above. When a customer calls a restaurant, a voice recognition system converts the customer’s voice into text. The converted voice in the text form returns an appropriate system response through the natural language understanding (NLU) module and the dialog manager (DM) module. The response is then converted into voice and forwarded to the customer via telephone.

In this process, the NLU module carries out two tasks, including intent classification and slot extraction. Intent classification is a task which classifies the intention of a speech of the customer. Slot extraction is a task that extracts necessary information from the speech.

The preliminary model implemented LaRva (Language Representation of Clova AI)[1] to perform these tasks. LaRva is a large scale pre-trained language model that was specialized in processing Korean. However, this model was not optimal in handling conversations involving numerous turns for the following two reasons. Firstly, the model uses spoken Korean data. Secondly, it is trained by receiving two-sentence segments as an input, as BERT[2] does.

We invented Dialog-LaRva (D-LaRva) that implements the conversational Korean language data and dialogue structure-specific model inputs and training methods.

Types of AI Hub Data Used

We first had to secure conversation data. We then realized that various conversational data were provided on AI Hub. The AI-Hub conversation corpus included data that can arise in many domains, including restaurants, cafes, accommodations, and retail stores. Thus, we used them to train our D-LaRva, as we believed that they would be appropriate for a conversational model to train a pre-trained language model.

Practical Ways to Develop AI Services

I would like to make a reservation for next Wednesday, 26th.

You can make a reservation for that date.

Then, please make it 14:00 26th.

How many people will come?

8 people in total

Would you like to make a reservation for the meal, as well?

Any recommendations, please?

Course A is the most popular.

Yes, then that one, please.

Ok, I will send you the reservation confirmation text message shortly.
May I take a look at the folded item?

Yes, you may.

Can I machine wash this sweater?

No, you should hand wash that item.

Do you have this black sweater in a small size?

No, we don’t have black ones. But we have blue ones.

Then may I try the blue sweater in a small size?

Yes, the changing room is farther inside.

There is a snag in this sweater. I would like to try a different one.

Yes, I will get you the new one right away.

We preprocessed the downloaded data as the photo above and gathered approximately 40,000 speech sets and created an input by adding other conversational data onto the sets. We masked several subwords followed and trained D-LaRva by employing N-gram masking and sentence order prediction. N-gram masking refers to a technique that matches those subwords. Sentence order prediction denotes a technique to tell whether the conversation sequence is in a forward or backward order.

Performance

Relative increase or decrease rates of performance
(based on the self-established evaluation of the dialog data)

As in the diagram above, D-LaRva trained by using the AI Hub data showed a better performance than its predecessor, LaRva, and Google Multilingual BERT. Interestingly, the intent classification performance was far better than that of slot extraction. We assume that unlike slot extraction, which generally is affected by past conversations, intent classification benefitted more from D-LaRva, which had learned the conversational flow.

Plans on Future Improvement

We developed the current version of AI Call as a target project to one specific small business. To carry the fine-tuning of intent classification tasks, we use the internally gathered data only. When we have expanded our domain, we plan to use intent data that are provided by AI Hub. Furthermore, we project that the model will show far better performance if we also use the intent data collected for AI Starthon[3].

References

  1. https://tv.naver.com/v/11212559
  2. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  3. https://github.com/ai-starthon/AI_Starthon2019