NAVER AI Hackathon 2019 #Speech

As of October 27, 2019, the third NAVER hackathon has ended.

NAVER selected 100 teams through document screening. The shortlisted teams were invited for the preliminary round. The second online round was held on NSML (Sung et al., 2017) from September 16 to October 4. The participants solved problems on speech recognition using the Korean phone network database. Then from October 7 to 10, the top 30 teams moved on to the next round: the final online round. The final offline round was held at the Chuncheon Connect One center from October 26 to 27. The final mission was to solve the making of the optimized model by considering the recognition rate, speed, and model size.

Read More

Subword Language Model for Query Auto-Completion (EMNLP-IJCNLP 2019)

Gyuwan Kim

arXiv Github

Motivations to Faster Neural Query Auto-Completion

When browsing on search engines, such as NAVER, users type in the information which they want to look for. Query auto-completion (QAC) suggests most likely completion candidates when a user enters the input. It is one of the essential features of search engines. In this paper, we suggest a method to speed-up QAC and eventually maximize user experience. 

The selection of an appropriate granularity level for text segmentation has been long studied over the variety of natural language processing problems: character, subword, word, phrase, sentence, and even paragraph. The best-performing granularity generally depends on tasks and datasets. The recent neural QAC models rely on a character-level language model since QAC systems need to respond whenever a user inputs a query in a character-by-character manner.

The generation process is auto-regressive, and the size of the search space is exponential to the sequence length. However, long character sequences make predictions slow and inaccurate in the constraints of limited computation. Also, character-level models are susceptible to errors because of long-range dependency. Thus, given these disadvantages, a need for a shorter query representation arises.

Read More