Addressee and Response Selection for Multilingual Conversation

Dialog Dataset : Download

ARS(Addressee and Response Selection)

How to create corpu

We create multilingual-ASR corpus based on previous work ([Ouchi and Tsuboi, 2016]).
First, we collect raw data from Ubuntu IRC Logs.
We apply word segmentation for preprocess. For language detection, we utilize [Nakatani, 2010].
We split dataset into train-set (90%) development-set (5%) evaluation-set (5%). (randomly).