Addressee and Response Selection for Multilingual Conversation
Dialog Dataset : Download
ARS(Addressee and Response Selection)
How to create corpu
We create multilingual-ASR corpus based on previous work ([Ouchi and Tsuboi, 2016]).
First, we collect raw data from Ubuntu IRC Logs.
We apply word segmentation for preprocess. For language detection, we utilize [Nakatani, 2010].
We split dataset into train-set (90%) development-set (5%) evaluation-set (5%). (randomly).
StatisticsoftheM-ARSdataset.