Yahoo answers are obtained from (Zhang et al., 2015). This is a topic classification task with 10 classes: Society & Culture, Science & Mathematics, Health, Education & Reference, Computers & Internet, Sports, Business & Finance, Entertainment & Music, Family & Relationships and Politics & Government. The document we use includes question titles, question contexts and best answers.
https://s3.amazonaws.com/fast-ai-nlp/yahoo_answers_csv.tg
we split the raw data into training set, development dataset and test dataset
- Training dataset:1260,000
- Development pairs:140,000
- Test pairs:60,000