YahooQA splits

Dataset splits for Yahoo Answers used in SIGIR 2017, AAAI 2018 and WSDM 2018 papers. Check the papers below for model comparisons on this dataset.

Original data comes from

Usage of dataset splits

You will find a .pkl file containing a dictionary object. The data is split into train, test and dev, which are by itself, dictionaries of the format train_QA[question] = [[ans1,0],[ans2,1],[ans3],0] etc..

Please contact me at if there are any issues. If I am not supposed to be publicly releasing my dataset splits, please let me know as well.


If you use our dataset splits, please cite our paper:

Splits for YahooQA dataset used in SIGIR'17 and AAAI'18 paper.



