Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

YahooQA splits

Dataset splits for Yahoo Answers used in SIGIR 2017, AAAI 2018 and WSDM 2018 papers. Check the papers below for model comparisons on this dataset.

Original data comes from

Usage of dataset splits

You will find a .pkl file containing a dictionary object. The data is split into train, test and dev, which are by itself, dictionaries of the format train_QA[question] = [[ans1,0],[ans2,1],[ans3],0] etc..

Please contact me at if there are any issues. If I am not supposed to be publicly releasing my dataset splits, please let me know as well.


If you use our dataset splits, please cite our paper:

  author    = {Yi Tay and
               Minh C. Phan and
               Anh Tuan Luu and
               Siu Cheung Hui},
  title     = {Learning to Rank Question Answer Pairs with Holographic Dual {LSTM}
  booktitle = {Proceedings of the 40th International {ACM} {SIGIR} Conference on
               Research and Development in Information Retrieval, Shinjuku, Tokyo,
               Japan, August 7-11, 2017},
  pages     = {695--704},
  year      = {2017},
  crossref  = {DBLP:conf/sigir/2017},
  url       = {},
  doi       = {10.1145/3077136.3080790},
  timestamp = {Sun, 06 Aug 2017 18:21:32 +0200},
  biburl    = {},
  bibsource = {dblp computer science bibliography,}
  author    = {Yi Tay and
               Luu Anh Tuan and
               Siu Cheung Hui},
  title     = {Cross Temporal Recurrent Networks for Ranking Question Answer Pairs},
  journal   = {CoRR},
  volume    = {abs/1711.07656},
  year      = {2017},
  url       = {},
  archivePrefix = {arXiv},
  eprint    = {1711.07656},
  timestamp = {Sun, 03 Dec 2017 12:38:15 +0100},
  biburl    = {},
  bibsource = {dblp computer science bibliography,}


Splits for YahooQA dataset used in SIGIR'17 and AAAI'18 paper.



No releases published


No packages published