Skip to content
For <Does It Make Sense? And Why? A Pilot Study for Sense Making and Explanation>. Accepted by ACL2019
Branch: master
Clone or download
Latest commit 8711fb0 Sep 15, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information. Update Sep 15, 2019


This is the dataset for <Does It Make Sense? And Why? A Pilot Study for Sense Making and Explanation> , which has been accepted by ACL-2019.

You can download the paper through .

You can call this dataset as Sen-Making.

Data Example

Example Picture

Data Format

JSON Sample

  "id": "1", 
  "sentence0": "he put an elephant into the fridge", 
  "sentence1": "he put a turkey into the fridge", 
  "false": 0, 
  "A": "an elephant is much bigger than a fridge", 
  "B": "elephants are usually gray while fridges are usually white", 
  "C": "an elephant cannot eat a fridge", 
  "reason": "A"


We will hold a contest in SemEval2020 based on this study. This contest will provide a Training Dataset of these tasks.

You can check task 4 - Commonsense Validation and Explanation in

Looking forward to your participation!


If you find our work helpful, you can cite

   title = "Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation",
   author = "Wang, Cunxiang  and
     Liang, Shuailong  and
     Zhang, Yue  and
     Li, Xiaonan  and
     Gao, Tian",
   booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
   month = jul,
   year = "2019",
   address = "Florence, Italy",
   publisher = "Association for Computational Linguistics",
   url = "",
   pages = "4020--4026",
   abstract = "Introducing common sense to natural language understanding systems has received increasing research attention. It remains a fundamental question on how to evaluate whether a system has the sense-making capability. Existing benchmarks measure common sense knowledge indirectly or without reasoning. In this paper, we release a benchmark to directly test whether a system can differentiate natural language statements that make sense from those that do not make sense. In addition, a system is asked to identify the most crucial reason why a statement does not make sense. We evaluate models trained over large-scale language modeling tasks as well as human performance, showing that there are different challenges for system sense-making.",
You can’t perform that action at this time.