This is the repo for the paper Xiaomeng Ma, Lingyu Gao and Qihui Xu. ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of Mind. This paper is accepted by CoNLL 2023.
In the paper, we generated theory-of-mind tests from the classic Sally-Anne and Smarties experiments. The details can be found in the Generate Data notebook. The generated data is in Data folder.
We used davinci, turbo and gpt4 to generate answers. The details can be found in Generate Models' Answers notebook. The results are in the Results folder.
In the paper we also proposed an autograder to evaluate the open-ended generations. The details can be found in the Auto Grader notebook.