Skip to content

Code and data for "Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?" (ACL 2024)

Notifications You must be signed in to change notification settings

zhaochen0110/Cotempqa

Repository files navigation

Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?

image1

📃Paper | 💾Dataset

😎: This is the official implementation repository of our study on co-temporal reasoning capabilities in Large Language Models (LLMs), accepted at ACL 2024 Main Conference.


We propose the CotempQA, the first comprehensive co-temporal Question Answering (QA) benchmark containing four co-temporal scenarios (Equal, Overlap, During, Mix) with 4,748 samples for evaluating the co-temporal comprehension and reasoning abilities of LLMs.

📊 Leaderboard

Model Overall Equal Overlap During Mix
Human 92.8 97.7 92.3 84.5 92.1
GPT-4 54.7 92.7 59.4 50.1 45.0
GPT-3.5-Turbo 38.9 62.8 44.3 37.2 23.4
WizardMath-70B 30.1 41.8 28.6 31.3 16.6
LLaMA2-70B 22.2 26.8 21.2 21.4 23.8
CodeLLaMA-34B 20.0 31.3 18.4 18.3 22.4
WizardCoder-34B 19.2 22.9 18.8 19.9 13.4
WizardMath-13B 14.4 26.4 13.0 14.4 6.6
CodeLLaMA-13B 12.4 18.0 10.6 11.3 15.9
LLaMA2-13B 13.8 21.2 13.7 12.8 14.0
WizardCoder-13B 13.9 12.4 12.4 14.6 12.6
WizardMath-7B 14.8 14.4 12.2 16.0 11.2
LLaMA2-7B 12.0 11.5 12.1 12.0 12.0
WizardCoder-7B 11.2 15.1 9.8 11.1 10.5
CodeLLaMA-7B 10.5 17.0 8.8 9.5 13.0

⚙️ Installation

To get started, clone this repository and install the required packages:

git clone https://github.com/zhaochen0110/Cotempqa.git
cd Cotempqa
pip install -r requirements.txt

🚧 Data Loading

Download CotempQA from this link or load it using the code below:

from datasets import load_dataset
dataset = load_dataset("Warrieryes/CotempQA")

💎 Quick Evaluation

Replicate our experimental results by running:

python inference.py --model_name "$model_name" \
--data_mode "$data_mode" \
--mode "$mode" \
--output_path "$output_path" \
--result_path "$result_path" 

🏗️ Datasets Construction

Our framework can be generalized to other structured temporal databases. To facilitate further research, we provide a detailed pipeline to generate our dataset, from extracting data from Wikidata to creating CotempQA data.

Structuring Temporal Facts

We based our work on TempLAMA. First, install SLING and download the Wikidata KB. Use the following commands to structure facts:

bash install.sh <path_to_store_wikipedia_dumps>
bash prepare_data.sh <path_to_store_wikipedia_dumps> <path_to_store_events>

Extracting Co-temporal Facts

We categorize fact pairs into five scenarios based on the consistency or variation of $(\mathcal{S}, \mathcal{R}, \mathcal{O})$. Use the following script to extract co-temporal facts:

python construct_comtempqa.py --rawdata_path <rawdata_path>  \
                  --qid_path <qid_path> \
                  --subject_path <subject_path> \
                  --object_path <object_path> \
                  --output_path <output_path>

QA Pairs Construction

Construct QA pairs using the following command:

python add_cotemporal_expression.py

📬 Contact

For any questions or inquiries, please feel free to open an issue or contact us at [suzhaochen0110@gmail.com].

🤝 Contributing

We welcome contributions to CotempQA! If you have any suggestions or improvements, please open a pull request or contact us directly.

📜 License

This project is licensed under the Apache 2.0 license - see the LICENSE file for details.

Acknowledgements

This project is based on the work done in TempLAMA. Special thanks to their authors for valuable contributions.

Citation

If you find our work useful, please consider citing our paper:

@misc{su2024living,
      title={Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?}, 
      author={Zhaochen Su and Juntao Li and Jun Zhang and Tong Zhu and Xiaoye Qu and Pan Zhou and Yan Bowen and Yu Cheng and Min zhang},
      year={2024},
      eprint={2406.09072},
      archivePrefix={arXiv}
}