Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?

📃Paper | 💾Dataset

😎: This is the official implementation repository of our study on co-temporal reasoning capabilities in Large Language Models (LLMs), accepted at ACL 2024 Main Conference.

We propose the CotempQA, the first comprehensive co-temporal Question Answering (QA) benchmark containing four co-temporal scenarios (Equal, Overlap, During, Mix) with 4,748 samples for evaluating the co-temporal comprehension and reasoning abilities of LLMs.

📊 Leaderboard

Model	Overall	Equal	Overlap	During	Mix
Human	92.8	97.7	92.3	84.5	92.1
GPT-4	54.7	92.7	59.4	50.1	45.0
GPT-3.5-Turbo	38.9	62.8	44.3	37.2	23.4
WizardMath-70B	30.1	41.8	28.6	31.3	16.6
LLaMA2-70B	22.2	26.8	21.2	21.4	23.8
CodeLLaMA-34B	20.0	31.3	18.4	18.3	22.4
WizardCoder-34B	19.2	22.9	18.8	19.9	13.4
WizardMath-13B	14.4	26.4	13.0	14.4	6.6
CodeLLaMA-13B	12.4	18.0	10.6	11.3	15.9
LLaMA2-13B	13.8	21.2	13.7	12.8	14.0
WizardCoder-13B	13.9	12.4	12.4	14.6	12.6
WizardMath-7B	14.8	14.4	12.2	16.0	11.2
LLaMA2-7B	12.0	11.5	12.1	12.0	12.0
WizardCoder-7B	11.2	15.1	9.8	11.1	10.5
CodeLLaMA-7B	10.5	17.0	8.8	9.5	13.0

⚙️ Installation

To get started, clone this repository and install the required packages:

git clone https://github.com/zhaochen0110/Cotempqa.git
cd Cotempqa
pip install -r requirements.txt

🚧 Data Loading

Download CotempQA from this link or load it using the code below:

from datasets import load_dataset
dataset = load_dataset("Warrieryes/CotempQA")

💎 Quick Evaluation

Replicate our experimental results by running:

python inference.py --model_name "$model_name" \
--data_mode "$data_mode" \
--mode "$mode" \
--output_path "$output_path" \
--result_path "$result_path"

🏗️ Datasets Construction

Our framework can be generalized to other structured temporal databases. To facilitate further research, we provide a detailed pipeline to generate our dataset, from extracting data from Wikidata to creating CotempQA data.

Structuring Temporal Facts

We based our work on TempLAMA. First, install SLING and download the Wikidata KB. Use the following commands to structure facts:

bash install.sh <path_to_store_wikipedia_dumps>
bash prepare_data.sh <path_to_store_wikipedia_dumps> <path_to_store_events>

Extracting Co-temporal Facts

We categorize fact pairs into five scenarios based on the consistency or variation of $(\mathcal{S}, \mathcal{R}, \mathcal{O})$. Use the following script to extract co-temporal facts:

python construct_comtempqa.py --rawdata_path <rawdata_path>  \
                  --qid_path <qid_path> \
                  --subject_path <subject_path> \
                  --object_path <object_path> \
                  --output_path <output_path>

QA Pairs Construction

Construct QA pairs using the following command:

python add_cotemporal_expression.py

📬 Contact

For any questions or inquiries, please feel free to open an issue or contact us at [suzhaochen0110@gmail.com].

🤝 Contributing

We welcome contributions to CotempQA! If you have any suggestions or improvements, please open a pull request or contact us directly.

📜 License

This project is licensed under the Apache 2.0 license - see the LICENSE file for details.

Acknowledgements

This project is based on the work done in TempLAMA. Special thanks to their authors for valuable contributions.

Citation

If you find our work useful, please consider citing our paper:

@misc{su2024living,
      title={Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?}, 
      author={Zhaochen Su and Juntao Li and Jun Zhang and Tong Zhu and Xiaoye Qu and Pan Zhou and Yan Bowen and Yu Cheng and Min zhang},
      year={2024},
      eprint={2406.09072},
      archivePrefix={arXiv}
}

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
data_construct		data_construct
interval_templates		interval_templates
point_templates		point_templates
templates		templates
README.md		README.md
config.py		config.py
construct_dataset.sh		construct_dataset.sh
inference.py		inference.py
install.sh		install.sh
picture.png		picture.png
prepare_data.sh		prepare_data.sh
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?

📊 Leaderboard

⚙️ Installation

🚧 Data Loading

💎 Quick Evaluation

🏗️ Datasets Construction

Structuring Temporal Facts

Extracting Co-temporal Facts

QA Pairs Construction

📬 Contact

🤝 Contributing

📜 License

Acknowledgements

Citation

About

Releases

Packages

Contributors 2

Languages

zhaochen0110/Cotempqa

Folders and files

Latest commit

History

Repository files navigation

Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?

📊 Leaderboard

⚙️ Installation

🚧 Data Loading

💎 Quick Evaluation

🏗️ Datasets Construction

Structuring Temporal Facts

Extracting Co-temporal Facts

QA Pairs Construction

📬 Contact

🤝 Contributing

📜 License

Acknowledgements

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages