GitHub - yale-nlp/KnowledgeMath: Data and Code for the paper "KnowledgeMath: Knowledge-Intensive Math Word Problem Solving in Finance Domains"

KnowledgeMath

The data and code for the paper KnowledgeMath: Knowledge-Intensive Math Word Problem Solving in Finance Domains. This research investigates the table-to-text capabilities of different LLMs using four datasets within two real-world information seeking scenarios. It demonstrates that high-performing LLMs, such as GPT-4, can effectively serve as table-to-text generators, evaluators, and feedback generators. KnowledgeMath is a knowledge-intensive dataset focused on mathematical reasoning within the domain of finance. It requires the model to comprehend specialized financial terminology and to interpret tabular data presented in the questions. KnowledgeMath includes 1200 QA examples across 7 key areas in finance. These examples were collected from financial experts and feature detailed solution annotations in Python format.

KnowledgeMath Dataset

All the data examples were divided into two subsets: validation and test.

validation: 200 examples used for model development, validation, or for those with limited computing resources.
test: 1000 examples for standard evaluation. We will not publicly release the annotated solution and answer for the test set.

You can download this dataset by the following command:

from datasets import load_dataset

dataset = load_dataset("yale-nlp/KnowledgeMath")

# print the first example on the validation set
print(dataset["validation"][0])

# print the first example on the test set
print(dataset["test"][0])

The dataset is provided in json format and contains the following attributes:

{
    "question_id": [string] The question id,
    "question": [string] The question text,
    "tables": [list] List of Markdown-format tables associated with the question, 
    "python_solution": [string] Python-format and executable solution by financial experts. The code is written in a clear and executable format, with well-named variables and a detailed explanation,
    "ground_truth": [float] Executed result of `python solution`, rounded to three decimal places,
    "topic": [string] The related financial area of the question,
    "knowledge_terms": [list] List of knowledge terms in our constructed knowledge bank that is necessary to answer the given question. We will release this feature upon paper publication
}

Experiments

Environment Setup

The code is tested on the following environment:

python 3.11.5
CUDA 12.1, PyTorch 2.1.1
run pip install -r requirements.txt to install all the required packages

LLM Inference on KnowledgeMath

We provide inference scripts for running various LLMs on KnowledgeMath:

scripts/run_openai.sh for running GPT-* models
scripts/run_gemini.sh for running Gemini Pro
scripts/run_vllm.sh for running all other open-sourced LLMs (e.g., Llama-2, Mistral, Falcon) that are reported in the paper and supported by the vLLM framework

Model Output

The Chain-of-Thought (CoT) and Program-of-Thought (PoT) output from various LLMs on both the validation and test sets of KnowledgeMath can be found at the outputs directory.

Automated Evaluation

We develop a heuristic-based method to automatically evaluate the accuracy of CoT and PoT outputs:

scripts/evaluate_cot_output.sh for evaluating CoT outputs
scripts/evaluate_pot_output.sh for evaluating PoT outputs

To get the results on the test set, please send your result json file to this email (see the leaderboard section below for more details).

KnowledgeMath Leaderboard

The leaderboard is continuously being updated. To submit your results to leaderboard, please send your result json file on the test set to this email. Please follow the format of the sample submission file.

Contact

For any issues or questions, kindly email us at: Yilun Zhao (yilun.zhao@yale.edu).

Citation

If you use the KnowledgeMath dataset in your work, please kindly cite the paper:

@misc{zhao2023knowledgemath,
      title={KnowledgeMath: Knowledge-Intensive Math Word Problem Solving in Finance Domains}, 
      author={Yilun Zhao and Hongjun Liu and Yitao Long and Rui Zhang and Chen Zhao and Arman Cohan},
      year={2023},
      eprint={2311.09797},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
figures		figures
outputs		outputs
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
evaluation.py		evaluation.py
requirements.txt		requirements.txt
run_llm.py		run_llm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figures

figures

outputs

outputs

scripts

scripts

utils

utils

.gitignore

.gitignore

README.md

README.md

evaluation.py

evaluation.py

requirements.txt

requirements.txt

run_llm.py

run_llm.py

Repository files navigation

KnowledgeMath

KnowledgeMath Dataset

Experiments

Environment Setup

LLM Inference on KnowledgeMath

Model Output

Automated Evaluation

KnowledgeMath Leaderboard

Contact

Citation

About

Releases

Packages

Languages

yale-nlp/KnowledgeMath

Folders and files

Latest commit

History

Repository files navigation

KnowledgeMath

KnowledgeMath Dataset

Experiments

Environment Setup

LLM Inference on KnowledgeMath

Model Output

Automated Evaluation

KnowledgeMath Leaderboard

Contact

Citation

About

Resources

Stars

Watchers

Forks

Languages