TROVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks 🛠️

Setup

Install the required packages:

pip install -r requirements.txt

Tasks and datasets are organized as follows:

├── MATH
│   ├── algebra
│   ├── counting_and_probability
│   ├── geometry
│   ├── intermediate_algebra
│   ├── number_theory
│   ├── prealgebra
│   └── precalculus
├── TableQA
│   ├── TabMWP
│   ├── WTQ
│   └── HiTab
├── VQA
└── └── GQA

Running Experiments

Our Method: TroVE

python run_trove.py --task_name "math/algebra"

For MATH tasks, specify the task name as math/${dataset_name}, e.g., math/algebra.
For TableQA and VQA tasks, directly used the dataset name: [tabmwp, wtq, hitab, gqa].

Note that the specified --task_name argument should be lowercased.

Baseline Methods: Primitive & Instance

python baseline.py --task_name "math/algebra" --suffix "primitive"  # or "instance"

Note that for GQA dataset, we implement the locate_objects and visual_qa functions as fast apis. So you need to launch the server first (as below), then run the trove/baseline experiments.

uvicorn server.gqa:app

Evaluation

python -m utils.eval --results_path ${RESULTS_PATH}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

prompt

prompt

server

server

toolbox

toolbox

utils

utils

README.md

README.md

baseline.py

baseline.py

requirements.txt

requirements.txt

run_trove.py

run_trove.py

Repository files navigation

TROVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks 🛠️

Setup

Running Experiments

Our Method: TroVE

Baseline Methods: Primitive & Instance

Evaluation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
prompt		prompt
server		server
toolbox		toolbox
utils		utils
README.md		README.md
baseline.py		baseline.py
requirements.txt		requirements.txt
run_trove.py		run_trove.py

zorazrw/trove

Folders and files

Latest commit

History

Repository files navigation

TROVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks 🛠️

Setup

Running Experiments

Our Method: TroVE

Baseline Methods: Primitive & Instance

Evaluation

About

Topics

Resources

Stars

Watchers

Forks

Languages