Extension of this paper. Finetuned gpt-3.5-turbo-0125
to use a deductive approach when solving math problems.
pip install -r requirements.txt
OPENAI_API_KEY
: api keyGPT_MODEL
: id of finetuned model
data/mawps-steps-train.json
anddata/ft-mawps-train.json
(MAWPs training)data/mawps-steps-dev.json
anddata/ft-mawps-dev.json
(MAWPs validation)data/mawps-dev.json
(MAWPs validation)data/svamp.json
(SVAMP validation)
The first two datasets (mawps-steps
) were used during training and had numbers masked. They were converted to ft-mawps
dataset were used for GPT training. The last two datasets did not have numbers masked and were used for the final evaluation.
Baseline model is vanilla gpt-3.5-turbo-0125
. Metric is percentage of problems correct.
MAWPs | SVAMP | |
---|---|---|
Baseline | 0.818 | 0.673 |
Fine-tuned | 0.882 | 0.717 |
MAWPs dataset:
# of Steps | Baseline | Fine-tuned |
---|---|---|
1 | 0.92 | 0.93 |
2 | 0.669 | 0.93 |
3 | 0.2 | 0.5 |
4 | 0 | 0.5 |
SVAMP dataset:
# of Steps | Baseline | Fine-tuned |
---|---|---|
0 | 0 | 0 |
1 | 0.73 | 0.73 |
2 | 0.46 | 0.67 |
- Try testing with RobustMath dataset
- Maybe build thing from scratch (BERT and custom architecture on top) (maybe)
- Look into more search approaches with LLM guidance?