- Create a conda environment with
python=3.9and install packages viapip install -r requirements.txt - Adjust the batch_sz for generation here
- This script launches 16 runs for aime24/25/MATH_hard_test with the proxy-tuned model.
- Run
python -m eval.compute_metricsto get pass@1 on all finished generations.
Note on Output and Resuming:
- The model's predictions are saved incrementally to a file named
predictions.jsonlwithin the specified {save_dir}. This happens after each batch of generations is finished. - If the
predictions.jsonlfile already contains a certain number of predictions (let's say N), rerunning the launch command will automatically resume from the (N+1)th question, preventing redundant computations.
The output directory will look like this:
results/aime2024/dexperts-S1.5B-L32B/constant/
├── 1
│ ├── example_prompt.txt
│ ├── logits.log
│ ├── metrics.json
│ └── predictions.jsonl
├── 2
│ ├── example_prompt.txt
│ ├── logits.log
│ ├── metrics.json
│ └── predictions.jsonl
...