Skip to content

wangitu/Ada-Instruct

Repository files navigation

Ada-Instruct: Adapting Instruction Generators for Complex Reasoning

Code License Data License Python 3.10+

This is the repository for Ada-Instruct: Adapting Instruction Generators for Complex Reasoning.

Overview

Ada-Instruct is an adaptive instruction generator developed by fine-tuning open-source LLMs. With a mere 10 samples, Ada-Instruct generates long and high-quality instructions that maintain distributional consistency for complex reasoning tasks.

Ada-Instruct

Installation

Create conda environment

conda create -n adainst python=3.10 && conda activate adainst

Install dependencies

pip install -r requirements.txt

How to obtain massive training samples relying solely on 10 samples with Ada-Instruct?

1. Fine-tune an open-source LLM on few-shot initial samples

For Humaneval, we use the following fine-tuning format (Code LLAMA as our base model):

[INST] You are an expert Python programmer, complete the function below based on its docstring and the given test cases:\n{Question}\nYour code should start with a [PYTHON] tag and end with a [/PYTHON] tag. [/INST] [PYTHON]\n# pass\n[/PYTHON]

For MBPP, we use the following fine-tuning format (Code LLAMA as our base model):

[INST] You are an expert Python programmer, and here is your task: {Question}\nYour code should pass these tests:\n\n{Test Cases}\nYour code should start with a [PYTHON] tag and end with a [/PYTHON] tag. [/INST] [PYTHON]\n# pass\n[/PYTHON]

For GSM8k and MATH, we use the following fine-tuning format (LLAMA 2 as our base model):

[INST] You are expert at solving math problems that require multi-step reasoning, and here is your task:\n{Question} [/INST] Let’s think step by step.\n

For CommonsenseQA, we use the following fine-tuning format (LLAMA 2 as our base model):

[INST] You are expert at commonsense reasoning, and here is your task: {Question}\nA. {Text of Label A}\nB. {Text of Label B}\nC. {Text of Label C}\nD. {Text of Label D}\nE. {Text of Label E} [/INST] 

If you'd like to fine-tune on your own task (in the typical case of few-shot initial samples), generally, we recommend a training epoch of 40 and a learning rate of 1e-6. Lower learning rate may suit a harder task, while fewer training epochs may suit a easier task.

2. Generate instructions with the fine-tuned open-source LLM

Multi-GPU generation:

accelerate launch run_synthesize_instructions.py \
    --base_model <fine-tuned open-source LLM> \
    --task_name <specific task, currently support "humaneval", "mbpp", "gsm8k", "math", "csqa"> \
    --synthesize_num <how many instructions you desire> \
    --batch_size <batch size per gpu> \
    --out_file <path to the output file, will produce a json file>

A workaround for "RuntimeError: probability tensor contains either inf, nan or element < 0" is to alter to Single-GPU generation:

python run_synthesize_instructions.py \
    --base_model <fine-tuned open-source LLM> \
    --task_name <specific task, currently support "humaneval", "mbpp", "gsm8k", "math", "csqa"> \
    --synthesize_num <how many instructions you desire> \
    --batch_size <batch size per gpu> \
    --out_file <path to the output file, will produce a json file>

3. Generate labels for the instructions

Before label generation, you should put your openai api_keys in "available" field of "./openai_keys.json".

Run the following command for asynchronous label generation:

python run_complete_instructions.py \
    --task_name <specific task, currently support "humaneval", "mbpp", "gsm8k", "math", "csqa"> \
    --in_file <input file, produced in "2. Generate instructions with the fine-tuned open-source LLM">
    --out_file <path to the output file, will produce a json file>

4. (Optional) Verify the generated samples (for HumanEval and MBPP)

For HumanEval and MBPP, We regard those generated code correctly passing the test cases as correct samples. Run the following command to verify and get correct samples:

python run_verification.py \
    --task_name <specific task, currently support "humaneval", "mbpp"> \
    --in_file <input file, produced in "3. Generate labels for the instructions">
    --out_file <path to the output file, will produce a json file>
    --do_task_verification

In this paper, we report the results of training on samples that didn't go through this process.

Data

We release all data files (in data/ada-instruct directory) we generated and used to train LLMs in Step 3, including:

We also release the data files (in data/alpaca directory) for these tasks except CommonsenseQA using Self-Instruct (refined alpaca version):

Results

Here are the results of Ada-Instruct on standard benchmark datasets. All the results are obtained with 8 A800 GPUs (80GB). For more training and evaluation details, please refer to our paper.

Code Completion (pass@1)

Model Params HumanEval MBPP
Code LLAMA-Python (base) 13B 43.3 49.0
Ada-Instruct-HumanEval 13B 65.2 -
Ada-Instruct-MBPP 13B - 55.6

Math (pass@1)

Model Params GSM8k MATH
LLAMA 2 (base) 13B 28.7 3.9
Ada-Instruct-GSM8k 13B 48.7 -
Ada-Instruct-MATH 13B - 8.8

CommonsenseQA

Model Params CommonsenseQA
LLAMA 2 (base) 13B 59.0
Ada-Instruct-CSQA 13B 75.5

Citation

If you find this codebase useful in your research, please cite the following paper.

@article{cui2023ada,
  title={Ada-Instruct: Adapting Instruction Generators for Complex Reasoning},
  author={Cui, Wanyun and Wang, Qianle},
  journal={arXiv preprint arXiv:2310.04484},
  year={2023}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published