Skip to content

wolfecameron/lora_instruction_tune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LoRA Instruction Tuning

This repo contains some simple Python code (based upon HuggingFace) for instruction tuning common LLMs with LoRA/QLoRA. The repo contains training code, as well as several different scripts for evaluating model generations.

SetupDetailsUsageFuture Work

Setup

Install necessary dependencies as follows

> conda create -n lora_tuning python=3.11 anaconda
> conda activate lora_tuning
> pip install -r requirements.txt

Details

The repo support instruction tuning with LoRA and QLoRA, based upon the PEFT (from HuggingFace) and bitsandbytes. Currently, the example scripts instruction tune the Mistral-7B model, though other models can be specified via the --model_name_or_path argument.

A breakdown of the main files within the resposity is as follows...

File Description
train.py Main training code
generate.py Script for examining model output
setup.py Functions for downloading and configuring models/tokenizers
data.py Code for configuring datasets
./scripts Scripts for training/evaluation
See all scripts...
./data Supplemental data files
See all files...

The training process supports either the Alpaca or Assistant Chatbot dataset. Evaluation is performed using the set of questions proposed for evaluating Vicuna (see here). However, model outputs can be observed over arbitrary datasets by leveraging the generate.py script. The training process logs all metrics to wandb (assuming --report_to wandb is specified in the arguments), as well as generates model outputs for the vicuna evaluation set that are logged to wandb at the end of training.

Usage

Example scripts are located in the ./scripts folder and can be run as follows:

> bash ./scripts/train.sh
> bash ./scripts/generate.sh

These scripts can also be customized by tweaking their arguments. See args.py for a full list of arguments for the model, training, data, and generation.

Future Work

This repository is very simplistic for now. Future efforts will likely include:

  • Expansion to more datasets (for training and evaluation)
  • Implementing an LLM-as-a-judge style evaluation pipeline
  • Adding evaluation on MMLU
  • Try out LoRA+ with different learning rates for A + B matrices

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published