# Finetune LLMs

In this notebook, we show users how to finetune our own large language models on Google colab.

We go through three main sections:
1. Install the library
2. Finetuning the model (using our `TextClassificationTransformersFinetuneEngine`)
2. Run inference on the fine tuned model.

## Installation

1. Clone the repository:
    ```
    git clone https://github.com/tigerrag/tiger.git
    ``` 
    
2. Upload it to Google Drive. Make sure tiger repository is in the root directory in Google Drive.

3. "T4 GPU" with 15GB GPU RAM should be sufficient with the sample datasets.

## Run Finetuning

Fine tuned model is uploaded to Google Drive automatically, and stored in the directory called "exp_finetune_classification". You can customize it by setting the model_output_path parameter.

In [None]:
# Mount Google Drive.
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Point default path to TigerTune in Google Drive
import sys
sys.path.append('/content/drive/MyDrive/tiger/TigerTune')

In [None]:
# Install required packages
!pip install -q  accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7 pyyaml h5py scikit-plot

In [None]:
# Import TigerTune
from tigertune.finetuning import TextClassificationTransformersFinetuneEngine

In [10]:
# Create an instance of TextClassificationTransformersFinetuneEngine. 
finetune_engine = TextClassificationTransformersFinetuneEngine(
    base_model_id="distilbert-base-uncased",
    notebook_mode=True
)

In [None]:
# Start fine tuning. 
finetune_engine.finetune(
    training_dataset="/content/drive/MyDrive/tiger/TigerTune/datasets/classification/training",
    validation_dataset="/content/drive/MyDrive/tiger/TigerTune/datasets/classification/validation",
    model_output_path="/content/drive/MyDrive/exp_finetune_classification/model")

## Evaluate Finetuned Model

In this section, we evaluate the fine tuned model.

We show that finetuning on classification dataset significantly improve upon an opensource model.

In [None]:
finetune_engine.evaluate(
        eval_dataset="/content/drive/MyDrive/tiger/TigerTune/datasets/classification/test_dataset.csv",
        eval_output_path="/content/drive/MyDrive/exp_finetune_classification/eval_result")