<a href="https://colab.research.google.com/github/wassimchouchen/Alpaca-style-Dataset-Generator/blob/main/Finetune_llama3_1_unsloth_6k_dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**This notebook presents fine-tuning of Llama-3-8b-Instruct on a medical psychology dataset, built from a corpus of books and articles on psychology.**

---



#**1-Install the dependencies**

In [1]:
!pip install torchvision torch=="2.2.1"

Collecting torch==2.2.1
  Downloading torch-2.2.1-cp310-cp310-manylinux1_x86_64.whl.metadata (26 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.2.1)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.2.1)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.2.1)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.2.1)
  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.2.1)
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.2.1)
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylin

In [2]:
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps trl peft accelerate bitsandbytes
!pip install "xformers==0.0.25"

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-55bbno9q/unsloth_669610a3a59d44f6a53a138fbea9bb9c
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-55bbno9q/unsloth_669610a3a59d44f6a53a138fbea9bb9c
  Resolved https://github.com/unslothai/unsloth.git to commit 228b3cf46ec4401b81194267ed0091eb62a56c6b
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting unsloth_zoo>=2024.11.1 (from unsloth@ git+https://github.com/unslothai/unsloth.git->unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Downloading unsloth_zoo-2024.11.7-py3-none-any.whl.metadata (16 kB)
Collecting tyro (from unsloth@ git+https://github.com/unslothai/unsloth.gi

Collecting xformers==0.0.25
  Downloading xformers-0.0.25-cp310-cp310-manylinux2014_x86_64.whl.metadata (1.0 kB)
Downloading xformers-0.0.25-cp310-cp310-manylinux2014_x86_64.whl (222.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m222.5/222.5 MB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: xformers
Successfully installed xformers-0.0.25


In [None]:
!pip install "xformers==0.0.25"

Collecting xformers==0.0.25
  Downloading xformers-0.0.25-cp310-cp310-manylinux2014_x86_64.whl.metadata (1.0 kB)
Collecting torch==2.2.1 (from xformers==0.0.25)
  Downloading torch-2.2.1-cp310-cp310-manylinux1_x86_64.whl.metadata (26 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.2.1->xformers==0.0.25)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.2.1->xformers==0.0.25)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.2.1->xformers==0.0.25)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.2.1->xformers==0.0.25)
  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.2.1->

In [None]:
from unsloth import FastLanguageModel
import json
import torch
from datasets import load_dataset
from huggingface_hub import notebook_login
from trl import SFTTrainer

from transformers import TrainingArguments


In [None]:
hf="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"
notebook_login()

#**2- Creating the config**

In [None]:
# Defining the configuration for the base model, LoRA and training
config = {
    "hugging_face_username":"wassimm",
    "model_config": {
        "base_model":"unsloth/Meta-Llama-3.1-8B", # The base model
        "finetuned_model":"Llama-3.1-8B-psycology-w-6k_data", # The fine-tuned model
        "max_seq_length": 100000, # The maximum sequence length
        "dtype":torch.float16, # The data type
        "load_in_4bit": False, # Load the model in 4-bit
    },
    "lora_config": {
      "r": 16, # The number of LoRA layers 8, 16, 32, 64
      "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"], # The target modules
      "lora_alpha":16, # The alpha value for LoRA
      "lora_dropout":0, # The dropout value for LoRA
      "bias":"none", # The bias for LoRA
      "use_gradient_checkpointing":True, # Use gradient checkpointing
      "use_rslora":False, # Use RSLora
      "use_dora":False, # Use DoRa
      "loftq_config":None # The LoFTQ configuration
    },
    "training_dataset":{
        "name":"wassimm/psychology_dataset_10k_with-prompts", # The dataset name(huggingface/datasets)
        "split":"train", # The dataset split
        "input_field":"prompt", # The input field
    },
     "training_config": {
        "per_device_train_batch_size": 2, # The batch size
        "gradient_accumulation_steps": 4, # The gradient accumulation steps
        "warmup_steps": 5, # The warmup steps
        "max_steps":0, # The maximum steps (0 if the epochs are defined)
        "num_train_epochs": 1, # The number of training epochs(0 if the maximum steps are defined)
        "learning_rate": 2e-4, # The learning rate
        "fp16": not torch.cuda.is_bf16_supported(), # The fp16
        "bf16": torch.cuda.is_bf16_supported(), # The bf16
        "logging_steps": 1, # The logging steps
        "optim" :"adamw_8bit", # The optimizer
        "weight_decay" : 0.01,  # The weight decay
        "lr_scheduler_type": "linear", # The learning rate scheduler
        "seed" : 42, # The seed
        "output_dir" : "outputs", # The output directory
    }
}

#**4- Loading model, tokenizer, configuration for LoRA(QLoRA), dataset, and trainer**

Importing the base model

In [None]:
def load_base_model():
  def get_device_map() -> str:
    return 'cuda' if torch.cuda.is_available() else 'cpu'
  device = get_device_map()  # 'cpu'

  model, tokenizer = FastLanguageModel.from_pretrained(
      model_name = config.get("model_config").get("base_model"),
      max_seq_length = config.get("model_config").get("max_seq_length"),
      dtype = config.get("model_config").get("dtype"),
      load_in_4bit = config.get("model_config").get("load_in_4bit"),
      device_map=device,
  )
  model = FastLanguageModel.get_peft_model(
      model,
      r = config.get("lora_config").get("r"),
      target_modules = config.get("lora_config").get("target_modules"),
      lora_alpha = config.get("lora_config").get("lora_alpha"),
      lora_dropout = config.get("lora_config").get("lora_dropout"),
      bias = config.get("lora_config").get("bias"),
      use_gradient_checkpointing = config.get("lora_config").get("use_gradient_checkpointing"),
      random_state = 42,
      use_rslora = config.get("lora_config").get("use_rslora"),
      use_dora = config.get("lora_config").get("use_dora"),
      loftq_config = config.get("lora_config").get("loftq_config"),
  )
  return model, tokenizer

Prepare the Trainer

In [8]:
def train_model(model,tokenizer):
    #load dataset
    dataset_train = load_dataset(config.get("training_dataset").get("name"), split = config.get("training_dataset").get("split"))
    # Setting up the trainer for the model
    trainer = SFTTrainer(
        model = model,
        tokenizer = tokenizer,
        train_dataset = dataset_train,
        dataset_text_field = config.get("training_dataset").get("input_field"),
        max_seq_length = config.get("model_config").get("max_seq_length"),
        dataset_num_proc = 2,
        packing = False,
        args = TrainingArguments(
            per_device_train_batch_size = config.get("training_config").get("per_device_train_batch_size"),
            gradient_accumulation_steps = config.get("training_config").get("gradient_accumulation_steps"),
            warmup_steps = config.get("training_config").get("warmup_steps"),
            max_steps = config.get("training_config").get("max_steps"),
            num_train_epochs= config.get("training_config").get("num_train_epochs"),
            learning_rate = config.get("training_config").get("learning_rate"),
            fp16 = config.get("training_config").get("fp16"),
            bf16 = config.get("training_config").get("bf16"),
            logging_steps = config.get("training_config").get("logging_steps"),
            optim = config.get("training_config").get("optim"),
            weight_decay = config.get("training_config").get("weight_decay"),
            lr_scheduler_type = config.get("training_config").get("lr_scheduler_type"),
            seed = 42,
            output_dir = config.get("training_config").get("output_dir"),
        ),
    )

    #Train the model
    trainer_stats = trainer.train()

    # Saving the trainer stats
    with open("trainer_stats.json", "w") as f:
        json.dump(trainer_stats, f, indent=4)

    # Locally saving the model and pushing it to the Hugging Face Hub (only LoRA adapters)
    model.save_pretrained(config.get("model_config").get("finetuned_model"))
    # model.push_to_hub(config.get("model_config").get("finetuned_model"), tokenizer = tokenizer)

    #Merging the Adapter with the base model
    model.save_pretrained_merged(config.get("model_config").get("finetuned_model"), tokenizer, save_method = "merged_16bit",)
    model.push_to_hub_merged(config.get("model_config").get("finetuned_model"), tokenizer, save_method = "merged_16bit")

    return model


In [9]:
def main():
  model, tokenizer = load_base_model()
  model = train_model(model, tokenizer)
  return model

In [10]:
wd="5c88e1091ee44dd83b4d2e8419af6e0d86a0a071"
main()

==((====))==  Unsloth 2024.11.9: Fast Llama patching. Transformers = 4.46.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.25. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

OutOfMemoryError: CUDA out of memory. Tried to allocate 1008.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 757.06 MiB is free. Process 21868 has 14.01 GiB memory in use. Of the allocated memory 13.89 GiB is allocated by PyTorch, and 17.52 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)