<a href="https://colab.research.google.com/github/thibaud-perrin/paramete-efficient-finetuning/blob/main/notebooks/load_lora_adapter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-Tuning and Using LoRA Adapters for OPT Models

This notebook demonstrates how to fine-tune and use LoRA (Low-Rank Adaptation) adapters with OPT language models for efficient task-specific training. By leveraging LoRA and pre-trained adapters from the Hugging Face Hub, we show how to:

1. Load a pre-trained OPT model with LoRA adapters.
2. Use efficient 8-bit training with `bitsandbytes` for faster and memory-optimized model loading.
3. Tokenize input text and generate task-specific outputs using the fine-tuned model.

This notebook is ideal for exploring lightweight fine-tuning techniques on large language models while maintaining high performance and low computational overhead.


## Libraries

In [1]:
!pip install -q bitsandbytes datasets accelerate loralib
!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.7/69.7 MB[0m [31m28.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m30.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.3/179.3 kB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.5/143.5 kB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.8/194.8 kB[0m [31m16.6 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompat

In [2]:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
import torch.nn as nn
import bitsandbytes as bnb
import transformers

from datasets import load_dataset
from peft import LoraConfig, get_peft_model, PeftModel, PeftConfig
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig


In [3]:
print(torch.cuda.is_available())

True


## Load adapters from the Hub

In [4]:
peft_model_id = "ybelkada/opt-6.7b-lora"
config = PeftConfig.from_pretrained(peft_model_id)
quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    quantization_config=quantization_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)

adapter_config.json:   0%|          | 0.00/332 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/651 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/41.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.96G [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/44.0k [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/3.36G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]



adapter_model.bin:   0%|          | 0.00/33.6M [00:00<?, ?B/s]

## Test Model

In [16]:
batch = tokenizer("Machine learning is:\n", return_tensors='pt').to(model.device.type)

with torch.amp.autocast(model.device.type):
  output_tokens = model.generate(
      **batch,
      max_new_tokens=100,
      do_sample=True,
      temperature=0.7,
      top_p=0.9
  )

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))



 Machine learning is:

A subset of statistics and machine learning that uses computers to learn and improve.

The process of training a machine to perform a task or function by providing it with examples of that task or function and having it find patterns or relationships among those examples.

The act of teaching a machine to perform a task or function.

The process of teaching a machine to perform a task or function.

The process of teaching a machine to perform a task or function.

The process


In [19]:
batch = tokenizer("who was the french president in 2007?\n The french president was", return_tensors='pt').to(model.device.type)

with torch.amp.autocast(model.device.type):
  output_tokens = model.generate(
      **batch,
      max_new_tokens=100,
      do_sample=True,
      temperature=0.7,
      top_p=0.9
  )

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))



 who was the french president in 2007?
 The french president was Nicolas Sarkozy.
I was thinking of the guy who lost to Sarkozy in 2007.
Ah, I see. I think that was François Mitterrand.
