# Exercice 1 : Charger un modèle quantifié et préparer le fine-tuning

## Objectif 
Charger un modèle de langage pré-entraîné, appliquer une quantification en 4 bits et préparer le modèle pour le fine-tuning avec LoRA.

## Étapes 
1. Choisir un modèle pré-entraîné (par exemple : `gpt2`, `LLaMA`, ou `falcon`).
2. Quantifier le modèle en 4 bits avec la fonction appropriée.
3. Préparer le modèle pour le fine-tuning avec LoRA en ciblant les couches `q_proj` et `v_proj`.

## Question 
- Quels sont les avantages de la quantification en 4 bits par rapport à la quantification classique en 8 bits ?


In [5]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model

In [None]:
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)

In [7]:
# Quantification du modele en 4 bits
model_quantified = prepare_model_for_kbit_training(model)

tokenizer = AutoTokenizer.from_pretrained(model_name)

print(model_quantified)

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)


In [8]:
# configurer LoRa

lora_config = LoraConfig(r = 8,
                         lora_alpha=32,
                         target_modules=["c_attn", "c_proj"],
                         lora_dropout=0.1)

model_with_lora = get_peft_model(model_quantified, peft_config=lora_config)

print(model_with_lora)

PeftModel(
  (base_model): LoraModel(
    (model): GPT2LMHeadModel(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2SdpaAttention(
              (c_attn): lora.Linear(
                (base_layer): Conv1D(nf=2304, nx=768)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (l



### Exemple de finetuning avec cette configuration

In [10]:
from datasets import load_dataset
from transformers import Trainer, TrainingArguments

In [11]:
from datasets import load_dataset

dataset_name = "imdb"
data = load_dataset(dataset_name)
data

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Generating train split: 100%|██████████| 25000/25000 [00:00<00:00, 285688.29 examples/s]
Generating test split: 100%|██████████| 25000/25000 [00:00<00:00, 339977.63 examples/s]
Generating unsupervised split: 100%|██████████| 50000/50000 [00:00<00:00, 347856.78 examples/s]


DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

In [21]:
def encoded_dataset(donnee):
    return tokenizer(donnee["text"], padding="max_length", truncation=True, max_length=512)

tokenizer.pad_token = tokenizer.eos_token

# Appliquer l'encodage sur les jeux de données d'entraînement et de test
encoded_dataset = data.map(encoded_dataset, batched=True)

Map: 100%|██████████| 25000/25000 [00:07<00:00, 3334.76 examples/s]


In [27]:
encoded_dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label', 'input_ids', 'attention_mask'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label', 'input_ids', 'attention_mask'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label', 'input_ids', 'attention_mask'],
        num_rows: 50000
    })
})

In [33]:
encoded_dataset["train"][0]["label"]

0

In [None]:
# Params d'entraînement
training_args = TrainingArguments(
    per_device_train_batch_size=8,
    per_device_eval_batch_size= 8,
    output_dir="./results",
    num_train_epochs=3,
    eval_strategy="epoch",
    learning_rate=5e-5
)

# Créer un Trainer pour fine-tuner le modèle
trainer = Trainer(
    model=model_with_lora,
    args=training_args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["test"],
    tokenizer = tokenizer
)

trainer.train()