### Fine-tuning un LLM grâce à LoRA

#### Rappels théoriques: 

TODO: LoRA




#### PEFT - HuggingFace:

PEFT (Parameter-Efficient Fine-Tuning) is a library for efficiently adapting large pretrained models to various downstream applications without fine-tuning all of a model’s parameters because it is prohibitively costly. PEFT methods only fine-tune a small number of (extra) model parameters - significantly decreasing computational and storage costs - while yielding performance comparable to a fully fine-tuned model. This makes it more accessible to train and store large language models (LLMs) on consumer hardware.

#### Workflow:

Loading the model + dataset -> Define LoRA config -> Create PEFT model -> Training -> Save the model (pushing to HF) -> Quantization (optional) -> Inference


In [3]:
# Wandb login

import wandb
from dotenv import load_dotenv, find_dotenv
import os
import getpass

load_dotenv(find_dotenv())

def get_api_key(env_var, prompt):
    if not os.getenv(env_var):
        os.environ[env_var] = getpass.getpass(prompt)

get_api_key("WANDB_API_KEY", "Enter your Weights & Biases API key: ")
wandb.login(key=os.getenv("WANDB_API_KEY"))

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/brikiyou/.netrc


True

In [None]:
from peft import LoraConfig, TaskType, get_peft_model
from transformers import AutoTokenizer, Qwen3ForSequenceClassification, default_data_collator, EarlyStoppingCallback, TrainingArguments, Trainer
import torch
import os 
import pandas as pd
from datasets import Dataset, DatasetDict

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
NUM_GPUS = torch.cuda.device_count()
BF16 = torch.cuda.is_bf16_supported()

os.environ["HF_HOME"] =  os.path.join(os.environ["SCRATCH"], "huggingface_cache")
os.environ["HF_HUB_CACHE"]       = os.path.join(os.environ["HF_HOME"], "hub")
os.environ["TRANSFORMERS_CACHE"] = os.path.join(os.environ["HF_HOME"], "models")
os.environ["HF_DATASETS_CACHE"]  = os.path.join(os.environ["HF_HOME"], "datasets")


cache_dir = os.environ["HF_HOME"]

# TODO: To change the following stuff 
data_dir = os.path.join(os.getcwd(), ) 
out_dir = os.path.join(os.getcwd(), "") 

os.mkdir(out_dir, exist_ok=True)




Loading the model and tokenizer:

In [None]:
model_name = "Qwen/Qwen3-1.7B"
model = Qwen3ForSequenceClassification.from_pretrained(model_name, 
                                                       dtype=torch.float32,
                                                       device_map="auto",
                                                       num_labels=3,
                                                       trust_remote_code=True,
                                                       cache_dir=cache_dir)
tokenizer = AutoTokenizer.from_pretrained(model_name,
                                         cache_dir=cache_dir)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Some weights of Qwen3ForSequenceClassification were not initialized from the model checkpoint at Qwen/Qwen3-1.7B and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [4]:
print(model)

Qwen3ForSequenceClassification(
  (model): Qwen3Model(
    (embed_tokens): Embedding(151936, 2048)
    (layers): ModuleList(
      (0-27): 28 x Qwen3DecoderLayer(
        (self_attn): Qwen3Attention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=1024, bias=False)
          (v_proj): Linear(in_features=2048, out_features=1024, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (q_norm): Qwen3RMSNorm((128,), eps=1e-06)
          (k_norm): Qwen3RMSNorm((128,), eps=1e-06)
        )
        (mlp): Qwen3MLP(
          (gate_proj): Linear(in_features=2048, out_features=6144, bias=False)
          (up_proj): Linear(in_features=2048, out_features=6144, bias=False)
          (down_proj): Linear(in_features=6144, out_features=2048, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): Qwen3RMSNorm((2048,), eps=1e-06)
        (post_a

### Quoi modifier ici 
Tous les élements de self attention mechanisme + mlp à part act_fn

Maintenant nous devons utiliser LoraConfig

In [5]:
peft_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=[
        "q_proj","v_proj","k_proj","o_proj",
        "gate_proj","up_proj","down_proj",
        #"lora_magnitude_vector"  # if using DoRa
    ],
    lora_dropout=0.01,
    bias="none",
    task_type=TaskType.SEQ_CLS,
    #use_dora=True,
)

lora_model = get_peft_model(model, peft_config)

## Loading Data

In [None]:

def get_training_files(data_aug:bool=False, *args):
    """
    Returns training files we're going to use

    Args:
        data_aug (bool, optional): _description_. Defaults to False.

    Returns:
        _type_: _description_
    """
    if data_aug:
        return [
            "Sentences_75Agree_utf8.txt",
            "Sentences_AllAgree_utf8.txt",
            "Augmented_Sentences_utf8.txt",
            args[:] # To change here 
        ]
    else:
         return [
            "Sentences_75Agree_utf8.txt",
            "Sentences_AllAgree_utf8.txt",
            args[:]
        ]
         
         
training_files = get_training_files(data_aug=False)
test_file = "Sentences_50Agree_utf8.txt"