# Fine Tuning Qwen 2.5 7B

Esse fine tuning tem como objetivo refinar o modelo LLM Qwen 2.5 7B para a tarefa de geração de descrição a partir do título de um produto.

O dataset utilizado é uma amostra aleatória de 1000 produtos do dataset fornecido pela FIAP que contém produtos da Amazon.

Conectando o Google Drive para salvar o modelo treinado.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Instala as biblitecas necessárias para o fine tunning com Unsloth.

In [None]:
%%capture
!pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

Carrega o dataset a ser utilizado para o fine tunning.

Exibe a primeira linha.

In [None]:
from datasets import load_dataset

DRIVE_PATH = '/content/drive/MyDrive/FIAP/Qwen2.5'

dataset = load_dataset('csv', data_files=f"{DRIVE_PATH}/../data-1000.csv", split = "train")

dataset[0]

Generating train split: 0 examples [00:00, ? examples/s]

{'title': "Diver: A Royal Navy and Commercial Diver's Journey Through Life, and Around the World",
 'content': "Anyone who has ever been affiliated with a military underwater demolition team or had the desire to enlist in one will not be able to put downDiverby Tony Groom. . . . Much of his first-person narrative is a candid look at adventures, relationships, personal triumphs and failures of a man whose job is to install or defuse explosives in cold dark waters. It demands nerves of steel, since one slight error would be the last error.(Northeast Dive News)Diveris the story of Tony Groom, a man who enlisted at the age of seventeen to become a diver for the Royal Navy. Serving countless years under extremely dangerous conditions, he speaks on his past career and his current career as a commercial diver.A tale of a man who truly loves what he does when no one else would think of doing it,Diveris an enthusiastically recommended tale.(Bookwatch2009-02-01)Wide-ranging, illuminating and sym

Configura o modelo a ser utilizado.

In [None]:
from unsloth import FastLanguageModel
import torch

max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen2.5-7B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.3.9: Fast Qwen2 patching. Transformers: 4.48.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json:   0%|          | 0.00/106k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/4.72k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/617 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

Preparação do prompt para o fine tunning.

In [None]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Given a product title, generate a detailed and persuasive description highlighting its key features and benefits.

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token

def formatting_prompts_func(examples):
    inputs       = examples["title"]
    outputs      = examples["content"]
    texts = []
    for input, output in zip(inputs, outputs):
        text = alpaca_prompt.format(input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

dataset = dataset.map(formatting_prompts_func, batched = True,)

Map:   0%|          | 0/1232 [00:00<?, ? examples/s]

Exibe a primeira linha para verificação

In [None]:
dataset[0]

{'title': "Diver: A Royal Navy and Commercial Diver's Journey Through Life, and Around the World",
 'content': "Anyone who has ever been affiliated with a military underwater demolition team or had the desire to enlist in one will not be able to put downDiverby Tony Groom. . . . Much of his first-person narrative is a candid look at adventures, relationships, personal triumphs and failures of a man whose job is to install or defuse explosives in cold dark waters. It demands nerves of steel, since one slight error would be the last error.(Northeast Dive News)Diveris the story of Tony Groom, a man who enlisted at the age of seventeen to become a diver for the Royal Navy. Serving countless years under extremely dangerous conditions, he speaks on his past career and his current career as a commercial diver.A tale of a man who truly loves what he does when no one else would think of doing it,Diveris an enthusiastically recommended tale.(Bookwatch2009-02-01)Wide-ranging, illuminating and sym

Configura o treinamento do modelo.

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",
    ),
)

Tokenizing to ["text"] (num_proc=2):   0%|          | 0/1232 [00:00<?, ? examples/s]

Inicializa o treinamento.

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,232 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 40,370,176/5,063,120,384 (0.80% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,2.2941
2,2.1465
3,2.1046
4,2.0288
5,1.9052
6,1.9235
7,1.9443
8,1.6091
9,1.7784
10,1.5982


Executa a inferência para gerar a descrição a partir do título.

In [None]:
FastLanguageModel.for_inference(model)

question = "CISCO C100"

inputs = tokenizer(
[
    alpaca_prompt.format(
        question,
        "",
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

['Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGiven a product title, generate a detailed and persuasive description highlighting its key features and benefits.\n\n### Input:\nCISCO C100\n\n### Response:\nCISCO C100 is a 100% genuine Cisco C100 replacement. It is a high quality, low cost alternative to the original Cisco C100. It is a 100% compatible replacement for the original Cisco C100. It is a 10']

Salva o modelo treinado no Google Drive.

In [None]:
model.save_pretrained(f"{DRIVE_PATH}/lora_model-describe")
tokenizer.save_pretrained(f"{DRIVE_PATH}/lora_model-describe")

('/content/drive/MyDrive/FIAP/FT-Qwen2.5/lora_model/tokenizer_config.json',
 '/content/drive/MyDrive/FIAP/FT-Qwen2.5/lora_model/special_tokens_map.json',
 '/content/drive/MyDrive/FIAP/FT-Qwen2.5/lora_model/vocab.json',
 '/content/drive/MyDrive/FIAP/FT-Qwen2.5/lora_model/merges.txt',
 '/content/drive/MyDrive/FIAP/FT-Qwen2.5/lora_model/added_tokens.json',
 '/content/drive/MyDrive/FIAP/FT-Qwen2.5/lora_model/tokenizer.json')

Salva o modelo quantizado em 4Q no Google Drive.

A partir do arquivo GGUF será possível carregar através de Ollama ou LMStudio em ambiente local.

In [None]:
model.save_pretrained_gguf(f"{DRIVE_PATH}/model", tokenizer, quantization_method = "q4_k_m")

Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 7.5G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 3.92 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 36%|███▌      | 10/28 [00:00<00:01, 12.34it/s]
We will save to Disk and not RAM now.
100%|██████████| 28/28 [02:25<00:00,  5.21s/it]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving /content/drive/MyDrive/FIAP/FT-Qwen2.5/model/pytorch_model-00001-of-00004.bin...
Unsloth: Saving /content/drive/MyDrive/FIAP/FT-Qwen2.5/model/pytorch_model-00002-of-00004.bin...
Unsloth: Saving /content/drive/MyDrive/FIAP/FT-Qwen2.5/model/pytorch_model-00003-of-00004.bin...
Unsloth: Saving /content/drive/MyDrive/FIAP/FT-Qwen2.5/model/pytorch_model-00004-of-00004.bin...
Done.


Unsloth: Converting qwen2 model. Can use fast conversion = False.


==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF 16bits might take 3 minutes.
\        /    [2] Converting GGUF 16bits to ['q8_0'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: CMAKE detected. Finalizing some steps for installation.
Unsloth: [1] Converting model at /content/drive/MyDrive/FIAP/FT-Qwen2.5/model into q8_0 GGUF format.
The output location will be /content/drive/MyDrive/FIAP/FT-Qwen2.5/model/unsloth.Q8_0.gguf
This might take 3 minutes...
INFO:hf-to-gguf:Loading model: model
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'pytorch_model.bin.index.json'
INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00001-of-00004.bin'
INFO:hf-to