# Llama 3.2 fine tuning with size-color-text "bare" dataset

2025-02-22 10:10

After the failures to fine-tune with the size-color-text, I created a new dataset that removed all the repeating, matching parts of the SVG and markup and reduced the differences to a set of parameters organized in a JSON strong. Did 3 days of fine-funing but unfortunately I only got repeating garbage. It might have to do with the learning rate although not sure. I'm running another batch with a different learning rate and batch size. The batch size here was kept low because I reused the notebook from the size-color-text full dataset and its longer sequences but it could have been increased. 

In [1]:
!apt-get install build-essential -y

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  binutils binutils-common binutils-x86-64-linux-gnu bzip2 cpp cpp-11 dirmngr
  dpkg-dev fakeroot g++ g++-11 gcc gcc-11 gcc-11-base gnupg gnupg-l10n
  gnupg-utils gpg-agent gpg-wks-client gpg-wks-server gpgsm
  libalgorithm-diff-perl libalgorithm-diff-xs-perl libalgorithm-merge-perl
  libasan6 libatomic1 libbinutils libcc1-0 libctf-nobfd0 libctf0 libdpkg-perl
  libfakeroot libfile-fcntllock-perl libgcc-11-dev libgomp1 libisl23 libitm1
  libksba8 liblocale-gettext-perl liblsan0 libmpc3 libmpfr6 libnpth0
  libquadmath0 libstdc++-11-dev libtsan0 libubsan1 lto-disabled-list make
  patch pinentry-curses xz-utils
Suggested packages:
  binutils-doc bzip2-doc cpp-doc gcc-11-locales dbus-user-session
  pinentry-gnome3 tor debian-keyring g++-multilib g++-11-multilib gcc-11-doc
  gcc-multilib manpages-dev autoconf automake libtool flex bison gdb gcc

In [1]:
!pip uninstall torch torchvision torchaudio -y && pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

!pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

!pip install sacrebleu
!pip install pytest-playwright
!playwright install
!pip install matplotlib
!pip install pillow
!pip install torchvision
!pip install lpips

!playwright install-deps  

!pip install -U numpy
!pip install tensorboard

Found existing installation: torch 2.6.0+cu126
Uninstalling torch-2.6.0+cu126:
  Successfully uninstalled torch-2.6.0+cu126
Found existing installation: torchvision 0.21.0+cu126
Uninstalling torchvision-0.21.0+cu126:
  Successfully uninstalled torchvision-0.21.0+cu126
Found existing installation: torchaudio 2.6.0+cu126
Uninstalling torchaudio-2.6.0+cu126:
  Successfully uninstalled torchaudio-2.6.0+cu126
[0mLooking in indexes: https://download.pytorch.org/whl/cu126
Collecting torch
  Using cached https://download.pytorch.org/whl/cu126/torch-2.6.0%2Bcu126-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (28 kB)
Collecting torchvision
  Using cached https://download.pytorch.org/whl/cu126/torchvision-0.21.0%2Bcu126-cp311-cp311-linux_x86_64.whl.metadata (6.1 kB)
Collecting torchaudio
  Using cached https://download.pytorch.org/whl/cu126/torchaudio-2.6.0%2Bcu126-cp311-cp311-linux_x86_64.whl.metadata (6.6 kB)
Using cached https://download.pytorch.org/whl/cu126/torch-2.6.0%2Bcu126-cp311-cp311-

In [1]:
import os
import numpy as np
import pandas as pd

import torch
from trl import SFTTrainer, SFTConfig
from transformers import TrainingArguments, TextStreamer
from unsloth.chat_templates import get_chat_template
from unsloth import FastLanguageModel
from datasets import Dataset
from unsloth import is_bfloat16_supported

# Saving model
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Warnings
import warnings
warnings.filterwarnings("ignore")

    PyTorch 2.6.0+cu124 with CUDA 1204 (you have 2.6.0+cu126)
    Python  3.11.11 (you have 3.11.9)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details


ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!




In [2]:
max_seq_length = 5020

def load_model():
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name="unsloth/Llama-3.2-1B-bnb-4bit",
        max_seq_length=max_seq_length,
        load_in_4bit=True,
        dtype=None,
    )
    
    model = FastLanguageModel.get_peft_model(
        model,
        r=16,
        lora_alpha=16,
        lora_dropout=0,
        target_modules=["q_proj", "k_proj", "v_proj", "up_proj", "down_proj", "o_proj", "gate_proj"],
        use_rslora=True,
        use_gradient_checkpointing="unsloth",
        random_state = 32,
        loftq_config = None,
    )
    return model, tokenizer

In [3]:
def create_trainer(model, tokenizer, training_data, max_steps):
    training_arguments = SFTConfig(
        learning_rate=5e-5,
        lr_scheduler_type="linear",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=64,
        num_train_epochs=40,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        # max_steps=max_steps,
        optim="adamw_8bit",
        weight_decay=0.01,
        warmup_steps=10,
        output_dir="output",
        seed=0,
        save_total_limit=3,
        dataset_text_field="text",
        max_seq_length=max_seq_length,
        dataset_num_proc=10,
        packing=True,
    )

    if max_steps is not None:
        training_arguments.max_steps = max_steps
    
    return SFTTrainer(
        model=model,
        tokenizer=tokenizer,
        train_dataset=training_data,
        args=training_arguments,
    )

In [4]:
from json import JSONDecodeError
import numpy as np
from utils.similarity import calculate_metrics
from torch.utils.tensorboard import SummaryWriter
from PIL import Image
import torch
import json

log_dir = 'output/runs'
with open('size-color-text-page-compressed.html', 'r') as f:
    html_template = f.read()

def add_image_to_tensorboard(name, step, img_path):
    image = Image.open(img_path)
    image = image.convert('RGB')
    image_array = np.array(image)
    image_tensor = torch.from_numpy(image_array)
    image_tensor = image_tensor.permute(2, 0, 1)
    image_tensor = image_tensor.float() / 255.0
    
    writer = SummaryWriter(log_dir=log_dir)
    writer.add_image(name, image_tensor, step)
    
def add_text_to_tensorboard(name, step, text):
    writer = SummaryWriter(log_dir=log_dir)
    writer.add_text(name, text, step)

def postprocess_text(preds, labels):
    preds = [pred.strip().replace('<unk>', '') for pred in preds]
    labels = [[label.strip().replace('<unk>', '')] for label in labels]

    return preds, labels

def apply_to_templates(text, template):
    try:
        variables = json.loads(text)
    except JSONDecodeError:
        return None

    if not isinstance(variables, dict):
        return None
    
    for variable_name, variable_value in variables.items():
        template = template.replace('{{' + variable_name + '}}', str(variable_value))

    return template

def compute_metrics(decoded_predictions, decoded_labels, steps):
    similarity_scores = []
    perceptual_losses = []
    index = 1
    
    for prediction, label in zip(decoded_predictions, decoded_labels):
        prediction = prediction.replace(tokenizer.eos_token, '')
        
        add_text_to_tensorboard(f'valid_{index}_label_text', steps, label)
        add_text_to_tensorboard(f'valid_{index}_prediction_text', steps, prediction)
        
        applied_label = apply_to_templates(label, html_template)
        applied_prediction = apply_to_templates(prediction, html_template)

        if applied_label is None or applied_prediction is None:
            metrics = None
        else:
            add_text_to_tensorboard(f'valid_{index}_label_text_applied', steps, applied_label)
            add_text_to_tensorboard(f'valid_{index}_prediction_text_applied', steps, applied_prediction)
            metrics = calculate_metrics(
                applied_label, 
                applied_prediction
            )
        
        if metrics is not None:
            similarity_scores.append(metrics['similarity'])
            perceptual_losses.append(metrics['perceptual_loss'])
            
            add_image_to_tensorboard(f'valid_{index}_expectation', steps, metrics['expected_screenshot_path'])
            add_image_to_tensorboard(f'valid_{index}_prediction', steps, metrics['predicted_screenshot_path'])
        
        index += 1

    results = {
        "similarity": float(np.mean(similarity_scores)),
        "perceptual_loss": float(np.mean(perceptual_losses)),
    }
    
    writer = SummaryWriter(log_dir=log_dir)
    writer.add_scalar('similarity', results['similarity'], steps)
    writer.add_scalar('perceptual_loss', results['perceptual_loss'], steps)
    
    print("Similarity:", results['similarity'])
    print("Perceptual loss:", results['perceptual_loss'])

    return results

def test_prediction(model, data, steps):
    answers = []
    labels = []
    print("Generating predictions...")
    for row in data:
        inputs = tokenizer(
        [
            data_prompt.format(
                #instructions
                row['svg'],
                #answer
                "",
            )
        ], return_tensors = "pt").to("cuda")
        
        outputs = model.generate(**inputs, max_new_tokens = 5020, use_cache = True)
        answer = tokenizer.batch_decode(outputs)
        answers.append(answer[0].split("### Response:")[-1])
        labels.append(row['html'])

    print("Computing metrics...")
    compute_metrics(answers, labels, steps)

In [6]:
!rm -rf output

In [6]:
!apt install zip -y
!rm -rf data-rb-size-color-text-bare
!mkdir -p data-rb-size-color-text-bare
!wget "https://www.dropbox.com/scl/fi/or7eexwsl7s9ud8otg4y4/data-rb-size-color-text-bare.zip?rlkey=35kkqe2k0a4xorh8q6ow7c1in&dl=1" -O model.zip
!unzip model.zip -d data-rb-size-color-text-bare

!rm -rf data-rb-validate
!mkdir -p data-rb-validate

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  unzip
The following NEW packages will be installed:
  unzip zip
0 upgraded, 2 newly installed, 0 to remove and 40 not upgraded.
Need to get 350 kB of archives.
After this operation, 930 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 unzip amd64 6.0-26ubuntu3.2 [175 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 zip amd64 3.0-12build2 [176 kB]
Fetched 350 kB in 1s (337 kB/s)m[33m
debconf: delaying package configuration, since apt-utils is not installed

7[0;23r8[1ASelecting previously unselected package unzip.
(Reading database ... 36713 files and directories currently installed.)
Preparing to unpack .../unzip_6.0-26ubuntu3.2_amd64.deb ...
7[24;0f[42m[30mProgress: [  0%][49m[39m [..........................................................] 87[24;0f[42m

In [6]:
from datasets import load_from_disk
dataset = load_from_disk('data-rb-size-color-text-bare')

dataset = dataset.train_test_split(test_size=4/len(dataset))

dataset

DatasetDict({
    train: Dataset({
        features: ['svg', 'html'],
        num_rows: 99849
    })
    test: Dataset({
        features: ['svg', 'html'],
        num_rows: 4
    })
})

In [5]:
model, tokenizer = load_model()

data_prompt = """Your job is to take variable parameters extracted from an SVG file of a web design and convert it into a variable set of parameters of HTML and CSS markup and stylesheet that represents the design in pixel-perfect accuracy.

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token
def formatting_prompt(examples):
    inputs       = examples["svg"]
    outputs      = examples["html"]
    texts = []
    for input_, output in zip(inputs, outputs):
        text = data_prompt.format(input_, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }



==((====))==  Unsloth 2025.2.12: Fast Llama patching. Transformers: 4.49.0.
   \\   /|    GPU: NVIDIA H100 NVL. Max memory: 93.111 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu126. CUDA: 9.0. CUDA Toolkit: 12.6. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = None. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Unsloth 2025.2.12 patched 16 layers with 16 QKV layers, 16 O layers and 16 MLP layers.


In [8]:
training_data = dataset.map(formatting_prompt, batched=True)

Map:   0%|          | 0/99849 [00:00<?, ? examples/s]

Map:   0%|          | 0/4 [00:00<?, ? examples/s]

In [9]:
training_data

DatasetDict({
    train: Dataset({
        features: ['svg', 'html', 'text'],
        num_rows: 99849
    })
    test: Dataset({
        features: ['svg', 'html', 'text'],
        num_rows: 4
    })
})

In [10]:
def get_token_lengths(examples):
    inputs = tokenizer(
        examples['text'],
        truncation=False,  # Don't truncate yet
        padding=False,     # Don't pad yet
        return_length=True,
    )

    return inputs

tokenized_data = training_data.map(get_token_lengths, batched=True)

def filter_function(example):
    return example['length'] <= max_seq_length

filtered_data = tokenized_data.filter(filter_function)

print(filtered_data)

Map:   0%|          | 0/99849 [00:00<?, ? examples/s]

Map:   0%|          | 0/4 [00:00<?, ? examples/s]

Filter:   0%|          | 0/99849 [00:00<?, ? examples/s]

Filter:   0%|          | 0/4 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['svg', 'html', 'text', 'input_ids', 'attention_mask', 'length'],
        num_rows: 99849
    })
    test: Dataset({
        features: ['svg', 'html', 'text', 'input_ids', 'attention_mask', 'length'],
        num_rows: 4
    })
})


In [11]:
filtered_data = filtered_data.remove_columns(["input_ids", "attention_mask", "length"])
filtered_data.save_to_disk('data-rb-size-color-text-bare-filtered-' + str(max_seq_length))

Saving the dataset (0/1 shards):   0%|          | 0/99849 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/4 [00:00<?, ? examples/s]

In [6]:
from datasets import load_from_disk

filtered_data = load_from_disk('data-rb-size-color-text-bare-filtered-' + str(max_seq_length))

filtered_data

DatasetDict({
    train: Dataset({
        features: ['svg', 'html', 'text'],
        num_rows: 99849
    })
    test: Dataset({
        features: ['svg', 'html', 'text'],
        num_rows: 4
    })
})

In [7]:
import torch
from tqdm import tqdm
import os

#resume = False
resume = True

for steps in tqdm(range(270, 360, 1)):
    print(f"Steps: {steps}")

    if steps > 0:
        os.environ['UNSLOTH_RETURN_LOGITS'] = '1'
        trainer = create_trainer(model, tokenizer, filtered_data['train'], steps)
        if resume:
            trainer.train(resume_from_checkpoint=True)
        else:
            trainer.train()
            resume = True
        
    model = FastLanguageModel.for_inference(model)

    results = test_prediction(model, filtered_data['test'], steps)

    if results is not None and results['perceptual_loss'] == 0.0:
        break

    model = FastLanguageModel.for_training(model)

    

  0%|          | 0/90 [00:00<?, ?it/s]

Steps: 270


Converting train dataset to ChatML (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Applying chat template to train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 270
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
272,0.0198


Generating predictions...


  1%|          | 1/90 [06:38<9:51:16, 398.61s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 271


Applying chat template to train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 271
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
273,0.0122


Generating predictions...


  2%|‚ñè         | 2/90 [13:15<9:43:28, 397.82s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 272


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 272
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
274,0.0256


Generating predictions...


  3%|‚ñé         | 3/90 [19:49<9:33:57, 395.83s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 273


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 273
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
275,0.0123


Generating predictions...


  4%|‚ñç         | 4/90 [26:22<9:25:35, 394.60s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 274


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 274
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
276,0.0166


Generating predictions...


  6%|‚ñå         | 5/90 [32:56<9:18:47, 394.44s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 275


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 275
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
277,0.0122


Generating predictions...


  7%|‚ñã         | 6/90 [39:31<9:12:35, 394.70s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 276


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 276
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
278,0.0265


Generating predictions...


  8%|‚ñä         | 7/90 [46:02<9:04:29, 393.61s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 277


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 277
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
279,0.0271


Generating predictions...


  9%|‚ñâ         | 8/90 [52:38<8:58:39, 394.14s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 278


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 278
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
280,0.0343


Generating predictions...


 10%|‚ñà         | 9/90 [59:13<8:52:27, 394.41s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 279


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 279
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
281,0.0195


Generating predictions...


 11%|‚ñà         | 10/90 [1:05:46<8:45:40, 394.26s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 280


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 280
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
282,0.0197


Generating predictions...


 12%|‚ñà‚ñè        | 11/90 [1:12:51<8:51:21, 403.57s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 281


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 281
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
283,0.0197


Generating predictions...


 13%|‚ñà‚ñé        | 12/90 [1:19:51<8:51:03, 408.51s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 282


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 282
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
284,0.0197


Generating predictions...


 14%|‚ñà‚ñç        | 13/90 [1:26:51<8:48:34, 411.88s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 283


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 283
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
285,0.0122


Generating predictions...


 16%|‚ñà‚ñå        | 14/90 [1:33:52<8:45:28, 414.85s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 284


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 284
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
286,0.0123


Generating predictions...


 17%|‚ñà‚ñã        | 15/90 [1:40:52<8:40:34, 416.46s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 285


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 285
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
287,0.0122


Generating predictions...


 18%|‚ñà‚ñä        | 16/90 [1:47:52<8:34:44, 417.36s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 286


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 286
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
288,0.0263


Generating predictions...


 19%|‚ñà‚ñâ        | 17/90 [1:54:54<8:29:20, 418.64s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 287


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 287
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
289,0.0122


Generating predictions...


 20%|‚ñà‚ñà        | 18/90 [2:01:56<8:23:46, 419.82s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 288


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 288
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
290,0.0195


Generating predictions...


 21%|‚ñà‚ñà        | 19/90 [2:09:00<8:18:14, 421.05s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 289


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 289
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
291,0.0123


Generating predictions...


 22%|‚ñà‚ñà‚ñè       | 20/90 [2:15:37<8:02:52, 413.89s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 290


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 290
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
292,0.0123


Generating predictions...


 23%|‚ñà‚ñà‚ñé       | 21/90 [2:22:39<7:58:49, 416.37s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 291


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 291
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
293,0.0122


Generating predictions...


 24%|‚ñà‚ñà‚ñç       | 22/90 [2:29:40<7:53:17, 417.61s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 292


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 292
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
294,0.0202


Generating predictions...


 26%|‚ñà‚ñà‚ñå       | 23/90 [2:36:40<7:47:07, 418.32s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 293


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 293
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
295,0.034


Generating predictions...


 27%|‚ñà‚ñà‚ñã       | 24/90 [2:43:43<7:41:50, 419.85s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 294


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 294
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
296,0.0194


Generating predictions...


 28%|‚ñà‚ñà‚ñä       | 25/90 [2:50:48<7:36:17, 421.20s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 295


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 295
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
297,0.0196


Generating predictions...


 29%|‚ñà‚ñà‚ñâ       | 26/90 [2:57:50<7:29:32, 421.45s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 296


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 296
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
298,0.0195


Generating predictions...


 30%|‚ñà‚ñà‚ñà       | 27/90 [3:04:53<7:23:12, 422.10s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 297


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 297
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
299,0.0123


Generating predictions...


 31%|‚ñà‚ñà‚ñà       | 28/90 [3:11:56<7:16:12, 422.13s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 298


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 298
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
300,0.0198


Generating predictions...


 32%|‚ñà‚ñà‚ñà‚ñè      | 29/90 [3:19:00<7:09:50, 422.80s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 299


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 299
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
301,0.0273


Generating predictions...


 33%|‚ñà‚ñà‚ñà‚ñé      | 30/90 [3:26:00<7:01:57, 421.95s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 300


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 300
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
302,0.0273


Generating predictions...


 34%|‚ñà‚ñà‚ñà‚ñç      | 31/90 [3:33:00<6:54:23, 421.42s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 301


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 301
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
303,0.0264


Generating predictions...


 36%|‚ñà‚ñà‚ñà‚ñå      | 32/90 [3:40:01<6:47:20, 421.38s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 302


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 302
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
304,0.027


Generating predictions...


 37%|‚ñà‚ñà‚ñà‚ñã      | 33/90 [3:47:01<6:39:56, 420.98s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 303


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 303
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
305,0.0275


Generating predictions...


 38%|‚ñà‚ñà‚ñà‚ñä      | 34/90 [3:54:01<6:32:37, 420.67s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 304


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 304
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
306,0.0197


Generating predictions...


 39%|‚ñà‚ñà‚ñà‚ñâ      | 35/90 [4:01:03<6:25:53, 420.98s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 305


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 305
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
307,0.0122


Generating predictions...


 40%|‚ñà‚ñà‚ñà‚ñà      | 36/90 [4:08:03<6:18:43, 420.80s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 306


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 306
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
308,0.0198


Generating predictions...


 41%|‚ñà‚ñà‚ñà‚ñà      | 37/90 [4:15:04<6:11:39, 420.75s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 307


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 307
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
309,0.027


Generating predictions...


 42%|‚ñà‚ñà‚ñà‚ñà‚ñè     | 38/90 [4:22:05<6:04:37, 420.71s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 308


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 308
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
310,0.0197


Generating predictions...


 43%|‚ñà‚ñà‚ñà‚ñà‚ñé     | 39/90 [4:29:07<5:58:01, 421.21s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 309


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 309
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
311,0.0194


Generating predictions...


 44%|‚ñà‚ñà‚ñà‚ñà‚ñç     | 40/90 [4:36:10<5:51:24, 421.68s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 310


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 310
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
312,0.0268


Generating predictions...


 46%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 41/90 [4:43:12<5:44:26, 421.76s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 311


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 311
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
313,0.0122


Generating predictions...


 47%|‚ñà‚ñà‚ñà‚ñà‚ñã     | 42/90 [4:49:48<5:31:11, 413.98s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 312


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 312
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
314,0.0199


Generating predictions...


 48%|‚ñà‚ñà‚ñà‚ñà‚ñä     | 43/90 [4:56:25<5:20:23, 409.01s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 313


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 313
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
315,0.0122


Generating predictions...


 49%|‚ñà‚ñà‚ñà‚ñà‚ñâ     | 44/90 [5:03:02<5:10:48, 405.39s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 314


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 314
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
316,0.0196


Generating predictions...


 50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 45/90 [5:09:36<5:01:29, 402.00s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 315


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 315
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
317,0.0193


Generating predictions...


 51%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 46/90 [5:16:12<4:53:21, 400.04s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 316


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 316
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
318,0.0268


Generating predictions...


 52%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè    | 47/90 [5:22:50<4:46:19, 399.53s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 317


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 317
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
319,0.0198


Generating predictions...


 53%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé    | 48/90 [5:29:28<4:39:25, 399.17s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 318


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 318
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
320,0.0123


Generating predictions...


 54%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç    | 49/90 [5:36:05<4:32:19, 398.53s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 319


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 319
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
321,0.0198


Generating predictions...


 56%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå    | 50/90 [5:42:40<4:24:54, 397.37s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 320


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 320
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
322,0.0194


Generating predictions...


 57%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã    | 51/90 [5:49:15<4:17:48, 396.64s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 321


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 321
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
323,0.0123


Generating predictions...


 58%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä    | 52/90 [5:55:46<4:10:06, 394.91s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 322


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 322
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
324,0.0199


Generating predictions...


 59%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ    | 53/90 [6:02:15<4:02:28, 393.19s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 323


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 323
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
325,0.0122


Generating predictions...


 60%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 54/90 [6:08:49<3:56:08, 393.56s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 324


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 324
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
326,0.0123


Generating predictions...


 61%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 55/90 [6:15:24<3:49:45, 393.87s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 325


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 325
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
327,0.0194


Generating predictions...


 62%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè   | 56/90 [6:21:56<3:42:52, 393.31s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 326


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 326
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
328,0.0272


Generating predictions...


 63%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé   | 57/90 [6:28:30<3:36:27, 393.56s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 327


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 327
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
329,0.0199


Generating predictions...


 64%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç   | 58/90 [6:35:07<3:30:24, 394.52s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 328


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 328
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
330,0.0197


Generating predictions...


 66%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 59/90 [6:41:42<3:23:56, 394.73s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 329


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 329
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss


Generating predictions...


 67%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã   | 60/90 [6:47:51<3:13:29, 386.99s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 330


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 330
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss


Generating predictions...


 68%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä   | 61/90 [6:53:56<3:03:50, 380.38s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 331


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 331
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
331,0.0122


Generating predictions...


 69%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ   | 62/90 [7:00:39<3:00:45, 387.32s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 332


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 332
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
332,0.0346


Generating predictions...


 70%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 63/90 [7:07:13<2:55:10, 389.28s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 333


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 333
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
333,0.0198


Generating predictions...


 71%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 64/90 [7:13:48<2:49:23, 390.89s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 334


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 334
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
334,0.0195


Generating predictions...


 72%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  | 65/90 [7:20:32<2:44:33, 394.93s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 335


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 335
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
335,0.0267


Generating predictions...


 73%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé  | 66/90 [7:27:08<2:38:01, 395.06s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 336


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 336
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
336,0.0199


Generating predictions...


 74%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç  | 67/90 [7:33:44<2:31:37, 395.56s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 337


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 337
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
337,0.0348


Generating predictions...


 76%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 68/90 [7:40:22<2:25:17, 396.26s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 338


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 338
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
338,0.0123


Generating predictions...


 77%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã  | 69/90 [7:46:57<2:18:34, 395.93s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 339


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 339
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
339,0.0122


Generating predictions...


 78%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä  | 70/90 [7:53:29<2:11:33, 394.70s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 340


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 340
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
340,0.0199


Generating predictions...


 79%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ  | 71/90 [8:00:03<2:04:51, 394.30s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 341


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 341
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
341,0.0121


Generating predictions...


 80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 72/90 [8:06:47<1:59:12, 397.39s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 342


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 342
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
342,0.0193


Generating predictions...


 81%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 73/90 [8:13:27<1:52:49, 398.22s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 343


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 343
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
343,0.0277


Generating predictions...


 82%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè | 74/90 [8:20:08<1:46:22, 398.88s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 344


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 344
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
344,0.0123


Generating predictions...


 83%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé | 75/90 [8:26:43<1:39:28, 397.90s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 345


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 345
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
345,0.0198


Generating predictions...


 84%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç | 76/90 [8:33:19<1:32:42, 397.35s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 346


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 346
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
346,0.0123


Generating predictions...


 86%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå | 77/90 [8:39:58<1:26:10, 397.72s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 347


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 347
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
347,0.0165


Generating predictions...


 87%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã | 78/90 [8:46:35<1:19:30, 397.55s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 348


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 348
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
348,0.0122


Generating predictions...


 88%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä | 79/90 [8:53:11<1:12:48, 397.10s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 349


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 349
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
349,0.0266


Generating predictions...


 89%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ | 80/90 [8:59:50<1:06:14, 397.46s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 350


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 350
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
350,0.0123


Generating predictions...


 90%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 81/90 [9:06:29<59:43, 398.16s/it]  

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 351


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 351
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
351,0.0199


Generating predictions...


 91%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 82/90 [9:13:04<52:57, 397.19s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 352


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 352
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
352,0.0122


Generating predictions...


 92%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè| 83/90 [9:19:50<46:38, 399.85s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 353


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 353
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
353,0.0122


Generating predictions...


 93%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé| 84/90 [9:26:34<40:05, 400.91s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 354


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 354
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
354,0.0343


Generating predictions...


 94%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç| 85/90 [9:33:16<33:27, 401.42s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 355


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 355
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
355,0.0196


Generating predictions...


 96%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 86/90 [9:40:00<26:48, 402.25s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 356


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 356
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
356,0.0197


Generating predictions...


 97%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã| 87/90 [9:46:45<20:08, 402.88s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 357


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 357
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
357,0.0122


Generating predictions...


 98%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä| 88/90 [9:53:27<13:25, 402.58s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 358


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 358
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
358,0.0189


Generating predictions...


 99%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ| 89/90 [10:00:12<06:43, 403.39s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan
Steps: 359


Tokenizing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

Packing train dataset (num_proc=10):   0%|          | 0/99849 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 14,180 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 64
\        /    Total batch size = 128 | Total steps = 359
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
359,0.0278


Generating predictions...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 90/90 [10:06:54<00:00, 404.61s/it]

Computing metrics...
Similarity: nan
Perceptual loss: nan





In [8]:
test_index = 0
text = filtered_data['test'][test_index]['svg']
model = FastLanguageModel.for_inference(model)
inputs = tokenizer(
[
    data_prompt.format(
        #instructions
        text,
        #answer
        "",
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 5020, use_cache = True)
answer=tokenizer.batch_decode(outputs)
answer = answer[0].split("### Response:")[-1]

print(filtered_data['test'][test_index]['html'])
print("Answer of the question is:", answer)

{"FONT_SIZE4": "154%", "COLOR8": "#c9a38e", "COLOR7": "#24d3e9", "SIZE3": "210px", "FONT_SIZE3": "4em", "COLOR6": "#05fd00", "COLOR5": "#44d3d6", "SIZE2": "35vw", "FONT_SIZE2": "19pt", "COLOR4": "#f06b18", "COLOR3": "#6bc83a", "FONT_SIZE1": "11px", "COLOR2": "#aa5044", "SIZE1": "74vh", "COLOR1": "#fcb18a", "WORD4": "OCCUR", "WORD3": "GLASS", "WORD2": "REACH", "WORD1": "POUND"}
Answer of the question is: 
{"WORD4": "OCCUR", "LENGTH4": "92.875", "LENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGTHLENGT

In [9]:
test_prediction(model, filtered_data['test'], steps)

Generating predictions...
Computing metrics...
Similarity: nan
Perceptual loss: nan
