# Llama 3.2 fine tuning with color

2024-12-10 11:24

Fine tuning Llama instruct 3.2 with the color data set using unsloth. Good results but it might just be memorization because we didn't split the data set. Needs to run with test set and mid-training evaluation, but it looks promising.

In [13]:
!apt-get install build-essential -y

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  binutils binutils-common binutils-x86-64-linux-gnu bzip2 cpp cpp-11 dirmngr
  dpkg-dev fakeroot g++ g++-11 gcc gcc-11 gcc-11-base gnupg gnupg-l10n
  gnupg-utils gpg-agent gpg-wks-client gpg-wks-server gpgsm
  libalgorithm-diff-perl libalgorithm-diff-xs-perl libalgorithm-merge-perl
  libasan6 libatomic1 libbinutils libcc1-0 libctf-nobfd0 libctf0 libdpkg-perl
  libfakeroot libfile-fcntllock-perl libgcc-11-dev libgomp1 libisl23 libitm1
  libksba8 liblocale-gettext-perl liblsan0 libmpc3 libmpfr6 libnpth0
  libquadmath0 libstdc++-11-dev libtsan0 libubsan1 lto-disabled-list make
  patch pinentry-curses xz-utils
Suggested packages:
  binutils-doc bzip2-doc cpp-doc gcc-11-locales dbus-user-session
  pinentry-gnome3 tor debian-keyring g++-multilib g++-11-multilib gcc-11-doc
  gcc-multilib manpages-dev autoconf automake libtool flex bison gdb gcc

In [1]:
!pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

Collecting unsloth
  Downloading unsloth-2024.12.4-py3-none-any.whl.metadata (59 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.2/59.2 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting unsloth_zoo>=2024.11.8 (from unsloth)
  Downloading unsloth_zoo-2024.12.1-py3-none-any.whl.metadata (16 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
  Downloading xformers-0.0.28.post3-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Collecting bitsandbytes (from unsloth)
  Downloading bitsandbytes-0.45.0-py3-none-manylinux_2_24_x86_64.whl.metadata (2.9 kB)
Collecting tyro (from unsloth)
  Downloading tyro-0.9.2-py3-none-any.whl.metadata (9.4 kB)
Collecting transformers>=4.46.1 (from unsloth)
  Downloading transformers-4.47.0-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.5/43.5 kB[0m [31m23.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets>=2.16.0 (from unsloth)
  Downloading datasets-3.1.

In [2]:
import os
import numpy as np
import pandas as pd

import torch
from trl import SFTTrainer
from transformers import TrainingArguments, TextStreamer
from unsloth.chat_templates import get_chat_template
from unsloth import FastLanguageModel
from datasets import Dataset
from unsloth import is_bfloat16_supported

# Saving model
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Warnings
import warnings
warnings.filterwarnings("ignore")

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [3]:
max_seq_length = 5020
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-1B-bnb-4bit",
    max_seq_length=max_seq_length,
    load_in_4bit=True,
    dtype=None,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    lora_dropout=0,
    target_modules=["q_proj", "k_proj", "v_proj", "up_proj", "down_proj", "o_proj", "gate_proj"],
    use_rslora=True,
    use_gradient_checkpointing="unsloth",
    random_state = 32,
    loftq_config = None,
)
print(model.print_trainable_parameters())

==((====))==  Unsloth 2024.12.4: Fast Llama patching. Transformers:4.46.3.
   \\   /|    GPU: NVIDIA H100 NVL. Max memory: 93.003 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 9.0. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/1.03G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/121 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/459 [00:00<?, ?B/s]

Unsloth 2024.12.4 patched 16 layers with 16 QKV layers, 16 O layers and 16 MLP layers.


trainable params: 11,272,192 || all params: 1,247,086,592 || trainable%: 0.9039
None


In [4]:
!apt install zip -y
!rm -rf data-rb-color
!mkdir -p data-rb-color
!wget "https://www.dropbox.com/scl/fi/vd0ypt9mo9oh0p9tf90h3/dataset-rb-color-fixed.zip?rlkey=bieseudpp5pzko5j4u1n67phq&dl=1" -O model.zip
!unzip model.zip -d data-rb-color

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  unzip
The following NEW packages will be installed:
  unzip zip
0 upgraded, 2 newly installed, 0 to remove and 36 not upgraded.
Need to get 350 kB of archives.
After this operation, 930 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 unzip amd64 6.0-26ubuntu3.2 [175 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 zip amd64 3.0-12build2 [176 kB]
Fetched 350 kB in 1s (331 kB/s)m[33m
debconf: delaying package configuration, since apt-utils is not installed

7[0;23r8[1ASelecting previously unselected package unzip.
(Reading database ... 16754 files and directories currently installed.)
Preparing to unpack .../unzip_6.0-26ubuntu3.2_amd64.deb ...
7[24;0f[42m[30mProgress: [  0%][49m[39m [..........................................................] 87[24;0f[42m

In [9]:
from datasets import load_from_disk
dataset = load_from_disk('data-rb-color')
# dataset = dataset.train_test_split(test_size=4/len(dataset))

dataset

Dataset({
    features: ['svg', 'html'],
    num_rows: 100000
})

In [10]:
data_prompt = """Your job is to take an SVG file of a web design and convert it into a pixel-perfect HTML and CSS markup and stylesheet.

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token
def formatting_prompt(examples):
    inputs       = examples["svg"]
    outputs      = examples["html"]
    texts = []
    for input_, output in zip(inputs, outputs):
        text = data_prompt.format(input_, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

training_data = dataset.map(formatting_prompt, batched=True)

Map:   0%|          | 0/100000 [00:00<?, ? examples/s]

In [11]:
training_data

Dataset({
    features: ['svg', 'html', 'text'],
    num_rows: 100000
})

In [14]:
trainer=SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=training_data,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=True,
    args=TrainingArguments(
        learning_rate=3e-4,
        lr_scheduler_type="linear",
        per_device_train_batch_size=16,
        gradient_accumulation_steps=8,
        num_train_epochs=40,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        warmup_steps=10,
        output_dir="output",
        seed=0,
    ),
)

trainer.train()

Generating train split: 0 examples [00:00, ? examples/s]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 5,589 | Num Epochs = 40
O^O/ \_/ \    Batch size per device = 16 | Gradient Accumulation steps = 8
\        /    Total batch size = 128 | Total steps = 1,720
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
1,0.3042
2,0.305
3,0.2977
4,0.2542
5,0.2793
6,0.267
7,0.2451
8,0.2139
9,0.199
10,0.1843


TrainOutput(global_step=1720, training_loss=0.06335395953163159, metrics={'train_runtime': 40636.3589, 'train_samples_per_second': 5.501, 'train_steps_per_second': 0.042, 'total_flos': 6.527893558316237e+18, 'train_loss': 0.06335395953163159, 'epoch': 39.425714285714285})

In [18]:
text = """<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="393" height="852" viewBox="0 0 393 852"><g id="html1"><g data-tag="head" id="head1" data-z-index="auto" data-stacking-context="true" aria-owns="script1"><g data-tag="script" id="script1" data-z-index="auto" data-stacking-context="true"/></g><g data-tag="body" id="body1" data-z-index="auto" data-stacking-context="true" role="document" aria-owns="style1"><g data-stacking-layer="rootBackgroundAndBorders"><rect width="377" height="836" x="8" y="8" fill="rgb(64, 2, 112)"/></g><g data-tag="style" id="style1" data-z-index="auto" data-stacking-context="true"/></g></g></svg>"""
model = FastLanguageModel.for_inference(model)
inputs = tokenizer(
[
    data_prompt.format(
        #instructions
        text,
        #answer
        "",
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 5020, use_cache = True)
answer=tokenizer.batch_decode(outputs)
answer = answer[0].split("### Response:")[-1]

print("Answer of the question is:", answer)

Answer of the question is: 
<body></body>

<style>

        body {
            background-color: #400270;
        }
    
</style><|end_of_text|>
