# Codestral fine-tuning with color dataset

2024-12-05 20:51

Attempt to replicate the color fine-tuning on the Codestral model. This time not using dev models as with the previous runs. Still no predict-with-generate. This time I limited the steps to 10000 because the browser kept crashing with many steps. As with the previous iterations, it seems that we can't get the results as with T5.

In [1]:
!pip install \
    "torch==2.3.0" \
    tensorboard

!pip install --upgrade \
    "transformers==4.41.2" \
    "accelerate==0.30.1" \
    "datasets==2.19.1" \
    "peft==0.11.1" \
    "bitsandbytes==0.43.1" \
    "trl==0.8.6" \
    "evaluate==0.4.2" \
    huggingface_hub huggingface

!pip install -U sentencepiece

[0m

In [2]:
import torch
assert torch.cuda.get_device_capability()[0] >= 8, 'Hardware not supported for Flash Attention'

# on a multi-gpu machine
!FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE pip install flash-attn --no-build-isolation

# NOTE: use when 'Hardware not supported for Flash Attention'
# on a single gpu or only cpu machine 
! pip install ninja packaging
! MAX_JOBS=4 pip install flash-attn --no-build

[0m

In [3]:
!git config --global credential.helper store

In [4]:
from huggingface_hub import login
 
login(
  token="<HF_API_KEY_REMOVED>", 
  add_to_git_credential=True
)

In [5]:
!apt install zip -y
!rm -rf data-rb-color
!mkdir -p data-rb-color
!wget "https://www.dropbox.com/scl/fi/vd0ypt9mo9oh0p9tf90h3/dataset-rb-color-fixed.zip?rlkey=bieseudpp5pzko5j4u1n67phq&dl=1" -O model.zip
!unzip model.zip -d data-rb-color

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
zip is already the newest version (3.0-12build2).
0 upgraded, 0 newly installed, 0 to remove and 28 not upgraded.
--2024-12-04 09:48:48--  https://www.dropbox.com/scl/fi/vd0ypt9mo9oh0p9tf90h3/dataset-rb-color-fixed.zip?rlkey=bieseudpp5pzko5j4u1n67phq&dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.13.18, 2620:100:6057:18::a27d:d12
Connecting to www.dropbox.com (www.dropbox.com)|162.125.13.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://uc30b3ced646a71f9ae373f2859d.dl.dropboxusercontent.com/cd/0/inline/CfmtGDQiNYtudMpvcPWNRq4fOKQqkXi0L_V7qoaxPn4hB91R2CyON_6K2g13HhEeIjbFaZ8f_musyXNNkFlF95kAGh6Uy8BXe9eSrfSKPyZvVs3n9M06H5GW8KMoHttMtI0/file?dl=1# [following]
--2024-12-04 09:48:50--  https://uc30b3ced646a71f9ae373f2859d.dl.dropboxusercontent.com/cd/0/inline/CfmtGDQiNYtudMpvcPWNRq4fOKQqkXi0L_V7qoaxPn4hB91R2CyON_6K2g13HhEeIjbFaZ8f_musyXNNkFlF95k

In [48]:
from datasets import load_from_disk
dataset = load_from_disk('data-rb-color')
dataset = dataset.train_test_split(test_size=4/len(dataset))

dataset

DatasetDict({
    train: Dataset({
        features: ['svg', 'html'],
        num_rows: 99996
    })
    test: Dataset({
        features: ['svg', 'html'],
        num_rows: 4
    })
})

In [49]:
mistral_instruct_template = "[INST]{instruction}[/INST]"

system_prompt = """Your job is to turn an input SVG file to HTML and CSS code.
You must generate only HTML and CSS code, no additional text."""

def format_dataset(sample):
    instruction = f"{system_prompt}\n\n" + sample["svg"] + "\n\n"
    sample["prompt"]  = mistral_instruct_template.format(instruction=instruction)
    sample["completion"] = sample["html"]
    return sample

# convert dataset to instruct prompt template
columns_to_remove = list(dataset["train"].column_names)
dataset = dataset.map(format_dataset, remove_columns=columns_to_remove, batched=False)

Map:   0%|          | 0/99996 [00:00<?, ? examples/s]

Map:   0%|          | 0/4 [00:00<?, ? examples/s]

In [50]:
print(dataset['train'][0])

{'prompt': '[INST]Your job is to turn an input SVG file to HTML and CSS code.\nYou must generate only HTML and CSS code, no additional text.\n\n<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="393" height="852" viewBox="0 0 393 852"><g id="html1"><g data-tag="head" id="head1" data-z-index="auto" data-stacking-context="true" aria-owns="script1"><g data-tag="script" id="script1" data-z-index="auto" data-stacking-context="true"/></g><g data-tag="body" id="body1" data-z-index="auto" data-stacking-context="true" role="document" aria-owns="style1"><g data-stacking-layer="rootBackgroundAndBorders"><rect width="377" height="836" x="8" y="8" fill="rgb(208, 67, 166)"/></g><g data-tag="style" id="style1" data-z-index="auto" data-stacking-context="true"/></g></g></svg>\n\n[/INST]', 'completion': '<body></body>\n\n<style>\n\n        body {\n            background-color: #d043a6;\n        }\n    \n</style>'}


In [51]:
!mkdir -p sft_cache
!mkdir -p sft_cache/checkpoints
!mkdir -p sft_cache/model
!mkdir -p sft_cache/offload

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

In [52]:
### model
model_id = "mistral-community/Codestral-22B-v0.1"

### qlora related
r = 64
lora_alpha = 16
lora_dropout = 0.1
task_type = "CAUSAL_LM"

### bitsandbytes related
load_in_4bit=True
bnb_4bit_use_double_quant=True
bnb_4bit_quant_type="nf4"
bnb_4bit_compute_dtype="bfloat16"

### training related
output_dir = "sft_cache/checkpoints"
save_model_dir = "sft_cache/model/"
offload_folder = "sft_cache/offload"
log_dir=f"{output_dir}/logs"

num_train_epochs = 1

per_device_train_batch_size = 1
per_device_eval_batch_size = 1
gradient_accumulation_steps = 1
gradient_checkpointing = True

bf16 = True
fp16 = False

max_grad_norm = 0.3
weight_decay = 0.001
optim = "adamw_torch"

learning_rate = 2e-4
warmup_ratio = 0.03
lr_scheduler_type = "constant"

save_strategy = "no"
logging_steps = 25
logging_strategy = "steps"
group_by_length = True

max_seq_length = 4096
packing = False

In [53]:
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)

In [54]:
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'

In [55]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=load_in_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_use_double_quant=bnb_4bit_use_double_quant,
    bnb_4bit_compute_dtype=getattr(torch, bnb_4bit_compute_dtype),
)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    use_cache=False if gradient_checkpointing else True,
    quantization_config=bnb_config,
    device_map="auto"
)
model.config.use_cache = False if gradient_checkpointing else True
model.config.pretraining_tp = 1 # num_of_gpus
model.gradient_checkpointing_enable(gradient_checkpointing_kwargs={"use_reentrant": False})

Loading checkpoint shards:   0%|          | 0/9 [00:00<?, ?it/s]

In [56]:
import bitsandbytes as bnb
from peft import LoraConfig

In [57]:
def find_all_linear_names(model):
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, bnb.nn.Linear4bit):
            names = name.split(".")
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if "lm_head" in lora_module_names:  # needed for 16-bit
        lora_module_names.remove("lm_head")
    return list(lora_module_names)


# get lora target modules
target_modules = find_all_linear_names(model)

In [58]:
lora_config = LoraConfig(
    r=r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    target_modules=target_modules,
    bias="none",
    task_type=task_type,
)

In [59]:
from transformers import TrainingArguments
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM

In [60]:
# checkout for more info: Train on completions only https://huggingface.co/docs/trl/en/sft_trainer

def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['prompt'])):
        text = f"{example['prompt'][i]}\n\n ### Answer: {example['completion'][i]}"
        output_texts.append(text)
    return output_texts

collator = DataCollatorForCompletionOnlyLM(
    response_template="### Answer:", 
    tokenizer=tokenizer
)

In [65]:
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    max_steps=10000,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    gradient_checkpointing=gradient_checkpointing,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    weight_decay=weight_decay,    
    optim=optim,
    learning_rate=learning_rate,
    warmup_ratio=warmup_ratio,
    lr_scheduler_type=lr_scheduler_type,
    save_strategy=save_strategy,
    logging_steps=logging_steps,
    logging_strategy=logging_strategy,
    group_by_length=group_by_length,
)

In [66]:
# initialize sft trainer
trainer = SFTTrainer(
    args=training_arguments,
    model=model,
    peft_config=lora_config,
    tokenizer=tokenizer,
    train_dataset=dataset['train'],
    eval_dataset=dataset["test"],
    formatting_func=formatting_prompts_func,
    data_collator=collator,
    max_seq_length=max_seq_length,
    packing=packing,
    compute_metrics=None,
)


Map:   0%|          | 0/99996 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [67]:
trainer.train()



Step,Training Loss
25,0.1718
50,0.0396
75,0.0546
100,0.0562
125,0.0595
150,0.0761
175,0.0505
200,0.0607
225,0.0501
250,0.094


TrainOutput(global_step=10000, training_loss=0.0147688237731345, metrics={'train_runtime': 10220.6577, 'train_samples_per_second': 0.978, 'train_steps_per_second': 0.978, 'total_flos': 4.8871132453085184e+17, 'train_loss': 0.0147688237731345, 'epoch': 0.1000040001600064})

In [64]:
!ls -l sft_cache/checkpoints

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


total 745764
-rw-r--r-- 1 root root      5110 Dec  5 14:07 README.md
-rw-r--r-- 1 root root       738 Dec  5 14:07 adapter_config.json
-rw-r--r-- 1 root root 763646922 Dec  5 14:07 adapter_model.bin
drwxr-xr-x 4 root root        88 Dec  4 09:51 runs


In [68]:
trainer.model.save_pretrained(output_dir, safe_serialization=False)



In [69]:
text = dataset['test'][0]
text

{'prompt': '[INST]Your job is to turn an input SVG file to HTML and CSS code.\nYou must generate only HTML and CSS code, no additional text.\n\n<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="393" height="852" viewBox="0 0 393 852"><g id="html1"><g data-tag="head" id="head1" data-z-index="auto" data-stacking-context="true" aria-owns="script1"><g data-tag="script" id="script1" data-z-index="auto" data-stacking-context="true"/></g><g data-tag="body" id="body1" data-z-index="auto" data-stacking-context="true" role="document" aria-owns="style1"><g data-stacking-layer="rootBackgroundAndBorders"><rect width="377" height="836" x="8" y="8" fill="rgb(137, 189, 249)"/></g><g data-tag="style" id="style1" data-z-index="auto" data-stacking-context="true"/></g></g></svg>\n\n[/INST]',
 'completion': '<body></body>\n\n<style>\n\n        body {\n            background-color: #89bdf9;\n        }\n    \n</style>'}

In [70]:
# clear memory
del model
del trainer
torch.cuda.empty_cache()

#### step 6: merge adapter weights and base model

In [71]:
from peft import AutoPeftModelForCausalLM

In [72]:
# load PEFT model in fp16
model = AutoPeftModelForCausalLM.from_pretrained(
    output_dir,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,  # ATTENTION: This allows remote code execution
)  

Loading checkpoint shards:   0%|          | 0/9 [00:00<?, ?it/s]

In [73]:
print(model)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): MistralForCausalLM(
      (model): MistralModel(
        (embed_tokens): Embedding(32768, 6144)
        (layers): ModuleList(
          (0-55): 56 x MistralDecoderLayer(
            (self_attn): MistralSdpaAttention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=6144, out_features=6144, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=6144, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=6144, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (k_proj): lora.Linear(
                (base_layer): Linear(in_fe

In [74]:
# merge
merged_model = model.merge_and_unload()

In [75]:
print(merged_model)

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32768, 6144)
    (layers): ModuleList(
      (0-55): 56 x MistralDecoderLayer(
        (self_attn): MistralSdpaAttention(
          (q_proj): Linear(in_features=6144, out_features=6144, bias=False)
          (k_proj): Linear(in_features=6144, out_features=1024, bias=False)
          (v_proj): Linear(in_features=6144, out_features=1024, bias=False)
          (o_proj): Linear(in_features=6144, out_features=6144, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear(in_features=6144, out_features=16384, bias=False)
          (up_proj): Linear(in_features=6144, out_features=16384, bias=False)
          (down_proj): Linear(in_features=16384, out_features=6144, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
      )
    )
    (norm): MistralRMSNorm(

In [76]:
# save merged model
merged_model.save_pretrained(save_model_dir, safe_serialization=True,  max_shard_size="2GB")

In [77]:
# save tokenizer for easy inference
tokenizer.save_pretrained(save_model_dir)

('sft_cache/model/tokenizer_config.json',
 'sft_cache/model/special_tokens_map.json',
 'sft_cache/model/tokenizer.model',
 'sft_cache/model/added_tokens.json',
 'sft_cache/model/tokenizer.json')

In [78]:
del model
del merged_model
del tokenizer

torch.cuda.empty_cache()

### 5. Test and evaluate

In [79]:
# NOTE: restart the kernel and run from this section

#### prepare test dataset

In [80]:
# uncomment the test dataset and run all the cells within section 3: Create and prepare dataset

#### inference: finetuned model

In [81]:
import gc, torch
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.cuda.empty_cache()
gc.collect()

540

In [82]:
model_local_path = "sft_cache/model/"
print(f"model_local_path: {model_local_path}")

model_local_path: sft_cache/model/


In [83]:
tokenizer = AutoTokenizer.from_pretrained(
    model_local_path, trust_remote_code=True
)
tokenizer.pad_token = tokenizer.eos_token

sft_model = AutoModelForCausalLM.from_pretrained(
    model_local_path,
    trust_remote_code=True,
    attn_implementation="flash_attention_2",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

Loading checkpoint shards:   0%|          | 0/23 [00:00<?, ?it/s]



In [84]:
eval_sample = dataset['test'][0]
eval_prompt, eval_completion = eval_sample["prompt"], eval_sample["completion"]

print(f"prompt: {eval_prompt}")
print("\n", f"*"*25, "\n")
print(f"completion: {eval_completion}")

prompt: [INST]Your job is to turn an input SVG file to HTML and CSS code.
You must generate only HTML and CSS code, no additional text.

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="393" height="852" viewBox="0 0 393 852"><g id="html1"><g data-tag="head" id="head1" data-z-index="auto" data-stacking-context="true" aria-owns="script1"><g data-tag="script" id="script1" data-z-index="auto" data-stacking-context="true"/></g><g data-tag="body" id="body1" data-z-index="auto" data-stacking-context="true" role="document" aria-owns="style1"><g data-stacking-layer="rootBackgroundAndBorders"><rect width="377" height="836" x="8" y="8" fill="rgb(137, 189, 249)"/></g><g data-tag="style" id="style1" data-z-index="auto" data-stacking-context="true"/></g></g></svg>

[/INST]

 ************************* 

completion: <body></body>

<style>

        body {
            background-color: #89bdf9;
        }
    
</style>


In [86]:
model_inputs = tokenizer([eval_prompt], return_tensors="pt").to("cuda")
sft_model.eval()
with torch.no_grad():
    generated_ids = sft_model.generate(
        **model_inputs, max_new_tokens=1000, do_sample=True,
        length_penalty=-5.0, repetition_penalty=2.0, num_beams=10,
    )
    results = tokenizer.batch_decode(generated_ids)[0]
    # prompt_length = model_inputs['input_ids'].shape[1]
    # results = tokenizer.batch_decode(generated_ids[prompt_length:])[0]
    print(results)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST]Your job is to turn an input SVG file to HTML and CSS code.
You must generate only HTML and CSS code, no additional text.

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="393" height="852" viewBox="0 0 393 852"><g id="html1"><g data-tag="head" id="head1" data-z-index="auto" data-stacking-context="true" aria-owns="script1"><g data-tag="script" id="script1" data-z-index="auto" data-stacking-context="true"/></g><g data-tag="body" id="body1" data-z-index="auto" data-stacking-context="true" role="document" aria-owns="style1"><g data-stacking-layer="rootBackgroundAndBorders"><rect width="377" height="836" x="8" y="8" fill="rgb(137, 189, 249)"/></g><g data-tag="style" id="style1" data-z-index="auto" data-stacking-context="true"/></g></g></svg>

[/INST]</s>


#### inference: original model

In [87]:
del sft_model
del tokenizer

In [88]:
import gc, torch
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.cuda.empty_cache()
gc.collect()

19529

In [89]:
model_id = "mistral-community/Codestral-22B-v0.1"
print(f"model_id: {model_id}")

model_id: mistral-community/Codestral-22B-v0.1


In [90]:
tokenizer = AutoTokenizer.from_pretrained(
    model_local_path, trust_remote_code=True
)
tokenizer.pad_token = tokenizer.eos_token

base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    attn_implementation="flash_attention_2",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)



Loading checkpoint shards:   0%|          | 0/9 [00:00<?, ?it/s]



In [91]:
eval_sample = dataset['test'][0]
eval_prompt, eval_completion = eval_sample["prompt"], eval_sample["completion"]

print(f"prompt: {eval_prompt}")
print("\n", f"*"*25, "\n")
print(f"completion: {eval_completion}")

prompt: [INST]Your job is to turn an input SVG file to HTML and CSS code.
You must generate only HTML and CSS code, no additional text.

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="393" height="852" viewBox="0 0 393 852"><g id="html1"><g data-tag="head" id="head1" data-z-index="auto" data-stacking-context="true" aria-owns="script1"><g data-tag="script" id="script1" data-z-index="auto" data-stacking-context="true"/></g><g data-tag="body" id="body1" data-z-index="auto" data-stacking-context="true" role="document" aria-owns="style1"><g data-stacking-layer="rootBackgroundAndBorders"><rect width="377" height="836" x="8" y="8" fill="rgb(137, 189, 249)"/></g><g data-tag="style" id="style1" data-z-index="auto" data-stacking-context="true"/></g></g></svg>

[/INST]

 ************************* 

completion: <body></body>

<style>

        body {
            background-color: #89bdf9;
        }
    
</style>


In [93]:
model_inputs = tokenizer([eval_prompt], return_tensors="pt").to("cuda")
base_model.eval()
with torch.no_grad():
    generated_ids = base_model.generate(
        **model_inputs, max_new_tokens=32_000, do_sample=True
    )
    results = tokenizer.batch_decode(generated_ids)[0]
    print(results)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST]Your job is to turn an input SVG file to HTML and CSS code.
You must generate only HTML and CSS code, no additional text.

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="393" height="852" viewBox="0 0 393 852"><g id="html1"><g data-tag="head" id="head1" data-z-index="auto" data-stacking-context="true" aria-owns="script1"><g data-tag="script" id="script1" data-z-index="auto" data-stacking-context="true"/></g><g data-tag="body" id="body1" data-z-index="auto" data-stacking-context="true" role="document" aria-owns="style1"><g data-stacking-layer="rootBackgroundAndBorders"><rect width="377" height="836" x="8" y="8" fill="rgb(137, 189, 249)"/></g><g data-tag="style" id="style1" data-z-index="auto" data-stacking-context="true"/></g></g></svg>

[/INST]
<!DOCTYPE html>
<html>
<head>
  <style>
    body {
      background-color: rgb(137, 189, 249);
      padding: 8px;
    }
    #head1, #body1 {
      position: relative;
    }
  </style>
</head>

In [None]:
!ls -lh $output_dir