<a href="https://colab.research.google.com/github/terahidro2003/ID2223_finetuning/blob/main/ID2223_fine_tuning_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup
## Install Unsloth

In [1]:
%%capture

!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git@nightly git+https://github.com/unslothai/unsloth-zoo.git

## Constants

In [49]:
import os 
import sys

if "google.colab" in str(get_ipython()):
  from google.colab import userdata
  TOKEN = userdata.get('HF_TOKEN')
else:
  !pip install python-dotenv
  from dotenv import load_dotenv
  load_dotenv()
  TOKEN = os.environ["HF_TOKEN"]

MODEL_NAME = "unsloth/Llama-3.2-3B-Instruct-bnb-4bit"
hub_repo = "hellstone1918/test-checkpoint-model"

Defaulting to user installation because normal site-packages is not writeable
Collecting python-dotenv
  Downloading python_dotenv-1.2.1-py3-none-any.whl (21 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.2.1


## Specify Quantizied Model 

In [50]:
!pip install unsloth
from unsloth import FastLanguageModel
from transformers import AutoTokenizer
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 2x faster
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # 4bit for 405b!
    "unsloth/Mistral-Small-Instruct-2409",     # Mistral 22b 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!

    "unsloth/Llama-3.2-1B-bnb-4bit",           # NEW! Llama 3.2 models
    "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    "unsloth/Llama-3.2-3B-bnb-4bit",
    "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",

    "unsloth/Llama-3.3-70B-Instruct-bnb-4bit" # NEW! Llama 3.3 70B!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = MODEL_NAME,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

Defaulting to user installation because normal site-packages is not writeable
==((====))==  Unsloth 2025.11.4: Fast Llama patching. Transformers: 4.57.2.
   \\   /|    NVIDIA GeForce RTX 4070. Num GPUs = 1. Max memory: 11.994 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.5.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


## Adding LoRA Adapter

In [51]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

# Data Preparation

## Download Dataset

In [52]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("mlabonne/FineTome-100k", split = "train")

## Convert Dataset Format to HuggingFace's Generic Format

In [53]:
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True, )

## Comparison of Formats

In [54]:
dataset[5]["conversations"]

[{'content': 'How do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?',
  'role': 'user'},
 {'content': 'Astronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.',
  'role': 'assistant'}]

In [55]:
dataset[5]["text"]

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|

# Training

In [59]:
from trl import SFTTrainer, SFTConfig
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported
import multiprocessing

sft_config = SFTConfig(
    output_dir       = "outputs",
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 4,
    warmup_steps     = 5,
    max_steps        = 100,
    learning_rate    = 2e-4,
    fp16             = not is_bfloat16_supported(),
    bf16             = is_bfloat16_supported(),
    logging_steps    = 1,
    optim            = "adamw_8bit",
    weight_decay     = 0.01,
    lr_scheduler_type= "linear",
    seed             = 3407,

    # Checkpoint saving and pushing config
    push_to_hub = True,
    hub_model_id = hub_repo,
    hub_token=TOKEN,
    save_strategy="steps",
    save_steps       = 10,
    save_total_limit = 5,
    hub_strategy="all_checkpoints",

    report_to        = "none",

    dataset_num_proc = multiprocessing.cpu_count(),
    packing          = False,
)

trainer = SFTTrainer(
    model              = model,
    tokenizer          = tokenizer,
    train_dataset      = dataset,
    args               = sft_config,
    dataset_text_field = "text",
    max_seq_length     = max_seq_length,
    data_collator      = DataCollatorForSeq2Seq(tokenizer = tokenizer),
)

In [60]:
import os
import re
from huggingface_hub import HfApi, snapshot_download

def get_latest_hf_checkpoint(repo_id: str, cache_dir: str | None = None) -> str | None:
    """
    Returns local path to the latest checkpoint-* directory from a Hub repo,
    or None if the repo doesn't exist or has no checkpoints.
    """
    api = HfApi()
    try:
        api.repo_info(repo_id, repo_type="model")
    except:
        # Repo does not exist on the Hub -> first training ever
        return None

    # Repo exists – download files locally (or reuse cached snapshot)
    local_repo_path = snapshot_download(
        repo_id,
        repo_type="model",
        local_dir=cache_dir,                # e.g. "hf_repo_cache" or None
        local_dir_use_symlinks=False,
    )

    # Find checkpoint-* subdirectories
    ckpts: list[tuple[int, str]] = []
    for name in os.listdir(local_repo_path):
        full = os.path.join(local_repo_path, name)
        m = re.match(r"checkpoint-(\d+)", name)
        if m and os.path.isdir(full):
            step = int(m.group(1))
            ckpts.append((step, full))

    if not ckpts:
        # Repo exists but no checkpoints yet (maybe only final model)
        return None

    # Return path of checkpoint with highest step number
    _, latest_ckpt_path = max(ckpts, key=lambda t: t[0])
    return latest_ckpt_path


In [61]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)
latest_ckpt = get_latest_hf_checkpoint(hub_repo)
if latest_ckpt is None:
    print("No checkpoints on Hub – starting training from scratch.")
    trainer_stats = trainer.train()
else:
    print(f"Found checkpoint on Hub: {latest_ckpt}")
    trainer_stats = trainer.train(resume_from_checkpoint=latest_ckpt)

The model is already on multiple devices. Skipping the move to device specified in `args`.


Found checkpoint on Hub: /home/hellstone/.cache/huggingface/hub/models--hellstone1918--test-checkpoint-model/snapshots/db1a2e8da5b43ee48962b8421a8466ba74cc18bd/checkpoint-60


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 100,000 | Num Epochs = 1 | Total steps = 100
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 24,313,856 of 3,237,063,680 (0.75% trained)


Step,Training Loss
61,0.6192
62,0.766
63,0.6993
64,0.9052
65,0.6868
66,0.7979
67,0.7177
68,0.6385
69,0.7852
70,0.522


## Show Stats

In [62]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

NameError: name 'start_gpu_memory' is not defined

In [63]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


The Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding numbers. The sequence is: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, and so on.
####
2, 3, 5, 8<|eot_id|>


# Save the Model

In [65]:
model.push_to_hub_gguf(hub_repo, tokenizer, quantization_method="f16")

Unsloth: Converting model to GGUF format...
Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /home/hellstone/.cache/huggingface/hub
Checking cache directory for required files...
Cache check failed: model-00001-of-00002.safetensors not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.


Unsloth: Preparing safetensor model files: 100%|████████████████████████████████| 2/2 [01:04<00:00, 32.34s/it]


Note: tokenizer.model not found (this is OK for non-SentencePiece models)


Unsloth: Merging weights into 16bit: 100%|██████████████████████████████████████| 2/2 [00:24<00:00, 12.22s/it]


Unsloth: Merge process complete. Saved to `/tmp/unsloth_gguf_eotv3lg4`
Unsloth: Converting to GGUF format...
==((====))==  Unsloth: Conversion from HF to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF bf16 might take 3 minutes.
\        /    [2] Converting GGUF bf16 to ['f16'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: llama.cpp folder exists but binaries not found - will rebuild
Unsloth: Updating system package directories
Unsloth: Missing packages: libcurl4-openssl-dev
Unsloth: Will attempt to install missing system packages.
Unsloth: Installing packages: libcurl4-openssl-dev
Unsloth: Install llama.cpp and building - please wait 1 to 3 minutes
Unsloth: Install GGUF and other packages


RuntimeError: Failed to convert model to GGUF: Unsloth: GGUF conversion failed: === Unsloth: FAILED building llama.cpp ===
Make failed: [FAIL] Command `make clean` failed with exit code 2
stdout: Makefile:6: *** Build system changed:
 The Makefile build has been replaced by CMake.

 For build instructions see:
 https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md

.  Stop.


CMake failed: [FAIL] Command `cmake . -B build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=OFF -DLLAMA_CURL=ON` failed with exit code 1
stdout: -- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
[0mCMAKE_BUILD_TYPE=Release[0m
-- Found Git: /usr/bin/git (found version "2.34.1") 
-- The ASM compiler identification is GNU
-- Found assembler: /usr/bin/cc
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- GGML_SYSTEM_ARCH: x86
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native 
-- ggml version: 0.9.4
-- ggml commit:  583cb8341
-- Could NOT find CURL (missing: CURL_LIBRARY CURL_INCLUDE_DIR) 
[31mCMake Error at common/CMakeLists.txt:91 (message):
  Could NOT find CURL.  Hint: to disable this feature, set -DLLAMA_CURL=OFF

[0m
-- Configuring incomplete, errors occurred!
See also "/mnt/f/University/KTH/ml_finetuning/ID2223_finetuning/llama.cpp/build/CMakeFiles/CMakeOutput.log".


=== Full output log: ===
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: gguf in /home/hellstone/.local/lib/python3.10/site-packages (0.17.1)
Requirement already satisfied: protobuf in /home/hellstone/.local/lib/python3.10/site-packages (6.33.1)
Requirement already satisfied: sentencepiece in /home/hellstone/.local/lib/python3.10/site-packages (0.2.1)
Requirement already satisfied: mistral_common in /home/hellstone/.local/lib/python3.10/site-packages (1.8.5)
Requirement already satisfied: pyyaml>=5.1 in /usr/lib/python3/dist-packages (from gguf) (5.4.1)
Requirement already satisfied: tqdm>=4.27 in /home/hellstone/.local/lib/python3.10/site-packages (from gguf) (4.67.1)
Requirement already satisfied: numpy>=1.17 in /home/hellstone/.local/lib/python3.10/site-packages (from gguf) (2.2.6)
Requirement already satisfied: pillow>=10.3.0 in /home/hellstone/.local/lib/python3.10/site-packages (from mistral_common) (12.0.0)
Requirement already satisfied: tiktoken>=0.7.0 in /home/hellstone/.local/lib/python3.10/site-packages (from mistral_common) (0.12.0)
Requirement already satisfied: typing-extensions>=4.11.0 in /home/hellstone/.local/lib/python3.10/site-packages (from mistral_common) (4.15.0)
Requirement already satisfied: pydantic<3.0,>=2.7 in /home/hellstone/.local/lib/python3.10/site-packages (from mistral_common) (2.12.4)
Requirement already satisfied: requests>=2.0.0 in /home/hellstone/.local/lib/python3.10/site-packages (from mistral_common) (2.32.5)
Requirement already satisfied: jsonschema>=4.21.1 in /home/hellstone/.local/lib/python3.10/site-packages (from mistral_common) (4.25.1)
Requirement already satisfied: pydantic-extra-types[pycountry]>=2.10.5 in /home/hellstone/.local/lib/python3.10/site-packages (from mistral_common) (2.10.6)
Requirement already satisfied: referencing>=0.28.4 in /home/hellstone/.local/lib/python3.10/site-packages (from jsonschema>=4.21.1->mistral_common) (0.37.0)
Requirement already satisfied: attrs>=22.2.0 in /home/hellstone/.local/lib/python3.10/site-packages (from jsonschema>=4.21.1->mistral_common) (25.4.0)
Requirement already satisfied: rpds-py>=0.7.1 in /home/hellstone/.local/lib/python3.10/site-packages (from jsonschema>=4.21.1->mistral_common) (0.29.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /home/hellstone/.local/lib/python3.10/site-packages (from jsonschema>=4.21.1->mistral_common) (2025.9.1)
Requirement already satisfied: annotated-types>=0.6.0 in /home/hellstone/.local/lib/python3.10/site-packages (from pydantic<3.0,>=2.7->mistral_common) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /home/hellstone/.local/lib/python3.10/site-packages (from pydantic<3.0,>=2.7->mistral_common) (2.41.5)
Requirement already satisfied: typing-inspection>=0.4.2 in /home/hellstone/.local/lib/python3.10/site-packages (from pydantic<3.0,>=2.7->mistral_common) (0.4.2)
Requirement already satisfied: pycountry>=23 in /home/hellstone/.local/lib/python3.10/site-packages (from pydantic-extra-types[pycountry]>=2.10.5->mistral_common) (24.6.1)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/lib/python3/dist-packages (from requests>=2.0.0->mistral_common) (1.26.5)
Requirement already satisfied: charset_normalizer<4,>=2 in /home/hellstone/.local/lib/python3.10/site-packages (from requests>=2.0.0->mistral_common) (3.4.4)
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests>=2.0.0->mistral_common) (3.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests>=2.0.0->mistral_common) (2020.6.20)
Requirement already satisfied: regex>=2022.1.18 in /home/hellstone/.local/lib/python3.10/site-packages (from tiktoken>=0.7.0->mistral_common) (2025.11.3)
