<a href="https://colab.research.google.com/github/terahidro2003/ID2223_finetuning/blob/main/ID2223_fine_tuning_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup
## Install Unsloth

In [1]:
%%capture

!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git@nightly git+https://github.com/unslothai/unsloth-zoo.git

## Constants

In [2]:
import os 

if "google.colab" in str(get_ipython()):
  from google.colab import userdata
  TOKEN = userdata.get('HF_TOKEN')
else:
  !pip install python-dotenv
  from dotenv import load_dotenv
  load_dotenv()
  TOKEN = os.environ["HF_TOKEN"]

MODEL_NAME = "unsloth/Llama-3.2-3B-Instruct-bnb-4bit"
hub_repo = "hellstone1918/Llama-3.2-3B-finance-lora-model"

Defaulting to user installation because normal site-packages is not writeable


## Specify Quantizied Model 

In [3]:
!pip install unsloth
from unsloth import FastLanguageModel
from transformers import AutoTokenizer
import torch
max_seq_length = 5012 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = MODEL_NAME,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

Defaulting to user installation because normal site-packages is not writeable
Collecting unsloth
  Using cached unsloth-2025.11.6-py3-none-any.whl (359 kB)
Installing collected packages: unsloth
Successfully installed unsloth-2025.11.6
ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.


  from .autonotebook import tqdm as notebook_tqdm


ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.11.6: Fast Llama patching. Transformers: 4.57.1.
   \\   /|    NVIDIA GeForce RTX 4070. Num GPUs = 1. Max memory: 11.994 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.5.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [4]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Explain how a country with persistent current-account deficits can maintain a strong currency for years, and under what conditions the adjustment becomes sudden (balance-of-payments crisis)."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 256,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


A country with persistent current-account deficits can maintain a strong currency for years by utilizing various strategies, including:

1. **Capital inflows**: Investors may be attracted to the country's attractive investment opportunities, high yields, or growing economies, leading to significant capital inflows that offset the current-account deficit.
2. **Dependence on foreign exchange reserves**: Some countries accumulate large foreign exchange reserves to back their currency. These reserves can be used to defend the currency and prevent its devaluation in times of exchange rate pressure.
3. **Central bank interventions**: Central banks can intervene in the foreign exchange market to influence the exchange rate. By buying or selling their currency, they can support the exchange rate and prevent its devaluation.
4. **Fiscal discipline**: Implementing sound fiscal policies can help maintain the country's creditworthiness and attract foreign investment. This, in turn, can support the

## Adding LoRA Adapter

In [5]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 32, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0, # 0 is optimized by unsloth (fallback, if needed: 0.05)
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = True, # And LoftQ
)

Unsloth 2025.11.6 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


# Data Preparation

## Download Dataset

In [6]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("Josephgflowers/Finance-Instruct-500k", split = 'train[:25%]')

def to_finetome(example):
    conv = []
    if example.get("system"):
        conv.append({"from": "system", "value": example["system"]})
    if example.get("user"):
        conv.append({"from": "human", "value": example["user"]})
    if example.get("assistant"):
        conv.append({"from": "gpt", "value": example["assistant"]})
    return {"conversations": conv}

converted = dataset.map(to_finetome)
shuffled = converted.shuffle(seed=42)
dataset = shuffled.filter(lambda x: len(x["conversations"]) > 1)

## Convert Dataset Format to HuggingFace's Generic Format

In [7]:
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True, )

## Comparison of Formats

In [8]:
dataset[5]["conversations"]

[{'content': '\n', 'role': 'system'},
 {'content': 'Explain the difference between microeconomics and macroeconomics with examples of how fiscal and monetary policies impact the economy at the micro and macro levels.',
  'role': 'user'},
 {'content': 'Microeconomics:\n\n- Focuses on the economic behavior and decision making of individuals and firms  \n- Examples: Supply and demand analysis, consumer choice theory, cost-benefit analysis, pricing strategies of businesses\n- Fiscal policy at the micro level would include things like taxes and subsidies targeted at specific industries or groups of consumers to change their behavior. \n- Monetary policy at the micro level would include interest rate changes that specifically impact certain loans or industries.\n\nMacroeconomics:\n\n- Focuses on the economy as a whole, analyzing aggregates like national income, unemployment rate, GDP, and inflation  \n- Examples: Business cycles, economic growth, fiscal and monetary policy\n- Fiscal policy a

In [9]:
dataset[5]["text"]

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nExplain the difference between microeconomics and macroeconomics with examples of how fiscal and monetary policies impact the economy at the micro and macro levels.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nMicroeconomics:\n\n- Focuses on the economic behavior and decision making of individuals and firms  \n- Examples: Supply and demand analysis, consumer choice theory, cost-benefit analysis, pricing strategies of businesses\n- Fiscal policy at the micro level would include things like taxes and subsidies targeted at specific industries or groups of consumers to change their behavior. \n- Monetary policy at the micro level would include interest rate changes that specifically impact certain loans or industries.\n\nMacroeconomics:\n\n- Focuses on the economy as a whole, analyzing aggrega

# Training

In [10]:
from trl import SFTTrainer, SFTConfig
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported
import multiprocessing

sft_config = SFTConfig(
    output_dir       = "outputs",
    # batch size - number of training examples utilized in one iteration
    # epoch - one forward and backward pass of all training samples
    # epoch = iterations * batch_size
    # effective batch size = per_device_train_batch_size * gradient_accumulation_steps
    per_device_train_batch_size = 2, # determines how many samples are loaded by the GPU at once (consumes more VRAM, but learning gets faster)
    gradient_accumulation_steps = 16,

    # warmup is needed at the beginning of the training as weights are unadapted to the new dataset. Therefore, we need to slowly increase the learning rate as we fine-tune the model.
    # - prevents large initial updates that could destabilize the model
    # - allows the model to explore the parameter space more effectively and transition smoothly from initial parameter space to regions of higher gradients
    # for large datasets or large amount of features, 5-10% of warmup steps is recommended. Since we use reletively small dataset (100k), it might be safe to go with 3-5% warmup steps (from overall steps)
    warmup_steps     = 50, # how many steps you use to slowly increase the learning rate from 0 up to the target learning rate

    # too many steps -> overfitting or model collapse, too few steps -> catastrophic forgetting
    max_steps        = 1000,

    learning_rate    = 2e-5,

    fp16             = not is_bfloat16_supported(),
    bf16             = is_bfloat16_supported(),
    logging_steps    = 1,
    optim            = "adamw_8bit",
    weight_decay     = 0.01, # adds a small penalty to large weights during training.
    lr_scheduler_type= "linear",
    seed             = 3407,

    # Checkpoint saving and pushing config
    push_to_hub = True,
    hub_model_id = hub_repo,
    hub_token=TOKEN,
    save_strategy="steps",
    save_steps       = 100,
    save_total_limit = 5,
    hub_strategy="all_checkpoints",

    report_to        = "none",

    dataset_num_proc = multiprocessing.cpu_count(),
    packing          = False,
)

trainer = SFTTrainer(
    model              = model,
    tokenizer          = tokenizer,
    train_dataset      = dataset,
    args               = sft_config,
    dataset_text_field = "text",
    max_seq_length     = max_seq_length,
    data_collator      = DataCollatorForSeq2Seq(tokenizer = tokenizer),
)

In [11]:
import os
import re
from huggingface_hub import HfApi, snapshot_download

def get_latest_hf_checkpoint(repo_id: str, cache_dir: str | None = None) -> str | None:
    """
    Returns local path to the latest checkpoint-* directory from a Hub repo,
    or None if the repo doesn't exist or has no checkpoints.
    """
    api = HfApi()
    try:
        api.repo_info(repo_id, repo_type="model")
    except:
        # Repo does not exist on the Hub -> first training ever
        return None

    # Repo exists â€“ download files locally (or reuse cached snapshot)
    local_repo_path = snapshot_download(
        repo_id,
        repo_type="model",
        local_dir=cache_dir,                # e.g. "hf_repo_cache" or None
        local_dir_use_symlinks=False,
    )

    # Find checkpoint-* subdirectories
    ckpts: list[tuple[int, str]] = []
    for name in os.listdir(local_repo_path):
        full = os.path.join(local_repo_path, name)
        m = re.match(r"checkpoint-(\d+)", name)
        if m and os.path.isdir(full):
            step = int(m.group(1))
            ckpts.append((step, full))

    if not ckpts:
        # Repo exists but no checkpoints yet (maybe only final model)
        return None

    # Return path of checkpoint with highest step number
    _, latest_ckpt_path = max(ckpts, key=lambda t: t[0])
    return latest_ckpt_path


In [12]:
from unsloth.chat_templates import train_on_responses_only

#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)
latest_ckpt = get_latest_hf_checkpoint(hub_repo)
if latest_ckpt is None:
    print("No checkpoints on Hub â€“ starting training from scratch.")
    trainer_stats = trainer.train()
else:
    print(f"Found checkpoint on Hub: {latest_ckpt}")
    trainer_stats = trainer.train(resume_from_checkpoint=latest_ckpt)

GPU = NVIDIA GeForce RTX 4070. Max memory = 11.994 GB.
3.07 GB of memory reserved.


The model is already on multiple devices. Skipping the move to device specified in `args`.


No checkpoints on Hub â€“ starting training from scratch.


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 129,546 | Num Epochs = 1 | Total steps = 500
O^O/ \_/ \    Batch size per device = 3 | Gradient accumulation steps = 16
\        /    Data Parallel GPUs = 1 | Total batch size (3 x 16 x 1) = 48
 "-____-"     Trainable parameters = 24,313,856 of 3,237,063,680 (0.75% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,0.9754
2,0.9753
3,0.7115
4,0.759
5,0.8329
6,0.816
7,0.742
8,0.8748
9,0.9177
10,0.7389


## Show Stats

In [13]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

9718.4194 seconds used for training.
161.97 minutes used for training.
Peak reserved memory = 10.383 GB.
Peak reserved memory for training = 7.313 GB.
Peak reserved memory % of max memory = 86.568 %.
Peak reserved memory for training % of max memory = 60.972 %.


In [14]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Explain how a country with persistent current-account deficits can maintain a strong currency for years, and under what conditions the adjustment becomes sudden (balance-of-payments crisis)."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 256,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

Persistent current-account deficits indicate a persistent trade deficit that is financed by foreign capital flows, specifically foreign direct investment or borrowing. While the deficits themselves are negative for the currency, foreign capital inflows into the country can keep supporting a strong currency, especially if:

1. The capital inflows are sustained and significant enough to finance the ongoing trade deficit. This can be because domestic industries are unable to compete effectively internationally.

2. Foreign investors remain confident that the country has stable economic growth prospects and a robust business environment, despite the persistent deficit. They therefore maintain foreign direct investment or purchase debt securities from domestic companies. 

3. There are sufficient foreign reserves to cover short-term balance-of-payments issues. This provides temporary funding to help the country avoid an immediate currency collapse.

4. There is limited foreign currency exch

# Save the Model

In [17]:
!uv venv .venv
!source .venv/bin/activate --clear
!uv pip install "unsloth" "gguf" "protobuf" "sentencepiece" "mistral_common"

model.save_pretrained(MODEL_NAME + "_regular_lora")
tokenizer.save_pretrained(MODEL_NAME + "_regular_lora")

# model.push_to_hub_gguf(hub_repo, tokenizer, quantization_method = "q4_k_m")

model.push_to_hub(hub_repo)
tokenizer.push_to_hub(hub_repo)

Using CPython [36m3.11.14[39m[36m[39m
Creating virtual environment at: [36m.venv[39m
[33m?[0m [1mA virtual environment already exists at `.venv`. Do you want to replace it?[0m [38;5;8m[y/n][0m [38;5;8mâ€º[0m [36myes[0m

[36m[1mhint[0m[1m:[0m Use the `[32m--clear[39m` flag or set `[32mUV_VENV_CLEAR=1[39m` to skip this prompt[?25l[?25h
[2mAudited [1m5 packages[0m [2min 1.75s[0m[0m


Processing Files (1 / 1): 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 97.3MB / 97.3MB, 64.3MB/s  
New Data Upload: |                                                                                                                                                    |  0.00B /  0.00B,  0.00B/s  
No files have been modified since last commit. Skipping to prevent empty commit.


Saved model to https://huggingface.co/hellstone1918/Llama-3.2-3B-finance-lora-model


Processing Files (1 / 1): 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 17.2MB / 17.2MB,  0.00B/s  
New Data Upload: |                                                                                                                                                    |  0.00B /  0.00B,  0.00B/s  
No files have been modified since last commit. Skipping to prevent empty commit.
