# LoRA Fine-tuning (4-bit) ‚Äî Spelling Correction (Kaggle 2√óT4)

This notebook was run on **Kaggle** using **2√ó NVIDIA T4 GPUs** (accelerator) to fine-tune a small instruction-tuned LLM with **LoRA** + **4-bit quantization**.

Pipeline: **data ‚Üí prompt formatting (loss masking) ‚Üí training ‚Üí eval (EM/CER) ‚Üí inference ‚Üí save adapter**


In [1]:
%%capture
import os

!pip install pip3-autoremove
!pip install torch torchvision torchaudio xformers --index-url https://download.pytorch.org/whl/cu128
!pip install unsloth
!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2

## Model loading (4-bit) + LoRA adapters

Load the base model in 4-bit and attach LoRA adapters for efficient fine-tuning.

In [2]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
    "unsloth/llama-2-7b-bnb-4bit",
    "unsloth/llama-2-13b-bnb-4bit",
    "unsloth/codellama-34b-bnb-4bit",
    "unsloth/tinyllama-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit", # New Google 6 trillion tokens model 2.5x faster!
    "unsloth/gemma-2b-bnb-4bit",
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct", # "unsloth/tinyllama" for 16bit loading
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.


2026-01-28 18:23:31.781887: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1769624612.163335      55 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1769624612.271217      55 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1769624613.049497      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1769624613.049539      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1769624613.049542      55 computation_placer.cc:177] computation placer alr

ü¶• Unsloth Zoo will now patch everything to make training faster!
Unsloth: Could not import trl.trainer.alignprop_trainer: Failed to import trl.trainer.alignprop_trainer because of the following error (look up to see its traceback):
Failed to import trl.models.modeling_sd_base because of the following error (look up to see its traceback):
Failed to import diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion because of the following error (look up to see its traceback):
Failed to import diffusers.loaders.ip_adapter because of the following error (look up to see its traceback):
/usr/local/lib/python3.12/dist-packages/xformers/flash_attn_3/_C.so: undefined symbol: _ZNK3c106SymInt22maybe_as_int_slow_pathEv
Unsloth: Could not import trl.trainer.ddpo_trainer: Failed to import trl.trainer.ddpo_trainer because of the following error (look up to see its traceback):
Failed to import trl.models.modeling_sd_base because of the following error (look up to see its traceback):
Failed to i

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/129 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct does not have a padding token! Will use pad_token = <unk>.


In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 32, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0, # Currently only supports dropout = 0
    bias = "none",    # Currently only supports bias = "none"
    use_gradient_checkpointing = False, # @@@ IF YOU GET OUT OF MEMORY - set to True @@@
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2026.1.4 patched 22 layers with 22 QKV layers, 22 O layers and 22 MLP layers.


## Prompt formatting

We convert each example into an instruction-style prompt:
Instruction + Input ‚Üí Response (target).


In [4]:
import pandas as pd
from datasets import Dataset

df_train = pd.read_csv("/kaggle/input/spelling-mistake-data-1mn/train.csv")
df_val   = pd.read_csv("/kaggle/input/spelling-mistake-data-1mn/val.csv")


INSTRUCTION = (
    "Correct only spelling mistakes in the sentence. "
    "Do not change the meaning, do not add/remove words unless required to fix spelling. "
    "Return ONLY the corrected sentence, nothing else."
)


def prepare_df(df):
    df = df.rename(columns={"augmented_text": "input", "text": "output"})
    df["instruction"] = INSTRUCTION
    return df[["instruction", "input", "output"]]

train_df = prepare_df(df_train)
val_df   = prepare_df(df_val)

train_dataset = Dataset.from_pandas(train_df).select(range(15000))
val_dataset   = Dataset.from_pandas(val_df).select(range(2000))


alpaca_prompt = """### Instruction:
{instruction}

### Input:
{input}

### Response:
{output}""" + tokenizer.eos_token


def build_text(examples):
    texts = []
    for instr, inp, out in zip(examples["instruction"], examples["input"], examples["output"]):
        texts.append(alpaca_prompt.format(instruction=instr, input=inp, output=out))
    return {"text": texts}

train_dataset = train_dataset.map(build_text, batched=True)
val_dataset   = val_dataset.map(build_text, batched=True)

print("EXEMPLE TEXT:\n", train_dataset[0]["text"][:500])


Map:   0%|          | 0/15000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

EXEMPLE TEXT:
 ### Instruction:
Correct only spelling mistakes in the sentence. Do not change the meaning, do not add/remove words unless required to fix spelling. Return ONLY the corrected sentence, nothing else.

### Input:
barbaric and shocking use of ofcre agaistn

### Response:
barbaric and shocking use of force against</s>


## Loss masking (train only on the answer)

We mask labels before `### Response:` so the loss is computed only on the corrected sentence.


In [5]:
response_marker = "\n\n### Response:\n"   

marker_ids = tokenizer(response_marker, add_special_tokens=False)["input_ids"]

def find_subseq(seq, sub):
    for i in range(len(seq) - len(sub) + 1):
        if seq[i:i+len(sub)] == sub:
            return i
    return -1

def show_example(ds, n=160):
    ex = ds[0]
    input_ids = ex["input_ids"]
    labels    = ex["labels"]
    toks = tokenizer.convert_ids_to_tokens(input_ids)

    print("idx | token                  | id     | label   | learn?")
    print("-"*70)
    for i, (t, tid, lab) in enumerate(zip(toks, input_ids, labels)):
        if i >= n: break
        learn = "LEARN" if lab != -100 else "IGN"
        print(f"{i:3d} | {t[:22]:22s} | {tid:6d} | {lab:6d} | {learn}")


MARKER = "### Response:\n"

def tokenize_and_mask(example):
    text = example["text"]
    idx = text.find(MARKER)

    enc = tokenizer(text, truncation=True, max_length=max_seq_length)
    labels = enc["input_ids"].copy()

    if idx == -1:
        enc["labels"] = [-100] * len(labels)
        return enc

    prefix_text = text[: idx + len(MARKER)]
    prefix_ids = tokenizer(prefix_text, truncation=True, max_length=max_seq_length)["input_ids"]
    start = len(prefix_ids)

    if start >= len(labels):
        enc["labels"] = [-100] * len(labels)
        return enc

    for i in range(start):
        labels[i] = -100

    enc["labels"] = labels
    return enc


train_tok = train_dataset.map(tokenize_and_mask, remove_columns=train_dataset.column_names, num_proc=8)
val_tok   = val_dataset.map(tokenize_and_mask, remove_columns=val_dataset.column_names, num_proc=8)
show_example(train_tok, n=200)


Map (num_proc=8):   0%|          | 0/15000 [00:00<?, ? examples/s]

Map (num_proc=8):   0%|          | 0/2000 [00:00<?, ? examples/s]

idx | token                  | id     | label   | learn?
----------------------------------------------------------------------
  0 | <s>                    |      1 |   -100 | IGN
  1 | ‚ñÅ###                   |    835 |   -100 | IGN
  2 | ‚ñÅInst                  |   2799 |   -100 | IGN
  3 | ruction                |   4080 |   -100 | IGN
  4 | :                      |  29901 |   -100 | IGN
  5 | <0x0A>                 |     13 |   -100 | IGN
  6 | Cor                    |  12521 |   -100 | IGN
  7 | rect                   |   1621 |   -100 | IGN
  8 | ‚ñÅonly                  |    871 |   -100 | IGN
  9 | ‚ñÅsp                    |    805 |   -100 | IGN
 10 | elling                 |   7807 |   -100 | IGN
 11 | ‚ñÅmistakes              |  28947 |   -100 | IGN
 12 | ‚ñÅin                    |    297 |   -100 | IGN
 13 | ‚ñÅthe                   |    278 |   -100 | IGN
 14 | ‚ñÅsentence              |  10541 |   -100 | IGN
 15 | .                      |  29889 |   -100 | IGN
 16 | ‚ñ

In [6]:
def learns(ex): 
    return any(l != -100 for l in ex["labels"])

print("ex0 learns?", learns(train_tok[0]))
print("ratio learns =", sum(learns(ex) for ex in train_tok) / len(train_tok))


ex0 learns? True
ratio learns = 1.0


## Training

Define padding/collator + training config, then fine-tune with HF Trainer.


In [7]:
from transformers import TrainingArguments, Trainer, DataCollatorForSeq2Seq

# Important : beaucoup de tokenizers Llama n'ont pas de pad_token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Pour √©viter un warning/perf issue en training
model.config.use_cache = False

collator = DataCollatorForSeq2Seq(
    tokenizer=tokenizer,
    padding=True,
    label_pad_token_id=-100,
    return_tensors="pt",
)


In [8]:
args = TrainingArguments(
    output_dir="out",
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=8,   # batch effectif = 16
    learning_rate=1e-4,
    num_train_epochs=5,
    fp16=True,                       # T4
    logging_steps=20,
    eval_strategy="steps",
    eval_steps=200,
    save_strategy="steps",
    save_steps=200,
    save_total_limit=2,
    report_to="none",
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_tok,
    eval_dataset=val_tok,
    data_collator=collator,
)

trainer.train()


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 15,000 | Num Epochs = 5 | Total steps = 2,345
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 8 x 1) = 32
 "-____-"     Trainable parameters = 25,231,360 of 1,125,279,744 (2.24% trained)


Step,Training Loss,Validation Loss
200,0.7353,0.711645
400,0.6499,0.624885
600,0.5528,0.581439
800,0.5425,0.553835
1000,0.4748,0.537374
1200,0.4764,0.521951
1400,0.4715,0.507718
1600,0.4288,0.502575
1800,0.4407,0.497645
2000,0.3727,0.497331


Unsloth: Not an error, but LlamaForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


TrainOutput(global_step=2345, training_loss=0.5195235861389876, metrics={'train_runtime': 4172.6677, 'train_samples_per_second': 17.974, 'train_steps_per_second': 0.562, 'total_flos': 4.421501847286579e+16, 'train_loss': 0.5195235861389876, 'epoch': 5.0})

## Save LoRA adapter

Save the adapter weights (and tokenizer) for later inference without retraining.


In [9]:
model.save_pretrained("lora_adapter")
tokenizer.save_pretrained("lora_adapter")


('lora_adapter/tokenizer_config.json',
 'lora_adapter/special_tokens_map.json',
 'lora_adapter/chat_template.jinja',
 'lora_adapter/tokenizer.model',
 'lora_adapter/added_tokens.json',
 'lora_adapter/tokenizer.json')

In [10]:
from unsloth import FastLanguageModel
FastLanguageModel.for_inference(model)

def make_prompt(sentence):
    return f"""### Instruction:
{INSTRUCTION}

### Input:
{sentence}

### Response:
"""

test_sentence = "bar bar ic and shocking use of creag aistn"
inputs = tokenizer([make_prompt(test_sentence)], return_tensors="pt").to("cuda")

out = model.generate(
    **inputs,
    max_new_tokens=64,
    do_sample=False,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))


### Instruction:
Correct only spelling mistakes in the sentence. Do not change the meaning, do not add/remove words unless required to fix spelling. Return ONLY the corrected sentence, nothing else.

### Input:
bar bar ic and shocking use of creag aistn

### Response:
bar bar ic and shocking use of crapistan


## Evaluation (Base vs Fine-tuned)

Compute Exact Match (EM) and Character Error Rate (CER), then inspect a few predictions.


In [11]:
df_test = pd.read_csv("/kaggle/input/spelling-mistake-data-1mn/test.csv")
test_df = prepare_df(df_test)

test_dataset = Dataset.from_pandas(test_df).select(range(1000))  # optionnel pour tester vite
test_dataset = test_dataset.map(build_text, batched=True)


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [12]:
def build_prompt(instr, inp):
    return f"""### Instruction:
{instr}

### Input:
{inp}

### Response:
"""


In [13]:
from unsloth import FastLanguageModel
FastLanguageModel.for_inference(model)

@torch.no_grad()
def generate_batch(model, tokenizer, batch_prompts, max_new_tokens=64):
    inputs = tokenizer(batch_prompts, return_tensors="pt", padding=True, truncation=True).to("cuda")
    out = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
    decoded = tokenizer.batch_decode(out, skip_special_tokens=True)

    # On r√©cup√®re seulement ce qui est apr√®s "### Response:"
    preds = []
    for full in decoded:
        if "### Response:" in full:
            preds.append(full.split("### Response:")[-1].strip())
        else:
            preds.append(full.strip())
    return preds


In [14]:
def exact_match(pred, ref):
    return int(pred.strip() == ref.strip())

def levenshtein(a, b):
    a, b = a.strip(), b.strip()
    n, m = len(a), len(b)
    dp = list(range(m+1))
    for i in range(1, n+1):
        prev = dp[0]
        dp[0] = i
        for j in range(1, m+1):
            cur = dp[j]
            cost = 0 if a[i-1] == b[j-1] else 1
            dp[j] = min(dp[j] + 1, dp[j-1] + 1, prev + cost)
            prev = cur
    return dp[m]

def cer(pred, ref):
    ref = ref.strip()
    if len(ref) == 0:
        return 0.0 if pred.strip() == "" else 1.0
    return levenshtein(pred, ref) / len(ref)


In [15]:
# build prompts + refs
N = 500  # ajuste
instrs = test_dataset["instruction"][:N]
inps   = test_dataset["input"][:N]
refs   = test_dataset["output"][:N]

prompts = [build_prompt(i, x) for i, x in zip(instrs, inps)]

preds = generate_batch(model, tokenizer, prompts, max_new_tokens=64)

em  = sum(exact_match(p, r) for p, r in zip(preds, refs)) / N
cer_mean = sum(cer(p, r) for p, r in zip(preds, refs)) / N

print("Exact match:", em)
print("CER:", cer_mean)


Exact match: 0.256
CER: 0.11135630819087414


In [16]:
import re
import torch
from unsloth import FastLanguageModel

# --- Helpers: extraction + normalisation ---
def extract_response(full_text: str) -> str:
    # r√©cup√®re ce qui suit le dernier "### Response:"
    if "### Response:" in full_text:
        ans = full_text.split("### Response:")[-1]
    else:
        ans = full_text
    # coupe si le mod√®le r√©-imprime une nouvelle section
    if "\n### " in ans:
        ans = ans.split("\n### ")[0]
    return ans.strip()

_ws = re.compile(r"\s+")
def normalize(s: str) -> str:
    return _ws.sub(" ", s).strip()

@torch.no_grad()
def generate_all(model, tokenizer, prompts, batch_size=4, max_new_tokens=64):
    FastLanguageModel.for_inference(model)
    preds = []
    for i in range(0, len(prompts), batch_size):
        batch = prompts[i:i+batch_size]
        inputs = tokenizer(batch, return_tensors="pt", padding=True, truncation=True).to("cuda")
        out = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=False,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
        decoded = tokenizer.batch_decode(out, skip_special_tokens=True)
        preds.extend([extract_response(t) for t in decoded])
    return preds

def exact_match(pred, ref):
    return int(normalize(pred) == normalize(ref))

def levenshtein(a, b):
    a, b = a.strip(), b.strip()
    n, m = len(a), len(b)
    dp = list(range(m+1))
    for i in range(1, n+1):
        prev = dp[0]
        dp[0] = i
        for j in range(1, m+1):
            cur = dp[j]
            cost = 0 if a[i-1] == b[j-1] else 1
            dp[j] = min(dp[j] + 1, dp[j-1] + 1, prev + cost)
            prev = cur
    return dp[m]

def cer(pred, ref):
    pred, ref = normalize(pred), normalize(ref)
    if len(ref) == 0:
        return 0.0 if pred == "" else 1.0
    return levenshtein(pred, ref) / len(ref)

def score(preds, refs):
    N = len(refs)
    em = sum(exact_match(p, r) for p, r in zip(preds, refs)) / N
    cer_mean = sum(cer(p, r) for p, r in zip(preds, refs)) / N
    return em, cer_mean

# --- 1) Eval RAW model (base, sans LoRA) ---
raw_model, _ = FastLanguageModel.from_pretrained(
    model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
    max_seq_length=max_seq_length,
    dtype=None,
    load_in_4bit=True,
)

raw_preds = generate_all(raw_model, tokenizer, prompts, batch_size=4, max_new_tokens=64)
raw_em, raw_cer = score(raw_preds, refs)

# lib√®re VRAM du raw
del raw_model
torch.cuda.empty_cache()

# --- 2) Eval Fine-tuned model (ton 'model' actuel apr√®s trainer.train()) ---
ft_preds = generate_all(model, tokenizer, prompts, batch_size=4, max_new_tokens=64)
ft_em, ft_cer = score(ft_preds, refs)

print(f"RAW -> Exact match: {raw_em:.3f} | CER: {raw_cer:.3f}")
print(f"FT  -> Exact match: {ft_em:.3f} | CER: {ft_cer:.3f}")
print(f"Œî   -> EM: {ft_em-raw_em:+.3f} | CER: {ft_cer-raw_cer:+.3f} ")

# --- 3) Affiche quelques exemples pour sanity-check ---
for i in range(5):
    print("\nIN :", inps[i])
    print("REF:", refs[i])
    print("RAW:", raw_preds[i] if i < len(raw_preds) else "")
    print("FT :", ft_preds[i])


==((====))==  Unsloth 2026.1.4: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct does not have a padding token! Will use pad_token = <unk>.
RAW -> Exact match: 0.002 | CER: 0.334
FT  -> Exact match: 0.260 | CER: 0.109
Œî   -> EM: +0.258 | CER: -0.226 

IN : project looks to muelsnig ngeetic alternative
REF: project looks to mulesing genetic alternative
RAW: project looks to muelsnig ngeetic alternative
FT : project looks to mulligan genetic alternative

IN : chemical agents used during LrotWst at port ahgusra prixoj
REF: chemical agents used during protest at port augusta prison
RAW:

## Conclusion & next steps

- Fine-tuned a small LLM with LoRA (4-bit) for spelling correction
- Improved EM / CER vs the base model : **EM: +0.258 | CER: -0.226**

**Potential improvements**
- Larger or cleaner dataset
- Better decoding (beam search, temperature tuning)
- More robust evaluation (multiple prompts, more test samples)
