To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ‚≠ê <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ‚≠ê
</div>

To install Unsloth your local device, follow [our guide](https://docs.unsloth.ai/get-started/install-and-update). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News


Unsloth's [Docker image](https://hub.docker.com/r/unsloth/unsloth) is here! Start training with no setup & environment issues. [Read our Guide](https://docs.unsloth.ai/new/how-to-train-llms-with-unsloth-and-docker).

[gpt-oss RL](https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning) is now supported with the fastest inference & lowest VRAM. Try our [new notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-GRPO.ipynb) which creates kernels!

Introducing [Vision](https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl) and [Standby](https://docs.unsloth.ai/basics/memory-efficient-rl) for RL! Train Qwen, Gemma etc. VLMs with GSPO - even faster with less VRAM.

Unsloth now supports Text-to-Speech (TTS) models. Read our [guide here](https://docs.unsloth.ai/basics/text-to-speech-tts-fine-tuning).

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [None]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    import torch; v = re.match(r"[0-9\.]{3,}", str(torch.__version__)).group(0)
    xformers = "xformers==" + ("0.0.32.post2" if v == "2.8.0" else "0.0.29.post3")
    !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth
!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2

### Unsloth

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

#### Text Completion / Raw Text Training



In [None]:
%env UNSLOTH_RETURN_LOGITS=1 # Run this to disable CCE since it is not supported for CPT

env: UNSLOTH_RETURN_LOGITS=1 # Run this to disable CCE since it is not supported for CPT


In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-v0.3-bnb-4bit",      # New Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/llama-3-8b-bnb-4bit",           # Llama-3 15 trillion tokens model 2x faster!
    "unsloth/llama-3-8b-Instruct-bnb-4bit",
    "unsloth/llama-3-70b-bnb-4bit",
    "unsloth/Phi-3-mini-4k-instruct",        # Phi-3 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit",             # Gemma 2.2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Phi-3-mini-4k-instruct", # "unsloth/mistral-7b" for 16bit loading
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.11.2: Fast Mistral patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

We also add `embed_tokens` and `lm_head` to allow the model to learn out of distribution data.

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 128, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",

                      "embed_tokens", "lm_head",], # Add for continual pretraining
    lora_alpha = 32,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = True,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth: Offloading input_embeddings to disk to save VRAM
Unsloth: Offloading output_embeddings to disk to save VRAM


Unsloth 2025.11.2 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


Unsloth: Training embed_tokens in mixed precision to save VRAM
Unsloth: Training lm_head in mixed precision to save VRAM


<a name="Data"></a>
### Data Prep

In [None]:
import re

# Downloading dataset
!wget -nc https://raw.githubusercontent.com/valkova-k/cactus-repo/refs/heads/main/assignment05/combined_books.txt

pattern = r'(\.\s*\.\s*\.\s*\d+)|([‚Äì-]\s*\d+\s*[‚Äì-])'

with open("combined_books.txt", "r", encoding="utf-8") as f:
    raw_text = "".join(line for line in f if not re.search(pattern, line))

File ‚Äòcombined_books.txt‚Äô already there; not retrieving.



In [None]:
print("length of dataset in characters: ", len(raw_text))
print(raw_text[:5000])

length of dataset in characters:  2304298
Produced by Miloslav Izar RUSK√Å KNIHOVNA IX. SPISY FEDORA MICHAJLOVIƒåE DOSTOJEVSK√âHO. P≈ôeklad rediguje JAROM√çR HRUB√ù Svazek I. Z√ÅPISKY Z MRTV√âHO DOMU. P≈ôelo≈æil H. JARO≈†. V PRAZE 1891. Tiskem a n√°kladem J. Otty. ƒå√ÅST PRV√Å. √öVOD. V dalek√Ωch kraj√≠ch Sibi≈ôe, uprost≈ôed step√≠, hor a neproniknuteln√Ωch les≈Ø vyskytuj√≠ se z≈ô√≠dka malink√° mƒõsta s jedn√≠m nebo nanejv√Ω≈° se dvƒõma tis√≠ci obyvatel≈Ø, d≈ôevƒõn√° to, ne√∫hledn√° mƒõsta se dvƒõma chr√°my, jedn√≠m ve mƒõstƒõ, druh√Ωm na h≈ôbitovƒõ, a podobn√° v√≠ce k slu≈°n√© vesnici pod Moskvou ne≈æ k mƒõstu. B√Ωvaj√≠ obyƒçejnƒõ hojnƒõ opat≈ôena policejn√≠mi hejtmany, komisary a ostatn√≠mi pod≈ô√≠zen√Ωmi policejn√≠mi dozorci. V Sibi≈ôi v≈Øbec p≈ôes to, ≈æe je tam zima. jsou √∫≈ôady neobyƒçejnƒõ teplouƒçk√©. Lid tamn√≠ je prost√Ω, nenaƒçichl√Ω liber√°lnost√≠; po≈ô√°dky star√©, pevn√©, stalet√≠mi posvƒõcen√©. √ö≈ôedn√≠ky, kte≈ô√≠ pr√°vem hraj√≠ √∫lohu sibi≈ôsk√© ≈°lechty, jsou buƒè tu

In [None]:
raw_text = re.sub(r"\s+", " ", raw_text).strip()

print("D√©lka cel√©ho textu ve znac√≠ch:", len(raw_text))

from datasets import Dataset

# --- NOV√â: rozsek√°n√≠ textu na chunky pevn√© d√©lky ---
chunk_size = 2000  # m≈Ø≈æe≈° klidnƒõ zmƒõnit na 1500 / 2500 atd.

chunks = [
    raw_text[i:i + chunk_size]
    for i in range(0, len(raw_text), chunk_size)
    if len(raw_text[i:i + chunk_size].strip()) > 0
]

print("Poƒçet chunk≈Ø:", len(chunks))
print("Uk√°zka prvn√≠ho chunku:\n", chunks[0][:500])

# --- vytvo≈ôen√≠ datasetu pro UnSloth ---
EOS_TOKEN = tokenizer.eos_token
texts_for_training = [c + EOS_TOKEN for c in chunks]

dataset = Dataset.from_dict({"text": texts_for_training})
print(dataset)
print("Prvn√≠ z√°znam v datasetu:\n", dataset[0]["text"][:300])

D√©lka cel√©ho textu ve znac√≠ch: 2302653
Poƒçet chunk≈Ø: 1152
Uk√°zka prvn√≠ho chunku:
 Produced by Miloslav Izar RUSK√Å KNIHOVNA IX. SPISY FEDORA MICHAJLOVIƒåE DOSTOJEVSK√âHO. P≈ôeklad rediguje JAROM√çR HRUB√ù Svazek I. Z√ÅPISKY Z MRTV√âHO DOMU. P≈ôelo≈æil H. JARO≈†. V PRAZE 1891. Tiskem a n√°kladem J. Otty. ƒå√ÅST PRV√Å. √öVOD. V dalek√Ωch kraj√≠ch Sibi≈ôe, uprost≈ôed step√≠, hor a neproniknuteln√Ωch les≈Ø vyskytuj√≠ se z≈ô√≠dka malink√° mƒõsta s jedn√≠m nebo nanejv√Ω≈° se dvƒõma tis√≠ci obyvatel≈Ø, d≈ôevƒõn√° to, ne√∫hledn√° mƒõsta se dvƒõma chr√°my, jedn√≠m ve mƒõstƒõ, druh√Ωm na h≈ôbitovƒõ, a podobn√° v√≠ce k slu≈°n√© vesnici p
Dataset({
    features: ['text'],
    num_rows: 1152
})
Prvn√≠ z√°znam v datasetu:
 Produced by Miloslav Izar RUSK√Å KNIHOVNA IX. SPISY FEDORA MICHAJLOVIƒåE DOSTOJEVSK√âHO. P≈ôeklad rediguje JAROM√çR HRUB√ù Svazek I. Z√ÅPISKY Z MRTV√âHO DOMU. P≈ôelo≈æil H. JARO≈†. V PRAZE 1891. Tiskem a n√°kladem J. Otty. ƒå√ÅST PRV√Å. √öVOD. V dalek√Ωch kraj√≠ch Sibi≈ôe, 

<a name="Train"></a>
### Continued Pretraining
Now let's use Unsloth's `UnslothTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 20 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`.

Also set `embedding_learning_rate` to be a learning rate at least 2x or 10x smaller than `learning_rate` to make continual pretraining work!

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import UnslothTrainer, UnslothTrainingArguments

trainer = UnslothTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 1,   # staƒç√≠ 1, a≈• se to neh√°d√°

    args = UnslothTrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 8,

        warmup_ratio = 0.02,
        num_train_epochs = 3,      # m√≠sto 100!

        learning_rate = 1e-5,      # jemnƒõj≈°√≠ LR
        embedding_learning_rate = 1e-6,

        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay = 0.00,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/1152 [00:00<?, ? examples/s]

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
8.201 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,152 | Num Epochs = 3 | Total steps = 216
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 8 x 1) = 16
 "-____-"     Trainable parameters = 436,076,544 of 4,257,156,096 (10.24% trained)


Step,Training Loss
10,2.8192
20,2.8227
30,2.8304
40,2.8444
50,2.8314
60,2.8226
70,2.8023
80,2.7476
90,2.7536
100,2.7434


In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

5243.3402 seconds used for training.
87.39 minutes used for training.
Peak reserved memory = 8.201 GB.
Peak reserved memory for training = 0.0 GB.
Peak reserved memory % of max memory = 55.634 %.
Peak reserved memory for training % of max memory = 0.0 %.


<a name="Inference"></a>
### Inference
Let's run the model!

In [None]:
from transformers import TextIteratorStreamer
from threading import Thread
text_streamer = TextIteratorStreamer(tokenizer)
import textwrap
max_print_width = 100

# Before running inference, call `FastLanguageModel.for_inference` first

FastLanguageModel.for_inference(model)

inputs = tokenizer(
[
    "ƒålovƒõk je zvl√°≈°tn√≠ tvor, pomyslel si Raskolnikov. "
]*1, return_tensors = "pt").to("cuda")

generation_kwargs = dict(
    inputs,
    streamer = text_streamer,
    max_new_tokens = 256,
    use_cache = True,
    do_sample = True,
    temperature = 0.8,
    top_p = 0.9,
    repetition_penalty = 1.1,
)
thread = Thread(target = model.generate, kwargs = generation_kwargs)
thread.start()

length = 0
for j, new_text in enumerate(text_streamer):
    wrapped_text = textwrap.wrap(new_text, width = max_print_width)
    if wrapped_text: # Add this check
        if j == 0:
            length = len(wrapped_text[-1])
            wrapped_text = "\n".join(wrapped_text)
            print(wrapped_text, end = "")
        else:
            length += len(new_text)
            if length >= max_print_width:
                length = 0
                print()
            print(new_text, end = "")
    pass
pass

ƒålovƒõk je zvl√°≈°tn√≠ tvor, pomyslel si Raskolnikov.2. Vzpom√≠n√°m na svou srdci se mne v≈°ecko 
pozn√°valo zkr√°tka dokonce u≈æ dlouho‚Ä¶ a≈• jen tak; ani jsem nemohl ho otev≈ô√≠t od t√© doby (a opƒõt to v≈°echno), 
proto≈æe bylo nepochybn√©, ≈æe za p≈ô√≠ƒçinou jeho smrti jsou p≈ôesto v≈°echny jeho postavy: znamen√°, ≈æe byla 
velice nesmysln√° spoleƒçnost√≠, kter√° p≈ôi tom ≈°el ji zas ode mƒõsta! Ale j√° tak√© jsem rozpt√Ωlen, kdy≈æ vy≈°la 
na vƒõc vejitav√©ho k≈ôehala. Neobyƒçejnƒõ jsem byl spokojen. A oni byli teƒè r√°d, abychom hned odch√°zeli? 
On i ona m√°lo m≈Ø≈æe mi ≈ô√≠ct, jak v√≠ce se mu pod√≠valy. Rozum√≠ se v≈°ak p≈ôedstaven√≠m, ≈æe chyst√°me ƒç√≠st 
pov√≠dky. Kdy≈æ zaƒçal, pustil se sly≈°eti z ≈ôady o prs√≠ch √∫pln√Ωch lid