<a href="https://colab.research.google.com/github/tahsinsoyak/fertilizer-api-tr/blob/main/Agriculture_Fine_Tune_V3_Turkish_Gemma3_4B_with_Lora_%26_Unsloth.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Finetuning Gemma-3 4B for Agriculture Q&A with Unsloth

This notebook finetunes the `unsloth/gemma-3-4b-it` model on the `KisanVaani/agriculture-qa-english-only` dataset.

To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

### Installation
First, we install the necessary libraries. Unsloth will handle the heavy lifting to make finetuning fast and memory-efficient.

In [7]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" huggingface_hub hf_transfer
    !pip install --no-deps unsloth

### Load Model & Tokenizer

We'll load the `gemma-3-4b-it` model. We're using 4-bit quantization (`load_in_4bit = True`) to drastically reduce memory usage, allowing us to finetune this powerful model on a free Colab GPU.

In [8]:
from unsloth import FastModel
import torch

model, tokenizer = FastModel.from_pretrained(
    model_name = "unsloth/gemma-3-4b-it",
    max_seq_length = 2048,
    load_in_4bit = True,
)

==((====))==  Unsloth 2025.7.3: Fast Gemma3 patching. Transformers: 4.53.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for gemma3 won't work! Using float32.


### Add LoRA Adapters
We now add LoRA adapters to the model. This is a parameter-efficient finetuning (PEFT) technique where we only train a small number of new weights, rather than the entire model. This is what makes finetuning feasible on consumer hardware.

In [9]:
model = FastModel.get_peft_model(
    model,
    finetune_vision_layers     = False,
    finetune_language_layers   = True,
    finetune_attention_modules = True,
    finetune_mlp_modules       = True,

    r = 16,           # ✅ DEĞİŞTİRİLDİ: Modelin öğrenme kapasitesini artırmak için 8'den 16'ya çıkarıldı.
    lora_alpha = 32,  # ✅ DEĞİŞTİRİLDİ: Genel kural olarak 'r' değerinin iki katı yapılır.
    lora_dropout = 0,
    bias = "none",
    random_state = 3407,
)

Unsloth: Making `model.base_model.model.model.language_model` require gradients


<a name="Data"></a>
### Data Prep
We now load our agriculture dataset and format it for the Gemma-3 chat template.

The Gemma-3 format for a single turn conversation looks like this:
```
<bos><start_of_turn>user
What is crop rotation?<end_of_turn>
<start_of_turn>model
Crop rotation is the practice of growing a series of different crops...<end_of_turn>
```
Our dataset has `question` and `answers` columns. We will write a function to map these to the required `user` and `model` (assistant) roles.

In [10]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "gemma-3",
)

In [12]:
from datasets import load_dataset

dataset = load_dataset("tahsinsoyak/agriculture-qa-turkish-translated", split = "train")

README.md: 0.00B [00:00, ?B/s]

train.parquet:   0%|          | 0.00/2.66M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/22616 [00:00<?, ? examples/s]

Now, we define our custom formatting function. This will take each row, with its `question` and `answers` fields, and create a `conversations` list that follows the required structure.

In [13]:
def format_kisanvaani_dataset(example):
    return {
        "conversations": [
            {"role": "user", "content": example["question_tr"]}, # <- Burası değişti
            {"role": "assistant", "content": example["answers_tr"]},   # <- Burası değişti
        ]
    }

dataset = dataset.map(format_kisanvaani_dataset, remove_columns=list(dataset.features))

Map:   0%|          | 0/22616 [00:00<?, ? examples/s]

Let's see an example of our newly formatted data.

In [14]:
print(dataset[0])

{'conversations': [{'content': 'Tarımda ekim nöbeti (münavebe) neden önemlidir?', 'role': 'user'}, {'content': 'Bu, toprak erozyonunu ve tükenmesini önlemeye yardımcı olur ve ayrıca zararlıları ve hastalıkları kontrol etmeye de yardımcı olabilir.', 'role': 'assistant'}]}


Next, we apply the chat template to the `conversations` column to create a single string (`text`) for each row. We remove the `<bos>` token since the trainer's processor will add it automatically.

In [15]:
def formatting_prompts_func(examples):
   convos = examples["conversations"]
   texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False).removeprefix('<bos>') for convo in convos]
   return { "text" : texts, }

dataset = dataset.map(formatting_prompts_func, batched = True)

Map:   0%|          | 0/22616 [00:00<?, ? examples/s]

Let's see how the final text looks after applying the template!

In [16]:
print(dataset[0]["text"])

<start_of_turn>user
Tarımda ekim nöbeti (münavebe) neden önemlidir?<end_of_turn>
<start_of_turn>model
Bu, toprak erozyonunu ve tükenmesini önlemeye yardımcı olur ve ayrıca zararlıları ve hastalıkları kontrol etmeye de yardımcı olabilir.<end_of_turn>



In [17]:
# ✅ YENİ: Veri setini eğitim ve doğrulama olarak ayırıyoruz.
# Bu, modelin ezber yapıp yapmadığını anlamamıza yardımcı olur.
split_dataset = dataset.train_test_split(test_size=0.05, seed=42)
train_data = split_dataset["train"]
eval_data = split_dataset["test"]

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`. For this demo, we'll run for `200` steps. For a full finetune, you should set `num_train_epochs = 1` and `max_steps = -1`.

A full epoch on this dataset (22.6k rows) with a batch size of 8 would be approximately `22600 / 8 = 2825` steps.

In [18]:
from trl import SFTTrainer, SFTConfig

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_data,   # Güncellendi
    eval_dataset = eval_data,     # Güncellendi
    args = SFTConfig(
        dataset_text_field = "text",
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 8,       # ✅ Güncellendi -> Etkili batch boyutu = 16 (daha stabil eğitim)
        warmup_steps = 100,                    # ✅ Güncellendi -> Toplam adım sayısının %5-10'u
        num_train_epochs = 1,                  # ✅ Güncellendi -> 1 tam epoch eğitim
        max_steps = -1,                        # ✅ Güncellendi -> Epoch sayısına göre çalış
        learning_rate = 2e-4,
        logging_steps = 10,                    # ✅ Güncellendi -> Logları daha seyrek tut
        eval_steps = 100,                      # ✅ Eklendi -> Her 100 adımda bir doğrulama yap
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "cosine",          # ✅ Güncellendi -> Genellikle daha iyi sonuç verir
        seed = 3407,
        report_to = "none",
    ),
)


Unsloth: Switching to float32 training since model cannot work with float16


Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/21485 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/1131 [00:00<?, ? examples/s]

We also use Unsloth's `train_on_completions` method to only train on the assistant's responses and ignore the loss on the user's questions. This is a best practice that helps the model learn to generate better answers.

In [19]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<start_of_turn>user\n",
    response_part = "<start_of_turn>model\n",
)

Map (num_proc=2):   0%|          | 0/21485 [00:00<?, ? examples/s]

Map (num_proc=2):   0%|          | 0/1131 [00:00<?, ? examples/s]

Let's train the model!

In [20]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 21,485 | Num Epochs = 1 | Total steps = 1,343
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 8 x 1) = 16
 "-____-"     Trainable parameters = 29,802,496 of 4,329,881,968 (0.69% trained)
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss
10,3.1221
20,2.3587
30,1.8599
40,1.7071
50,1.6531
60,1.6276
70,1.6207
80,1.5509
90,1.5486
100,1.532


Unsloth: Will smartly offload gradients to save VRAM!


In [21]:
# @title Final bellek ve zaman istatistiklerini göster
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)

used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"Eğitim için kullanılan süre: {trainer_stats.metrics['train_runtime']} saniye.")
print(f"Eğitim için kullanılan süre: {round(trainer_stats.metrics['train_runtime']/60, 2)} dakika.")
print(f"Zirve bellek kullanımı = {used_memory} GB.")
print(f"Eğitim için kullanılan ek bellek = {used_memory_for_lora} GB.")

Eğitim için kullanılan süre: 9715.0369 saniye.
Eğitim için kullanılan süre: 161.92 dakika.
Zirve bellek kullanımı = 10.002 GB.
Eğitim için kullanılan ek bellek = 0.0 GB.


<a name="Inference"></a>
### Inference
Let's test our newly finetuned model! We'll ask it a question from the dataset to see how it responds.

In [22]:
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
    tokenizer,
    chat_template = "gemma-3",
)

messages = [{
    "role": "user",
    "content": [{
        "type" : "text",
        "text" : "Tarımda ekim nöbeti (münavebe) neden önemlidir?", # <- Türkçe soru
    }]
}]

text = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True, # Must add for generation
)
print("Input prompt:\n", text)

outputs = model.generate(
    **tokenizer([text], return_tensors = "pt").to("cuda"),
    max_new_tokens = 128, # Increase for longer outputs!
    # Recommended Gemma-3 settings!
    temperature = 1.0, top_p = 0.95, top_k = 64,
    use_cache = True,
)
decoded_output = tokenizer.batch_decode(outputs)
print("\nModel response:\n", decoded_output[0])

Input prompt:
 <bos><start_of_turn>user
Tarımda ekim nöbeti (münavebe) neden önemlidir?<end_of_turn>
<start_of_turn>model


Model response:
 <bos><start_of_turn>user
Tarımda ekim nöbeti (münavebe) neden önemlidir?<end_of_turn>
<start_of_turn>model
tarım nöbeti (münavebe), toprak erozyonunu, tükenmesini ve asitlenmesini önlemeye yardımcı olur. Ayrıca, zararlıları ve hastalıkları yönetmeye de yardımcı olur ve bu da genel toprak sağlığını geliştirir.<end_of_turn>


 You can also use a `TextStreamer` for continuous inference to see the generation token by token.

In [23]:
from transformers import TextStreamer

messages = [{
    "role": "user",
    "content": [
        {"type" : "text", "text" : "Toprak bozulmasının nedenleri nelerdir?"}, # <- Türkçe soru
    ]
}]

text = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True, # Üretim için eklenmeli
)

print("Çıkarım çalıştırılıyor...")
_ = model.generate(
    **tokenizer([text], return_tensors = "pt").to("cuda"),
    max_new_tokens = 128,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
    temperature = 1.0, top_p = 0.95, top_k = 64,
    use_cache = True,
)


Çıkarım çalıştırılıyor...
Erozyon, sıkışma ve besin maddesi kaybı.<end_of_turn>


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model, we'll save the LoRA adapters. This is a very small file that contains only the weights we trained.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To load it later, you need to load the original base model (`unsloth/gemma-3-4b-it`) and then apply these adapters to it.

In [24]:
model.save_pretrained("gemma-3-4b-agriculture-turkish-lora") # <- Klasör adı değişti
tokenizer.save_pretrained("gemma-3-4b-agriculture-turkish-lora") # <- Klasör adı değişti

['gemma-3-4b-agriculture-turkish-lora/processor_config.json']

### Saving to float16 for VLLM

You can also merge the LoRA adapters back into the base model and save it as a regular `float16` model. This creates a larger, standalone model that can be easily deployed with services like VLLM.

**This will use more disk space.**

In [25]:
if False: # Change to True to save a full 16-bit model
    model.save_pretrained_merged("gemma-3-4b-agriculture-turkish-full", tokenizer, save_method = "merged_16bit") # <- Klasör adı değişti
    # To push to Hub:
    # model.push_to_hub_merged("YOUR_HF_USERNAME/gemma-3-4b-agriculturev2-turkish-full", tokenizer, save_method = "merged_16bit", token = True) # <- Repo adı da değişti

### GGUF / llama.cpp Conversion
Finally, you can save the model in GGUF format, which is widely used by tools like `llama.cpp`, Jan, and Open WebUI.

In [26]:
if False: # Change to True to save to GGUF
    model.save_pretrained_gguf("gemma-3-4b-agriculturev2-turkish-gguf", tokenizer, quantization_method = "q8_0") # <- Klasör adı değişti
    # To push to Hub:
    # model.push_to_hub_gguf("YOUR_HF_USERNAME/gemma-3-4b-agriculturev2-turkish-gguf", tokenizer, quantization_method = "q8_0", token = True) # <- Repo adı da değişti

And we're done! You have successfully finetuned a Gemma-3 model on your custom agriculture dataset.

If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

In [28]:
# @title Upload Your Finetuned Model to Hugging Face

# To upload, you need to be logged in. You'll need a Hugging Face account
# and a User Access Token with "write" permissions.
# You can get a token here: https://huggingface.co/settings/tokens

from huggingface_hub import notebook_login
import_success = False
try:
    notebook_login()
    import_success = True
except:
    print("Could not import notebook_login. Please run `pip install huggingface_hub`")

if import_success:
    # ------------------------------------------------------------------
    # CHANGE THIS TO YOUR HUGGING FACE USERNAME AND A NEW REPO NAME
    # ------------------------------------------------------------------
    # The name of the repository on the Hugging Face Hub
    # FORMAT: "Your_HF_Username/the-model-name"
    hf_repo_name = "tahsinsoyak/gemma-3-4b-agriculture-turkish-lora" # <- Repo adı değişti
    # ------------------------------------------------------------------

    # --- Option 1: Push LoRA Adapters (Recommended & Default) ---
    # This is the most efficient way to save and share your model.
    # It only saves the new weights you trained (a few MBs).
    print(f"Uploading LoRA adapters to: https://huggingface.co/{hf_repo_name}")
    model.push_to_hub(hf_repo_name, token=True)
    tokenizer.push_to_hub(hf_repo_name, token=True)
    print("✅ Successfully uploaded LoRA adapters!")


    # --- Option 2: Push a Fully Merged Model ---
    # This will merge the LoRA adapters into the base model and upload
    # the complete, standalone model. This will be a much larger file (e.g., 8GB+).
    # To use this, change 'if False:' to 'if True:'.
    if False:
        # We'll add "-merged" to the repo name to keep it separate from the LoRA adapters
        merged_repo_name = f"{hf_repo_name}-merged"
        print(f"Uploading a fully merged model to: https://huggingface.co/{merged_repo_name}")

        # Merge and push. `save_method` can be "merged_16bit" or "merged_4bit".
        model.push_to_hub_merged(merged_repo_name, tokenizer, save_method="merged_16bit", token=True)
        print("✅ Successfully uploaded the fully merged model!")

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Uploading LoRA adapters to: https://huggingface.co/tahsinsoyak/gemma-3-4b-agriculture-turkish-lora


README.md:   0%|          | 0.00/602 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/119M [00:00<?, ?B/s]

Saved model to https://huggingface.co/tahsinsoyak/gemma-3-4b-agriculture-turkish-lora


  0%|          | 0/2 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.69M [00:00<?, ?B/s]

✅ Successfully uploaded LoRA adapters!


In [30]:
# @title Load Finetuned Model from Hugging Face Hub for Inference
#
# Make sure you have pushed your model to the Hub first.
# Replace "YOUR_HF_USERNAME/gemma-3-4b-agriculture-lora" with your actual repo name.

if True: # Change to False if you don't want to run this
    from unsloth import FastModel
    from transformers import TextStreamer
    import torch

    # The repo where you saved your LoRA adapters
    model_repo = "tahsinsoyak/gemma-3-4b-agriculture-turkish-lora"

    # Load the model and tokenizer from the Hub
    # Unsloth will automatically download the base model and apply your adapters.
    model, tokenizer = FastModel.from_pretrained(
        model_name = model_repo,
        max_seq_length = 2048,
        load_in_4bit = True,
    )

    # --- Test with a TextStreamer ---
    messages = [{
        "role": "user",
        "content": [{"type" : "text", "text" : "Evde bitkim sarardı ne yapabilirim ?",}]
    }]

    # We must use the chat template again
    text = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt = True, # Must add for generation
    )

    print("Running inference...")
    _ = model.generate(
        **tokenizer([text], return_tensors = "pt").to("cuda"),
        max_new_tokens = 128,
        streamer = TextStreamer(tokenizer, skip_prompt = True),
        # Recommended Gemma-3 settings!
        temperature = 1.0, top_p = 0.95, top_k = 64,
    )

==((====))==  Unsloth 2025.7.3: Fast Gemma3 patching. Transformers: 4.53.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for gemma3 won't work! Using float32.
Running inference...
Bitkiler sarardıysa, su basmış olup değil, yeterince gübre, ışık ve havalandırma alıp alamıyacağınızı kontrol edin. Bir toprak testi, besin eksikliklerini veya fazlalıklarını belirlemeye yardımcı olabilir. Sararmadan yola çıkmak ve bitki döngüsüne bakmak da önemli olabilir.<end_of_turn>


In [31]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive
