<a href="https://colab.research.google.com/github/tech-azim/fine-tuning-llama-wiki/blob/main/fine-tuning-llama-wiki.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import torch
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling
)
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
import gc

# Check GPU
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")


GPU: Tesla T4
Memory: 15.83 GB


In [None]:
# Pilih model (contoh: GPT-2 atau TinyLlama)
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
# Alternatif: "gpt2", "google/flan-t5-base", "microsoft/phi-2"

# Load tokenizer dan model
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_8bit=True,  # Quantization untuk hemat memory
    device_map="auto",
    trust_remote_code=True
)

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


In [None]:
from datasets import load_dataset

# Load dataset lengkap dulu
dataset = load_dataset("wikitext", "wikitext-2-raw-v1")

# BARU select subset
dataset['train'] = dataset['train'].select(range(1000))  # ‚Üê Perbaikan di sini

print(f"‚úÖ Dataset limited to {len(dataset['train'])} samples")

# Lanjut tokenisasi
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=512,
        padding="max_length"
    )

tokenized_dataset = dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=dataset["train"].column_names
)

‚úÖ Dataset limited to 1000 samples


Map:   0%|          | 0/4358 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/3760 [00:00<?, ? examples/s]

In [None]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Konfigurasi LoRA (Parameter-Efficient Fine-tuning)
lora_config = LoraConfig(
    r=8,                              # Rank - turunkan ke 4 kalau OOM
    lora_alpha=32,                    # Alpha scaling
    target_modules=["q_proj", "v_proj"],  # Modules yang di-train
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Prepare model untuk LoRA training
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, lora_config)

# Lihat jumlah parameter yang dilatih
model.print_trainable_parameters()
# Output: trainable params: ~4M (0.36%) dari 1100M total

trainable params: 1,126,400 || all params: 1,101,174,784 || trainable%: 0.1023




In [None]:
from transformers import TrainingArguments, DataCollatorForLanguageModeling

# Konfigurasi training
training_args = TrainingArguments(
    output_dir="./results",                 # Folder hasil training
    num_train_epochs=3,                     # Jumlah epoch
    per_device_train_batch_size=4,          # Batch size (turunkan ke 2 kalau OOM)
    gradient_accumulation_steps=4,          # Efektif batch = 16
    learning_rate=2e-4,                     # Learning rate
    fp16=True,                              # Mixed precision untuk hemat memory
    logging_steps=10,                       # Log setiap 10 steps
    save_strategy="epoch",                  # Save setiap epoch
    save_total_limit=2,                     # Simpan max 2 checkpoint
    optim="paged_adamw_8bit",               # Optimizer hemat memory
    warmup_steps=100,                       # Warmup steps
    gradient_checkpointing=True,            # Hemat memory
    report_to="none",                       # Matikan wandb/tensorboard
)

# Data collator untuk language modeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # False karena Causal LM, bukan Masked LM
)

print("‚úÖ Training config ready!")

‚úÖ Training config ready!


In [None]:
import gc

# Clear memory sebelum training
gc.collect()
torch.cuda.empty_cache()

# Setup Trainer
from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    data_collator=data_collator,
)

print("‚úÖ Trainer ready!")

‚úÖ Trainer ready!


In [None]:
print("üöÄ Starting training...")
print("=" * 50)

trainer.train()

print("=" * 50)
print("‚úÖ Training completed!")

üöÄ Starting training...




Step,Training Loss
10,2.6945
20,2.7582
30,2.7627
40,2.5463
50,2.6029
60,2.393
70,2.3572
80,2.2812
90,2.2621
100,2.2811




‚úÖ Training completed!


In [None]:
model.save_pretrained("./finetuned-model")
tokenizer.save_pretrained("./finetuned-model")

('./finetuned-model/tokenizer_config.json',
 './finetuned-model/special_tokens_map.json',
 './finetuned-model/chat_template.jinja',
 './finetuned-model/tokenizer.model',
 './finetuned-model/added_tokens.json',
 './finetuned-model/tokenizer.json')

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# 1. Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("./finetuned-model")

# 2. Load base model
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_8bit=True,
    device_map="auto",
    trust_remote_code=True
)

# 3. Load LoRA weights
model = PeftModel.from_pretrained(base_model, "./finetuned-model")

print("‚úÖ Model loaded!")

# 4. Generate text
def generate_text(prompt, max_length=150):
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(
        **inputs,
        max_length=max_length,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)



The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


‚úÖ Model loaded!
Artificial intelligence is transforming the world of marketing. It's creating new opportunities and enabling businesses to innovate at a faster rate than ever before. This is why it's crucial for businesses to stay ahead of the curve and adopt the latest technologies to stay competitive. Here are some of the ways AI is transforming the marketing industry:

1. Personalized Marketing: AI is helping businesses create personalized marketing campaigns. It's using data to analyze consumer behavior and preferences to create tailored messaging and offers. This personalization improves the effectiveness of marketing campaigns, resulting in increased engagement and conversion rates.


The history of the American Revolution and its impact on the world.
In the future, we will create a mobile application that will enable users to create and share their own customized meditation tracks. This app will have features like personalized meditation tracks based on users' preferences and 

In [None]:
# 5. Test!
print(generate_text("My name is azim, so what's my favourite game"))


My name is azim, so what's my favourite game?

Azim: My favourite game is basketball, and I also like to play soccer.

Host: Alright, great. Well, let's talk about your experience playing sports.

Azim: Playing sports has been my passion since I was a kid. I remember playing football in the streets with my friends, and I was always the one who scored the goals.

Host: That's really cool. So, what are your biggest strengths as a basketball player?

Azim: Well, my strengths are my quick reflexes, my dribbling skills, and my ability to shoot the
