<a href="https://colab.research.google.com/github/traptisinghh/Projects/blob/main/Finetuning1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
# STEP 1: Install dependencies
!pip install -q transformers datasets peft accelerate bitsandbytes
# peft - perameter efficiencnt fine tuning
#Lora is a type of peft

In [3]:
# STEP 2: Load model and tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_name = "distilgpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [12]:
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})
    model.resize_token_embeddings(len(tokenizer))

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`


In [9]:
# STEP 3: Initial generation
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
prompt = "The future of AI is"
initial_output = generator(prompt, max_length=50)[0]['generated_text']
print("🔹 Initial Output:\n", initial_output)

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


🔹 Initial Output:
 The future of AI is a great, wonderful, very exciting, and very exciting one, because it is a very exciting, very exciting, very exciting, and very exciting, because it is a very exciting, very exciting, very exciting, and very exciting, because it is a very exciting, very exciting, very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very interesting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very interesting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very exciting, and very ex

In [10]:
# STEP 4: Manual perplexity evaluation with device fix
import math, torch
from datasets import load_dataset
from tqdm import tqdm
def compute_perplexity(model, tokenizer, texts):
    model.eval()
    device = next(model.parameters()).device
    total_loss, total_tokens = 0, 0
    for text in tqdm(texts):
        if not text.strip():
            continue
        inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding=True)
        input_ids = inputs.input_ids.to(device)
        attention_mask = inputs.attention_mask.to(device)
        with torch.no_grad():
            outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=input_ids)
        loss = outputs.loss
        total_loss += loss.item() * input_ids.size(1)
        total_tokens += input_ids.size(1)
    return math.exp(total_loss / total_tokens) if total_tokens > 0 else float("inf")

In [13]:
dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="test[:1%]")
model.to("cuda")
initial_perplexity = compute_perplexity(model, tokenizer, dataset["text"])
print("📊 Initial Perplexity:", initial_perplexity)

  0%|          | 0/44 [00:00<?, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
100%|██████████| 44/44 [00:00<00:00, 141.10it/s]

📊 Initial Perplexity: 84.7082007074749





In [None]:
#perpelxity score = how well language prdict the text, it measure how uncertain the model is to predict next word, lower perplexity model is good
# CUDA = prepares your model to utilize the parallel processing capabilities of an NVIDIA GPU for accelerated deep learning operations.

In [16]:
# STEP 5: Apply LoRA fine-tuning
from peft import get_peft_model, LoraConfig, TaskType
from transformers import Trainer, TrainingArguments
from datasets import Dataset
lora_config = LoraConfig(
    r=8,
    #r- no. of trainable parameters
    lora_alpha=32,
    target_modules=["c_attn"],
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)
model = get_peft_model(model, lora_config)



In [17]:
# Tiny dataset for demo
train_texts = ["AI will transform the world.", "Machine learning is a subset of AI."]
train_dataset = Dataset.from_dict({"text": train_texts})

In [18]:
# Tokenize and add labels
def tokenize_function(examples):
    tokens = tokenizer(examples["text"], padding="max_length", truncation=True)
    tokens["labels"] = tokens["input_ids"].copy()
    return tokens
tokenized_train = train_dataset.map(tokenize_function)
training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=2,
    num_train_epochs=10,# --- change
    logging_steps=1,
    save_steps=5,
    save_total_limit=1,
    report_to="none"
)
trainer = Trainer(model=model, args=training_args, train_dataset=tokenized_train)
trainer.train()

Map:   0%|          | 0/2 [00:00<?, ? examples/s]

Step,Training Loss
1,14.6097
2,14.9465
3,14.4199
4,14.4419
5,14.8336
6,14.4321
7,14.4452
8,14.3051
9,14.5463
10,14.3996


TrainOutput(global_step=10, training_loss=14.537985038757324, metrics={'train_runtime': 3.6511, 'train_samples_per_second': 5.478, 'train_steps_per_second': 2.739, 'total_flos': 5244054405120.0, 'train_loss': 14.537985038757324, 'epoch': 10.0})

In [19]:
# STEP 6: Generate with fine-tuned model
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
fine_tuned_output = generator(prompt, max_length=50)[0]['generated_text']
print("🔹 Fine-Tuned Output:\n", fine_tuned_output)

Device set to use cuda:0
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


🔹 Fine-Tuned Output:
 The future of AI is the hope that the future of AI is not a mere illusion. Even if we can imagine how a human being would be able to control more than a few small bits of the machinery (such as the computer, the phone, the calculator, the computer, etc.) it is unlikely to be the same. This is also true of the AI world.


The future of AI is not a mere illusion.
We can imagine that the future of AI is not a mere illusion. Even if we can imagine how a human being would be able to control more than a few small bits of the machinery (such as the computer, the computer, the calculator, the calculator, etc.) it is unlikely to be the same. This is also true of the second generation of AI.
The next generation of AI is the future of AI.
A group of the present and future of AI is the future of AI.
The future of AI is the future of AI.
The future of AI is the future of AI.
The future of AI is the future of AI.
The future of AI is the future of AI.
The future of AI is the fut

In [20]:
# STEP 7: Re-evaluate perplexity
model.to("cuda")
fine_tuned_perplexity = compute_perplexity(model, tokenizer, dataset["text"])
print("📊 Fine-Tuned Perplexity:", fine_tuned_perplexity)

100%|██████████| 44/44 [00:00<00:00, 142.93it/s]

📊 Fine-Tuned Perplexity: 84.67827005905939



