# Falcon-RW-1B Fine-Tuning Notebook
This notebook demonstrates how to fine-tune the Falcon-RW-1B model using LoRA.

In [1]:
!pip install transformers datasets peft torch accelerate bitsandbytes

Collecting bitsandbytes
  Downloading bitsandbytes-0.45.3-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Downloading bitsandbytes-0.45.3-py3-none-manylinux_2_24_x86_64.whl (76.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.1/76.1 MB[0m [31m23.6 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.45.3


## 1. Install Dependencies
We first install the necessary Python libraries.

In [2]:
!pip install evaluate rouge_score nltk

Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24935 sha256=3b6a0aec6a44fc69eb2525798eb149503c0aadd9379135d8f136699f449473b9
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge_score
Installing collected packages: rouge_score, evaluate
Successfully installed evaluate-0.4.3 rouge_score-0.1.2


In [4]:
!pip install -U bitsandbytes



## 2. Load and Preprocess Dataset
We load a JSON dataset and convert it into a Hugging Face dataset format.

In [3]:
import json
from datasets import Dataset

# Load your dataset
with open('/kaggle/input/ads-list/fixed_ads_list.json', 'r') as f:
    data = json.load(f)

# Convert to Hugging Face Dataset
dataset = Dataset.from_list(data)

# Display a sample
print(dataset[0])

{'prompt': 'Create an ad for my new organic juice brand, FreshPress, that emphasizes health benefits and taste, targeting health-conscious consumers.', 'ad_text': '🍏 **Quench Your Thirst, Boost Your Health!**\\n\\nIntroducing **FreshPress**: The Organic Juice that Delivers Taste & Nutrients!\\n\\n✨ *"Tastes amazing and I feel fantastic!"* - Jamie, Health Enthusiast\\n\\n👉 Join the **20,000+ Happy Customers** who’ve transformed their health!\\n\\n✅ **Organic Ingredients**: No additives, just real fruit!\\n✅ **Packed with Nutrition**: Each bottle delivers vitamins & minerals that support your immune system.\\n✅ **Guilt-Free Indulgence**: Enjoy refreshing flavors without the sugar crash!\\n\\n**Hurry, Limited Time Offer!**\\n🌟 Get **20% OFF** your first order! 🌟\\n\\n🛡️ **Risk-Free**: Enjoy our **30-Day Money-Back Guarantee!**\\n\\n**Ready to Revitalize Your Health?**\\n👉 *Click to Order Now!*\\n[Order Your FreshPress Juice Today] \\n\\n✨ *"Best juice ever, I’m hooked!"* - Alex, Repeat Cu

## 3. Tokenization
We load the Falcon-RW-1B tokenizer and prepare our dataset for training.

In [5]:
from transformers import AutoTokenizer

# Load Falcon-RW-1B tokenizer
tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-rw-1b")

# Check if the model has a pad_token, if not use the eos_token as fallback
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token  

def tokenize_function(examples):
    combined_texts = [f"### Prompt: {p} ### Response: {a}" for p, a in zip(examples["prompt"], examples["ad_text"])]
    model_inputs = tokenizer(combined_texts, truncation=True, padding="max_length", max_length=512)
    model_inputs["labels"] = model_inputs["input_ids"]  # Add labels as input_ids for causal LM
    return model_inputs

# Apply tokenization to the dataset
tokenized_dataset = dataset.map(tokenize_function, batched=True)


tokenizer_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

Map:   0%|          | 0/485 [00:00<?, ? examples/s]

## 4. Load Pretrained Model with Quantization
We use `BitsAndBytesConfig` to load the Falcon model with 4-bit quantization to optimize memory usage.

In [6]:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=torch.float16
)

# Load the Falcon-RW-1B model with quantization
model = AutoModelForCausalLM.from_pretrained(
    'tiiuae/falcon-rw-1b',
    quantization_config=bnb_config,
    device_map='auto',
    trust_remote_code=True
)

config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

configuration_falcon.py:   0%|          | 0.00/6.70k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-rw-1b:
- configuration_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_falcon.py:   0%|          | 0.00/56.9k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-rw-1b:
- modeling_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


pytorch_model.bin:   0%|          | 0.00/2.62G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/115 [00:00<?, ?B/s]

In [7]:
# Print model structure to find correct module names
print(model)

FalconForCausalLM(
  (transformer): FalconModel(
    (word_embeddings): Embedding(50304, 2048)
    (h): ModuleList(
      (0-23): 24 x FalconDecoderLayer(
        (self_attention): FalconAttention(
          (query_key_value): Linear4bit(in_features=2048, out_features=6144, bias=True)
          (dense): Linear4bit(in_features=2048, out_features=2048, bias=True)
          (attention_dropout): Dropout(p=0.0, inplace=False)
        )
        (mlp): FalconMLP(
          (dense_h_to_4h): Linear4bit(in_features=2048, out_features=8192, bias=True)
          (act): GELU(approximate='none')
          (dense_4h_to_h): Linear4bit(in_features=8192, out_features=2048, bias=True)
        )
        (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
      )
    )
    (ln_f): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=2048, out_features=50304,

## 5. Configure LoRA for Parameter Efficient Fine-Tuning
LoRA reduces the number of trainable parameters while maintaining performance.

In [8]:
from peft import get_peft_model, LoraConfig, TaskType

# ✅ LoRA Configuration
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=32,  
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=["query_key_value", "dense", "dense_h_to_4h", "dense_4h_to_h", "lm_head"]
)

model = get_peft_model(model, lora_config)

print(model)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): FalconForCausalLM(
      (transformer): FalconModel(
        (word_embeddings): Embedding(50304, 2048)
        (h): ModuleList(
          (0-23): 24 x FalconDecoderLayer(
            (self_attention): FalconAttention(
              (query_key_value): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=2048, out_features=6144, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=2048, out_features=32, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=32, out_features=6144, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
        



## 6. Training Setup and Execution
We define training arguments and train the model using the Hugging Face `Trainer` API.

In [9]:
from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling

training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy="epoch",
    save_strategy="epoch",
    logging_strategy="steps",
    logging_steps=10,
    save_total_limit=2,
    learning_rate=1e-4,  
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=8,  
    num_train_epochs=6,
    weight_decay=0.01,
    fp16=False,
    bf16=True,
    report_to="none",
    push_to_hub=False,
    load_best_model_at_end=True,  
    metric_for_best_model="loss",  
    greater_is_better=False,  
)

# Data collator for language modeling (for causal LM)
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

from transformers import EarlyStoppingCallback

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    eval_dataset=tokenized_dataset,  
    data_collator=data_collator,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2)]  # Stops if no improvement after 2 evals
)


# Start training
trainer.train()



Epoch,Training Loss,Validation Loss
1,1.4817,1.391018
2,1.2628,1.214674
3,1.1827,1.123979
4,1.0966,1.062722
5,1.0366,1.013454




TrainOutput(global_step=180, training_loss=1.2240387068854437, metrics={'train_runtime': 3293.1567, 'train_samples_per_second': 0.884, 'train_steps_per_second': 0.055, 'total_flos': 1.07216747102208e+16, 'train_loss': 1.2240387068854437, 'epoch': 5.823045267489712})

## 7. Text Generation
Finally, we generate text based on a sample input prompt.

In [10]:
# Generate text based on a sample prompt
sample_prompt = "Introducing our latest product: "
inputs = tokenizer(sample_prompt, return_tensors='pt').to('cuda')
output = model.generate(**inputs, max_length=100)

# Decode and print the generated text
print(tokenizer.decode(output[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Introducing our latest product: 
🚀 **Introducing the Ultimate Fitness Tracker!**
✅ **Track Your Progress with Precision!**
✅ **Customizable Workouts!**
✅ **Connected to Your Smartphone!**
⚡ **Join 50,000+ Happy Users!**
⚡ **95% of Users Report Increased Fitness!**
⚡ **Trusted by Top Fitness Influencers!**
👉 **[


In [11]:
sample_prompt = "Create an ad for my new organic juice brand, FreshPress, that emphasizes health benefits and taste, targeting health-conscious consumers."

# Generate text using beam search
inputs = tokenizer(sample_prompt, return_tensors='pt').to('cuda')
output = model.generate(
    **inputs, 
    max_length=256,  
    num_beams=5,  # Beam search for better quality
    temperature=0.7,  # Adds diversity to outputs
    repetition_penalty=1.2  # Reduces word repetition
)

# Decode and print results
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("🔹 **Prompt:**", sample_prompt)
print("🔹 **Generated Ad:**", generated_text)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


🔹 **Prompt:** Create an ad for my new organic juice brand, FreshPress, that emphasizes health benefits and taste, targeting health-conscious consumers.
🔹 **Generated Ad:** Create an ad for my new organic juice brand, FreshPress, that emphasizes health benefits and taste, targeting health-conscious consumers.
— FreshPress Juice (@FreshPressJuice)
🍊 **Join the FreshPress Movement!** 🍊
→ 95% of our customers say they feel better after drinking our juices!
— FreshPress Juice (@FreshPressJuice)
🔥 **Limited Time Offer: Get 20% off your first order!** 🔥
— FreshPress Juice (@FreshPressJuice)
👉 [Order Now] 👈
— FreshPress Juice (@FreshPressJuice)
🌟 "Best organic juice I’ve ever tasted!" - Sarah, Health Enthusiast 🌟
— FreshPress Juice (@FreshPressJuice)
👉 **Join the FreshPress Movement!** 👈
— FreshPress Juice (@FreshPressJuice)
Join the FreshPress Revolution! ⬅️
— FreshPress Juice (@FreshPressJuice)
👉 [Try FreshPress] 👈
— FreshPress Juice (@FreshPressJuice)
🔥 **Join the FreshPress Community!** 🔥


In [None]:
sample_prompt = "Create an ad for my new organic juice brand, FreshPress, that emphasizes health benefits and taste, targeting health-conscious consumers."

# Ensure the model understands it's a completion task
formatted_prompt = f"### Prompt: {sample_prompt} ### Response:"

inputs = tokenizer(formatted_prompt, return_tensors='pt').to('cuda')
output = model.generate(
    **inputs, 
    max_length=512,  # Increase max length
    num_beams=3,  # Reduce beams for more diverse output
    temperature=0.9,  # Increase temperature for variation
    repetition_penalty=1.1,  # Reduce penalty for smoother text
    top_p=0.9,  # Use nucleus sampling to increase diversity
    do_sample=True,  # Enable sampling to reduce repetition
    pad_token_id=tokenizer.pad_token_id
)

# Decode and print results
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("🔹 **Prompt:**", sample_prompt)
print("🔹 **Generated Ad:**", generated_text.replace(formatted_prompt, "").strip())  # Remove repeated prompt


In [12]:
# Generate text based on a sample prompt
sample_prompt = "Introducing our latest product: "
inputs = tokenizer(sample_prompt, return_tensors='pt').to('cuda')
output = model.generate(
    **inputs, 
    max_length=256,  
    num_beams=5,  # Beam search for better quality
    temperature=0.7,  # Adds diversity to outputs
    repetition_penalty=1.2  # Reduces word repetition
)

# Decode and print the generated text
print(tokenizer.decode(output[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Introducing our latest product: 
Introducing our latest product: 
🚀 **Join 50,000+ Satisfied Users!**
🚀 **95% of Users Recommend Us!**
🚀 **4.8⭐ Rating on Trustpilot!**
🚀 **Exclusive Launch Offer: Buy One, Get One 50% OFF!**
👉 **What’s Inside?**
🌟 **Instant Access to All Features:**
✅ **Track Your Progress:** See how far you’ve come!
✅ **Customized Plans:** Tailored to your needs!
✅ **User-Friendly Interface:** Easy to use for everyone!
👉 **Limited Time Offer:**
✅ **30-Day Money-Back Guarantee!**
✅ **Risk-Free 30-Day Trial!**
👉 **Don’t Miss Out!**
⚡ **Hurry! Limited Time Offer!** ⚡
👉 **[Buy Now]**
👉 **Transform Your Health Today!**
#


In [13]:
import evaluate

# Load BLEU and ROUGE metrics
bleu = evaluate.load("bleu")
rouge = evaluate.load("rouge")

# Sample generated vs. ground truth
reference = [dataset[0]["ad_text"]]
candidate = generated_text

# Compute BLEU
bleu_score = bleu.compute(predictions=[candidate], references=[[reference]])
rouge_score = rouge.compute(predictions=[candidate], references=[reference])

print("🔹 BLEU Score:", bleu_score)
print("🔹 ROUGE Score:", rouge_score)

Downloading builder script:   0%|          | 0.00/5.94k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

🔹 BLEU Score: {'bleu': 0.03673709674942967, 'precisions': [0.358695652173913, 0.14207650273224043, 0.038461538461538464, 0.0055248618784530384], 'brevity_penalty': 0.6404064834081792, 'length_ratio': 0.6917293233082706, 'translation_length': 184, 'reference_length': 266}
🔹 ROUGE Score: {'rouge1': 0.3231441048034934, 'rouge2': 0.10572687224669604, 'rougeL': 0.21834061135371177, 'rougeLsum': 0.22707423580786026}
