# üìù Project: Parameter-Efficient Fine-Tuning (PEFT) for Dialogue Summarization
## üí° The Idea
Large Language Models (LLMs) like T5 are powerful but computationally expensive to fine-tune fully. This project demonstrates how to **efficiently fine-tune Google's FLAN-T5 model** to summarize dialogues using **LoRA (Low-Rank Adaptation)**.

## üéØ The Proposal (Objective)
Instead of retraining all 248M parameters of the model, we propose to inject low-rank trainable matrices into the model.
* **Goal:** Achieve high-quality dialogue summarization.
* **Dataset:** `knkarthick/dialogsum`.
* **Technique:** PEFT with LoRA configuration.
* **Efficiency:** Train less than 1% of parameters to save GPU memory and time.

## üó∫Ô∏è Project Outline
1.  **Setup Environment:** Install `transformers`, `peft`, `datasets`.
2.  **Data Preparation:** Load and tokenize the `dialogsum` dataset.
3.  **Model Initialization:** Load pre-trained `flan-t5-base`.
4.  **LoRA Configuration:** Apply Low-Rank Adaptation to freeze main weights and add trainable adapters.
5.  **Training:** Fine-tune the model for 3 epochs.
6.  **Evaluation:** Compare the PEFT model's generated summaries against human summaries.

## üìä Dataset: DialogSum
We are using the **DialogSum** dataset, which consists of real-life scenarios like doctor-patient conversations, taxi bookings, etc.
* **Input:** A dialogue text.
* **Target:** A human-written summary.

In [None]:
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForSeq2SeqLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    GenerationConfig,
    DataCollatorForSeq2Seq
)
from peft import LoraConfig, get_peft_model, TaskType, PeftModel


In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Device:", device)

## üìä Dataset: DialogSum
We are using the **DialogSum** dataset, which consists of real-life scenarios like doctor-patient conversations, taxi bookings, etc.
* **Input:** A dialogue text.
* **Target:** A human-written summary.

In [None]:
data = load_dataset("knkarthick/dialogsum")


In [None]:
model_name = "google/flan-t5-base"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [None]:
def tokenizer_fun(ex):
    prompts = [
        "Summarize the following conversation.\n\n" + d + "\n\nSummary:"
        for d in ex["dialogue"]
    ]

    model_inputs = tokenizer(
        prompts,
        truncation=True,
        padding="max_length",
        max_length=512
    )

    with tokenizer.as_target_tokenizer():
        labels = tokenizer(
            ex["summary"],
            truncation=True,
            padding="max_length",
            max_length=128
        )

    labels_ids = [
        [(t if t != tokenizer.pad_token_id else -100) for t in seq]
        for seq in labels["input_ids"]
    ]

    model_inputs["labels"] = labels_ids
    return model_inputs

# =====================
# Tokenize dataset
# =====================
tokenized_datasets = data.map(
    tokenizer_fun,
    batched=True,
    remove_columns=["id", "topic", "dialogue", "summary"]
)

print("Sample labels:", tokenized_datasets["train"][0]["labels"][:20])

## ‚öôÔ∏è Methodology: Applying LoRA
Here we apply **LoRA (Low-Rank Adaptation)**.
* **Rank (r):** 8
* **Alpha:** 32
* **Trainable Parameters:** Only **0.35%** (approx 880k params) of the model will be trained, keeping the original 248M parameters frozen. This drastically reduces memory usage.

In [None]:
lora_config = LoraConfig(
    task_type=TaskType.SEQ_2_SEQ_LM,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
)

peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()


In [None]:
data_collator = DataCollatorForSeq2Seq(
    tokenizer=tokenizer,
    model=peft_model
)

In [None]:
training_args = TrainingArguments(
    output_dir="./peft-dialogue-summary-training",
    auto_find_batch_size=True,
    learning_rate=1e-3,
    num_train_epochs=3,
    fp16=True,
    logging_steps=10,
    save_strategy="epoch",
    eval_strategy="epoch",
    report_to="none"
)

In [None]:
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    data_collator=data_collator,
)

print(" Start training")
trainer.train()


In [None]:
peft_path = "./peft-dialogue-summary-checkpoint"
trainer.model.save_pretrained(peft_path)
tokenizer.save_pretrained(peft_path)

In [None]:
base_model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)
peft_model = PeftModel.from_pretrained(base_model, peft_path).to(device)

idx = 20
prompt = f"Summarize the following conversation.\n\n{data['test'][idx]['dialogue']}\n\nSummary:"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

out = peft_model.generate(
    input_ids=input_ids,
    max_new_tokens=200
)
print("Human summary:")
print(data["test"][idx]["summary"])
print("\nPEFT output:")
print(tokenizer.decode(out[0], skip_special_tokens=True))

In [None]:
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

peft_model_outputs = peft_model.generate(
    input_ids=input_ids,
    generation_config=GenerationConfig(max_new_tokens=200, num_beams=1)
)

peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)

print(f"Human Summary: {data['test'][idx]['summary']}")
print(f"PEFT Model Summary: {peft_model_text_output}")


## üìà Results & Analysis

### Training Performance
The model was trained for 3 epochs. The loss metrics show convergence:
* **Training Loss:** Decreased from ~1.14 to **1.07**.
* **Validation Loss:** Stabilized around **1.24**.

### Qualitative Result
Comparing the output on a test sample (Medical Context):
* **Input:** A conversation about symptoms (itchy, lightheaded).
* **Human Summary:** Mentions chicken pox and hazards.
* **PEFT Model Summary:** Successfully captures the key points: *"#Person1# thinks #Person2# has chicken pox and #Person2# is a biohazard."*

### Conclusion
The implementation confirms that **PEFT/LoRA is highly effective**. We achieved coherent summarization capabilities by training only a tiny fraction of the model, making LLM customization accessible on consumer hardware (T4 GPU).