# NLU + Retrieval + NLG Mini Pipeline (Colab-Ready)

This notebook shows:
1. **NLU**: Intent classification with a small DistilBERT fine-tune
2. **Retrieval**: Use predicted intent to fetch a policy snippet
3. **NLG**: Generate a grounded reply with FLAN-T5
4. **Evaluation**: Accuracy/F1 for NLU; ROUGE-L for NLG

Run top-to-bottom on Google Colab (GPU optional).

In [None]:
# Install deps (Colab)
!pip -q install transformers datasets evaluate accelerate sentencepiece rouge-score
import os, random, numpy as np, torch
from datetime import datetime
print('Torch:', torch.__version__)

## 1) Build a small, realistic intent dataset
- Four intents: `refund`, `order_status`, `complaint`, `change_address`
- We'll create train/validation/test splits for a quick demo.

In [None]:
from datasets import Dataset, DatasetDict
import random
random.seed(42)

intents = {0:'refund',1:'order_status',2:'complaint',3:'change_address'}
samples = [
  ('I want a refund for my purchase', 0),
  ('Please process my refund', 0),
  ('How do I return this and get my money back?', 0),
  ('Can I get a refund? The item arrived damaged', 0),
  ('Initiate refund request', 0),
  ('I returned the item; when will I receive a refund?', 0),
  ('Where is my order?', 1),
  ('Has my package shipped yet?', 1),
  ('Track my delivery please', 1),
  ('What is the current status of my order?', 1),
  ('Provide my tracking number', 1),
  ('Expected delivery date for order 123?', 1),
  ('This product is terrible', 2),
  ('I want to file a complaint about the service', 2),
  ('Your support was unhelpful', 2),
  ('I am unhappy with the quality', 2),
  ('The item is faulty and I am frustrated', 2),
  ('I need to escalate a complaint', 2),
  ('I need to change my shipping address', 3),
  ('Update my delivery address please', 3),
  ('Can I modify the address for my order?', 3),
  ('Change the destination address', 3),
  ('Change address before it ships', 3),
  ('Please correct my street address', 3),
]
random.shuffle(samples)
texts = [s[0] for s in samples]
labels = [s[1] for s in samples]
dataset = Dataset.from_dict({'text': texts, 'label': labels})
dataset = dataset.train_test_split(test_size=0.25, seed=42)
temp = dataset['train'].train_test_split(test_size=0.2, seed=42)
datasets = DatasetDict(train=temp['train'], validation=temp['test'], test=dataset['test'])
datasets

## 2) NLU: DistilBERT fine-tuning for intent classification

In [None]:
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
from transformers import TrainingArguments, Trainer
import evaluate

tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')

def tok(batch):
    return tokenizer(batch['text'], truncation=True, padding='max_length', max_length=64)

tokenized = datasets.map(tok, batched=True)
tokenized = tokenized.remove_columns(['text'])
tokenized.set_format('torch')

model_nlu = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=4)

accuracy = evaluate.load('accuracy')
f1 = evaluate.load('f1')

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    import numpy as np
    preds = np.argmax(logits, axis=-1)
    return {
        'accuracy': accuracy.compute(predictions=preds, references=labels)['accuracy'],
        'f1_macro': f1.compute(predictions=preds, references=labels, average='macro')['f1']
    }

args = TrainingArguments(
    output_dir='nlu_out',
    evaluation_strategy='epoch',
    save_strategy='no',
    learning_rate=5e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=2,
    weight_decay=0.01,
    seed=42,
    logging_steps=5,
)

trainer = Trainer(
    model=model_nlu,
    args=args,
    train_dataset=tokenized['train'],
    eval_dataset=tokenized['validation'],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)
trainer.train()
eval_metrics = trainer.evaluate(tokenized['test'])
eval_metrics

## 3) Retrieval: use predicted intent to fetch a policy snippet
Swap this dict for your own retriever in production.

In [None]:
import torch
label_map = {0:'refund',1:'order_status',2:'complaint',3:'change_address'}
kb = {
  'refund': 'You can request a refund within 30 days of delivery. Refunds are issued within 5 business days after inspection.',
  'order_status': 'Track your order with the tracking link emailed after shipment. Typical delivery is 3–5 business days.',
  'complaint': 'We are sorry for the trouble. Provide your order ID and details so we can investigate and make it right.',
  'change_address': 'You can change the delivery address before the order ships. Contact support with the correct address.'
}
def predict_intent(text):
    enc = tokenizer(text, return_tensors='pt', truncation=True, padding=True)
    with torch.no_grad():
        logits = model_nlu(**enc).logits
    pred = int(torch.argmax(logits, dim=-1).item())
    return label_map[pred]
user_msg = "I'd like to return my item; the size is wrong."
intent = predict_intent(user_msg)
context = kb[intent]
intent, context

## 4) NLG: Generate a grounded response with FLAN-T5

In [None]:
from transformers import pipeline
generator = pipeline('text2text-generation', model='google/flan-t5-base')
prompt = (
  f"You are a support agent. Using the policy below, write a concise, friendly response to the customer.\n"
  f"Customer: {user_msg}\n"
  f"Policy: {context}\n"
  f"Response:" )
reply = generator(prompt, max_length=120, do_sample=False)[0]['generated_text']
print(reply)

## 5) NLG Evaluation: ROUGE-L vs. a simple reference

In [None]:
import evaluate
rouge = evaluate.load('rouge')
reference = "You can request a refund within 30 days. Refunds are processed within 5 business days after we receive the item."
scores = rouge.compute(predictions=[reply], references=[reference])
scores

## 6) Inference helper: end-to-end function

In [None]:
def respond(user_text):
    intent = predict_intent(user_text)
    context = kb[intent]
    prompt = (
      f"You are a support agent. Using the policy below, write a concise, friendly response to the customer.\n"
      f"Customer: {user_text}\n"
      f"Policy: {context}\n"
      f"Response:" )
    out = generator(prompt, max_length=120, do_sample=False)[0]['generated_text']
    return {'intent': intent, 'policy': context, 'reply': out}
respond("My package hasn't arrived. Can you tell me where it is?")

### Notes
- This is a **toy** dataset; for robust NLU, use a larger labeled set (Banking77/CLINC150).
- For production NLG, add **grounding** (RAG), **templates**, and **post-gen checks**.
- Consider LoRA/PEFT for faster/cheaper fine-tuning.
