# 02 — Baseline com FLAN-T5-base (sem treino)

Aqui avaliamos o modelo foundation `google/flan-t5-base` **sem qualquer fine-tuning**.  
Usamos um recorte de validação (200 amostras) para medir **ROUGE-L** e **BLEU** e criar um ponto de partida.  
O objetivo é entender “quanto o modelo já faz sozinho” antes de qualquer ajuste.

**Saída esperada**
- `outputs/baseline_val200.jsonl` (entradas, referência e predições do baseline)


In [1]:
%pip install -q transformers datasets evaluate rouge-score sacrebleu accelerate torch
%pip install -q ipywidgets



Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
import os, torch, json
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

DATA_DIR = "../data"
VAL_PATH = os.path.join(DATA_DIR, "val.jsonl")
MODEL_NAME = "google/flan-t5-base"
MAX_INPUT, MAX_GEN = 128, 256

device = "cuda" if torch.cuda.is_available() else ("mps" if torch.backends.mps.is_available() else "cpu")
print("Device:", device)

Device: mps


In [3]:
ds = load_dataset("json", data_files={"val": VAL_PATH})
ds["val"] = ds["val"].shuffle(seed=42)
N_SAMPLE = min(200, len(ds["val"]))
sample = ds["val"].select(range(N_SAMPLE))
len(sample)

Generating val split: 0 examples [00:00, ? examples/s]

200

In [4]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME).to(device)
model.config.use_cache = True

In [5]:
def generate_batch(texts, batch_size=8):
    preds = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i+batch_size]
        enc = tokenizer(batch, return_tensors="pt", padding=True, truncation=True, max_length=MAX_INPUT).to(device)
        with torch.no_grad():
            out = model.generate(**enc, max_new_tokens=MAX_GEN)
        preds.extend(tokenizer.batch_decode(out, skip_special_tokens=True))
    return preds

In [6]:
preds = generate_batch(sample["input_text"])
refs  = sample["target_text"]

print("Exemplo:")
print("Input:", sample[0]["input_text"][:120], "...")
print("Pred :", preds[0][:200], "...")
print("Ref  :", refs[0][:200], "...")

Exemplo:
Input: Given a product title, generate its product description.
Title: Islandoffer Color Pearl Nail Art Stone Small Wheel Rhine ...
Pred : Islandoffer Color Pearl Nail Art Stone Small Wheel Rhinestones Beads ...
Ref  : Make your nails look elegance and special,Suitable to use on top of nail polish, UV builder gel, acrylic, etc,Also can be used to decorate your home and furniture, cell phone and mp3 cases, glasses, c ...


In [7]:
import evaluate
rouge = evaluate.load("rouge")
bleu  = evaluate.load("sacrebleu")

rouge_res = rouge.compute(predictions=preds, references=refs, use_aggregator=True)
bleu_res  = bleu.compute(predictions=preds, references=[[r] for r in refs])

print("ROUGE-L:", round(rouge_res["rougeL"], 4))
print("BLEU:", round(bleu_res["score"], 2))


ROUGE-L: 0.1228
BLEU: 0.0


In [8]:
os.makedirs("../outputs", exist_ok=True)
with open("../outputs/baseline_val200.jsonl", "w", encoding="utf-8") as f:
    for x, p, r in zip(sample["input_text"], preds, refs):
        f.write(json.dumps({"input": x, "pred": p, "ref": r}, ensure_ascii=False) + "\n")
print("Baseline salvo em outputs/baseline_val200.jsonl")


Baseline salvo em outputs/baseline_val200.jsonl
