# Cap√≠tulo 07 ‚Äî Instruction Tuning

Este notebook acompanha o Cap√≠tulo 07 da s√©rie **Fazendo um LLM do Zero**.

Neste notebook vamos ensinar o modelo a seguir instru√ß√µes humanas.

üéØ **Objetivos deste notebook:**
- Interpretar instru√ß√µes
- Responder perguntas
- Produzir respostas orientadas

Este √© o passo que transforma modelos base em assistentes conversacionais.


## 1. Setup e Configura√ß√£o

In [None]:
# ============================================================
# Setup do reposit√≥rio
# ============================================================
import os

REPO_URL = "https://github.com/vongrossi/fazendo-um-llm-do-zero.git"
REPO_DIR = "fazendo-um-llm-do-zero"

if not os.path.exists(REPO_DIR):
    !git clone {REPO_URL}

os.chdir(REPO_DIR)
print("Diret√≥rio atual:", os.getcwd())


### 1.1 Depend√™ncias e Imports

In [None]:
!pip -q install -r 07-instruction-tuning/requirements.txt

import torch
import torch.nn.functional as F
import torch.nn as nn
import random
import numpy as np
import sys

# Adiciona raiz ao path para imports locais
sys.path.append(os.getcwd())

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Device:", device)

def seed_everything(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

seed_everything(42)


Importando criado GPTMini

In [None]:
from lib.gptmini import GPTConfig, GPTMini

Carregar Pesos do criado no Cap√≠tulo 05

In [None]:
checkpoint_path = "05-pre-treinamento/gpt_checkpoint.pt"

config = GPTConfig(
    vocab_size=2000,
    context_size=64,
    d_model=128,
    n_heads=4,
    n_layers=2
)

backbone = GPTMini(config).to(device)

try:
    ckpt = torch.load(checkpoint_path, map_location=device)
    backbone.load_state_dict(ckpt, strict=False)
    print("Pesos do Cap 05 carregados ‚úÖ")
except:
    print("Checkpoint n√£o encontrado ‚Äî usando modelo inicial")


## 3. Dataset de Instru√ß√µes

### 3.1 Dataset M√©dio

In [None]:
dataset = [
    {
        "instruction": "Explique o que √© Machine Learning",
        "response": "Machine Learning √© uma √°rea da intelig√™ncia artificial que permite que sistemas aprendam padr√µes a partir de dados."
    },
    {
        "instruction": "Traduza para ingl√™s: bom dia",
        "response": "Good morning"
    },
    {
        "instruction": "Explique o que √© um token em LLMs",
        "response": "Token √© uma unidade de texto convertida em representa√ß√£o num√©rica usada pelo modelo."
    },
    {
        "instruction": "Resuma: Modelos GPT usam Transformers",
        "response": "Modelos GPT utilizam arquitetura Transformer para processar linguagem."
    }
]


### 3.2 Fun√ß√£o de Prompt Estruturado

In [None]:
def format_prompt(item):
    return f"""
### Instru√ß√£o:
{item['instruction']}

### Resposta:
{item['response']}
""".strip()


### 3.3 Tokeniza√ß√£o Simples (Char-level)

In [None]:
all_text = "".join([format_prompt(d) for d in dataset])

chars = sorted(set(all_text))
stoi = {c:i for i,c in enumerate(chars)}
itos = {i:c for c,i in stoi.items()}

vocab_size = len(chars)


### 3.4 Encoder/Decoder

In [None]:
def encode(text):
    return [stoi[c] for c in text if c in stoi]

def decode(tokens):
    return "".join([itos[t] for t in tokens])


### 3.5 Criar Dataset com Masking

In [None]:
def build_instruction_dataset(data, context_size=64):

    X, Y, mask = [], [], []

    for item in data:
        prompt = f"### Instru√ß√£o:\n{item['instruction']}\n\n### Resposta:\n"
        full = prompt + item["response"]

        prompt_tokens = encode(prompt)
        full_tokens = encode(full)

        for i in range(len(full_tokens)-context_size):

            x = full_tokens[i:i+context_size]
            y = full_tokens[i+1:i+context_size+1]

            m = [0]*len(prompt_tokens)
            m = m + [1]*(context_size-len(prompt_tokens))

            X.append(x)
            Y.append(y)
            mask.append(m[:context_size])

    return (
        torch.tensor(X),
        torch.tensor(Y),
        torch.tensor(mask)
    )

context_size = 64
X, Y, MASK = build_instruction_dataset(dataset, context_size)

X, Y, MASK = X.to(device), Y.to(device), MASK.to(device)


## 4. Modelo Instruction Tuned

Wrapper com Loss Mascarada

In [None]:
class InstructionGPT(nn.Module):

    def __init__(self, backbone):
        super().__init__()
        self.backbone = backbone

    def forward(self, x, y=None, mask=None):

        logits, _ = self.backbone(x)

        loss = None

        if y is not None:

            loss = F.cross_entropy(
                logits.view(-1, logits.size(-1)),
                y.view(-1),
                reduction="none"
            )

            loss = loss * mask.view(-1)
            loss = loss.mean()

        return logits, loss

model = InstructionGPT(backbone).to(device)


## 5. Treinamento SFT

Otimizador

In [None]:
optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4)


### 5.1 Loop de Treino

In [None]:
loss_history = []

for step in range(300):

    idx = torch.randint(0, X.size(0), (8,))
    xb, yb, mb = X[idx], Y[idx], MASK[idx]

    logits, loss = model(xb, yb, mb)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    loss_history.append(loss.item())

    if step % 50 == 0:
        print(step, loss.item())


### 5.2 Visualizar Loss

In [None]:
import matplotlib.pyplot as plt

plt.plot(loss_history)
plt.title("Loss Instruction Tuning")
plt.show()


## 6. Compara√ß√£o e Teste

Fun√ß√£o de Gera√ß√£o

In [None]:
@torch.no_grad()
def generate(model, start, max_tokens=80):

    tokens = encode(start)
    tokens = torch.tensor(tokens).unsqueeze(0).to(device)

    for _ in range(max_tokens):
        logits, _ = model(tokens)
        next_token = torch.argmax(logits[:, -1, :], dim=-1)
        tokens = torch.cat([tokens, next_token.unsqueeze(1)], dim=1)

    return decode(tokens.squeeze().tolist())


### 6.1 Testando o modelo

In [None]:
prompt = "### Instru√ß√£o:\nExplique o que √© Machine Learning\n\n### Resposta:\n"

print(generate(model, prompt))


## 7. Salvar Checkpoint Final

In [None]:
torch.save(model.state_dict(), "07_instruction_gpt.pt")


## 8. Encerramento da S√©rie

Neste cap√≠tulo voc√™ ensinou um modelo a:

‚Ä¢ Interpretar instru√ß√µes  
‚Ä¢ Produzir respostas orientadas  
‚Ä¢ Aprender comportamento conversacional  

Voc√™ percorreu toda a jornada:

Texto ‚Üí Tokens ‚Üí Aten√ß√£o ‚Üí GPT ‚Üí Treinamento ‚Üí Fine-Tuning ‚Üí Instruction Tuning

Este √© o pipeline fundamental dos assistentes baseados em LLMs modernos.
