<a href="https://colab.research.google.com/github/vongrossi/fazendo-um-llm-do-zero/blob/main/00-passo-zero/notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Passo Zero — Colab, PyTorch e Fundamentos

Objetivos:
- entender o que é o Colab e como usar
- validar o ambiente (Python + libs)
- entender tensores (a base do deep learning)
- aprender a controlar reprodutibilidade (seed)

In [15]:
# === Setup do repositório ===
import os

REPO_URL = "https://github.com/vongrossi/fazendo-um-llm-do-zero.git"
REPO_DIR = "fazendo-um-llm-do-zero"

if not os.path.exists(REPO_DIR):
    !git clone {REPO_URL}

os.chdir(REPO_DIR)

print("Diretório atual:", os.getcwd())
print("Conteúdo:", os.listdir("."))


Cloning into 'fazendo-um-llm-do-zero'...
remote: Enumerating objects: 25, done.[K
remote: Counting objects:   4% (1/25)[Kremote: Counting objects:   8% (2/25)[Kremote: Counting objects:  12% (3/25)[Kremote: Counting objects:  16% (4/25)[Kremote: Counting objects:  20% (5/25)[Kremote: Counting objects:  24% (6/25)[Kremote: Counting objects:  28% (7/25)[Kremote: Counting objects:  32% (8/25)[Kremote: Counting objects:  36% (9/25)[Kremote: Counting objects:  40% (10/25)[Kremote: Counting objects:  44% (11/25)[Kremote: Counting objects:  48% (12/25)[Kremote: Counting objects:  52% (13/25)[Kremote: Counting objects:  56% (14/25)[Kremote: Counting objects:  60% (15/25)[Kremote: Counting objects:  64% (16/25)[Kremote: Counting objects:  68% (17/25)[Kremote: Counting objects:  72% (18/25)[Kremote: Counting objects:  76% (19/25)[Kremote: Counting objects:  80% (20/25)[Kremote: Counting objects:  84% (21/25)[Kremote: Counting objects:  88% (22/25)[Kre

### Checando versões e ambiente (Code)

In [16]:
import sys, platform

print("Python:", sys.version)
print("Platform:", platform.platform())


Python: 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]
Platform: Linux-6.6.105+-x86_64-with-glibc2.35


### Instalando dependências (opcional) (Code)

No Colab, normalmente não precisa.
Use apenas se quiser garantir pacotes atualizados.

In [18]:

!pip -q install -r 00-passo-zero/requirements.txt



### Importando libs + device (Code)

In [19]:
import torch
import numpy as np

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Torch:", torch.__version__)
print("Device:", device)


Torch: 2.9.0+cpu
Device: cpu


### Primeiro tensor (Code)

In [20]:
x = torch.tensor([1.0, 2.0, 3.0], device=device)
y = x * 2

print("x:", x)
print("y:", y)


x: tensor([1., 2., 3.])
y: tensor([2., 4., 6.])


### Forma (shape) e dimensão (Code)

In [21]:
a = torch.randn(2, 3, device=device)
b = torch.randn(3, 4, device=device)

print("a.shape:", a.shape)
print("b.shape:", b.shape)
print("a @ b shape:", (a @ b).shape)  # multiplicação matricial


a.shape: torch.Size([2, 3])
b.shape: torch.Size([3, 4])
a @ b shape: torch.Size([2, 4])


### Seed (reprodutibilidade) (Code)

In [25]:
from importlib import reload
import sys

sys.path.append("00-passo-zero")

import colab_setup
reload(colab_setup)

from colab_setup import seed_everything


seed_everything(42)
t1 = torch.randn(3)

seed_everything(42)
t2 = torch.randn(3)

print("t1:", t1)
print("t2:", t2)
print("iguais?", torch.allclose(t1, t2))


t1: tensor([0.3367, 0.1288, 0.2345])
t2: tensor([0.3367, 0.1288, 0.2345])
iguais? True


## Por que isso importa?

LLMs são essencialmente funções matemáticas enormes.
Antes de tokenização, atenção e treino, tudo precisa virar:

- número → tensor → operações → gradientes → ajuste de pesos

Se você domina isso, o resto deixa de ser magia.
