# learn_llm 4 周实践笔记本
本笔记本按周组织：
- Week1: 理论与注意力（NumPy 实现）
- Week2: 工程与推理（Hugging Face pipeline 示例）
- Week3: 微调示例（小数据、Trainer skeleton）
- Week4: 部署（FastAPI 简易示例）

## 环境准备
首次运行时请根据 README 先安装依赖：
```bash
pip install -r requirements.txt
```

In [5]:
# Week1: NumPy 实现的 Scaled Dot-Product Attention
import numpy as np

def scaled_dot_product_attention(Q, K, V, mask=None):
    """
    Q, K, V: (seq_len, d_k) 或 (batch, seq_len, d_k)
    返回: attention 输出和 attention 权重
    """
    # 简单的形状一致性检查，便于排错
    if Q.ndim == 3:
        # batch, seq_q, d_k ; batch, seq_k, d_k ; batch, seq_k, d_v
        assert K.shape[1] == V.shape[1], f'K 和 V 的序列长度不一致: {K.shape[1]} vs {V.shape[1]}'
    else:
        assert K.shape[0] == V.shape[0], f'K 和 V 的序列长度不一致: {K.shape[0]} vs {V.shape[0]}'
    # 计算 QK^T
    scores = np.matmul(Q, K.transpose(0,2,1)) if Q.ndim==3 else np.dot(Q, K.T)
    # 缩放
    d_k = K.shape[-1]
    scores = scores / np.sqrt(d_k)
    # 可选 mask
    if mask is not None:
        scores = np.where(mask, scores, -1e9)
    # softmax
    if scores.ndim==3:
        weights = np.exp(scores - np.max(scores, axis=-1, keepdims=True))
        weights = weights / np.sum(weights, axis=-1, keepdims=True)
        output = np.matmul(weights, V)
    else:
        exp = np.exp(scores - np.max(scores, axis=-1, keepdims=True))
        weights = exp / np.sum(exp, axis=-1, keepdims=True)
        output = np.dot(weights, V)
    return output, weights

# 小示例
Q = np.array([[1.,0.,0.],[0.,1.,0.]])[None,:,:]  # (1,2,3)
K = Q.copy()
V = np.array([[1.,0.],[0.,1.]])[None,:,:]  # (1,2,2) 已修正与 Q/K 的序列长度匹配
out, attn = scaled_dot_product_attention(Q, K, V)
print('out.shape=', out.shape)

out.shape= (1, 2, 2)


## Week2: Hugging Face 推理示例
使用 pipeline 做快速文本生成（示例使用 distilgpt2）。首次运行会下载模型。

In [7]:
try:
    from transformers import pipeline
    gen = pipeline('text-generation', model='distilgpt2')
    res = gen('在未来的 AI 研究中，', max_length=40, num_return_sequences=1)
    print(res[0]['generated_text'])
except Exception as e:
    print('无法加载 transformers 或 下载模型失败，运行前请确保依赖已安装并可联网。', e)

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use mps:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=40) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


在未来的 AI 研究中，廝中，成的 AI 淯本的远与样品的其本定言着是这是不税地是对下良的莫的成的的国容容容的多以的详练良的地地的么罰廝良的是不安莫的成的一个成的发是了对了的不税地的多以的是汲貌化的汲是成的成的详练良以多的详练良的莫的成的地德练良的多以的来的详练良罰廝良的地的情以的发是汲貌化的是为�


## Week3: 微调示例（数据准备与 Trainer skeleton）
演示如何构建小数据集并进行 tokenization，给出 Trainer 的最小骨架（不必实际训练大模型）。

In [10]:
try:
    from datasets import Dataset
    from transformers import AutoTokenizer
    # 准备小数据集
    raw = {'text': ['今天天气很好', '机器学习很有意思', '深度学习是强大的工具']}
    ds = Dataset.from_dict(raw)
    tokenizer = AutoTokenizer.from_pretrained('distilgpt2')
    # 若 tokenizer 没有 pad_token，则优先使用 eos_token 作为 pad_token，否则新增 [PAD]
    if getattr(tokenizer, 'pad_token', None) is None:
        if getattr(tokenizer, 'eos_token', None) is not None:
            tokenizer.pad_token = tokenizer.eos_token
        else:
            tokenizer.add_special_tokens({'pad_token': '[PAD]'})
    def tokenize_fn(ex):
        return tokenizer(ex['text'], truncation=True, padding='max_length', max_length=32)
    tok_ds = ds.map(lambda x: tokenize_fn(x), batched=True)
    print('示例 tokenized:', tok_ds[0])
except Exception as e:
    print('datasets/tokenizer 示例无法运行：', e)

Map:   0%|          | 0/3 [00:00<?, ? examples/s]

示例 tokenized: {'text': '今天天气很好', 'input_ids': [20015, 232, 25465, 25465, 36365, 242, 36181, 230, 25001, 121, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}


## Week4: 部署示例（FastAPI）
下面写出一个最小 FastAPI app 的样例代码，将模型包装为 /generate 接口。可单独保存为 app.py 并用 uvicorn 运行。

In [11]:
fastapi_app_code = '''
from fastapi import FastAPI
from pydantic import BaseModel
try:
    from transformers import pipeline
    gen = pipeline('text-generation', model='distilgpt2')
except Exception:
    gen = None

app = FastAPI()

class Req(BaseModel):
    prompt: str

@app.post('/generate')
def generate(req: Req):
    if gen is None:
        return {'error': '模型未加载，检查依赖或网络'}
    out = gen(req.prompt, max_length=50, num_return_sequences=1)
    return {'text': out[0]['generated_text']}
'''

with open('app.py', 'w', encoding='utf-8') as f:
    f.write(fastapi_app_code)
print('已写出 app.py，运行: uvicorn app:app --reload')

已写出 app.py，运行: uvicorn app:app --reload


---
提示：笔记本中的示例以教学为主，实际训练/部署请根据显存与计算资源调整模型、批次与训练步数。