# Tiny-ImageNet 实验

本 Notebook 将 CIFAR-10 的最优配置迁移到 Tiny-ImageNet (200类, 64x64)。

## 数据集信息
- **训练集**: 100,000 张图像 (每类 500 张)
- **验证集**: 10,000 张图像 (每类 50 张)
- **测试集**: 10,000 张图像 (无标签)
- **图像尺寸**: 64x64x3
- **类别数**: 200

## 主要调整
1. 输入尺寸从 32x32 调整为 64x64
2. 输出类别从 10 调整为 200
3. 网络更深/更宽以应对更复杂任务
4. 数据增强参数针对性调整

In [None]:
# 导入必要的库
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader

import os
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from tqdm import tqdm
import time

# 设置随机种子
torch.manual_seed(42)
np.random.seed(42)

print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

In [None]:
# 设备配置
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
if device.type == "cuda":
    print(f"GPU name: {torch.cuda.get_device_name(0)}")

## 下载与准备 Tiny-ImageNet 数据集

In [None]:
# Tiny-ImageNet 数据集类
class TinyImageNet(Dataset):
    """
    Tiny-ImageNet 数据集加载器
    数据集结构:
    tiny-imagenet-200/
        train/
            n01443537/
                images/
                    n01443537_0.JPEG
                    ...
        val/
            images/
                val_0.JPEG
                ...
            val_annotations.txt
    """
    def __init__(self, root, split='train', transform=None, download=False):
        """
        root: 数据集根目录
        split: 'train' 或 'val'
        transform: 数据增强
        download: 是否自动下载（手动下载更稳定）
        """
        self.root = root
        self.split = split
        self.transform = transform
        
        # 如果需要下载
        if download:
            self._download()
        
        # 加载类别映射
        self.class_to_idx = self._load_classes()
        
        # 加载图像路径和标签
        self.samples = self._load_samples()
        
        print(f"Loaded {len(self.samples)} images for {split} split")
    
    def _download(self):
        """下载数据集（若未下载）"""
        import urllib.request
        import zipfile
        
        url = "http://cs231n.stanford.edu/tiny-imagenet-200.zip"
        zip_path = os.path.join(self.root, "tiny-imagenet-200.zip")
        
        if not os.path.exists(os.path.join(self.root, "tiny-imagenet-200")):
            print(f"Downloading Tiny-ImageNet from {url}...")
            os.makedirs(self.root, exist_ok=True)
            urllib.request.urlretrieve(url, zip_path)
            
            print("Extracting...")
            with zipfile.ZipFile(zip_path, 'r') as zip_ref:
                zip_ref.extractall(self.root)
            
            os.remove(zip_path)
            print("Download complete!")
    
    def _load_classes(self):
        """加载类别到索引的映射"""
        wnids_path = os.path.join(self.root, 'tiny-imagenet-200', 'wnids.txt')
        with open(wnids_path, 'r') as f:
            class_ids = [line.strip() for line in f]
        return {class_id: idx for idx, class_id in enumerate(class_ids)}
    
    def _load_samples(self):
        """加载所有样本的路径和标签"""
        samples = []
        
        if self.split == 'train':
            # 训练集：每个类别一个文件夹
            train_dir = os.path.join(self.root, 'tiny-imagenet-200', 'train')
            for class_id in self.class_to_idx.keys():
                class_dir = os.path.join(train_dir, class_id, 'images')
                for img_name in os.listdir(class_dir):
                    if img_name.endswith('.JPEG'):
                        img_path = os.path.join(class_dir, img_name)
                        samples.append((img_path, self.class_to_idx[class_id]))
        
        elif self.split == 'val':
            # 验证集：图像在同一文件夹，标签在 txt 文件
            val_dir = os.path.join(self.root, 'tiny-imagenet-200', 'val')
            val_annotations = os.path.join(val_dir, 'val_annotations.txt')
            
            with open(val_annotations, 'r') as f:
                for line in f:
                    parts = line.strip().split('\t')
                    img_name = parts[0]
                    class_id = parts[1]
                    img_path = os.path.join(val_dir, 'images', img_name)
                    samples.append((img_path, self.class_to_idx[class_id]))
        
        return samples
    
    def __len__(self):
        return len(self.samples)
    
    def __getitem__(self, idx):
        img_path, label = self.samples[idx]
        image = Image.open(img_path).convert('RGB')
        
        if self.transform:
            image = self.transform(image)
        
        return image, label

## 数据增强配置 (针对 64x64)

In [None]:
# Cutout 实现（与 CIFAR-10 相同）
class Cutout:
    def __init__(self, n_holes=1, length=16):
        self.n_holes = n_holes
        self.length = length

    def __call__(self, img):
        h, w = img.size(1), img.size(2)
        mask = torch.ones((h, w), dtype=torch.float32)

        for _ in range(self.n_holes):
            y = torch.randint(h, (1,)).item()
            x = torch.randint(w, (1,)).item()
            y1 = max(0, y - self.length // 2)
            y2 = min(h, y + self.length // 2)
            x1 = max(0, x - self.length // 2)
            x2 = min(w, x + self.length // 2)
            mask[y1:y2, x1:x2] = 0.

        mask = mask.expand_as(img)
        return img * mask


# 数据增强流水线
transform_train = transforms.Compose([
    transforms.RandomCrop(64, padding=8),              # 更大的 padding 适应 64x64
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ColorJitter(
        brightness=0.2,
        contrast=0.2,
        saturation=0.2,
        hue=0.1
    ),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],  # ImageNet 统计值
        std=[0.229, 0.224, 0.225]
    ),
    transforms.RandomErasing(p=0.5, scale=(0.02, 0.33)),
    Cutout(n_holes=1, length=20),  # 适当增大 cutout 尺寸
])

transform_val = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

print("数据增强流水线已配置")

In [None]:
# 加载数据集
data_root = './data'  # 数据集将下载/解压到此目录

# 训练集
train_dataset = TinyImageNet(
    root=data_root,
    split='train',
    transform=transform_train,
    download=True  # 首次运行设为 True 自动下载
)

# 验证集
val_dataset = TinyImageNet(
    root=data_root,
    split='val',
    transform=transform_val,
    download=False
)

# DataLoader
batch_size = 128
num_workers = 4

train_loader = DataLoader(
    train_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=num_workers,
    pin_memory=True
)

val_loader = DataLoader(
    val_dataset,
    batch_size=batch_size,
    shuffle=False,
    num_workers=num_workers,
    pin_memory=True
)

print(f"训练集: {len(train_dataset)} 张图像")
print(f"验证集: {len(val_dataset)} 张图像")
print(f"Batch size: {batch_size}")

## 网络架构（针对 Tiny-ImageNet 优化）

In [None]:
# 复用 CIFAR-10 的模块
class SEBlock(nn.Module):
    """SE 注意力模块"""
    def __init__(self, channels, reduction=16):
        super(SEBlock, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channels, channels // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channels // reduction, channels, bias=False),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y.expand_as(x)


class SEResidualBlock(nn.Module):
    """残差块 + SE 注意力"""
    def __init__(self, in_channels, out_channels, stride=1, reduction=16):
        super(SEResidualBlock, self).__init__()
        
        self.conv1 = nn.Conv2d(in_channels, out_channels, 3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, 3, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.se = SEBlock(out_channels, reduction)
        
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, 1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )
    
    def forward(self, x):
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.se(out)
        out += self.shortcut(x)
        out = self.relu(out)
        return out


class TinyImageNetNet(nn.Module):
    """
    Tiny-ImageNet 网络（200 类，64x64 输入）
    相比 CIFAR-10:
    - 输入尺寸更大 (64x64 vs 32x32)
    - 类别更多 (200 vs 10)
    - 网络需要更深/更宽
    """
    def __init__(self, num_classes=200, use_se=True):
        super(TinyImageNetNet, self).__init__()
        
        # 初始卷积
        self.conv1 = nn.Conv2d(3, 64, 3, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        
        # 残差块组（更深的网络）
        block_type = SEResidualBlock
        self.layer1 = self._make_layer(block_type, 64, 128, num_blocks=3, stride=1)
        self.layer2 = self._make_layer(block_type, 128, 256, num_blocks=4, stride=2)
        self.layer3 = self._make_layer(block_type, 256, 512, num_blocks=6, stride=2)  # 更深
        self.layer4 = self._make_layer(block_type, 512, 512, num_blocks=3, stride=2)  # 额外层
        
        # 分类头
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.dropout = nn.Dropout(0.5)
        self.fc = nn.Linear(512, num_classes)
        
        self._initialize_weights()
    
    def _make_layer(self, block_type, in_channels, out_channels, num_blocks, stride):
        layers = []
        layers.append(block_type(in_channels, out_channels, stride))
        for _ in range(1, num_blocks):
            layers.append(block_type(out_channels, out_channels, stride=1))
        return nn.Sequential(*layers)
    
    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        
        x = self.avg_pool(x)
        x = torch.flatten(x, 1)
        x = self.dropout(x)
        x = self.fc(x)
        
        return x


# 实例化模型
model = TinyImageNetNet(num_classes=200, use_se=True).to(device)
total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"模型参数量: {total_params / 1_000_000:.2f}M")

## 训练配置与辅助函数

In [None]:
# Label Smoothing
class LabelSmoothingCrossEntropy(nn.Module):
    def __init__(self, epsilon=0.1):
        super().__init__()
        self.epsilon = epsilon
    
    def forward(self, pred, target):
        n_classes = pred.size(-1)
        log_preds = torch.nn.functional.log_softmax(pred, dim=-1)
        loss = -log_preds.sum(dim=-1).mean() * self.epsilon / n_classes
        nll = torch.nn.functional.nll_loss(log_preds, target, reduction='mean')
        return (1 - self.epsilon) * nll + loss


# MixUp
def mixup_data(x, y, alpha=1.0, device='cuda'):
    if alpha > 0:
        lam = torch.distributions.Beta(alpha, alpha).sample().item()
    else:
        lam = 1
    batch_size = x.size(0)
    index = torch.randperm(batch_size).to(device)
    mixed_x = lam * x + (1 - lam) * x[index, :]
    y_a, y_b = y, y[index]
    return mixed_x, y_a, y_b, lam


def mixup_criterion(criterion, pred, y_a, y_b, lam):
    return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)


# 学习率调度
class WarmupCosineSchedule:
    def __init__(self, optimizer, warmup_epochs, total_epochs, lr_min=1e-6):
        self.optimizer = optimizer
        self.warmup_epochs = warmup_epochs
        self.total_epochs = total_epochs
        self.lr_min = lr_min
        self.base_lr = optimizer.param_groups[0]['lr']
    
    def step(self, epoch):
        if epoch < self.warmup_epochs:
            lr = self.base_lr * (epoch + 1) / self.warmup_epochs
        else:
            progress = (epoch - self.warmup_epochs) / (self.total_epochs - self.warmup_epochs)
            lr = self.lr_min + (self.base_lr - self.lr_min) * 0.5 * (1 + np.cos(np.pi * progress))
        
        for param_group in self.optimizer.param_groups:
            param_group['lr'] = lr
        return lr


# 训练函数
def train_epoch(model, loader, criterion, optimizer, device, use_mixup=False, mixup_alpha=0.0):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    
    pbar = tqdm(loader, desc='Training')
    for img, target in pbar:
        img, target = img.to(device), target.to(device)
        
        if use_mixup and mixup_alpha > 0:
            img, target_a, target_b, lam = mixup_data(img, target, mixup_alpha, device)
            pred = model(img)
            loss = mixup_criterion(criterion, pred, target_a, target_b, lam)
        else:
            pred = model(img)
            loss = criterion(pred, target)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item() * img.size(0)
        _, predicted = pred.max(1)
        total += target.size(0)
        correct += predicted.eq(target).sum().item()
        
        pbar.set_postfix({'loss': running_loss / total, 'acc': 100. * correct / total})
    
    return running_loss / total, 100. * correct / total


def evaluate(model, loader, criterion, device):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0
    
    with torch.no_grad():
        for img, target in tqdm(loader, desc='Evaluating'):
            img, target = img.to(device), target.to(device)
            pred = model(img)
            loss = criterion(pred, target)
            
            running_loss += loss.item() * img.size(0)
            _, predicted = pred.max(1)
            total += target.size(0)
            correct += predicted.eq(target).sum().item()
    
    return running_loss / total, 100. * correct / total


print("训练辅助函数已定义")

## 开始训练

In [None]:
# 训练配置
config = {
    'num_epochs': 150,
    'lr': 1e-3,
    'weight_decay': 5e-4,
    'warmup_epochs': 5,
    'label_smoothing': 0.1,
    'mixup_alpha': 0.2,
}

# 优化器、损失、调度器
optimizer = optim.AdamW(
    model.parameters(),
    lr=config['lr'],
    weight_decay=config['weight_decay']
)

criterion = LabelSmoothingCrossEntropy(epsilon=config['label_smoothing'])

scheduler = WarmupCosineSchedule(
    optimizer,
    warmup_epochs=config['warmup_epochs'],
    total_epochs=config['num_epochs']
)

# 训练历史
history = {
    'train_loss': [],
    'train_acc': [],
    'val_loss': [],
    'val_acc': [],
    'lr': []
}

best_acc = 0
best_epoch = 0

print("="*80)
print("开始训练 Tiny-ImageNet")
print("="*80)

start_time = time.time()

for epoch in range(config['num_epochs']):
    print(f"\nEpoch [{epoch+1}/{config['num_epochs']}]")
    
    # 调整学习率
    current_lr = scheduler.step(epoch)
    history['lr'].append(current_lr)
    
    # 训练
    train_loss, train_acc = train_epoch(
        model, train_loader, criterion, optimizer, device,
        use_mixup=(config['mixup_alpha'] > 0),
        mixup_alpha=config['mixup_alpha']
    )
    
    # 验证
    val_loss, val_acc = evaluate(model, val_loader, criterion, device)
    
    # 记录
    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc)
    history['val_loss'].append(val_loss)
    history['val_acc'].append(val_acc)
    
    # 保存最佳模型
    if val_acc > best_acc:
        best_acc = val_acc
        best_epoch = epoch
        torch.save(model.state_dict(), './best_model_tiny_imagenet.pth')
        print(f"✓ 新的最佳模型！验证准确率: {val_acc:.2f}%")
    
    elapsed = time.time() - start_time
    print(f"LR: {current_lr:.6f} | Train Acc: {train_acc:.2f}% | Val Acc: {val_acc:.2f}% | "
          f"Best: {best_acc:.2f}% @Epoch {best_epoch+1} | Time: {elapsed/60:.1f}min")

total_time = time.time() - start_time
print("\n" + "="*80)
print(f"训练完成！总用时: {total_time/3600:.2f} 小时")
print(f"最佳验证准确率: {best_acc:.2f}% (Epoch {best_epoch+1})")
print("="*80)

## 结果可视化

In [None]:
# 绘制训练曲线
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# 损失
axes[0, 0].plot(history['train_loss'], label='Train Loss', linewidth=2)
axes[0, 0].plot(history['val_loss'], label='Val Loss', linewidth=2)
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Loss')
axes[0, 0].set_title('Loss Curve - Tiny-ImageNet')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# 准确率
axes[0, 1].plot(history['train_acc'], label='Train Acc', linewidth=2)
axes[0, 1].plot(history['val_acc'], label='Val Acc', linewidth=2)
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('Accuracy (%)')
axes[0, 1].set_title('Accuracy Curve - Tiny-ImageNet')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
axes[0, 1].axhline(y=best_acc, color='r', linestyle='--', label=f'Best: {best_acc:.2f}%')

# 学习率
axes[1, 0].plot(history['lr'], linewidth=2, color='green')
axes[1, 0].set_xlabel('Epoch')
axes[1, 0].set_ylabel('Learning Rate')
axes[1, 0].set_title('Learning Rate Schedule')
axes[1, 0].grid(True, alpha=0.3)
axes[1, 0].set_yscale('log')

# Train-Val Gap
gap = np.array(history['train_acc']) - np.array(history['val_acc'])
axes[1, 1].plot(gap, linewidth=2, color='orange')
axes[1, 1].set_xlabel('Epoch')
axes[1, 1].set_ylabel('Accuracy Gap (%)')
axes[1, 1].set_title('Train-Val Accuracy Gap')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('./tiny_imagenet_training_curves.png', dpi=300, bbox_inches='tight')
plt.show()

print("训练曲线已保存")

## 实验总结

### Tiny-ImageNet 挑战
1. **类别数增加**: 从 10 类增加到 200 类，分类难度大幅提升
2. **图像分辨率**: 64x64 vs 32x32，需要更深的网络提取特征
3. **数据规模**: 训练样本相对较少（每类 500 张），更容易过拟合

### 针对性改进
- **更深的网络**: 16 个残差块 vs CIFAR-10 的 7 个
- **更强的数据增强**: 更大的 Cutout、更激进的 ColorJitter
- **更长的训练**: 150 epochs vs 100-200 epochs
- **相同的优化策略**: AdamW + Warmup + Cosine + Label Smoothing + MixUp

### 预期性能
- **Top-1 准确率**: 目标 >60% (Baseline ~55%)
- **Top-5 准确率**: 目标 >82%

### CIFAR-10 vs Tiny-ImageNet 对比

| 指标 | CIFAR-10 | Tiny-ImageNet |
|------|----------|---------------|
| 图像尺寸 | 32x32 | 64x64 |
| 类别数 | 10 | 200 |
| 训练样本 | 50,000 | 100,000 |
| 测试样本 | 10,000 | 10,000 |
| 网络深度 | 7 层残差块 | 16 层残差块 |
| 参数量 | ~3.8M | ~6.5M |
| 预期准确率 | ~92% | ~62% |

### 关键学习
1. **迁移学习的重要性**: CIFAR-10 的最优配置可以迁移，但需要针对性调整
2. **网络深度与任务复杂度**: 更复杂任务需要更深的网络
3. **数据增强的关键作用**: 在样本相对较少时，数据增强尤为重要
4. **过拟合风险**: Tiny-ImageNet 更容易过拟合，需要更强的正则化