# 1. 项目说明  
## 1.1 项目概述  
本项目旨在利用深度学习技术对扑克牌图像进行分类，基于ResNet18模型来实现对53类扑克牌的自动识别。ResNet18是一种深度残差网络模型，因其优越的特征提取能力和较少的参数量，适合用于图像分类任务。

## 1.2 数据集  
扑克牌图像数据集，包含不同类型的扑克牌图像（如：红桃A、黑桃K等）。  
图像需要进行预处理，归一化等，以适应ResNet18模型的输入要求。

我使用的是**Python 3.8.5 Pytorch 1.6.0**镜像,计算资源选择**8核32G**

In [7]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import os
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.metrics import confusion_matrix


# 2. 项目实现

## 2.1 数据预处理  
先将输入图像的大小调整为224x224像素  
再将图像转换为PyTorch的张量格式  
然后对图像进行归一化处理。具体来说，它使用给定的均值和标准差对每个通道（红、绿、蓝）的像素值进行归一化。这些均值和标准差值是基于在ImageNet上预训练的模型的统计数据，用于标准化图像，使其具有类似的分布，从而提高模型的性能。

In [3]:
# 数据预处理
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# 数据集路径
data_dir = '/home/mw/input/cards6224/cards/'

# 加载数据集
train_dataset = datasets.ImageFolder(root=os.path.join(data_dir, 'train'), transform=transform)
valid_dataset = datasets.ImageFolder(root=os.path.join(data_dir, 'valid'), transform=transform)
test_dataset = datasets.ImageFolder(root=os.path.join(data_dir, 'test'), transform=transform)

# 创建数据加载器
batch_size = 32
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
valid_loader = DataLoader(dataset=valid_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)


## 2.2 数据集信息查看  
查看数据的样本数量、类别和类别数量

In [4]:
# 查看数据集的基本信息
def print_dataset_info(dataset, name):
    print(f'{name} Dataset:')
    print(f'样本数量: {len(dataset)}')
    print(f'类别: {dataset.classes}')
    print(f'类别数量: {len(dataset.classes)}')
    print('------------------------------------------------------------------')

print_dataset_info(train_dataset, 'Training')
print_dataset_info(valid_dataset, 'Validation')
print_dataset_info(test_dataset, 'Test')


Training Dataset:
样本数量: 7624
类别: ['ace of clubs', 'ace of diamonds', 'ace of hearts', 'ace of spades', 'eight of clubs', 'eight of diamonds', 'eight of hearts', 'eight of spades', 'five of clubs', 'five of diamonds', 'five of hearts', 'five of spades', 'four of clubs', 'four of diamonds', 'four of hearts', 'four of spades', 'jack of clubs', 'jack of diamonds', 'jack of hearts', 'jack of spades', 'joker', 'king of clubs', 'king of diamonds', 'king of hearts', 'king of spades', 'nine of clubs', 'nine of diamonds', 'nine of hearts', 'nine of spades', 'queen of clubs', 'queen of diamonds', 'queen of hearts', 'queen of spades', 'seven of clubs', 'seven of diamonds', 'seven of hearts', 'seven of spades', 'six of clubs', 'six of diamonds', 'six of hearts', 'six of spades', 'ten of clubs', 'ten of diamonds', 'ten of hearts', 'ten of spades', 'three of clubs', 'three of diamonds', 'three of hearts', 'three of spades', 'two of clubs', 'two of diamonds', 'two of hearts', 'two of spades']
类别数量: 53

## 2.3 模型选择  
这里选择的是ResNet18模型  
ResNet18是深度卷积神经网络（CNN）中的一种经典模型，属于残差网络（Residual Network）系列，主要特点是引入了“残差块”（Residual Block）来缓解深层网络训练中的梯度消失问题。  
其主要特点有：  
残差学习： 使用残差块来学习输入和输出之间的“残差”，即输入与经过卷积处理后的输出之间的差异。这种结构使得网络能够更有效地训练深层模型。  
网络结构： ResNet18包含18层可训练参数，包括卷积层、批归一化层（Batch Normalization）和激活函数（ReLU）。它由多个残差块组成，每个残差块内有两个3x3的卷积层。  
性能优势： 相较于传统深度网络，ResNet18在训练更深的网络时表现出更好的性能和稳定性，主要因为残差结构允许梯度更容易地通过网络进行传播。  
应用广泛： ResNet18作为一个轻量级模型，在各种视觉任务中（如图像分类、目标检测）表现出色，并且作为预训练模型在多个应用中被广泛使用。

In [4]:
# 加载ResNet18模型
model = models.resnet18(pretrained=True)

# 替换最后一层全连接层
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 53)  # 53是类别数量

# 将模型转到GPU（如果可用）
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)


Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /home/mw/.cache/torch/hub/checkpoints/resnet18-5c106cde.pth


HBox(children=(FloatProgress(value=0.0, max=46827520.0), HTML(value='')))




ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [5]:
# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-4)

## 2.4 定义训练模型函数  
在每个训练周期（epoch）结束时输出训练损失。  
设置训练模式 (model.train()): 切换模型到训练模式  
数据转移到设备 (images, labels.to(device)): 将数据转移到计算设备（如 GPU）。  
梯度清零 (optimizer.zero_grad()): 在每次迭代之前清除之前计算的梯度。  
前向传播 (outputs = model(images)): 计算模型输出。  
计算损失 (loss = criterion(outputs, labels)): 使用损失函数计算预测值与实际标签之间的差距。  
反向传播 (loss.backward()): 计算梯度。  
更新模型参数 (optimizer.step()): 根据梯度更新模型参数。

In [8]:
def train_model(model, train_loader, valid_loader, criterion, optimizer, num_epochs=10):
    for epoch in range(num_epochs):
        model.train() # 设置模型为训练模式
        running_loss = 0.0
        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)  # 将数据转移到指定设备（CPU或GPU）
            
            optimizer.zero_grad() # 清除之前的梯度
            
            outputs = model(images) # 前向传播
            loss = criterion(outputs, labels) # 计算损失
            loss.backward()  # 反向传播
            optimizer.step() # 更新参数
            
            running_loss += loss.item() * images.size(0)  # 累加损失
        
        epoch_loss = running_loss / len(train_loader.dataset) # 计算平均损失
        print(f'Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.4f}')
        
        validate_model(model, valid_loader, criterion)  # 在每个周期结束后进行验证


## 2.5定义验证模型函数  
在每个训练周期后验证模型的性能，并计算验证集上的损失和准确率。  
设置评估模式 (model.eval()): 切换模型到评估模式，以禁用 Dropout 和 BatchNorm 等训练特定的层。  
禁用梯度计算 (with torch.no_grad()): 在验证时不需要计算梯度，从而节省内存和计算资源。  
前向传播和计算损失 (outputs = model(images), loss = criterion(outputs, labels)): 与训练时相同。  
获取预测结果 (_, predicted = torch.max(outputs, 1)): 找到每个样本的预测类别。  
计算总样本数和正确预测数 (total, correct): 用于计算准确率。

In [10]:
def validate_model(model, valid_loader, criterion):
    model.eval() # 设置模型为评估模式
    running_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():  # 禁用梯度计算
        for images, labels in valid_loader:
            images, labels = images.to(device), labels.to(device)
            
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            _, predicted = torch.max(outputs, 1) # 获取预测结果
            total += labels.size(0) # 计算总样本数
            correct += (predicted == labels).sum().item() # 计算正确预测的样本数
            running_loss += loss.item() * images.size(0)
    
    epoch_loss = running_loss / len(valid_loader.dataset)
    accuracy = correct / total * 100 # 计算准确率
    print(f'Validation Loss: {epoch_loss:.4f}, Accuracy: {accuracy:.2f}%')


## 2.6模型训练及模型评估  
训练模型用的是8核32G的资源,要一会儿,这边建议直接加载训练好的模型,模型地址在文章最下面  

使用验证数据集评估模型性能，确保其在新数据上的泛化能力。

In [7]:
# 模型训练
train_model(model, train_loader, valid_loader, criterion, optimizer, num_epochs=10)


Epoch 1/10, Loss: 1.8703
Validation Loss: 0.4942, Accuracy: 89.06%
Epoch 2/10, Loss: 0.5120
Validation Loss: 0.1928, Accuracy: 96.60%
Epoch 3/10, Loss: 0.1796
Validation Loss: 0.1454, Accuracy: 96.23%
Epoch 4/10, Loss: 0.0749
Validation Loss: 0.0962, Accuracy: 98.49%
Epoch 5/10, Loss: 0.0399
Validation Loss: 0.0959, Accuracy: 98.11%
Epoch 6/10, Loss: 0.0247
Validation Loss: 0.0598, Accuracy: 99.25%
Epoch 7/10, Loss: 0.0194
Validation Loss: 0.0939, Accuracy: 98.49%
Epoch 8/10, Loss: 0.0180
Validation Loss: 0.0556, Accuracy: 98.87%
Epoch 9/10, Loss: 0.0240
Validation Loss: 0.0797, Accuracy: 98.11%
Epoch 10/10, Loss: 0.0338
Validation Loss: 0.1552, Accuracy: 95.09%


## 2.7 模型保存

In [8]:
torch.save(model, '/home/mw/project/model.pth')

## 2.8 预测及可视化  
对测试图像进行预测，并通过图像和预测结果的可视化展示预测的准确性。

In [5]:
# 加载测试集种类标签
classes = test_dataset.classes

### **加载测试集预测及可视化**  
定义测试集预测函数，加载训练好的模型对测试集进行预测并查看部分结果

In [21]:
def visualize_predictions(model, test_loader, dataset, num_images=5):
    model.eval()
    images_so_far = 0
    fig = plt.figure(figsize=(12, 12))
    
    with torch.no_grad():
        for inputs, labels in test_loader:
            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)
            
            for i in range(inputs.size(0)):
                if images_so_far == num_images:
                    return
                images_so_far += 1
                
                ax = plt.subplot(num_images // 5, 5, images_so_far)
                ax.imshow(inputs[i].permute(1, 2, 0).cpu().numpy())
                ax.set_title(f'Pred: {dataset[preds[i].item()]} \nTrue: {dataset[labels[i].item()]}')
                ax.axis('off')
    
    plt.show()

# 使用训练好的模型进行预测可视化
visualize_predictions(model, test_loader, classes, num_images=10)


Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping i

### **随机在测试集选择一张图片加载训练好的模型进行预测并展示其类别及预测类别**

In [29]:
def show_single_prediction(model, test_loader):
    # 设置模型为评估模式
    model.eval()
    
    # 随机选择一张图片
    data_iter = iter(test_loader)
    images, labels = next(data_iter)
    index = np.random.randint(0, len(images))
    img = images[index]
    true_label = labels[index].item()
    true_class = classes[true_label]

    # 进行模型预测
    with torch.no_grad():
        output = model(img.unsqueeze(0))  # 增加一个批次维度
        _, predicted = torch.max(output, 1)
        predicted_label = predicted.item()
        pre_class = classes[predicted_label]

    # 图像转换为numpy格式以便显示
    img = img.permute(1, 2, 0).numpy()

    # 绘制图像及预测结果
    plt.figure(figsize=(6, 6))
    plt.imshow(img)
    plt.title(f'True Label: {true_class}, Predicted Label: {pre_class}')
    plt.axis('off')
    plt.show()


In [30]:
show_single_prediction(model, test_loader)

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).


### **对测试集进行测试，展示混淆矩阵显示结果**  
更加直观的展示正确预测及错误预测的量

In [31]:

def plot_confusion_matrix(model, test_loader, num_classes):
    model.eval()
    all_preds = []
    all_labels = []
    
    with torch.no_grad():
        for inputs, labels in test_loader:
            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)
            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
    
    cm = confusion_matrix(all_labels, all_preds, labels=list(range(num_classes)))
    
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=range(num_classes), yticklabels=range(num_classes))
    plt.xlabel('Predicted Label')
    plt.ylabel('True Label')
    plt.title('Confusion Matrix')
    plt.show()

# 使用训练好的模型绘制混淆矩阵
plot_confusion_matrix(model, test_loader, num_classes=53)


## 加载保存的模型  
由于模型是刚训练的不用加载,所以放到最后,如果不想训练的直接加载即可

In [34]:
model = torch.load('/home/mw/project/model.pth')
model.eval()  # 切换到评估模式

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

# 3. 总结  
本项目展示了如何使用ResNet18对扑克牌图像进行分类，利用深度学习技术实现高效的图像识别，结果显示出了一定的准确性。未来可以探索更复杂的模型或数据增强技术，以进一步提高分类准确率和系统鲁棒性。