# Real or Fake Image Recognition Task

> **Introduction:** This is the Question 2 of APOAI 2025 Mock Competition, and it is also the second question of the NOAI 2024 (China).

## I. Question Overview

CIFAR10 is a commonly used image classification data set. Each image is a 3x32x32 color image. Here, 3 represents the number of color channels, and 32x32 represents the image size. We have selected 5,000 training images and 1,000 test images from CIFAR10. These 6,000 images are obtained by taking pictures and are defined as "real" images.

The Diffusion Model is a very powerful image generation model that can be used to generate "fake" images. We used a Diffusion Model on CIFAR10 to generate 6,000 "fake" images, among which 5,000 "fake" images are for the training set and 1,000 are for the test set. The samples in the test set cannot be accessed during the competition.

## II. Data Set

1. Address of the training set: [Training Set](https://bohrium.dp.tech/competitions/2623226705?tab=datasets).
    - The real pictures of the training set are stored in the `train/cifar` folder.
    - The fake pictures of the training set are stored in the `train/uvit` folder.
2. Test set (cannot be directly downloaded during the competition):
    - The real pictures of the test set are stored in the `test/cifar` folder.
    - The fake pictures of the test set are stored in the `test/uvit` folder.

## III. Task

Please use PyTorch to design a **Convolutional Neural Network** to implement model training and testing, which is used to distinguish which image is a "real" image and which is a "fake" image.

The specific requirements are as follows:

1. During the training process, please set the Label of the "real" image to 0 by yourself, and set the Label of the "fake" image to 1.
2. Name the class of the model as `MyModel()`. Using other names may lead to failure in submission.
3. Include at least 2 convolutional layers (`nn.Conv2d`) and 2 max pooling layers (`nn.MaxPool2d`). Do not use other methods to define convolutional layers and pooling layers. Please build the neural network directly and do not use `nn.Sequential()` nesting. The scoring system cannot detect the network structure inside `nn.Sequential()`, and a score of 0 will be directly given.
4. Include at most 2 linear layers (`nn.Linear`). Please build the neural network directly and do not use `nn.Sequential()` nesting. The scoring system cannot detect the network structure inside `nn.Sequential()`, and a score of 0 will be directly given.
5. The activation function of each layer can only be selected from `nn.ReLU`, `nn.Sigmoid`, `nn.Tanh`, `nn.ELU`, `nn.LeakyReLU`, `nn.PreLU`.
6. The loss function, optimizer (solver), and learning rate can be freely selected.

## IV. Submission

Please submit a compressed file named `submission.zip`. After decompression, it should contain the model file `submission_model.py` and the model parameter file `submission_dic.pth`. The specific requirements are as follows:

1. Save the class definition of the model and the required precursor libraries in `submission_model.py`.
2. Save the trained model parameters in `submission_dic.pth`. The model parameters will be loaded during scoring.
3. You can refer to the method in `baseline.ipynb` to generate the `submission.zip` file on the platform for submission. You can also download the data set to the local machine, train the model, and then package it into a `submission.zip` file for submission.

> **Address of `baseline.ipynb`:** [Question 2 of APOAI2025 Mock Competition_baseline](https://bohrium.dp.tech/notebooks/53453886135/).

## V. Scoring

1. When the number of linear layers and the number of neurons in the neural network meet the requirements, the score is calculated as follows:
    1. Network complexity score: `Network_Simplicity_Score = 1 / (Num_Linear + Num_Conv + 1)`. \
       Here, `Num_Linear` is the number of linear layers; `Num_Conv` is the number of convolutional layers.
    2. The accuracy rate of the model on the test set: Accuracy (Now the metrics has bug to double the accuracy =v=).
    3. Final score: `Score = (Network_Simplicity_Score + Accuracy) * 3/4`.
2. When the number of linear layers and the number of neurons in the neural network do not meet the requirements, the score is 0.

> **Remarks:** The leaderboard A uses 50% of the data in the test set, which can be displayed in real time during the competition to help contestants debug the model. The leaderboard B uses the remaining 50% of the data in the test set and is calculated after the competition ends. The score of the leaderboard B is the final score.

In [1]:
import os
import zipfile
import numpy as np
import random
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, Dataset
from PIL import Image

seed = 42
os.environ["PYTHONHASHSEED"] = str(seed)
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

## Load data

In [2]:
class CustomDataset(Dataset):
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir
        self.transform = transform
        self.image_paths = []
        self.labels = []
        for label, sub_dir in enumerate(os.listdir(root_dir)):  # cifar has label 0, uvit has label 1
            full_dir = os.path.join(root_dir, sub_dir)
            for img_name in os.listdir(full_dir):
                img_path = os.path.join(full_dir, img_name)
                self.image_paths.append(img_path)
                self.labels.append(label)

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        image = Image.open(img_path).convert("RGB")
        label = self.labels[idx]
        if self.transform:
            image = self.transform(image)
        return image, label

In [20]:
from torchvision.models import resnet18
transform = transforms.Compose([
    transforms.Resize((32, 32)),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))  # These values are used when grading
])
ds_train = CustomDataset(r"../../noai/malaysia/ioai-tsp-2025-main/noai-china-2024/real-or-fake-image/train_v1", transform=transform)
dl_train = DataLoader(ds_train, batch_size=64, shuffle=True)

## Define model and train

In [21]:
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 8, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(8, 10, 5)
        self.fc1 = nn.Linear(10 * 5 * 5, 70)
        self.fc2 = nn.Linear(70, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = self.pool(torch.relu(self.conv2(x)))
        x = x.view(-1, 10 * 5 * 5)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        x = self.sigmoid(x)
        return x

In [22]:
from tqdm.auto import tqdm
def train(model, device, optimizer, criterion, dataloader, num_epochs):
    train_losses = []
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0
        
        for images, labels in tqdm(dataloader):
            images, labels = images.to(device), labels.float().to(device)
            
            outputs = model(images).squeeze()
            loss = criterion(outputs, labels)
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item() * images.size(0)

        running_loss /= len(dataloader.dataset)
        train_losses.append(running_loss)
        accuracy = eval(model, device, dataloader)
        print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {running_loss}, Accuracy: {accuracy}")
    return train_losses

In [23]:
@torch.no_grad()
def eval(model, device, dataloader):
    model.eval()
    correct = 0
    total = 0
    for images, labels in dataloader:
        images, labels = images.to(device), labels.to(device).float()
        
        outputs = model(images).squeeze()
        preds = outputs >= 0.5
        
        correct += (preds == labels).sum().item()
        total += labels.size(0)
    return correct / total

In [26]:
device = "cuda" if torch.cuda.is_available() else "cpu"
# model = MyModel().to(device)
model = resnet18(weights=None)
model.fc = nn.Linear(512, 1, True)
model.to(device)
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

In [27]:
print(model)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [28]:
# from pytorch_lightning import Trainer
# trainer = Trainer()
# trainer.fit(model, dl_train)
train(model, device, optimizer, criterion, dl_train, 10)

100%|██████████| 157/157 [00:22<00:00,  7.04it/s]


Epoch [1/10], Loss: 0.5968762973308563, Accuracy: 0.7092


100%|██████████| 157/157 [00:21<00:00,  7.16it/s]


Epoch [2/10], Loss: 0.5578853813171387, Accuracy: 0.7222


100%|██████████| 157/157 [00:22<00:00,  6.98it/s]


Epoch [3/10], Loss: 0.5376831862449646, Accuracy: 0.7341


100%|██████████| 157/157 [00:23<00:00,  6.67it/s]


Epoch [4/10], Loss: 0.529094282913208, Accuracy: 0.7543


100%|██████████| 157/157 [00:23<00:00,  6.73it/s]


Epoch [5/10], Loss: 0.5089563994884491, Accuracy: 0.6922


100%|██████████| 157/157 [00:23<00:00,  6.75it/s]


Epoch [6/10], Loss: 0.48770226249694826, Accuracy: 0.7899


100%|██████████| 157/157 [00:23<00:00,  6.69it/s]


Epoch [7/10], Loss: 0.4604527988433838, Accuracy: 0.8018


100%|██████████| 157/157 [00:23<00:00,  6.75it/s]


Epoch [8/10], Loss: 0.4443097314834595, Accuracy: 0.7973


100%|██████████| 157/157 [00:23<00:00,  6.58it/s]


Epoch [9/10], Loss: 0.41340775609016417, Accuracy: 0.8351


100%|██████████| 157/157 [00:22<00:00,  6.97it/s]


Epoch [10/10], Loss: 0.38329280331134796, Accuracy: 0.8221


[0.5968762973308563,
 0.5578853813171387,
 0.5376831862449646,
 0.529094282913208,
 0.5089563994884491,
 0.48770226249694826,
 0.4604527988433838,
 0.4443097314834595,
 0.41340775609016417,
 0.38329280331134796]

## Save for submission

In [29]:
py_filename = "submission_model.py"
pth_filename = "submission_dic.pth"
zip_filename = "submission.zip"  # Will submit this zip to the grader

In [30]:
torch.save(model.state_dict(), pth_filename)

In [31]:
model_code = """  
import torch
import torch.nn as nn


class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 8, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(8, 10, 5)
        self.fc1 = nn.Linear(10 * 5 * 5, 70)
        self.fc2 = nn.Linear(70, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = self.pool(torch.relu(self.conv2(x)))
        x = x.view(-1, 10 * 5 * 5)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        x = self.sigmoid(x)
        return x
""".lstrip()

with open(py_filename, "w") as f:
    f.write(model_code)

In [None]:
with zipfile.ZipFile(zip_filename, "w") as zipf:
    for file in [py_filename, pth_filename]:
        zipf.write(file, os.path.basename(file))

## Score

Leaderboard A accuracy: 1.0000

Leaderboard B accuracy: 1.0000