## PyTorch Custom Dataset

### Importing a PyTorch and setting up device-agnostic code

In [30]:
import torch
from torch import nn

In [31]:
torch.__version__

'2.6.0+cu124'

In [32]:
# Set up device-agnostic code
device = "cuda" if torch.cuda.is_available() else 'cpu'

In [33]:
device

'cuda'

In [34]:
!nvidia-smi

Mon Apr 14 15:44:29 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   40C    P8              9W /   70W |       2MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [35]:
# Pushing and pulling project into and from github
!git config --global user.name "Asliddin"
!git config --global user.email "asliddinmalikov999@gmail.com"


In [36]:
!git clone https://github.com/tayfunai/PytorchPractise.git

fatal: destination path 'PytorchPractise' already exists and is not an empty directory.


### 1. Get data
* Dataset I am going to work on is subset of Food101 dataset
* I am using just 3 classes of food out of 101 different categories in actual dataset
* My dataset is 10 percent of Food101 dataset images.

In [37]:
import requests
import zipfile
from pathlib import Path

# Setup path to a data folder
data_path = Path("data/")
image_path = data_path/"pizza_steak_sushi"

# This piece of code is to prevent redownloading dataset if it's already exist
if image_path.is_dir():
  print(f" {image_path} directory already exists... skipping download")
else:
  print(f"{image_path} does not exist, so creating a new one...")
  image_path.mkdir(parents=True, exist_ok=True)

# Download pizza, steak and sushi data
with open(data_path/"pizza_steak_sushi.zip", "wb") as f:
  print(f"Downloading dataset from github...")
  request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")
  f.write(request.content)

# Unzip dataset
with zipfile.ZipFile(data_path/"pizza_steak_sushi.zip", "r") as zip_file:
  print(f"Unzipping dataset...")
  zip_file.extractall(image_path)

data/pizza_steak_sushi does not exist, so creating a new one...
Downloading dataset from github...
Unzipping dataset...


In [38]:
image_path

PosixPath('data/pizza_steak_sushi')

In [39]:
!cp -r /content/data /content/PytorchPractise

In [40]:
!cp -r /content/data /content/PytorchPractise

In [41]:
!git clone https://github.com/tayfunai/PytorchPractise.git

fatal: destination path 'PytorchPractise' already exists and is not an empty directory.


### 2. Becoming one with the data (data preparation and data exploration)

#### Following folder structure is standard dataset format for computer vision projects
```
pizza_steak_sushi/ <- overall dataset folder
    train/ <- training images
        pizza/ <- class name as folder name
            image01.jpeg
            image02.jpeg
            ...
        steak/
            image24.jpeg
            image25.jpeg
            ...
        sushi/
            image37.jpeg
            ...
    test/ <- testing images
        pizza/
            image101.jpeg
            image102.jpeg
            ...
        steak/
            image154.jpeg
            image155.jpeg
            ...
        sushi/
            image167.jpeg
            ...
```





In [42]:
# Once downloading dataset we will explore it by walking through each subfolders using os.walk()
import os
def walk_through_dir(dir_path):
  """
    Walks through dir_path returning its contents.
  Args:
    dir_path (str or pathlib.Path): target directory

  Returns:
    A print out of:
      number of subdiretories in dir_path
      number of images (files) in each subdirectory
      name of each subdirectory
  """
  print(dir_path)
  for dirpath, dirnames, filenames in os.walk(dir_path):
    print(dir_path)
    print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")

In [43]:
walk_through_dir(image_path)

data/pizza_steak_sushi
data/pizza_steak_sushi
There are 2 directories and 0 images in 'data/pizza_steak_sushi'.
data/pizza_steak_sushi
There are 3 directories and 0 images in 'data/pizza_steak_sushi/train'.
data/pizza_steak_sushi
There are 0 directories and 78 images in 'data/pizza_steak_sushi/train/pizza'.
data/pizza_steak_sushi
There are 0 directories and 72 images in 'data/pizza_steak_sushi/train/sushi'.
data/pizza_steak_sushi
There are 0 directories and 75 images in 'data/pizza_steak_sushi/train/steak'.
data/pizza_steak_sushi
There are 3 directories and 0 images in 'data/pizza_steak_sushi/test'.
data/pizza_steak_sushi
There are 0 directories and 25 images in 'data/pizza_steak_sushi/test/pizza'.
data/pizza_steak_sushi
There are 0 directories and 31 images in 'data/pizza_steak_sushi/test/sushi'.
data/pizza_steak_sushi
There are 0 directories and 19 images in 'data/pizza_steak_sushi/test/steak'.


In [44]:
image_path

PosixPath('data/pizza_steak_sushi')

In [45]:
import os
def walk_through_dir(dir_path):
  """
  Walks through dir_path returning its contents.
  Args:
    dir_path (str or pathlib.Path): target directory

  Returns:
    A print out of:
      number of subdiretories in dir_path
      number of images (files) in each subdirectory
      name of each subdirectory
  """
  for dirpath, dirnames, filenames in os.walk(dir_path):
    print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")


In [49]:
walk_through_dir(image_path)


There are 2 directories and 0 images in 'data/pizza_steak_sushi'.
There are 3 directories and 0 images in 'data/pizza_steak_sushi/train'.
There are 0 directories and 78 images in 'data/pizza_steak_sushi/train/pizza'.
There are 0 directories and 72 images in 'data/pizza_steak_sushi/train/sushi'.
There are 0 directories and 75 images in 'data/pizza_steak_sushi/train/steak'.
There are 3 directories and 0 images in 'data/pizza_steak_sushi/test'.
There are 0 directories and 25 images in 'data/pizza_steak_sushi/test/pizza'.
There are 0 directories and 31 images in 'data/pizza_steak_sushi/test/sushi'.
There are 0 directories and 19 images in 'data/pizza_steak_sushi/test/steak'.
