# Going Modular

Going modular involves turning notebook code (from a Jupyter Notebook or Google Colab notebook) into a series of 
different Python scripts that offer similar functionality.

For example, we could turn our notebook code from a series of cells into the following Python files:
- data_setup.py - a file to prepare and download data if needed.
- engine.py - a file containing various training functions.
- model_builder.py or model.py - a file to create a PyTorch model.
- train.py - a file to leverage all other files and train a target PyTorch model.
- utils.py - a file dedicated to helpful utility functions.

For example, you might be instructed to run code like the following in a terminal/command line to train a model:
``` python train.py --model MODEL_NAME --batch_size BATCH_SIZE --lr LEARNING_RATE --num_epochs NUM_EPOCHS ```

**Directory Structure:**
```
going_modular/
├── going_modular/
│   ├── data_setup.py
│   ├── engine.py
│   ├── model_builder.py
│   ├── train.py
│   └── utils.py
├── models/
│   ├── 05_going_modular_cell_mode_tinyvgg_model.pth
│   └── 05_going_modular_script_mode_tinyvgg_model.pth
└── data/
    └── pizza_steak_sushi/
        ├── train/
        │   ├── pizza/
        │   │   ├── image01.jpeg
        │   │   └── ...
        │   ├── steak/
        │   └── sushi/
        └── test/
            ├── pizza/
            ├── steak/
            └── sushi/
```

## 1. Get Data

In [3]:
import os
import requests
import zipfile
from pathlib import Path

In [6]:
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"

# If image folder doesn't exist, create
if image_path.is_dir():
    print(f"{image_path} directory exists...")
else:
    print(f"Didn't find {image_path} directory, creating one now...")
    image_path.mkdir(parents=True, exist_ok=True) 
        # exist_ok=True  will make the function do nothing if the directory already exists
        # parents=True tells Python to create any necessary parent directories
    
with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
    request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")
    print("Downloading...")
    f.write(request.content)
    
# Unzip
with zipfile.ZipFile(data_path/"pizza_steak_sushi.zip", "r") as zip_ref:
    print("Unzipping...")
    zip_ref.extractall(image_path)
    
# Remove zip file
os.remove(data_path/"pizza_steak_sushi.zip")

Didn't find data\pizza_steak_sushi directory, creating one now...
Downloading...
Unzipping...


## 2. Create Datasets and Dataloaders (`data_setup.py`)

In [None]:
%%writefile going_modular/data_setup.py

"""
Contains functionality for creating PyTorch DataLoaders for image classification data.
"""

import os
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

NUM_WORKERS = os.cpu_count()


def create_dataloaders(train_dir: str,
                       test_dir: str,
                       transform: transforms.Compose,
                       batch_size: int,
                       num_workers: int=NUM_WORKERS):
    """
    Creates training and testing DataLoaders
    
    Args:
        train_dir: Path to training directory.
        test_dir: Path to testing directory.
        transform: torchvision transforms to perform on training and testing data.
        batch_size: Number of samples per batch in each of the DataLoaders.
        num_workers: An integer for number of workers per DataLoader.
        
    Returns:
        A tuple of (train_dataloader, test_dataloader, class_names).
        Where class_names is a list of the target classes.
        
    Example usage:
        train_dataloader, test_dataloader, class_names = create_dataloaders(train_dir=path/to/train_dir,
                                                                            test_dir=path/to/test_dir,
                                                                            transform=some_transform,
                                                                            batch_size=32,
                                                                            num_workers=4)
    """
    # Use ImageFolder to create datasets
    train_data = datasets.ImageFolder(train_dir, transform=transform)
    test_data = fatasets.ImageFolder(test_dir, transform=transform)
    
    # Get class names
    class_names = train_data.classes
    
    # Turn images into data loaders
    train_dataloader = DataLoader(train_data, 
                                  batch_size=batch_size,
                                  shuffle=True,
                                  num_workers=num_workers,
                                  pin_memory=True)
    test_dataloader = DataLoader(test_data,
                                 batch_size=batch_size,
                                 shuffle=False,
                                 num_workers=num_workers,
                                 pin_memory=True)
    
    return train_dataloader, test_dataloader, class_names