# **Name** - Manu Mathew
# **StudentID** - 8990691

**Objective** - The primary objective of the Lab is to build and compare image classification model that can accurately distinguish the images of cats and dogs. In this notebook, 
- We first prepare a manageable dataset of 5000 images where 2500 images are of dogs and 2500 images are of cats.
- Then we perform the exploratory data analysis (EDA) to understand the dataset characteristics and identify data quality issues.
- Train and evaluate two deep learning models:
    - A custom-defined Convolutional Neural Network (CNN).
    - A fine-tuned VGG16 model pre-trained on ImageNet.
- Explore the relative performance of the models.

`Approach`: 
- Data:
    Use 5000 images from Kaggle and organize them into train/test folders.

- Preprocessing:
    Resize images, normalize pixel values, apply data augmentation, and split into train/validation/test sets.

- EDA:
    Visualize sample images, check class distribution, and analyze pixel intensity.

- Model 1 - Custom CNN:
    Build and train a basic CNN with conv, pooling, and dense layers using binary classification.

- Model 2 - Fine-Tune VGG16:
    Load pre-trained VGG16 (without top), freeze base, add custom top layers, and train with the same data.

- Evaluation:
    Compare both models using accuracy, confusion matrix, precision, recall, F1-score, and PR curve.

- Error Analysis:
    Show and review misclassified images to find patterns in errors.

- Conclusion:
    Summarize which model performed better and suggest possible improvements.

## Import the dependencies

In [1]:
import os, shutil, pathlib

## Obtain the dataset

In [None]:
# orginal_dir is the directory where the original dataset is stored.
original_dir = pathlib.Path("./data/train")
# new_base_dir is the directory where the new dataset structure will be created.
# It will contain subdirectories for train, validation, and test sets.
new_base_dir = pathlib.Path("./data/kaggle_dogs_vs_cats_small")

# Create the new base directory if it doesn't exist
def make_subset(subset_name, start_index, end_index):
    for category in ("cat", "dog"):
        dir = new_base_dir / subset_name / category
        os.makedirs(dir)
        fnames = [f"{category}.{i}.jpg" for i in range(start_index, end_index)]
        for fname in fnames:
            shutil.copyfile(src=original_dir / fname,
                            dst=dir / fname)

# Create train, validation, and test subsets
make_subset("train", start_index=0, end_index=1000)
make_subset("validation", start_index=1000, end_index=1500)
make_subset("test", start_index=1500, end_index=2500)