## Purpose

This source code is modified from the original source to download only a subset of the original ImageNet images for custom image classification usage.

The classes we will be looking at are Laptop, Monitor, Keyboard, Mouse, Football Ball (Soccer Ball), Bee, Electric locomotive.

In [1]:
import os
import shutil
from tqdm import tqdm
import random
import numpy as np


# Extract random images from training folder. The original dataset contains all 1000 classes, 
# and we only wanted to classify a subset of classes:
# ["Laptop", "Monitor", "Keyboard", "Mouse", "Soccer Ball", "Bee", "Train"]
train_path = "/kaggle/input/imagenet-object-localization-challenge/ILSVRC/Data/CLS-LOC/train/"

# The full list of ImageNet index and class names pairs can be found online, search "imagenet_class_index.json".
# subfolders contain the index for our subset of classes as below:
subfolders = ["n03642806", "n03782006", "n03085013", "n03793489", "n04254680", "n02206856", "n03272562"]

In [2]:
## download only 500 images from specific classes
ex_per_class = 500
output_train_folder = "/kaggle/working/imagenet_subset/train/"

for folder in subfolders:
    # create a subfolder in output_train_folder
    os.makedirs(os.path.join(output_train_folder, folder))
    folder_path = os.path.join(train_path, folder)
    folder_files = os.listdir(folder_path)
    
    # Select random images
    selected_files = np.random.choice(folder_files, ex_per_class, replace=False)
    
    # Copy selected files to output subfolder
    for filename in selected_files:
        shutil.copy2(
            os.path.join(folder_path, filename),
            os.path.join(output_train_folder, folder),
        )

# Compress the output folder
shutil.make_archive("imagenet_subset_train", 'zip', output_train_folder)

'/kaggle/working/imagenet_subset_train.zip'

## Download

To download a copy of the subset files from Kaggle, after running this source code, go to right sidebar and look under "Data" > "Output" > ""/kaggle/working/imagenet_subset/train/" > imagenet_subset_train.zip