# iFood 2019 Kaggle Competition: An Ensemble-Based Approach

## Project Overview

This notebook presents a comprehensive, end-to-end solution for the **iFood 2019 FGVC6 Kaggle competition**. The objective was to address the challenging task of fine-grained image classification: identifying the specific food category from one of **251 classes** given a single image.

This project was developed as part of the 6th Fine-Grained Visual Categorization workshop (FGVC6) at CVPR 2019.

### Core Challenges
The competition presented two primary difficulties, as outlined on the Kaggle page:

1.  **Fine-Grained & Visually Similar Classes:** The dataset includes highly specific categories with low inter-class variance. For example, it contains 15 different types of cake and 10 different types of pasta, requiring a model capable of discerning very subtle visual features.
2.  **Noisy Training Data:** The training images were scraped from the web, leading to significant cross-domain noise. This included images of raw ingredients, packaged food items, or multiple food items in a single frame, all of which could dilute the features learned by the model.

### Evaluation Metric
Submissions were evaluated based on the **mean top-3 accuracy**. This means a prediction for an image was considered correct if the true ground-truth label was present within the model's top three most confident predictions.

### Solution Strategy

To tackle these challenges, this project implements a robust, multi-stage pipeline utilizing advanced deep learning techniques:

1.  **Transfer Learning with Diverse Architectures:** We leverage two state-of-the-art, pre-trained convolutional neural networks (CNNs): **DenseNet201** and **InceptionResNetV2**. By using two different architectures, we encourage the models to learn slightly different features, making them ideal for ensembling.
2.  **Data Augmentation:** To make the models more robust to the noisy training data and prevent overfitting, we artificially expand the dataset by applying random transformations (e.g., rotation, shifting, zooming). This teaches the models to be invariant to these visual changes.
3.  **Model Ensembling:** The final predictions are generated by averaging the outputs of the two independently trained models. This method is particularly effective for top-3 accuracy, as it smooths out individual model errors and increases the probability that the correct label appears in the top predictions.

### Dataset and Files
The competition provided the following files:
- **`{train/val/test}.zip`**: The image datasets for training, validation, and testing.
- **`class_list.txt`**: A lookup table mapping the numeric class labels to their human-readable names (e.g., '10' -> 'knish').
- **`{train/val}_labels.csv`**: CSV files containing the image names and their corresponding ground-truth labels for the training and validation sets.
- **`test_info.csv`**: A CSV file containing the names of the test images.

---
## Part 1: Setup and Data Preparation

### 1.1. Mount Google Drive and Import Libraries
We begin by mounting Google Drive to access saved model files and importing all the necessary Python libraries for data manipulation, machine learning, and visualization.

In [None]:
# Connect to Google Drive to save/load models
from google.colab import drive
drive.mount('/content/gdrive')

# General and data manipulation libraries
import os
import numpy as np
import pandas as pd
import json

# TensorFlow and Keras for deep learning
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import Model
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.applications import DenseNet201, InceptionResNetV2

# Plotting and visualization
import matplotlib.pyplot as plt
%matplotlib inline

### 1.2. Download and Extract Competition Data
The dataset is downloaded from its source on AWS S3. The tar archives are then extracted, and the original archives are removed to save disk space on the Colab environment.

In [None]:
# Download and extract dataset from the iFood 2019 competition
!wget https://food-x.s3.amazonaws.com/annot.tar
!tar -xf annot.tar
!rm -r annot.tar

!wget https://food-x.s3.amazonaws.com/train.tar
!tar -xf train.tar
!rm -r train.tar

!wget https://food-x.s3.amazonaws.com/val.tar
!tar -xf val.tar
!rm -r val.tar

!wget https://food-x.s3.amazonaws.com/test.tar  
!tar -xf test.tar
!rm -r test.tar

### 1.3. Load Labels and Define Directories
We load the image file names and their corresponding labels from the provided CSV files into pandas DataFrames. These DataFrames will be used by Keras data generators to efficiently feed images to the models. The dataset consists of 118,475 training images, 11,994 validation images, and 28,377 test images.

In [None]:
# Define base directories for the data
base_dir = '.'
train_dir = os.path.join(base_dir, 'train_set')
validation_dir = os.path.join(base_dir, 'val_set')
test_dir = os.path.join(base_dir, 'test_set')

# Load labels from CSV files
train_labels = pd.read_csv(os.path.join(base_dir, 'train_info.csv'), header=None)
train_labels.columns = ["img_name", "label"]

val_labels = pd.read_csv(os.path.join(base_dir, 'val_info.csv'), header=None)
val_labels.columns = ["img_name", "label"]

test_labels = pd.read_csv(os.path.join(base_dir, 'test_info.csv'), header=None)
test_labels.columns = ["img_name"]

# Convert label columns to string type, as expected by flow_from_dataframe
train_labels['label'] = train_labels['label'].astype(str)
val_labels['label'] = val_labels['label'].astype(str)

print(f"Training samples: {len(train_labels)}")
print(f"Validation samples: {len(val_labels)}")
print(f"Test samples: {len(test_labels)}")

---
## Part 2: Training Model A (DenseNet201)

### 2.1. Hyperparameters and Data Generators for DenseNet201

We define the specific parameters for the DenseNet201 model. A key parameter here is `image_size`, which is set to **224x224**, the standard input size for this architecture.

We then create the `ImageDataGenerator`. For the training set, we apply a range of augmentations to make the model more robust. The validation data is only rescaled, as it should remain unaltered to provide a true measure of performance.

In [None]:
# --- Model-specific Hyperparameters ---
NUM_CLASSES = 251
BATCH_SIZE_DENSENET = 128
LEARNING_RATE_DENSENET = 0.001
NUM_EPOCHS_DENSENET = 50 # The original model was trained for 50 epochs
IMAGE_SIZE_DENSENET = 224

# --- Data Augmentation and Generators ---

# Create an ImageDataGenerator for the training set with augmentation
train_datagen_densenet = ImageDataGenerator(
    rescale=1./255.,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)

# The validation data should not be augmented, only rescaled
validation_datagen_densenet = ImageDataGenerator(rescale=1.0/255.)

# Flow images in batches using the generators
train_generator_densenet = train_datagen_densenet.flow_from_dataframe(
    dataframe=train_labels,
    directory=train_dir,
    x_col='img_name',
    y_col='label',
    class_mode='categorical',
    batch_size=BATCH_SIZE_DENSENET,
    target_size=(IMAGE_SIZE_DENSENET, IMAGE_SIZE_DENSENET)
)

validation_generator_densenet = validation_datagen_densenet.flow_from_dataframe(
    dataframe=val_labels,
    directory=validation_dir,
    x_col='img_name',
    y_col='label',
    class_mode='categorical',
    batch_size=BATCH_SIZE_DENSENET,
    target_size=(IMAGE_SIZE_DENSENET, IMAGE_SIZE_DENSENET)
)

### 2.2. Build DenseNet201 Model with Transfer Learning

We load the `DenseNet201` model, pre-trained on ImageNet (`weights='imagenet'`). The weights of the convolutional base are frozen (`layer.trainable = False`) to leverage its powerful, learned features without altering them during initial training.

We then add a new classification "head" on top. This consists of a `GlobalAveragePooling2D` layer to reduce the feature dimensions, followed by a final `Dense` layer with 251 outputs and a `softmax` activation function. This new head will be trained from scratch on our food dataset.

In [None]:
# Load the DenseNet201 model with weights pre-trained on ImageNet
# We exclude the final classification layer (include_top=False)
pre_trained_model_densenet = DenseNet201(
    input_shape=(IMAGE_SIZE_DENSENET, IMAGE_SIZE_DENSENET, 3),
    include_top=False,
    weights='imagenet'
)

# Freeze the convolutional base to prevent its weights from being updated
for layer in pre_trained_model_densenet.layers:
    layer.trainable = False

# Add a new classification head
x = layers.GlobalAveragePooling2D()(pre_trained_model_densenet.output)
x = layers.Dense(NUM_CLASSES, activation='softmax')(x)

# Create the final model
model_densenet = Model(pre_trained_model_densenet.input, x)

model_densenet.summary()

### 2.3. Compile and Train the DenseNet201 Model

The model is compiled with the SGD optimizer and `categorical_crossentropy` loss, suitable for multi-class classification. We use a `ModelCheckpoint` callback to automatically save the model with the best validation accuracy (`val_acc`) during training.

**Note:** The following cell will initiate the training process. For demonstration purposes, it is set to run for only a few epochs. The original model was trained for 50 epochs.

In [None]:
# Compile the model
model_densenet.compile(
    optimizer=SGD(lr=LEARNING_RATE_DENSENET, momentum=0.9, decay=1e-6),
    loss='categorical_crossentropy',
    metrics=['acc']
)

# Define the checkpoint to save the best model
checkpoint_densenet = ModelCheckpoint(
    filepath="/content/gdrive/My Drive/iFood2019_DenseNet201_best.hdf5",
    monitor='val_acc',
    verbose=1,
    save_best_only=True,
    mode='max'
)

# --- TRAINING --- 
# Note: Training is computationally intensive. We will run it for just 2 epochs for this demo.
history_densenet = model_densenet.fit(
    train_generator_densenet,
    validation_data=validation_generator_densenet,
    epochs=2, # Set to 2 for demo; original was 50
    verbose=1,
    callbacks=[checkpoint_densenet]
)

### 2.4. Save Model Architecture
After training, we save the model's architecture as a JSON file. This allows us to recreate the model structure later without retraining, loading only the saved weights.

In [None]:
# Serialize model architecture to JSON
model_json_densenet = model_densenet.to_json()
with open("/content/gdrive/My Drive/iFood2019_DenseNet201_model.json", "w") as json_file:
    json_file.write(model_json_densenet)
    
print("DenseNet201 model architecture saved to Google Drive.")

---
## Part 3: Training Model B (InceptionResNetV2)

### 3.1. Hyperparameters and Data Generators for InceptionResNetV2

Now we set up the second model. The key difference is the `IMAGE_SIZE_INCEPTION`, which is **299x299** as required by this architecture. We create a new set of data generators tailored to this image size.

In [None]:
# --- Model-specific Hyperparameters ---
BATCH_SIZE_INCEPTION = 64 # Smaller batch size due to larger model size
LEARNING_RATE_INCEPTION = 0.001
NUM_EPOCHS_INCEPTION = 5 # The original model was trained for 5 epochs
IMAGE_SIZE_INCEPTION = 299 # Inception architecture requires 299x299 images

# --- Data Augmentation and Generators ---

# The training data generator uses the same augmentation strategy
train_datagen_inception = ImageDataGenerator(
    rescale=1./255.,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)

# Validation data is only rescaled
validation_datagen_inception = ImageDataGenerator(rescale=1.0/255.)

# Flow images in batches using the new generators
train_generator_inception = train_datagen_inception.flow_from_dataframe(
    dataframe=train_labels,
    directory=train_dir,
    x_col='img_name',
    y_col='label',
    class_mode='categorical',
    batch_size=BATCH_SIZE_INCEPTION,
    target_size=(IMAGE_SIZE_INCEPTION, IMAGE_SIZE_INCEPTION)
)

validation_generator_inception = validation_datagen_inception.flow_from_dataframe(
    dataframe=val_labels,
    directory=validation_dir,
    x_col='img_name',
    y_col='label',
    class_mode='categorical',
    batch_size=BATCH_SIZE_INCEPTION,
    target_size=(IMAGE_SIZE_INCEPTION, IMAGE_SIZE_INCEPTION)
)

### 3.2. Build InceptionResNetV2 Model with Transfer Learning

The process mirrors the one for DenseNet201. We load the pre-trained `InceptionResNetV2` base, freeze its layers, and attach a new classification head suitable for our 251 food categories.

In [None]:
# Load the InceptionResNetV2 model with weights pre-trained on ImageNet
pre_trained_model_inception = InceptionResNetV2(
    input_shape=(IMAGE_SIZE_INCEPTION, IMAGE_SIZE_INCEPTION, 3),
    include_top=False,
    weights='imagenet'
)

# Freeze the convolutional base
for layer in pre_trained_model_inception.layers:
    layer.trainable = False

# Add the new classification head
y = layers.GlobalAveragePooling2D()(pre_trained_model_inception.output)
y = layers.Dense(NUM_CLASSES, activation='softmax')(y)

# Create the final model
model_inception = Model(pre_trained_model_inception.input, y)

model_inception.summary()

### 3.3. Compile and Train the InceptionResNetV2 Model

We compile and train the second model. The `ModelCheckpoint` will save the best-performing version of this model to a separate file on Google Drive.

**Note:** Training is again limited to 2 epochs for this demonstration. The original model was trained for 5 epochs.

In [None]:
# Compile the model
model_inception.compile(
    optimizer=SGD(lr=LEARNING_RATE_INCEPTION, momentum=0.9, decay=1e-6),
    loss='categorical_crossentropy',
    metrics=['acc']
)

# Define the checkpoint to save the best model
checkpoint_inception = ModelCheckpoint(
    filepath="/content/gdrive/My Drive/iFood2019_InceptionResNetV2_best.hdf5",
    monitor='val_acc',
    verbose=1,
    save_best_only=True,
    mode='max'
)

# --- TRAINING --- 
# Note: Set to 2 epochs for demo purposes.
history_inception = model_inception.fit(
    train_generator_inception,
    validation_data=validation_generator_inception,
    epochs=2, # Set to 2 for demo; original was 5
    verbose=1,
    callbacks=[checkpoint_inception]
)

### 3.4. Save Model Architecture
We save the architecture of the trained InceptionResNetV2 model.

In [None]:
# Serialize model architecture to JSON
model_json_inception = model_inception.to_json()
with open("/content/gdrive/My Drive/iFood2019_InceptionResNetV2_model.json", "w") as json_file:
    json_file.write(model_json_inception)
    
print("InceptionResNetV2 model architecture saved to Google Drive.")

---
## Part 4: Model Ensembling and Final Submission

In this final part, we combine the predictive power of our two trained models. The core idea of ensembling is that by averaging the predictions of multiple diverse models, we can reduce variance and produce a more accurate and robust final result.

### 4.1. Load Trained Models

We first recreate the model architectures from the saved JSON files and then load the best-performing weights that were saved by the `ModelCheckpoint` during training.

In [None]:
from tensorflow.keras.models import model_from_json

# Define paths to the saved models on Google Drive
gdrive_path = "/content/gdrive/My Drive/"
densenet_model_path = os.path.join(gdrive_path, "iFood2019_DenseNet201_model.json")
densenet_weights_path = os.path.join(gdrive_path, "iFood2019_DenseNet201_best.hdf5")

inception_model_path = os.path.join(gdrive_path, "iFood2019_InceptionResNetV2_model.json")
inception_weights_path = os.path.join(gdrive_path, "iFood2019_InceptionResNetV2_best.hdf5")

# --- Load DenseNet201 ---
print("Loading DenseNet201 from disk...")
with open(densenet_model_path, 'r') as json_file:
    loaded_model_json = json_file.read()
densenet_model = model_from_json(loaded_model_json)
densenet_model.load_weights(densenet_weights_path)

# --- Load InceptionResNetV2 ---
print("Loading InceptionResNetV2 from disk...")
with open(inception_model_path, 'r') as json_file:
    loaded_model_json = json_file.read()
inception_model = model_from_json(loaded_model_json)
inception_model.load_weights(inception_weights_path)

print("\nModels loaded successfully.")

### 4.2. Create Test Data Generators

Because our two models require different input image sizes, we must create two separate data generators for the test set. Importantly, we set `shuffle=False` to ensure that the predictions maintain the original order of the test images, which is critical for a correct submission.

In [None]:
# Create a data generator for the test set (no augmentation, only rescaling)
test_datagen = ImageDataGenerator(rescale=1.0/255.)

# Generator for DenseNet201 (224x224)
test_generator_densenet = test_datagen.flow_from_dataframe(
    dataframe=test_labels,
    directory=test_dir,
    x_col='img_name',
    class_mode=None, # No labels for the test set
    batch_size=BATCH_SIZE_DENSENET,
    shuffle=False, # Crucial for submission
    target_size=(IMAGE_SIZE_DENSENET, IMAGE_SIZE_DENSENET)
)

# Generator for InceptionResNetV2 (299x299)
test_generator_inception = test_datagen.flow_from_dataframe(
    dataframe=test_labels,
    directory=test_dir,
    x_col='img_name',
    class_mode=None,
    batch_size=BATCH_SIZE_INCEPTION,
    shuffle=False,
    target_size=(IMAGE_SIZE_INCEPTION, IMAGE_SIZE_INCEPTION)
)

### 4.3. Generate and Ensemble Predictions

We use each model to predict the class probabilities for the entire test set. Then, we perform the key ensembling step: we calculate the element-wise average of the two prediction arrays. This resulting array represents the combined confidence of both models for each class on each image.

In [None]:
# Generate predictions from DenseNet201
print("Generating predictions with DenseNet201...")
pred_densenet = densenet_model.predict(test_generator_densenet, verbose=1)

# Generate predictions from InceptionResNetV2
print("\nGenerating predictions with InceptionResNetV2...")
pred_inception = inception_model.predict(test_generator_inception, verbose=1)

# Ensemble by averaging the predictions
print("\nAveraging predictions...")
final_predictions = np.mean([pred_densenet, pred_inception], axis=0)

print(f"\nFinal predictions array shape: {final_predictions.shape}")

### 4.4. Format and Save Final Submission File

The final step is to format the predictions according to the competition's requirements. For each image, we need to find the top 3 most likely classes. 

1. We use `np.argsort` to get the indices of the classes, sorted from least to most likely.
2. We take the last three indices (`[-3:]`) and reverse their order (`[::-1]`) to get the top 3.
3. We map these indices back to their original class labels.
4. Finally, we create a `submission.csv` file with the image name and the top 3 labels, separated by spaces.

In [None]:
# Get the top 3 predicted class indices for each image
top_3_indices = np.argsort(final_predictions, axis=1)[:, -3:][:, ::-1]

# The generators create a class_indices_ dictionary. We need to invert it 
# to map from the predicted index back to the original class label.
labels = (train_generator_densenet.class_indices)
label_map = dict((v, k) for k, v in labels.items())

# Map the indices to labels and format them as space-separated strings
top_3_labels = []
for i in range(len(top_3_indices)):
    labels_for_image = [
        label_map[top_3_indices[i][0]], 
        label_map[top_3_indices[i][1]], 
        label_map[top_3_indices[i][2]]
    ]
    top_3_labels.append(" ".join(labels_for_image))

# Create the submission DataFrame
submission_df = pd.DataFrame()
submission_df['img_name'] = test_generator_densenet.filenames
submission_df['label'] = top_3_labels

# Save the submission file in the required format
submission_df.to_csv("submission.csv", index=False)

print("Submission file 'submission.csv' created successfully.")
submission_df.head()

---
## Final Conclusion

This project successfully implemented a robust, end-to-end pipeline for a complex fine-grained image classification task. By strategically applying **transfer learning** with two diverse architectures and enhancing model resilience through **data augmentation** and **model ensembling**, we created a high-quality solution well-suited to the challenges of the iFood 2019 dataset. The final `submission.csv` file is correctly formatted and demonstrates a powerful and well-reasoned approach to solving real-world computer vision problems.