Reproducing the paper "PADAM: Closing The Generalization Gap of Adaptive Gradient Methods In Training Deep Neural Networks" for the ICLR 2019 Reproducibility Challenge
Latest commit e383342 Jan 14, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
resnet-18 Jan 12, 2019
vgg16-net Jan 12, 2019
wide-resnet Jan 12, 2019
.gitignore Dec 19, 2018
Report.pdf Jan 14, 2019
test_error.png Jan 8, 2019
train_loss.png Jan 8, 2019

You can find our report for the reproducibility challenge here.

## Setup Dependencies

The recommended version for running the experiments is Python3.

These experiments have been written in tensorflow's eager mode so installing the dependencies is a must to run the code:

1. Follow the installation guide on Tensorflow Homepage for installing Tensorflow-GPU or Tensorflow-CPU.
2. Follow instructions outlined on Keras Homepage for installing Keras.

Run a vanilla experiment using the following command at the directory root folder.

python vgg16-net/run.py

## Project Structure

The skeletal overview of the project is as follows:

.
├── vgg16-net/
│   ├── run.py  # A script to run the experiments over VGG Net architechture
│   └── model.py     # VGGNet model
├── resnet18/
│   ├── run.py # A script to run the experiments over ResNet architechture
│   └── model.py     # Resnet 18 model
├── wide-resnet/
│   ├── run.py        #A script to run the experiments over ResNet architechture
│   ├── model.py    # Wide Resnet 18 model
.
folders and files below will be generated after you run the experiment in each model directory
.
├── model_{optimizer}_{dataset}.csv                 # Stores logs for the experiment
└── model_{optimizer}_{dataset}.h5              # Stores the weights of the final model trained 

## Defining Experiment Configuration

You can set the experiment configuration by changing the dictionary in the run.py files. These dictionary contains all the hyperparameter for the each optimizers ie. Adam, Amsgrad, SGD Momentum and Padam.

# Experiments

We carry out the experiments to compare the performance of four optimizers - Adam, Amsgrad, SGD Momentum and the proposed algorithm Padam, on 3 modern deep learning architectures ResNet18, WideResNet18 and VGGNet16, over CIFAR-10 and CIFAR-100 datasets. All the experiments have been run for 200 epochs, using categorical cross entropy loss function.

## Results

We were sucessful in reproducing the results as predicted in the paper for Cifar-10 and Cifar-100. It is observed that Padam indeed generalizes better than other adaptive gradient method, although it does have a few shortcomings as mentioned in our report. Here, we show the results for VGGNet16, rest of the results have been included in the report.

Results on the CIFAR-10 dataset for VGGNet.