Spatial Transformer Networks in PyTorch

References

DeepMind paper: https://arxiv.org/abs/1506.02025.

Uber AI paper: https://arxiv.org/pdf/1807.03247.pdf.

PyTorch tutorial: https://pytorch.org/tutorials/intermediate/spatial_transformer_tutorial.html.

Pytorch implementation of CoordConv: https://github.com/walsvid/CoordConv.

Description

Spatial transformer networks are a generalization of differentiable attention to any spatial transformation. Spatial transformer networks (STN for short) allow a neural network to learn how to perform spatial transformations on the input image in order to enhance the geometric invariance of the model. For example, it can crop a region of interest, scale and correct the orientation of an image. It can be a useful mechanism because CNNs are not invariant to rotation and scale and more general affine transformations.

Goals of the project:

Investigate if using CoordConv layers instead of standard Conv will help to improve the performance.
Compare the performance of the new model in evaluation metric and motivate the choice of metrics.
Explore new ideas that might achieve better performance than conventional STNs.

Installation

Docker

Usage

Set up the environment:

docker build -t pytorch-stn .

Train and test the STN with different layers:

docker run -v "$(pwd):/app" pytorch-stn --layer='conv' --epochs=50

docker run -v "$(pwd):/app" pytorch-stn --layer='coordconv' --epochs=50

Output images:

imgs/stn.png: visualize the batch of input images and the corresponding transformed batch using STN
imgs/cm.png: confusion matrix where number of predictions are summarized with count values.

Experiments

The proposed CoordConv layer is a simple extension to the standard convolutional layer. Convolutional layers are used in a lot of applications because they often work well, perhaps due to some combination of three factors:

they have relatively few learned parameters.
they are fast to compute on modern GPUs.
they learn a function that is translation invariant.

Following figure shows a comparison of 2D Conv and CoordConv layers.

Uber AI paper suggest that including CoordConv layers can boost the performance. In order to verify this hypothesis, we will compare the performance using Conv and CoordConv layers and training the models during 50 epochs. We will evaluate the accuracy for each number in MNIST dataset, and the average loss and the accuracy for the whole test set. Following tables shows the results:

Layer	0	1	2	3	4	5	6	7	8	9
Conv	99%	99%	99%	99%	99%	98%	99%	98%	99%	98%
CoordConv	99%	99%	99%	99%	98%	98%	98%	99%	99%	98%

Layer	Average loss	Accuracy
Conv	0.0296	9921/10000 (99%)
CoordConv	0.0312	9908/10000 (99%)

As we can see on the previous tables, the performances using Conv and CoordConv layers are pretty similar. We will compute the confusion matrix in order to summarize the predictions broken down by each number.

Confusion Matrix Conv Layer	Confusion Matrix CoordConv Layer

For this image classification problem, using the CoordConv layer doesn't improve the performance in classification task. Although the previous tables shows that the accuracy is slightly worse in predictions with CoordConv layer during 50 training epochs, we've also evaluated the performance with less number of training epochs.

All of the experiments shown that the accuracy does not improve considerably using CoordConv layers. In image classification we don't expect see much improvement, because Conv layers are actually designed to be spatially invariant.

Using Leaky ReLU activation function

We will try to boost the performance, using Leaky ReLU activation function instead of ReLU in the spatial transformer network. We will include CoordConv layers as well. The derivative of Leaky ReLU is not a 0 in the negative part, and this activation function have a little slope to allow the gradients to flow on. Let's verify if this condition could be a benefit to improve the performance. We will evaluate the performance following the same methodology than the previous experiments.

Command to reproduce the experiment:

docker run -v "$(pwd):/app" pytorch-stn --function='leakyrelu' --epochs=50

Activation	0	1	2	3	4	5	6	7	8	9
Leaky ReLU	99%	99%	99%	99%	99%	98%	99%	99%	99%	98%

Activation	Average loss	Accuracy
Leaky ReLU	0.0293	9913/10000 (99%)

Confusion Matrix Conv Layer	Confusion Matrix Leaky ReLU Activation

We observed that using Leaky ReLU activation functions in STN, doesn't improve the performance in classification task. Change the ReLU activation function by Leaky ReLU, doesn't affect significantly to the performance.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
imgs		imgs
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
presentation.pdf		presentation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spatial Transformer Networks in PyTorch

References

Description

Installation

Usage

Experiments

Using Leaky ReLU activation function

About

Releases

Packages

Contributors 2

Languages

vicsesi/PyTorch-STN

Folders and files

Latest commit

History

Repository files navigation

Spatial Transformer Networks in PyTorch

References

Description

Installation

Usage

Experiments

Using Leaky ReLU activation function

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages