<a href="https://colab.research.google.com/github/weiyunna/Deep-Learning-with-Tensorflow/blob/master/How_to_handle_images_of_large_sizes_in_CNN%3F.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to handle images of large sizes in CNN?

## How to handle images of large sizes in CNN?

I assume that by downsampling you mean scaling down the input before passing it into CNN. Convolutional layer allows to downsample the image within a network, by picking **a large stride**, which is going to save resources for the next layers. In fact, that's what it has to do, otherwise your model won't fit in GPU.

## Are there any techniques to handle such large images which are to be trained?

Commonly researches** scale the images to a resonable size**. But if that's not an option for you, you'll need to** restrict your CNN**. In addition to downsampling in early layers, I would recommend you to get rid of FC layer (which normally takes most of parameters) in favor of convolutional layer. Also you will have to **stream your data in each epoch**, because it won't fit into your GPU.

Note that none of this will prevent heavy computational load in the early layers, exactly because the input is so large: **convolution is an expensive operation and the first layers will perform a lot of them in each forward and backward pass**. In short, training will be slow.

## What batch size is reasonable to use ?

Here's another problem. A single image takes `2400x2400x3x4` (3 channels and 4 bytes per pixel) which is ~70Mb, so you can hardly afford even a batch size 10. More realistically would be 5. Note that **most of the memory will be taken by CNN parameters**. I think in this case it makes sense reduce the size by using 16-bit values rather than 32-bit - this way you'll be able to double the batches.

## Is there any precautions to take or any increase and decrease in hardware resources that I can do?

Your bottleneck is GPU memory. If you can afford another GPU, get it and split the network across them. Everything else is insignificant compared to GPU memory.

* Usually for images the feature set is the pixel density values and in this case it will lead to quite a big feature set; also down sampling the images is also not recommended as you may lose (actually will) loose important data

* But there are some techniques that can help you reduce the feature set size, approaches like PCA(Principle Component Analysis) helps you in selection of important feature subset.

* Other than that to reduce the computational expense while training your Neural Network, you can use **Stochastic Gradient Descent**, rather than conventional use of Gradient Descent approach, that would reduce the size of dataset required for training in each iteration. Thus your dataset size to be used in one iteration would reduce, thus would reduce the time required to train the Network

* The **exact batch size** to be used is dependent on your distribution for training dataset and testing datatset, a more general use is 70-30. Where you can also use above mentioned Stochastic approach to reduce required time.



*   Rescale all your images to smaller dimensions. You can rescale them to 112x112 pixels. In your case, because you have a square image, there will be no need for cropping. You will still not be able to load all these images into your RAM at a goal.
*   The best option is to **use a generator function that will feed the data in batches**. Please refer to the use of **fit_generator** as used in Keras. If your model parameters become too big to fit into GPU memory, consider using batch normalization or using a Residual model to reduce your number of parameter.



# To what resolution should I resize my images to use as training dataset for deep learning?

* It really depends on the size of your network and your GPU. You need to `fit reasonably sized batch (16-64 images)` in gpu memory. That can easily be very big: you can compute the size of intermediate activations as 4*batch_size*num_feature_maps*height*width. Say you take 32 square images 112x112 with 64 feature maps. It would be `100Mb just for activations` and `the same amount for gradients`. Take relatively` big network (for example, VGG16)` and you already need a few Gb.

* Other aspect is the size of receptive field. If you follow current advices to prefer `small filter size (3x3)` and `take big images`, you can end up either with quite shallow network (because you can't fit a lot of layers into gpu) or with narrow network (which is ok if you know how to train it). Former network will necessarily have small effective receptive fields, therefore will approximate more local and simpler function.

* So the rule of thumb is use images about `256x256 for ImageNet-scale` networks and about `96x96` for something smaller and easier. I have heard that in kaggle people train on 512x512 sometimes, but you will need to compromise on something. Or just buy gpu cluster.

* If you train fully convolutional networks like `Faster RCNN` you can take much bigger images (say `800x600`) because you have batch size = 1.

* First of all, it is not necessary to have square images. Secondly, bigger images means more computation operations per layer as well as more memory requirements. `However it doesn’t impact the conv network as it doesn’t work with the full image as one input, but rather a fixed window that slides over the image, the convolution operation`.

* In training we need to work with batches, and that means a set of data represented as a tensor. That tensor is taken through conv layer operations of convolution follow by ReLU to get the activation volume (the input to the following layer). All of this has to be done in memory. `For better performance, the tensor should be able to fit fully in the memory either RAM or GPU memory.`

* `If you are scaling down the images, then keep the aspect ratio, as otherwise it will impact the structural relations in your data`. Moreover, if your data is scale down version of high resolution images, then your network will be able to see key features in the initial layers. If you are images are large, then these key features might be learned later at the end of the network.