What is Data Augmentation?

Data Augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified data from the existing one.

It is a good technique if you want to prevent overfitting, or the initial dataset is too small to train on, or even if you want to squeeze better performance from your model.

## Data Augmentation techniques
* Geometric transformations: Randomly flip, crop, rotate or translate images.
* Color space transformations: change RGB color channels, intensify any color.
* Kernel filters: sharpen or blur an image.
* Random Erasing: delete a part of the initial image.
* Mixing images: basically, mix images with one another.
## Data Augmentation Frameworks:
* TensorFlow
* Keras
* PyTorch

Sources:
* https://neptune.ai/blog/data-augmentation-in-python
* https://pytorch.org/tutorials/beginner/data_loading_tutorial.html?highlight=dataloader
* https://towardsdatascience.com/a-comprehensive-guide-to-image-augmentation-using-pytorch-fb162f2444be

In [4]:
!pip install torch torchvision







# Import Data and images

In [1]:
import pandas as pd
import numpy as np
import torchvision.transforms as T
import matplotlib.pyplot as plt

In [2]:
df = pd.read_csv('dataset_faces.csv')

In [3]:
df.head()

Unnamed: 0,filename,age,gender,ethnicity
0,100_1_0_20170110183726390.jpg,100,1,0
1,100_1_2_20170105174847679.jpg,100,1,2
2,100_1_2_20170110182836729.jpg,100,1,2
3,101_1_2_20170105174739309.jpg,101,1,2
4,10_0_0_20161220222308131.jpg,10,0,0


# Get pictures as numpy arrays

In [7]:
from PIL import Image
import numpy as np

images_as_arrays = []

for idx, image in enumerate(df['filename']):
    path = 'pic/' + image
    img = Image.open(path)
    img = np.array(img)
    images_as_arrays.append(img)

KeyboardInterrupt: 

In [6]:
print(images_as_arrays[0])

[[[231 234 241]
  [234 237 244]
  [233 236 243]
  ...
  [220 225 229]
  [223 227 230]
  [223 227 230]]

 [[222 225 232]
  [226 229 236]
  [231 234 241]
  ...
  [224 229 233]
  [228 232 235]
  [228 232 235]]

 [[222 225 232]
  [221 224 231]
  [226 229 236]
  ...
  [226 231 235]
  [214 217 222]
  [213 216 221]]

 ...

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]]


# Data Augmentation in Keras

# Data Augmentation in PyTorch

PyTorch contains the transforms library for data augmentation.

Transform library contains different image transformations that can be chained together using the compose method.

## Resize

One issue we can face is that the samples are not of the same size. Most neural networks expect the images of fixed size.


In [None]:
# Investigate size of the first 5 pictures:

for idx, image_array in images_as_arrays[0:5]:
    print(f"Image {idx} has shape {image_array.size}")


In [None]:
resized_images = []
size = (128, 128)
for img in images_as_arrays:
    resized_img = T.Resize(size=size) (img)
    resized_images.append(resized_img)

In [None]:
for idx, image_array in images_as_arrays[0:5]:
    print(f"Image {idx} has shape {image_array.size}")

In [None]:
plt.plot(resized_images[0])