# Autoencoders with Keras, TensorFlow, and Deep Learning

![image](https://pyimagesearch.com/wp-content/uploads/2020/02/keras_autoencoders_header.png)

In this tutorial, we’ll discuss what autoencoders are, including how convolutional autoencoders can be applied to image data. We’ll also discuss the difference between autoencoders and other generative models, such as Generative Adversarial Networks (GANs).

From there, I’ll show you how to implement and train a convolutional autoencoder using Keras and TensorFlow.

Finally, we’ll then review the results of the training script, including visualizing how the autoencoder did at reconstructing the input data.

I’ll recommend next steps to you if you are interested in learning more about deep learning applied to image datasets.

# What are autoencoders?

Autoencoders are a type of unsupervised neural network (i.e., no class labels or labeled data) that seek to:

Accept an input set of data (i.e., the input).
Internally compress the input data into a latent-space representation (i.e., a single vector that compresses and quantifies the input).
Reconstruct the input data from this latent representation (i.e., the output).
Typically, we think of an autoencoder having two components/subnetworks:

Encoder: Accepts the input data and compresses it into the latent-space. If we denote our input data as x and the encoder as E, then the output latent-space representation, s, would be $s = E(x)$.
Decoder: The decoder is responsible for accepting the latent-space representation s and then reconstructing the original input. If we denote the decoder function as D and the output of the detector as o, then we can represent the decoder as $o = D(s)$.
Using our mathematical notation, the entire training process of the autoencoder can be written as:

$$o = D(E(x))$$

![image](https://pyimagesearch.com/wp-content/uploads/2020/02/keras_autoencoder_arch_flow.png)

Figure 1: Autoencoders with Keras, TensorFlow, Python, and Deep Learning don’t have to be complex. Breaking the concept down to its parts, you’ll have an input image that is passed through the autoencoder which results in a similar output image. (figure inspired by Nathan Hubens’ article, Deep inside: Autoencoders)

Here you can see that:

We input a digit to the autoencoder.
The encoder subnetwork creates a latent representation of the digit. This latent representation is substantially smaller (in terms of dimensionality) than the input.
The decoder subnetwork then reconstructs the original digit from the latent representation.
You can thus think of an autoencoder as a network that reconstructs its input!

To train an autoencoder, we input our data, attempt to reconstruct it, and then minimize the mean squared error (or similar loss function).

Ideally, the output of the autoencoder will be near identical to the input.

During the training process, our goal is to train a network that can learn how to reconstruct our input data — but the true value of the autoencoder lives inside that latent-space representation.

Keep in mind that autoencoders compress our input data and, more to the point, when we train autoencoders, what we really care about is the encoder, E, and the latent-space representation, $s = E(x)$.

The decoder, $o = D(s)$, is used to train the autoencoder end-to-end, but in practical applications, we often (but not always) care more about the encoder and the latent-space.

Later in this tutorial, we’ll be training an autoencoder on the MNIST dataset. The MNIST dataset consists of digits that are 28×28 pixels with a single channel, implying that each digit is represented by 28 x 28 = 784 values. The autoencoder we’ll be training here will be able to compress those digits into a vector of only 16 values — that’s a reduction of nearly 98%!

So what can we do if an input data point is compressed into such a small vector?

That’s where things get really interesting.

### What are applications of autoencoders?

![image](https://pyimagesearch.com/wp-content/uploads/2020/02/keras_autoencoders_applications.png)

Autoencoders are typically used for dimensionality reduction, denoising, and anomaly/outlier detection. Outside of computer vision, they are extremely useful for Natural Language Processing (NLP) and text comprehension. In this tutorial, we’ll use Python and Keras/TensorFlow to train a deep learning autoencoder. (image source)

Autoencoders are typically used for:

Dimensionality reduction (i.e., think PCA but more powerful/intelligent).
Denoising (ex., removing noise and preprocessing images to improve OCR accuracy).
Anomaly/outlier detection (ex., detecting mislabeled data points in a dataset or detecting when an input data point falls well outside our typical data distribution).
Outside of the computer vision field, you’ll see autoencoders applied to Natural Language Processing (NLP) and text comprehension problems, including understanding the semantic meaning of words, constructing word embeddings, and even text summarization.

### Project Structure

```bash
$ tree --dirsfirst
.
├── pyimagesearch
│   ├── __init__.py
│   └── convautoencoder.py
├── output.png
├── plot.png
└── train_conv_autoencoder.py
1 directory, 5 files
```

In [1]:
# import the necessary packages
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Conv2DTranspose
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Reshape
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
import numpy as np

In [2]:
height, width, depth = 28, 28, 1
filters=(32, 64)
latentDim=16


inputShape = (height, width, depth)
chanDim = -1 # normalización por batch, en este caso el último numero, que son los canales por bacth de muestras

# define the input to the encoder
inputs = Input(shape=inputShape)
x = inputs
# loop over the number of filters
for f in filters:
    # apply a CONV => RELU => BN operation
    x = Conv2D(f, (3, 3), strides=2, padding="same")(x)
    x = LeakyReLU(alpha=0.2)(x) # igual que Relu pero admite ciertos valores negativos, que se controlan mediante la inclinación de la recta en la parte negativa, marcada por el alpha
    x = BatchNormalization(axis=chanDim)(x)
# flatten the network and then construct our latent vector
volumeSize = K.int_shape(x) # aquí almacenamos el shape de la capa de filtro 7,7,64
x = Flatten()(x)
latent = Dense(latentDim)(x)# esta densa es de 16, que sería el latentDim
# build the encoder model
encoder = Model(inputs, latent, name="encoder") # esta parte es el encoder



In [3]:
encoder.summary()

In [4]:
latentInputs = Input(shape=(latentDim,))# aquí tenemos el latentDIm, pero importante incluir la "," para que coincida el shape
x = Dense(np.prod(volumeSize[1:]))(latentInputs)  # !!!!!!!!!!!!!!! el volumensize tiene en cuenta el batch, luego quitamos el 1er valor
x = Reshape((volumeSize[1], volumeSize[2], volumeSize[3]))(x) # aqui tenemos el 7,7,64
# loop over our number of filters again, but this time in
# reverse order
for f in filters[::-1]:
	# apply a CONV_TRANSPOSE => RELU => BN operation
	x = Conv2DTranspose(f, (3, 3), strides=2,
		padding="same")(x)
	x = LeakyReLU(alpha=0.2)(x)
	x = BatchNormalization(axis=chanDim)(x)

# apply a single CONV_TRANSPOSE layer used to recover the
# original depth of the image
x = Conv2DTranspose(depth, (3, 3), padding="same")(x)
outputs = Activation("sigmoid")(x)# esta activación tiene sentido al haber normalizado los datos de entrada, lo que nos dá valores entre 0 y 1, al igual qeu la salida de la sigmoide, que será entre 0 y 1.
# build the decoder model
decoder = Model(latentInputs, outputs, name="decoder")
# our autoencoder is the encoder + decoder



In [5]:
decoder.summary()

In [6]:
autoencoder = Model(inputs, decoder(encoder(inputs)),
			name="autoencoder")
autoencoder.summary()

In [11]:
## CLASE CON EL AUTOENCODER

import numpy as np

class ConvAutoencoder:
	@staticmethod
	def build(width, height, depth, filters=(32, 64), latentDim=16):
		# initialize the input shape to be "channels last" along with
		# the channels dimension itself
		# channels dimension itself
		inputShape = (height, width, depth)
		chanDim = -1

		# ENCODER

        # define the input to the encoder
		inputs = Input(shape=inputShape)
		x = inputs
		# loop over the number of filters
		for f in filters:
			# apply a CONV => RELU => BN operation
			x = Conv2D(f, (3, 3), strides=2, padding="same")(x)
			x = LeakyReLU(alpha=0.2)(x)
			x = BatchNormalization(axis=chanDim)(x)
		# flatten the network and then construct our latent vector
		volumeSize = K.int_shape(x) # !!!!!!!!!!!!!!!
		x = Flatten()(x)
		latent = Dense(latentDim)(x)
		# build the encoder model
		encoder = Model(inputs, latent, name="encoder")

		# DECODER

		# start building the decoder model which will accept the
		# output of the encoder as its inputs
		latentInputs = Input(shape=(latentDim,))
		x = Dense(np.prod(volumeSize[1:]))(latentInputs)  # !!!!!!!!!!!!!!!
		x = Reshape((volumeSize[1], volumeSize[2], volumeSize[3]))(x)
		# loop over our number of filters again, but this time in
		# reverse order
		for f in filters[::-1]:
			# apply a CONV_TRANSPOSE => RELU => BN operation
			x = Conv2DTranspose(f, (3, 3), strides=2,
				padding="same")(x)
			x = LeakyReLU(alpha=0.2)(x)
			x = BatchNormalization(axis=chanDim)(x)

		# apply a single CONV_TRANSPOSE layer used to recover the
		# original depth of the image
		x = Conv2DTranspose(depth, (3, 3), padding="same")(x)
		outputs = Activation("sigmoid")(x) 
		# build the decoder model
		decoder = Model(latentInputs, outputs, name="decoder")
		# our autoencoder is the encoder + decoder
		autoencoder = Model(inputs, decoder(encoder(inputs)),
			name="autoencoder")
		# return a 3-tuple of the encoder, decoder, and autoencoder
		return (encoder, decoder, autoencoder)


In [7]:
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")


# import the necessary packages
# from pyimagesearch.convautoencoder import ConvAutoencoder
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt
import numpy as np
# import argparse
import cv2

In [8]:
samples = 8
output = "output.png"
plot = "plot.png"

In [9]:
# initialize the number of epochs to train for and batch size
EPOCHS = 5
BS = 32
# load the MNIST dataset
print("[INFO] loading MNIST dataset...")
((trainX, _), (testX, _)) = mnist.load_data()
# add a channel dimension to every image in the dataset, then scale
# the pixel intensities to the range [0, 1]
trainX = np.expand_dims(trainX, axis=-1)# añade 1 dimensión, la del canal, a los datos de entrada, asumiendo que la foto es en blanco y negro
testX = np.expand_dims(testX, axis=-1)
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

[INFO] loading MNIST dataset...


In [12]:
# construct our convolutional autoencoder
print("[INFO] building autoencoder...")
(encoder, decoder, autoencoder) = ConvAutoencoder.build(28, 28, 1)
opt = Adam(learning_rate=1e-3)
autoencoder.compile(loss="mse", optimizer=opt)
# train the convolutional autoencoder
H = autoencoder.fit(
	trainX, trainX,
	validation_data=(testX, testX),
	epochs=EPOCHS,
	batch_size=BS)

[INFO] building autoencoder...




Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 17ms/step - loss: 0.0328 - val_loss: 0.0113
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 19ms/step - loss: 0.0106 - val_loss: 0.0098
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 18ms/step - loss: 0.0094 - val_loss: 0.0095
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 18ms/step - loss: 0.0088 - val_loss: 0.0094
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 18ms/step - loss: 0.0084 - val_loss: 0.0080


In [13]:
# construct a plot that plots and saves the training history
N = np.arange(0, EPOCHS)
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(plot)

In [14]:
# use the convolutional autoencoder to make predictions on the
# testing images, then initialize our list of output images
print("[INFO] making predictions...")
decoded = autoencoder.predict(testX)
outputs = None
# loop over our number of output samples
for i in range(0, samples):
	# grab the original image and reconstructed image
	original = (testX[i] * 255).astype("uint8")
	recon = (decoded[i] * 255).astype("uint8")
	# stack the original and reconstructed image side-by-side
	output = np.hstack([original, recon])
	# if the outputs array is empty, initialize it as the current
	# side-by-side image display
	if outputs is None:
		outputs = output
	# otherwise, vertically stack the outputs
	else:
		outputs = np.vstack([outputs, output])
# save the outputs image to disk
cv2.imwrite("PREDICTIONS.png", outputs)

[INFO] making predictions...
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step


True