# Deep Learning

Convolutional neural networks started a completely new approach of machine learning generally called as "deep learning". In principle, it is based on idea that tasks can be solved by devising a suitable architecture that learns to solve the problem.

## Backbone network 1: AlexNet

AlexNet was presented in the NeurIPS 2012 conference and that single paper started the revolution in ML. The paper introduced various concepts for training many convolutional layers of neurons.


## Backbone network 2: VGGNet

In their paper the Oxford group investigated various strategies to build a better backbone network for image classification and proposed VGGNet which is many ways simplified version of AlexNet.


## More backbones
A bunch of pre-trained backbone networks have been made available in Keras (https://keras.io/api/applications/) and include, for example, various versions of the ResNet, Inception, MobileNet and EfficientNet. You may easily use them in your applications. Often VGG16 performs rather well in any application and is thus a good starting point.

### Demo 1: CNN with some AlexNet flavors (MNIST Digits)

In [None]:
import tensorflow as tf
import keras
import numpy as np
print("TensorFlow version:", tf.__version__)

In [None]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
print(f' Min y (class) values is {np.min(y_test)} and max {np.max(y_test)}')

In [None]:
model2 = tf.keras.models.Sequential()

model2.add(keras.layers.Input(shape=(28,28,1)))
print(model2.output_shape)
           
# Convolution layer
model2.add(keras.layers.Conv2D(16,kernel_size=(5,5),strides=(2,2)))
print(model2.output_shape)

# Flatten input image to a vector
model2.add(keras.layers.Flatten())
print(model2.output_shape)

# Add a full connected layer
model2.add(keras.layers.Dense(10, activation='sigmoid'))
print(model2.output_shape)

In [None]:
## This loss takes care of one-hot encoding (see https://keras.io/api/losses/)
#loss_fn2 = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
##loss_fn = tf.keras.losses.MeanSquaredError(from_logits=True)
#model2.compile(optimizer='adam',
#              loss=loss_fn2,
#              metrics=['accuracy'])
#model2.summary()

model2.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model2.summary()

We need to convert training data to format assumed by a convolutional filter (add one more dimension to make it explicit) and convert y explicitly to one-hot encoding.

In [None]:
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

# convert class vectors to binary class matrices
y_train_cat = keras.utils.to_categorical(y_train, 10)
y_test_cat = keras.utils.to_categorical(y_test, 10)

Now training.

In [None]:
model2.fit(x_train, y_train_cat, epochs=1)

Testing.

In [None]:
y_test_hat = model2.predict(x_test)
y_test_hat = y_test_hat[0:10,:]
#print(np.maxind(y_test_hat))
print(y_test[0:10])

print(np.argmax(y_test_hat,axis=1))
model2.evaluate(x_test,  y_test_cat, verbose=2)

This is with more tricks and flavors but we need to learn about backbone networks first.

In [None]:
# With more tricks (ReLu and MaxPooling)
model3 = tf.keras.models.Sequential()

model3.add(keras.layers.Input(shape=(28,28,1)))
print(model3.output_shape)
           
# Convolution layer 1
model3.add(keras.layers.Conv2D(16,kernel_size=(3,3)))
print(model3.output_shape)

# Max pooling
model3.add(keras.layers.MaxPooling2D(pool_size=(2,2)))
print(model3.output_shape)

# Convolution layer 2
model3.add(keras.layers.Conv2D(32,kernel_size=(3,3)))
print(model3.output_shape)

# Max pooling
model3.add(keras.layers.MaxPooling2D(pool_size=(2,2)))
print(model3.output_shape)

# Flatten input image to a vector
model3.add(keras.layers.Flatten(input_shape=(28,28)))
print(model3.output_shape)

# Add dropout "layer"
model3.add(keras.layers.Dropout(0.2))
print(model3.output_shape)

# Add a full connected layer
model3.add(keras.layers.Dense(10, activation='softmax'))
print(model3.output_shape)

In [None]:
model3.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model3.summary()

In [None]:
model3.fit(x_train, y_train_cat, epochs=1)

## Special architectures

### Autoencoders
Autoencoder architecture is a good starting point to understand recent developments as many great ideas can be derived from it.

### Demo: autoencoder

### DeepFakes

See: https://www.alanzucconi.com/2018/03/14/understanding-the-technology-behind-deepfakes/

Video: https://www.youtube.com/watch?v=OmB7fmi8JwY

Code: plenty

### Generative Adversarial Network (GAN)

StyleGAN: https://en.wikipedia.org/wiki/StyleGAN

Code: https://github.com/NVlabs/stylegan

### It's all about data - crazy ideas to collect data

Frozen people: https://mannequin-depth.github.io/

### Natural language processing - Transformers

The most amazing results in AI occur in NLP at the moment. Especially the results of large scale language models are incredible (as of Aug 2022) and now tranformer architectures are quickly apodted in vision and other fields as well.

Question & answer demo: https://www.pragnakalp.com/demos/BERT-NLP-QnA-Demo/

Text generation: https://talktotransformer.com/

DALL-E: https://gpt3demo.com/apps/openai-dall-e

DALL-E 2: https://openai.com/dall-e-2/

PaLM: https://thenextweb.com/news/google-palm-ai-sucks-at-telling-jokes-but-great-at-analyzing-them 

## References

TensorFlow Tutorials. URL: https://www.tensorflow.org/tutorials

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton (2012): "ImageNet Classification with Deep Convolutional Neural Networks". In Proc. of the NeurIPS. URL: https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html

Karen Simonyan, Andrew Zisserman (2015): "Very Deep Convolutional Networks for Large-Scale Image Recognition". In Proc. of the ICLR. URL: https://arxiv.org/abs/1409.1556