<a href="https://colab.research.google.com/github/vessln/Deep_learning/blob/main/3_Neural_Networks_for_Images.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [48]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from skimage.transform import resize

import tensorflow as tf

from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Input, Conv2D, MaxPool2D, UpSampling2D, Conv2DTranspose, Dense, Flatten, GlobalMaxPool2D, GlobalAvgPool2D, Add
from tensorflow.keras.datasets import cifar10

from tensorflow.keras.applications import vgg19, resnet50, inception_resnet_v2

# Neural Networks for Images

## Convolutional neural networks

Convolution extracts features - recognizes local patterns such as edges, shapes, etc.

Convolutional neural network is a type of neural networks specialized for image processing (classification, segmentation), video processing and other data that have a spatial structure.
I have an input image, represented as a two-dimensional matrix of numbers, where each number is the intensity of a pixel (value between 0 and 255). The filter (kernel) is a small matrix, with size 3x3 / 5x5, which slides (sliding window) over the input image. Sliding – the filter is positioned on the input image and multiplied element by element by the part it covers. The result is summed and this number is written into the output matrix - feature map. Stride s – how many pixels the filter moves with each slide (step=1 -> the filter is shifted by 1 pixel). Padding – adds a frame of pixels (can be zero, mirrored etc.) around the input image, which enlarges it to preserve its size after convolution. Valid padding – no frame. Same padding – input and output sizes remain the same.

In [2]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step


In [3]:
# this image has height=32, width=32, channels=3
x_train[0]

Two-dimensional convolutions (Conv2D) work on three-dimensional convolution volumes, because of the channels!

In tensorflow: width, height, channels.
In pytorch: channels, width, height.

Parameters:
- filters - count of filters;
- kernel_size - size of the each filter;
- padding - how much to increase / reduce the image or to be with the same dimetions

In [4]:
# tensorflow: width, height, channels
# pytorch: channels, width, height

cnn_model = Sequential([
    Input((32, 32, 3)),
    Conv2D(filters = 17, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 15, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 12, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 10, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 5, kernel_size = (3, 3), padding = "same", activation = "relu"),
])

In [5]:
cnn_model.summary()

Flatten() connects the multidimensional spatial information, resulting from the convolutional layers, with the one-dimensional inputs of the Dense layers: 32 * 32 * 5 = 5120.

In [6]:
dense_model = Sequential([
    Input((32, 32, 5)),
    Flatten(),
    Dense(20, activation= "relu"),
    Dense(10, activation= "softmax"),
])

In [7]:
dense_model.summary()

In [8]:
# I can combine cnn_model and dence_model:

com_model = Sequential([
    Input((32, 32, 3)),
    Conv2D(filters = 19, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 19, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 17, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 17, kernel_size = (3, 3), padding = "same", activation = "relu"),

    Conv2D(filters = 15, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 12, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 10, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 5, kernel_size = (3, 3), padding = "same", activation = "relu"),

    Flatten(),
    Dense(40, activation= "relu"),
    Dense(20, activation= "relu"),
    Dense(10, activation= "softmax"),
]).summary()

I have too many parameters and I need to reduce them (Total params: 220 699). I can use **MaxPooling** dimensionality reduction (make aggregation) reduces convolutional volume four times.

**GlobalMaxPool2D** reduces the number of channels. It is used when convolutional volume is bigger (alternative to Flatten). **GlobalAvgPool2D** is more commonly used.

In [9]:
comb_model = Sequential([
    Input((32, 32, 3)),
    Conv2D(filters = 19, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 19, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 17, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 17, kernel_size = (3, 3), padding = "same", activation = "relu"),
    MaxPool2D(),

    Conv2D(filters = 15, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 12, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 10, kernel_size = (3, 3), padding = "same", activation = "relu"),
    Conv2D(filters = 5, kernel_size = (3, 3), padding = "same", activation = "relu"),
    MaxPool2D(),

    # Flatten(),
    GlobalAvgPool2D(),
    Dense(40, activation= "relu"),
    Dense(20, activation= "relu"),
    Dense(10, activation= "softmax"),
])

In [10]:
comb_model.summary()

In [11]:
comb_model.compile(loss = "sparse_categorical_crossentropy", optimizer = "adam")

In [12]:
comb_model.fit(x_train[:5000], y_train[:5000], epochs = 5)

Epoch 1/5
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 161ms/step - loss: 2.3224
Epoch 2/5
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 150ms/step - loss: 2.1718
Epoch 3/5
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 151ms/step - loss: 1.9560
Epoch 4/5
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 146ms/step - loss: 1.8696
Epoch 5/5
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 149ms/step - loss: 1.8079


<keras.src.callbacks.history.History at 0x7df9bc3c2020>

## vgg19

In [13]:
vgg_model = vgg19.VGG19()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels.h5
[1m574710816/574710816[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 0us/step


In [14]:
vgg_model.summary()

In [15]:
vgg_model.layers[12].weights

[<KerasVariable shape=(3, 3, 256, 512), dtype=float32, path=block4_conv1/kernel>,
 <KerasVariable shape=(512,), dtype=float32, path=block4_conv1/bias>]

In [16]:
# preprocess_input function doesnt resize
preprocessed = vgg19.preprocess_input(x_train[:10])

In [17]:
preprocessed.shape

(10, 32, 32, 3)

The model vgg19 expects shape: (224, 224, 3), but my image is: (32, 32, 3).

In [18]:
from tensorflow.keras.layers import Rescaling

In [19]:
resized_img = resize(x_train[0], (224, 224), preserve_range=True).astype(np.uint8)

In [20]:
resized_img.shape

(224, 224, 3)

In [21]:
resized_img

In [22]:
images = np.array([resize(x_train[i], (224, 224), preserve_range=True).astype(np.uint8) for i in range(50)])

In [23]:
images.shape

(50, 224, 224, 3)

In [24]:
preprocessed_images = vgg19.preprocess_input(images)

In [25]:
predictions_probabilities = vgg_model.predict(preprocessed_images)

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 13s/step


In [26]:
predictions_probabilities

array([[6.8698684e-04, 1.5271391e-02, 1.0795880e-05, ..., 4.5592600e-04,
        5.3814106e-04, 6.9145835e-06],
       [8.9981211e-07, 8.0726272e-07, 2.9883868e-08, ..., 3.0135602e-08,
        1.5150373e-06, 1.9761706e-07],
       [1.0972826e-04, 3.1540945e-05, 6.4884462e-06, ..., 6.3131665e-06,
        1.7730249e-04, 6.3365096e-06],
       ...,
       [1.6427335e-05, 8.4834255e-06, 4.4948418e-05, ..., 8.9718901e-07,
        2.2703007e-06, 1.3351939e-07],
       [1.0207788e-04, 3.4475374e-06, 1.1897511e-06, ..., 5.0614108e-07,
        6.7692454e-06, 1.1243643e-06],
       [6.4456835e-06, 3.0237233e-04, 2.1267369e-06, ..., 8.3076299e-07,
        2.1651318e-05, 2.2697895e-07]], dtype=float32)

In [27]:
vgg19.decode_predictions(predictions_probabilities)

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json
[1m35363/35363[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


[[('n03347037', 'fire_screen', 0.12202412),
  ('n04443257', 'tobacco_shop', 0.035200235),
  ('n03871628', 'packet', 0.02293932),
  ('n07590611', 'hot_pot', 0.020755785),
  ('n01773549', 'barn_spider', 0.02065492)],
 [('n03796401', 'moving_van', 0.89656836),
  ('n04467665', 'trailer_truck', 0.04767666),
  ('n03776460', 'mobile_home', 0.031516373),
  ('n03417042', 'garbage_truck', 0.0033560575),
  ('n03895866', 'passenger_car', 0.002592392)],
 [('n04428191', 'thresher', 0.23830506),
  ('n03134739', 'croquet_ball', 0.05163732),
  ('n03000684', 'chain_saw', 0.044336446),
  ('n02950826', 'cannon', 0.03177263),
  ('n03498962', 'hatchet', 0.027883796)],
 [('n01795545', 'black_grouse', 0.25616363),
  ('n02422106', 'hartebeest', 0.14226277),
  ('n02002724', 'black_stork', 0.08712437),
  ('n01871265', 'tusker', 0.04051196),
  ('n02114855', 'coyote', 0.034868747)],
 [('n03796401', 'moving_van', 0.72487336),
  ('n04467665', 'trailer_truck', 0.099333934),
  ('n02690373', 'airliner', 0.04269375),
  

In [28]:
x_train[3]

In [29]:
tf.keras.backend.clear_session()

## ResNet50

In [30]:
resnet = resnet50.ResNet50()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels.h5
[1m102967424/102967424[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


In [31]:
resnet.summary()

In [32]:
inputs = Input((224, 224, 3))
conv1 = Conv2D(20, (3, 3), activation = "relu", padding = "same")(inputs)
conv2 = Conv2D(20, (3, 3), activation = "relu", padding = "same")(conv1)
conv3 = Conv2D(20, (3, 3), activation = "relu", padding = "same")(conv2)
conv4 = Conv2D(20, (3, 3), activation = "relu", padding = "same")(conv3)

add_result = Add()([conv1, conv4])

residual_block = Model(inputs = inputs, outputs = add_result)

In [33]:
residual_block.summary()

## Inception-ResNet

In [35]:
inception_resnet_v2.InceptionResNetV2().summary()

## Encoder-Decoder architecture

Encoder that processes the input data and converts it into a context vector (a compact number). This vector summarizes all the important information from the input. Decoder uses the context vector to generate the output data step by step.

In [40]:
encoder = Sequential([
    Input((224, 224, 3)),
    Conv2D(256, (3, 3), activation = "relu", padding = "same"),
    Conv2D(128, (3, 3), activation = "relu", padding = "same"),
    MaxPool2D(),

    Conv2D(128, (3, 3), activation = "relu", padding = "same"),
    Conv2D(64, (3, 3), activation = "relu", padding = "same"),
    MaxPool2D(),

    Conv2D(64, (3, 3), activation = "relu", padding = "same"),
    Conv2D(32, (3, 3), activation = "relu", padding = "same"),
    MaxPool2D(),
])

In [41]:
encoder.summary()

UpSampling2D is the inverse function of Maxpooling!

In [45]:
decoder = Sequential([
    Input((28, 28, 32)),
    UpSampling2D(),
    Conv2D(32, (3, 3), activation = "relu", padding = "same"),
    Conv2D(64, (3, 3), activation = "relu", padding = "same"),

    UpSampling2D(),
    Conv2D(64, (3, 3), activation = "relu", padding = "same"),
    Conv2D(128, (3, 3), activation = "relu", padding = "same"),

    UpSampling2D(),
    Conv2D(128, (3, 3), activation = "relu", padding = "same"),
    Conv2D(256, (3, 3), activation = "relu", padding = "same"),

    Conv2D(3, (3, 3), padding = "same"),
])

In [46]:
decoder.summary()

In the decoder must be used UpSampling2D and Conv2DTranspose.
Conv2DTranspose is inverse convolution.

From a larger convolutional volume to a smaller one, forward convolutions (Conv2D) are used! From a smaller one to a larger one, inverse convolutions (Conv2DTranspose) are used!

In [47]:
encoder_decoder = Sequential([
    # encoder:
    Input((224, 224, 3)),
    Conv2D(256, (3, 3), activation = "relu", padding = "same"),
    Conv2D(128, (3, 3), activation = "relu", padding = "same"),
    MaxPool2D(),
    Conv2D(128, (3, 3), activation = "relu", padding = "same"),
    Conv2D(64, (3, 3), activation = "relu", padding = "same"),
    MaxPool2D(),
    Conv2D(64, (3, 3), activation = "relu", padding = "same"),
    Conv2D(32, (3, 3), activation = "relu", padding = "same"),
    MaxPool2D(),

    # decoder:
    UpSampling2D(),
    Conv2DTranspose(32, (3, 3), activation = "relu", padding = "same"),
    Conv2DTranspose(64, (3, 3), activation = "relu", padding = "same"),
    UpSampling2D(),
    Conv2DTranspose(64, (3, 3), activation = "relu", padding = "same"),
    Conv2DTranspose(128, (3, 3), activation = "relu", padding = "same"),
    UpSampling2D(),
    Conv2DTranspose(128, (3, 3), activation = "relu", padding = "same"),
    Conv2DTranspose(256, (3, 3), activation = "relu", padding = "same"),

    # segmentation with 50 classes:
    Conv2DTranspose(50, (3, 3), activation = "softmax", padding = "same"),
])