                                    TOPIC: Understanding Pooling and Padding in CNN

Q1.  Describe the purpose and benefits of pooling in CNN.

Pooling, also known as subsampling or downsampling, is a fundamental operation in Convolutional Neural Networks (CNNs) primarily used for dimensionality reduction and feature translation in the context of feature maps generated by convolutional layers. The purpose and benefits of pooling in CNNs can be outlined as follows:

1. Dimensionality Reduction: One of the primary purposes of pooling is to reduce the spatial dimensions (width and height) of the feature maps while retaining important information. This reduction helps in controlling the number of parameters and computational complexity of the subsequent layers in the network. As the spatial dimensions decrease, the computational burden decreases, allowing for deeper networks to be trained more efficiently.

2. Translation Invariance: Pooling helps in achieving translation invariance, which means that the exact location of a feature in the input image becomes less important. By aggregating information from neighboring regions, pooling layers create features that are more robust to small variations in the position of the features within the input image. This property is particularly useful for tasks such as object recognition, where the position of the object in the image may vary.

3. Feature Generalization: Pooling aids in capturing the most important features present in the feature maps by summarizing information from local neighborhoods. By selecting the maximum (Max Pooling) or average (Average Pooling) value within a window, pooling layers emphasize the most active features while suppressing noise and irrelevant details. This process helps in generalizing features and making the network more robust to variations and distortions in the input data.

4. Parameter Reduction and Overfitting Mitigation: Pooling layers contribute to reducing the number of parameters in the network, which helps in mitigating overfitting, especially in scenarios with limited training data. By summarizing information and discarding redundant details, pooling reduces the risk of overfitting by preventing the network from memorizing the training data.

5. Computational Efficiency: Pooling operations are computationally efficient compared to convolutional operations, as they involve simple operations such as max or average calculations within local regions. This efficiency enables faster training and inference times, making CNNs with pooling layers suitable for real-time applications and deployment on resource-constrained devices.

Q2. Explain the differences between min pooling and max pooling in CNN.


Max pooling and min pooling are two common types of pooling operations used in Convolutional Neural Networks (CNNs) for downsampling feature maps. While both serve the purpose of reducing the spatial dimensions of the feature maps, they differ in how they aggregate information within local regions. Here are the main differences between min pooling and max pooling:

1. Aggregation Function:

Max Pooling: In max pooling, the maximum value within each local region (typically defined by a sliding window) is retained and propagated to the output feature map. This means that only the most active feature within each region is preserved, discarding less significant features.

Min Pooling: In min pooling, the minimum value within each local region is retained and propagated to the output feature map. Unlike max pooling, min pooling emphasizes the least active feature within each region, which can be useful in certain scenarios, such as edge detection or boundary localization.

2. Feature Emphasis:

Max Pooling: Max pooling emphasizes the most prominent features within each local region, effectively highlighting the presence of specific patterns or structures. It is particularly useful for capturing the most salient features while discarding noise and irrelevant details.

Min Pooling: Min pooling, on the other hand, emphasizes the least active features within each region, which may be useful for tasks where the presence of certain structures or boundaries is of interest. It can help in accentuating subtle features and enhancing the network's sensitivity to specific patterns.

3. Applications:

Max Pooling: Max pooling is commonly used in CNN architectures for tasks such as image classification, object detection, and feature extraction. Its ability to preserve the most dominant features makes it suitable for tasks where identifying key patterns is crucial.

Min Pooling: Min pooling is less commonly used compared to max pooling. It finds applications in scenarios where the emphasis is on detecting boundaries, edges, or regions of low intensity. However, its usage is more specialized and less prevalent in standard CNN architectures.

4. Robustness to Noise:

Max Pooling: Max pooling tends to be more robust to noise compared to min pooling, as it focuses on the most significant features and suppresses the impact of noisy or less relevant information.
Min Pooling: Min pooling may amplify the effects of noise, especially if the least active features correspond to noise or artifacts in the input data. Therefore, careful consideration is needed when using min pooling in noisy environments.

Q3. Discuss the concept of padding in CNN and its significance.

Padding is a technique used in Convolutional Neural Networks (CNNs) to preserve the spatial dimensions of feature maps throughout the convolutional layers. It involves adding additional border pixels (zeros or values derived from the image boundary) around the input image or feature map before applying convolutional operations. Padding affects the size of the output feature maps after convolution and plays a significant role in the design and performance of CNN architectures. Here's a discussion on the concept of padding and its significance:

1. Preservation of Spatial Information:

Padding helps in maintaining the spatial dimensions of feature maps, especially when using convolutional layers with a stride greater than 1. Without padding, applying convolutional operations repeatedly can progressively reduce the spatial dimensions of the feature maps, potentially leading to loss of spatial information and resolution.

2. Centering the Kernel:

Padding ensures that the convolutional kernel is centered on each input pixel, allowing for better coverage of the input image. When the kernel is centered, it can extract features from all parts of the input image, which is essential for capturing both global and local patterns effectively.

3. Mitigation of Border Effects:

Without padding, the convolution operation may not fully cover the borders of the input image, leading to border effects where the output feature map size is smaller than the input size. Padding helps in mitigating these effects by providing additional pixels around the borders, ensuring that convolutional operations can be applied uniformly across the entire input image.

4. Control Over Output Size:

By adjusting the amount of padding applied, the size of the output feature maps can be controlled. This flexibility allows network designers to precisely manage the spatial dimensions of feature maps at each layer, facilitating the design of CNN architectures tailored to specific tasks and input data characteristics.

5. Improved Network Performance:

Properly padded convolutional layers can lead to better performance in terms of feature representation and learning. Padding helps in preserving spatial information, which is crucial for tasks such as object localization and segmentation, where precise spatial relationships between features are essential for accurate predictions.

Handling Arbitrary Input Sizes:

Padding enables CNNs to handle input images of arbitrary sizes. By padding the input images to a consistent size before feeding them into the network, CNNs can process inputs with varying dimensions without requiring modifications to the network architecture.
Reduction of Boundary Artifacts:

Padding can help in reducing boundary artifacts, such as aliasing or distortion effects, that may occur at the edges of feature maps during convolution. By providing additional pixels around the borders, padding ensures that the convolutional operation is applied more smoothly, minimizing the risk of artifacts in the output feature maps.

                                          OPIC: Exploring LeNet

Q1. Provide a brief overview on LeNet-5 architecture.

LeNet-5 is a convolutional neural network (CNN) architecture designed by Yann LeCun et al. in 1998, primarily for handwritten digit recognition tasks. It played a significant role in popularizing CNNs and laying the foundation for modern deep learning architectures. Here's a brief overview of its architecture:

1. Input Layer: The network takes as input grayscale images of size 32x32 pixels.

2. Convolutional Layers: LeNet-5 consists of two convolutional layers:

Convolutional layer 1: Applies convolution with 6 filters (5x5) followed by a hyperbolic tangent (tanh) activation function.

Subsampling layer 1: Performs down-sampling using average pooling over 2x2 regions with a stride of 2.

Convolutional layer 2: Applies convolution with 16 filters (5x5) followed by a tanh activation.

Subsampling layer 2: Another down-sampling layer, similar to the first one.


3. Fully Connected Layers: After the convolutional and subsampling layers, the feature maps are flattened and connected to fully connected layers:

Fully connected layer 1: Consists of 120 neurons with a tanh activation function.
Fully connected layer 2: Followed by another fully connected layer with 84 neurons, also using a tanh activation function.

4. Output Layer: The final layer is a softmax output layer with 10 units, representing the 10 possible classes (digits 0 through 9).

5. Activation Functions: Throughout the network, the hyperbolic tangent (tanh) activation function is used in convolutional and fully connected layers.

6. Training: LeNet-5 was trained using gradient-based optimization techniques like stochastic gradient descent (SGD) with momentum.

7. Loss Function: The network is typically trained using cross-entropy loss, which is suitable for multi-class classification tasks like digit recognition.

8. Other Features:

LeNet-5 uses local connectivity and shared weights, reducing the number of parameters compared to fully connected networks.
It employs a simple architecture with alternating convolutional and subsampling layers, which helps capture hierarchical features.


 Q2. Describe the key components of LeNet-5 and their respective purposes.
 
 LeNet-5, being one of the pioneering convolutional neural network (CNN) architectures, comprises several key components, each serving a specific purpose in the network's design. Here's a breakdown of the key components and their purposes:
 
 1. Convolutional Layers:

Purpose: The convolutional layers are responsible for extracting features from the input images by performing convolution operations with learnable filters.

Components:

Convolution Operation: This operation involves sliding a small filter/kernel over the input image, computing element-wise multiplications, and summing the results to produce feature maps.

Learnable Filters: These filters are parameters of the network that are learned during the training process to detect various patterns and features present in the input images.

2. Subsampling (Pooling) Layers:

Purpose: Subsampling layers reduce the spatial dimensions of the feature maps, effectively downsampling them while retaining the most important information.

Components:

Pooling Operation: Common pooling operations include max pooling or average pooling, where the maximum or average value within each pooling region is retained, respectively.

Downsampling: By reducing the spatial dimensions, subsampling layers help in controlling the number of parameters and computational complexity of the network while enhancing translational invariance.

3. Fully Connected Layers:

Purpose: Fully connected layers at the end of the network serve for classification by taking the high-level features extracted by convolutional and pooling layers and mapping them to class scores.

Components:

Neurons: Each neuron in the fully connected layers receives inputs from all the neurons in the previous layer and applies a weighted sum followed by an activation function.

Activation Functions: Activation functions like hyperbolic tangent (tanh) or rectified linear unit (ReLU) introduce non-linearity into the network, enabling it to learn complex relationships between features and classes.

4. Activation Functions:

Purpose: Activation functions introduce non-linearities into the network, allowing it to learn and represent complex mappings between inputs and outputs.

Components:
Tanh Activation: Tanh activation function was commonly used in LeNet-5 for introducing non-linearity while keeping the output values within a bounded range (-1 to 1).

ReLU Activation: While not used in LeNet-5, modern architectures often employ Rectified Linear Unit (ReLU) activation for faster convergence and mitigating the vanishing gradient problem.

5. Output Layer:

Purpose: The output layer generates the final predictions or class probabilities based on the features extracted and processed through the preceding layers.

Components:

Softmax Activation: Softmax activation function is typically used in the output layer for multi-class classification tasks, producing a probability distribution over the classes.

Q3. Discuss the advantages and limitations of LeNet-5 in the context of image classification tasks.

LeNet-5, being one of the pioneering convolutional neural network (CNN) architectures, brought several advantages to image classification tasks. However, it also has its limitations, especially when compared to more modern architectures. Let's discuss the advantages and limitations of LeNet-5:

Advantages:

1. Hierarchical Feature Extraction: LeNet-5 employs a hierarchical architecture with alternating convolutional and subsampling layers, allowing it to extract features at multiple levels of abstraction. This hierarchical feature extraction is effective in capturing patterns and structures in images.

2. Parameter Sharing: LeNet-5 utilizes parameter sharing in convolutional layers, where the same set of weights is used across different spatial locations. This reduces the number of parameters in the network, making it more computationally efficient and reducing the risk of overfitting, especially in scenarios with limited training data.

3. Local Connectivity: By focusing on local connectivity within the receptive fields, LeNet-5 can capture spatial dependencies in the input images more effectively. This local connectivity aids in learning translation-invariant features, making the network robust to variations in object position within the image.

4. Simple Architecture: LeNet-5 has a relatively simple architecture compared to modern CNNs, which makes it easier to understand, implement, and train. This simplicity also contributes to faster training times and reduced computational overhead, especially on hardware with limited resources.

Limitations:

1. Limited Capacity: Due to its simple architecture, LeNet-5 has limited capacity to learn complex features and relationships in large and diverse datasets. This limitation may result in suboptimal performance when applied to more challenging image classification tasks beyond simple digit recognition.

2. Small Receptive Fields: The size of the receptive fields in LeNet-5's convolutional layers is relatively small (5x5), which may restrict its ability to capture larger and more complex spatial structures present in high-resolution images or objects with intricate details.

3. Limited Non-linearity: LeNet-5 predominantly uses the hyperbolic tangent (tanh) activation function, which has limited non-linearity compared to modern activation functions like ReLU. This may constrain the network's ability to model complex relationships in the data, potentially hindering its performance on certain tasks.

4. Pooling Loss: While subsampling layers (pooling) help in reducing spatial dimensions and controlling the number of parameters, they also lead to information loss, potentially discarding useful spatial information. This may impact the network's ability to precisely localize objects or capture fine-grained details.

5. Performance on Complex Datasets: LeNet-5 was primarily designed for handwritten digit recognition tasks and may not generalize well to more complex datasets with diverse object categories, varying backgrounds, and occlusions. Its performance may degrade significantly when applied to such datasets without appropriate modifications or enhancements.


Q4. Implement LeNet-5 using a deep learning framework TensorFlow, and train it on a publicly available dataset (e.g., MNIST). Evaluate its performance and provide insights. 

In [7]:
pip install tensorflow

Note: you may need to restart the kernel to use updated packages.


In [5]:
pip install tensorflow==<version>

/bin/bash: -c: line 1: syntax error near unexpected token `newline'
/bin/bash: -c: line 1: `/opt/conda/bin/python -m pip install tensorflow==<version>'
Note: you may need to restart the kernel to use updated packages.


In [6]:
import tensorflow as tf
print(tf.__version__)

2024-01-29 12:36:38.891334: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-01-29 12:36:38.958985: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-29 12:36:38.959043: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-29 12:36:38.960811: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-29 12:36:38.970428: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-01-29 12:36:38.971198: I tensorflow/core/platform/cpu_feature_guard.cc:1

2.15.0


In [44]:
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

In [45]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [46]:
train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)


In [47]:
import tensorflow as tf
from tensorflow.keras import layers, models

In [15]:
model = models.Sequential([ layers.Conv2D(6, (5, 5), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
    layers.Conv2D(16, (5, 5), activation='relu'),
    layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
    layers.Flatten(),
    layers.Dense(120, activation='relu'),
    layers.Dense(84, activation='relu'),
    layers.Dense(10, activation='softmax')
])

In [16]:
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])


In [50]:
model.fit(train_images, train_labels, epochs=10, batch_size=128, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7f90c42d6c50>

In [51]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)

Test accuracy: 0.9900000095367432


                                                                    OPIC: Analyzing AlexNet

 Q1. Present an overview of the AlexNet acchitecture. 
 
 AlexNet is a pioneering convolutional neural network (CNN) architecture designed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. It was the winning entry in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, significantly advancing the field of computer vision. Here's an overview of the AlexNet architecture:
 
1. Input Layer:

AlexNet takes as input RGB images of size 224x224 pixels.

2. Convolutional Layers:

AlexNet consists of five convolutional layers.
The first convolutional layer applies 96 filters (11x11x3) with a stride of 4 and uses the ReLU activation function.
Subsequent convolutional layers use smaller filter sizes (5x5) and a stride of 1.

3. Max Pooling Layers:

After the first, second, and fifth convolutional layers, there are max-pooling layers applied over 3x3 regions with a stride of 2.

4. Normalization Layers:

Local Response Normalization (LRN) layers are used after the first and second convolutional layers to normalize the responses and enhance generalization.

5. Flattening Layer:

The output from the last convolutional layer is flattened into a single vector to be fed into the fully connected layers.

6. Fully Connected Layers:

AlexNet contains three fully connected layers with 4096 neurons each, followed by a dropout layer to prevent overfitting.
The ReLU activation function is used in the fully connected layers.

7. Output Layer:

The final layer is a fully connected softmax output layer with 1000 units, representing the 1000 ImageNet classes.

8. Dropout:

Dropout regularization is applied before the output layer to mitigate overfitting. It randomly drops neurons during training to reduce co-adaptation of neurons.

9. Activation Function:

AlexNet predominantly uses the Rectified Linear Unit (ReLU) activation function, which introduces non-linearity and accelerates convergence compared to traditional activation functions like tanh or sigmoid.

10. Training:

AlexNet was trained using stochastic gradient descent (SGD) with momentum.
Data augmentation techniques, such as image translations, horizontal reflections, and altering the intensity of RGB channels, were employed during training to increase the size of the training set and improve generalization.

Architecture Advancements:

AlexNet introduced several architectural innovations, including the use of deeper networks, ReLU activation functions, dropout regularization, and data augmentation, which significantly improved the performance of CNNs on image classification tasks.

Q2. Explain the architectural innovations introduced in AlexNet that contributed to its breakthrough 
performance. 

AlexNet introduced several architectural innovations that played a crucial role in its breakthrough performance in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. These innovations contributed to the improved performance of convolutional neural networks (CNNs) on image classification tasks. Here are the key architectural innovations introduced in AlexNet:

1. Deeper Architecture:

AlexNet was one of the first CNN architectures to utilize a relatively deep network with multiple layers.
Deeper networks allowed for more abstract and hierarchical feature representations to be learned, capturing increasingly complex patterns in the data.

2. ReLU Activation Function:

AlexNet replaced traditional activation functions like hyperbolic tangent (tanh) or sigmoid with the Rectified Linear Unit (ReLU) activation function.
ReLU activation helped mitigate the vanishing gradient problem, enabling faster convergence during training by allowing gradients to flow more freely through the network.
The simplicity and computational efficiency of ReLU made it feasible to train deeper networks more effectively.

3. Local Response Normalization (LRN):

AlexNet introduced Local Response Normalization (LRN) layers after the first and second convolutional layers.
LRN layers helped improve generalization by normalizing the responses within local neighborhoods of the feature maps.
LRN layers acted as a form of lateral inhibition, enhancing the contrast between activated neurons and suppressing responses of neighboring neurons.

4. Overlapping Max Pooling:

AlexNet utilized max-pooling layers with overlapping regions (3x3 with a stride of 2) after convolutional layers.
Overlapping pooling reduced spatial resolution while preserving more spatial information compared to non-overlapping pooling.
It helped capture richer spatial hierarchies and provided some translation invariance while reducing computational complexity.

5. Dropout Regularization:

AlexNet incorporated dropout regularization before the fully connected layers to prevent overfitting.
Dropout randomly drops a fraction of neurons during training, forcing the network to learn more robust features and reducing co-adaptation of neurons.
Dropout regularization helped improve generalization and prevent the model from memorizing noise in the training data.

6. Data Augmentation:

AlexNet employed data augmentation techniques during training, such as image translations, horizontal reflections, and altering the intensity of RGB channels.
Data augmentation increased the effective size of the training dataset, providing the model with more diverse examples and improving its ability to generalize to unseen data.


 Q3. Discuss the Role of convolutional layers, pooling layers, and fully connected layers in AlexNet.
 
 In AlexNet, convolutional layers, pooling layers, and fully connected layers play distinct yet complementary roles in the network's architecture, contributing to its effectiveness in image classification tasks. Here's a discussion of the roles of each of these components in AlexNet:
 
1. Convolutional Layers:

Feature Extraction: The primary role of convolutional layers in AlexNet is to extract hierarchical features from input images.

Local Receptive Fields: Convolutional layers employ learnable filters that slide across the input image, capturing local patterns and features. These local receptive fields help the network learn spatial hierarchies of features.

Non-linearity: After each convolution operation, a non-linear activation function (ReLU) is applied, introducing non-linearity into the network and enabling it to learn complex mappings between inputs and outputs.

Hierarchical Representation: Successive convolutional layers capture increasingly abstract and high-level features by combining information from preceding layers. This hierarchical representation enables the network to learn complex patterns and variations present in the input data.

2. Pooling Layers:

Spatial Subsampling: Pooling layers reduce the spatial dimensions of feature maps while retaining the most salient information.

Translation Invariance: By performing max pooling over small regions, pooling layers provide some degree of translation invariance, making the network robust to variations in object position within the image.

Dimensionality Reduction: Pooling layers help control the number of parameters and computational complexity of the network by reducing the spatial dimensions of feature maps, thus aiding in preventing overfitting.

Feature Generalization: Pooling layers help generalize features by capturing the most dominant features within local regions of the feature maps, further enhancing the network's ability to generalize to unseen data.

3. Fully Connected Layers:

High-Level Representation: Fully connected layers at the end of the network process the high-level features extracted by convolutional and pooling layers.

Classification: The role of fully connected layers is to map the learned features to class scores or probabilities, enabling the network to make predictions about the input image's class.

Global Context: Fully connected layers aggregate information from all neurons in the preceding layer, providing a global context for making predictions.

Non-linear Mapping: Like convolutional layers, fully connected layers employ non-linear activation functions (ReLU) to introduce non-linearity and enable the network to learn complex decision boundaries between different classes.


Q4. Implement AlexNet using a deep learning framework of your choice and evaluate its performance 
on a dataset of your choice.

In [2]:
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

2024-01-29 14:59:23.349231: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-01-29 14:59:23.416162: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-29 14:59:23.416265: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-29 14:59:23.417994: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-29 14:59:23.429358: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-01-29 14:59:23.431681: I tensorflow/core/platform/cpu_feature_guard.cc:1

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape((-1, 28, 28, 1)).astype('float32') / 255.0
x_test = x_test.reshape((-1, 28, 28, 1)).astype('float32') / 255.0

In [None]:
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)


In [None]:
# Define AlexNet architecture
model = models.Sequential()
([
    layers.Conv2D(64, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Conv2D(256, (3, 3), activation='relu'),
    layers.Conv2D(256, (3, 3), activation='relu'),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Flatten(),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
])


In [None]:
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

In [None]:
history = model.fit(x_train, y_train, epochs=8, batch_size=70, validation_split=0.2)

In [None]:
test_loss, test_acc = model.evaluate(x_test, y_test)
print('Test accuracy:', test_acc)