# MNIST digit classification before and after shuffling

**Task:**
In this notebook you will use a convolutional neural network (CNN), to train two neural networks on the original and the pixel shuffled MNIST dataset and compare the performances. Before you used already the Fully Connected NN with the same data


**Dataset:** You work with the MNIST dataset. We have 60'000 28x28 pixel greyscale images of digits and want to classify them into the right label (0-9). In the second part the original pixel in the image are shuffled.

**Content:**
* load the original MNIST data and create a randomly pixel shuffled version of the data
* visualize samples of the orginal and shuffled version of the data
* use keras to train a CNN with the original and shuffled data and compare the perfomance on new unseen test data
* check if the local structure of the pixels within the images have an impact on the classification performance when you use a CNN




#### Imports

In the next two cells, we load all the required libraries and download the MNIST data, normalize the pixelvalues to be between 0 and 1, and seperate it into a training and validation set.

In [None]:
# load required libraries
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('default')
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

import os
os.environ["KERAS_BACKEND"] = "torch"
import keras
import torch # not needed yet

print(f'Keras_version: {keras.__version__}')# 3.5.0
print(f'torch_version: {torch.__version__}')# 2.5.1+cu121
print(f'keras backend: {keras.backend.backend()}')

# Keras Building blocks
from keras.models import Sequential
from keras.layers import Dense, Convolution2D, MaxPooling2D, Flatten , Activation
from keras.optimizers import SGD
from keras.utils import to_categorical
from keras import optimizers


In [None]:
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# separate x_train in X_train and X_val, same for y_train
X_train=x_train[0:50000] / 255 #divide by 255 so that they are in range 0 to 1
Y_train=to_categorical(y_train[0:50000],10) # one-hot encoding

X_val=x_train[50000:60000] / 255
Y_val=to_categorical(y_train[50000:60000],10)

X_test=x_test / 255
Y_test=to_categorical(y_test,10)

del x_train, y_train, x_test, y_test

X_train=np.reshape(X_train, (X_train.shape[0],28,28,1))
X_val=np.reshape(X_val, (X_val.shape[0],28,28,1))
X_test=np.reshape(X_test, (X_test.shape[0],28,28,1))

print(X_train.shape)
print(X_val.shape)
print(X_test.shape)

Let's visualize the first 4 MNIST images before shuffling the pixels randomly. It is very easy to recognise the true label of the digits.

In [None]:
# visualize the 4 first mnist images before shuffling the pixels
plt.figure(figsize=(12,12))
for i in range(0,4):
    plt.subplot(1,4,(i+1))
    plt.imshow((X_train[i,:,:,0]),cmap="gray")
    plt.title('true label: ' + str(np.argmax(Y_train,axis=1)[i]))
    #plt.axis('off')

In the next cell we shuffle the pixel of each image randomly. Note that we shuffle every image in same manner!

In [None]:
# function to shuffle the pixel order within an image
# used to shuffel the pixels of all mnist images in the same manner
def shuffel_pixels(idx, data):
    data_new=np.zeros((data.shape))
    for i,img in enumerate(data):
        data_new[i] = img.flatten()[idx].reshape((28,28,1))
    return data_new

np.random.seed(42)
shuffel_idx = np.random.permutation(np.arange(28*28))
X_train_shuffle = shuffel_pixels(shuffel_idx, X_train)
X_val_shuffle = shuffel_pixels(shuffel_idx, X_val)
X_test_shuffle = shuffel_pixels(shuffel_idx, X_test)

Let's visualize the first 4 MNIST images after shuffling the pixels randomly around. Now as a human you have no chance to recognise the true label of the digits.

In [None]:
# visualize the 4 first mnist images after shuffling the pixels
plt.figure(figsize=(12,12))
for i in range(0,4):
    plt.subplot(1,4,(i+1))
    plt.imshow((X_train_shuffle[i,:,:,0]), cmap="gray")
    plt.title('true label: ' + str(np.argmax(Y_train,axis=1)[i]))

# CNN as classification model for MNIST data

Now, we train a CNN to classify the MNIST data. We use the same netwok architecture to train first with the original data and then with the shuffled data.
* Use a CNN with 2 convolution blocks and 2 fully connected layers as classification model
* train it once on the original train data and check the performance on the original test data
* train it once on the shuffeled train data and check the performance on the accordingly shuffled test data

### Train the CNN on the **original data**

In [None]:
# check the shape of the orginal data
# we need matrices as input
X_train.shape,Y_train.shape,X_val.shape,Y_val.shape

In the next cell we define the hyperparameters and architecture of the CNN. We use:
- the relu activation function  
- batchsize of 128  
- kernelsize of 3x3  
- poolingsize of 2x2   
- our inputs are the greyscaled MNIST images, so the shape is 28x28x1  
- we use 2 convolutional blocks with 8 filters and then a maxpooling layer followed by again 2 convolutional blocks with 16 filters and then a maxpooling  
- we flatten the output and use a fully connected layer with 40 nodes and the output has 10 nodes with the softmax activation.

In [None]:
# here we define the hyperparameter of the CNN
batch_size = 128
nb_classes = 10
img_rows, img_cols = 28, 28
kernel_size = (3, 3)
input_shape = (img_rows, img_cols, 1)
pool_size = (2, 2)

In [None]:
# define CNN with 2 convolution blocks and 2 fully connected layers
model = Sequential()

model.add(Convolution2D(8,kernel_size,padding='same',input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(8, kernel_size,padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))

##### Your code here ######
##### add two convolutional layers with each 16 filter and a maxpooling layer

model.add(Convolution2D(16, kernel_size,padding='same'))
model.add(Activation('relu'))
model.add(Convolution2D(16,kernel_size,padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))

##### End of your code  ######

model.add(Flatten())
model.add(Dense(40))
model.add(Activation('relu'))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

# save the model for the shuffled data
model2 = clone_model(model)

# compile model and intitialize weights
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
# summarize model along with number of model weights
model.summary()

In [None]:
# train the model
history=model.fit(X_train, Y_train,
                  batch_size=128,
                  epochs=10,
                  verbose=1,
                  validation_data=(X_val, Y_val)
                 )

In [None]:
# plot the development of the accuracy and loss during training
plt.figure(figsize=(12,4))
plt.subplot(1,2,(1))
plt.plot(history.history['accuracy'],linestyle='-.')
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'valid'], loc='lower right')
plt.subplot(1,2,(2))
plt.plot(history.history['loss'],linestyle='-.')
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'valid'], loc='upper right')

#### Prediction on the test set after training on original data

Now, let us use CNN that was trained on the original data to predict new unseen data (our testdata). We determine the confusion matrix and the accuracy on the testdata to evaluate the classification performance.

In [None]:
#### Exercise
#### Use the trained model to calculate the accuracy and the confusion matrix on the test data

### Your code here ###

# predict each instance of the testset
pred=model.predict(X_test)
# get confusion matrix
cm = confusion_matrix(np.argmax(Y_test, axis=1), np.argmax(pred, axis=1))

acc_fc = np.sum(np.argmax(Y_test,axis=1)==np.argmax(pred,axis=1))/len(pred)
print("Accuracy = " , acc_fc)

disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot(cmap='viridis')
plt.title('Confusion Matrix')
plt.show()

### Train the CNN on the **shuffled data**

In [None]:
# check the shape of the shuffled data
# we need matrices as input
X_train_shuffle.shape,Y_train.shape,X_val_shuffle.shape,Y_val.shape

In [None]:
# compile model and intitialize weights
model2.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
# train the model
history2=model2.fit(X_train_shuffle, Y_train,
                  batch_size=128,
                  epochs=10,
                  verbose=1,
                  validation_data=(X_val_shuffle, Y_val)
                 )

In [None]:
# plot the development of the accuracy and loss during training
plt.figure(figsize=(12,4))
plt.subplot(1,2,(1))
plt.plot(history2.history['accuracy'],linestyle='-.')
plt.plot(history2.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'valid'], loc='lower right')
plt.subplot(1,2,(2))
plt.plot(history2.history['loss'],linestyle='-.')
plt.plot(history2.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'valid'], loc='upper right')

###

In [None]:
#### Exercise
#### Use the trained model to calculate the accuracy and the confusion matrix on the test data

### Your code here ###



### 🔧 **YOUR TASK:**
  #### Prediction on the test set after training on the shuffled data
- Use the CNN that was trained on the shuffled data to predict new unseen data (our testdata). We determine the confusion matrix and the accuracy on the testdata to evaluate the classification performance.

#### Comparison

- Compare the performances of the fcNN on the MNIST dataset to the CNN performances. What do you observe?  
- Compare the performance of the CNN on the original and on the shuffled dataset. Try to explain the differences.  


In [None]:
### YOUR CODE HERE ###

### 🔑 **Solution:**


In [None]:
# @title Solution Code { display-mode: "form" }
# predict each instance of the testset
pred=model2.predict(X_test_shuffle)
# get confusion matrix
cm = confusion_matrix(np.argmax(Y_test, axis=1), np.argmax(pred, axis=1))

acc_fc = np.sum(np.argmax(Y_test,axis=1)==np.argmax(pred,axis=1))/len(pred)
print("Accuracy = " , acc_fc)

disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot(cmap='viridis')
plt.title('Confusion Matrix')
plt.show()

<details>
  <summary>🔑 Click here to View Answers:</summary>



----------
The MLP (MultiLayerPerceptron aka fully connected NN) from notebook 03 had a total number of trainable **parameters : 84060**, a NLL of ~ 0.1 and achieved an accuracy around 0.97

The CNN used in this notebook only uses **35962 parameters**  which is less than **half the size** of the MLP and reaches ~ 0.98 accuracy and a loss of 0.05




----------






Surprisingly the accuracy on the testset is still ~ 0.95 when the dataset is shuffled! However its worse than using the real dataset ~ 0.98.
Somehow there are still structures in the reshuffled dataset which could be learned (since their all reshuffled in the same way) by a CNN.

The overall structure and therefore local dependencies in the image get destroyed by the shuffling. Opposed to the FC NN, the performance of the CNN suffers if the images are shuffled. This makes sense since a CNN assumes by architecture that the ordering and local neighborhoods of the pixels matter (that is called a "model bias"): each neuron gets as input only a patch of neighboring pixels in the previous layer as input. The far-reaching correlations are learned by stacking more layers and hence increasing the perception field. Since a CNN has the "model bias" that neighborhood is essential, it does not have to learn that from scratch as a FC NN. Therefore, the CNN outperforms the FC NN when used on original MNIST image data but has roughly the same performance as a FC NN when provided with shuffled image data.

</details>