## Understanding the Task:
+ We are tasked with analyzing a pretrained autoencoder neural network model, specifically focusing on its Mean Squared Error (MSE) performance on the MNIST dataset. The autoencoder is stored in an .h5 file named mnist_AE.+ 
h5. Our aim is to test whether the MSE data follows a normal distribution using the Kolmogorov-Smirnov test.

## Setting Up the Environment: 
+ We start by setting up the Python environment and importing necessary libraries. The set_seed function ensures reproducibility in our results.

In [2]:
import numpy as np
import random
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from scipy import stats

def set_seed(seed):
    np.random.seed(seed)
    random.seed(seed)

set_seed(810109203)





Loading the Pretrained Model: We load the mnist_AE.h5 file, which contains the pretrained autoencoder model.

In [3]:
model = tf.keras.models.load_model('mnist_AE.h5')





## Preparing the MNIST Dataset: 

+ The MNIST dataset is a collection of handwritten digit images, commonly used for training and testing in the field of machine learning.


In [5]:
# Load MNIST data
(x_train, _), (x_test, _) = mnist.load_data()

# Normalize and reshape the data
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))


## Model Evaluation:
+  We use the encoder and decoder parts of the autoencoder to compress and reconstruct the MNIST images. We then calculate the MSE between the original and reconstructed images.

In [7]:
# Reshape the test data to match the input shape expected by the model
x_test_reshaped = x_test.reshape(x_test.shape[0], 784)

In [9]:
# Reshape the reconstructed images to match the original images' shape
reconstructed_reshaped = reconstructed.reshape(x_test.shape)

# Calculate MSE for each image
mse = np.mean(np.square(x_test - reconstructed_reshaped), axis=(1, 2, 3))


## Statistical Test:
+ Finally, we perform the Kolmogorov-Smirnov test to check if the MSE values follow a normal distribution, using the mean and standard deviation of these MSE values.

In [10]:
# Calculate mean and standard deviation of MSE
mean = np.mean(mse)
std = np.std(mse)

# Perform the Kolmogorov-Smirnov test
ks_statistic, p_value = stats.kstest(mse, cdf='norm', args=(mean, std))

print(f"KS Statistic: {ks_statistic}, P-value: {p_value}")


KS Statistic: 0.0700142082808849, P-value: 4.5397725566600726e-43


## Step 1: Setup and Data Preparation
### We'll import TensorFlow, NumPy, and other required libraries. We'll also set the seed as per the provided instructions for consistency in random values. Then, we'll load the MNIST dataset and preprocess it for the autoencoder.

In [4]:
import numpy as np
import random
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
from scipy import stats

# Set the seed
def set_seed(seed):
    np.random.seed(seed)
    random.seed(seed)
    tf.random.set_seed(seed)

set_seed(810109203)

# Load MNIST dataset
(x_train, _), (x_test, _) = mnist.load_data()

# Normalize and reshape the data
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((-1, 28*28))
x_test = x_test.reshape((-1, 28*28))


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


## Step 2: Building and Training the Autoencoder
#### We'll create a simple autoencoder model. The encoder will compress the MNIST images into a lower-dimensional latent space, and the decoder will try to reconstruct the images.

In [6]:
# Autoencoder model
input_img = Input(shape=(28*28,))
encoded = Dense(128, activation='relu')(input_img)
encoded = Dense(64, activation='relu')(encoded)

decoded = Dense(128, activation='relu')(encoded)
decoded = Dense(28*28, activation='sigmoid')(decoded)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

# Train the autoencoder
autoencoder.fit(x_train, x_train, epochs=50, batch_size=256, shuffle=True, validation_data=(x_test, x_test))




Epoch 1/50

Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.src.callbacks.History at 0x264107b4a90>

## Step 3: Data Reconstruction and MSE Calculation
### We'll use the trained autoencoder to reconstruct the test set and calculate the MSE for each image.

In [7]:
# Predict (reconstruct) the test set
reconstructed_imgs = autoencoder.predict(x_test)

# Calculate MSE
mse = np.mean(np.power(x_test - reconstructed_imgs, 2), axis=1)




## Step 4: Normality Test
### Finally, we'll conduct a Kolmogorov-Smirnov test to check the normality of the MSE distribution.

In [8]:
mean = np.mean(mse)
std = np.std(mse)

# KS test for normality
ks_statistic, p_value = stats.kstest(mse, cdf='norm', args=(mean, std))
print(f'KS Statistic: {ks_statistic}, P-value: {p_value}')


KS Statistic: 0.09343222023897935, P-value: 2.0121003016299243e-76
