Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.  

In this assignment, we
- build a MLP classifier for the Fashion-MNIST dataset.
- use PCA to reduce the dimensionality of the dataset, and make sure we preserve 95% of the explained variance. (20 points)
- train a classifier using the dimensionality reduced dataset with the same network toplogy as the previous classfier,  and compare the classification accuracy result with the one using the original dataset. (10 points)
- check whether we observe anything surprising. (10 points)
- follow and improve the example from the text to fine tune the neural network hyperparameters using RandomizedSearchCV. Note to use the dataset after the PCA step. This will make the search less time consuming. (40 points)
- report the test result using the best model obtained from the randomized search. Show the summary of the model. Compare this result with the previous results. (20 points)

In [2]:
import numpy as np
import os

# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12

Read the data.

In [3]:
import tensorflow as tf
fashion_mnist = tf.keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

Set random seeds.

In [4]:
np.random.seed(42)
tf.random.set_seed(42)

In [5]:
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(64, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

In [6]:
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

In [7]:
X_train_full = X_train_full.reshape((60000, 28 * 28))
X_train_full = X_train_full.astype('float32') / 255

X_test = X_test.reshape((10000, 28 * 28))
X_test = X_test.astype('float32') / 255

In [8]:
from tensorflow.keras.utils import to_categorical

y_train_full = to_categorical(y_train_full)
y_test = to_categorical(y_test)

In [9]:
X_valid = X_train_full[:5000]
y_valid = y_train_full[:5000]
X_train = X_train_full[5000:]
y_train = y_train_full[5000:]

In [10]:
network.fit(X_train, y_train, validation_data=(X_valid, y_valid), epochs=20, batch_size=128)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.src.callbacks.History at 0x203ab4f65d0>

In [11]:
test_loss, test_acc = network.evaluate(X_test, y_test)



In [12]:
print('test_acc:', test_acc)

test_acc: 0.8666999936103821


Now we read the data again.

In [13]:
import tensorflow as tf
fashion_mnist = tf.keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

Use the same random seeds.

In [14]:
np.random.seed(42)
tf.random.set_seed(42)

In [15]:
X_train_full.shape

(60000, 28, 28)

In [16]:
X_test.shape

(10000, 28, 28)

In [17]:
X_train_full = X_train_full.reshape((60000,28*28))
X_test = X_test.reshape((10000,28*28))

Conduct fit and transform on X_train_full using PCA. (10 points)

In [18]:
from sklearn.decomposition import PCA

# fill in code here
pca = PCA(n_components=187)
X_train_reduced_full = pca.fit_transform(X_train_full)

Transform X_test using the PCA. (10 points)

In [19]:
# fill in code here
X_test_reduced = pca.transform(X_test)

Fill in the input_shape in the following code. (10 points)

In [20]:
from keras import models
from keras import layers

network = models.Sequential()
# fill in code
network.add(layers.Dense(64, activation='relu', input_shape=(187, )))
network.add(layers.Dense(10, activation='softmax'))

In [21]:
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

In [22]:
X_train_reduced_full = X_train_reduced_full.astype('float32') / 255
X_test_reduced = X_test_reduced.astype('float32') / 255

In [23]:
from tensorflow.keras.utils import to_categorical

y_train_full = to_categorical(y_train_full)
y_test = to_categorical(y_test)

In [24]:
X_valid_reduced = X_train_reduced_full[:5000]
y_valid = y_train_full[:5000]
X_train_reduced = X_train_reduced_full[5000:]
y_train = y_train_full[5000:]

In [25]:
network.fit(X_train_reduced, y_train, validation_data=(X_valid_reduced, y_valid), epochs=20, batch_size=128)

Epoch 1/20


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.src.callbacks.History at 0x203abfc3e90>

In [26]:
test_loss, test_acc = network.evaluate(X_test_reduced, y_test)



In [27]:
print('test_acc:', test_acc)

test_acc: 0.8822000026702881


Compare these two accuracy results and check whether we see anything surprising. (10 points)

The accuracy result ...

In [28]:
np.random.seed(42)
tf.random.set_seed(42)

Modify the code provided by this module and use RandomizedSearchCV to find a model that beats the previous accuracy results. (40 points)

Hint: you can speed up the search by using n_jobs = 1 in RandomizedSearchCV.

In [29]:
X_valid = X_train_reduced[:5000]
y_valid = y_train[:5000]
X_train = X_train_reduced[5000:]
y_train = y_train[5000:]

In [30]:
X_train.shape, X_valid.shape

((50000, 187), (5000, 187))

In [31]:
y_train.shape, y_valid.shape, y_test.shape

((50000, 10), (5000, 10), (10000, 10))

In [32]:
from tensorflow import keras

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

In [33]:
def build_model(n_hidden=1, n_neurons=128, learning_rate=3e-3, input_shape=(X_train_reduced.shape[1],)):
    model = keras.models.Sequential()
    model.add(keras.layers.InputLayer(input_shape=input_shape))
    for layer in range(n_hidden):
        # fill in code
        model.add(keras.layers.Dense(n_neurons, activation="relu"))
    # fill in code
    model.add(layers.Dense(10, activation='softmax'))
    optimizer = keras.optimizers.SGD(learning_rate=learning_rate)
    model.compile(optimizer='rmsprop',
                  # fill in code
                loss='categorical_crossentropy',
                metrics=['accuracy'])

    return model

In [34]:
from tensorflow import keras
from sklearn.base import BaseEstimator, RegressorMixin

# Modify the KerasRegressorWrapper class to accept hyperparameters
class KerasRegressorWrapper(BaseEstimator, RegressorMixin):
    def __init__(self, n_hidden=1, n_neurons=200, learning_rate=1e-3):
        self.n_hidden = n_hidden
        self.n_neurons = n_neurons
        self.learning_rate = learning_rate

    def fit(self, X, y, **kwargs):
        self.model = build_model(self.n_hidden, self.n_neurons, self.learning_rate)
        self.model.fit(X, y, **kwargs)
        return self

    def predict(self, X):
        return self.model.predict(X)

# Create an instance of the KerasRegressorWrapper
keras_reg = KerasRegressorWrapper()


In [35]:
keras_reg.fit(X_train, y_train, epochs=100,
              # fill in code
              validation_data=(X_valid, y_valid),
              callbacks=[keras.callbacks.EarlyStopping(patience=10)])

Epoch 1/100


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100


In [38]:
from scipy.stats import reciprocal
from sklearn.model_selection import RandomizedSearchCV

param_distribs = {
    # fill in code
    "n_hidden": [0, 1, 2],
    "n_neurons": np.arange( 1 , 400 ),

}

rnd_search_cv = RandomizedSearchCV(keras_reg, param_distribs, n_iter=20, cv=3, verbose=2, n_jobs=-1)
rnd_search_cv.fit(X_train, y_train, epochs=100,
                  # fill in code
                  validation_data=(X_valid, y_valid),
                  callbacks=[keras.callbacks.EarlyStopping(patience=10)])

Fitting 3 folds for each of 20 candidates, totalling 60 fits
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100


In [39]:
rnd_search_cv.best_params_

{'n_neurons': 392, 'n_hidden': 1}

Show the summary of the best model obtained from the randomized search. Report the test result using the best model, and compare this result with the previous results. (20 points)

In [40]:
# fill in code
model = rnd_search_cv.best_estimator_.model
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_2 (Dense)             (None, 392)               73696     
                                                                 
 dense_3 (Dense)             (None, 10)                3930      
                                                                 
Total params: 77626 (303.23 KB)
Trainable params: 77626 (303.23 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [41]:
model.evaluate(X_test_reduced, y_test)



[0.47874706983566284, 0.8899000287055969]

The result from the randomized search:
- The "sequential_1" model is the best one obtained.
- The test results have higher loss and about the same accuracy as the results above from the model that we built.