## Data Preparation - Normalization
* Here, we select the "Fashion MNIST" image dataset from Keras to practice the model training
* Normalize the pixel value ranges from 0-255 to 0-1, which reducing the compuation complexity# 

In [1]:
from keras.datasets import fashion_mnist
import keras

In [2]:
# add the training data and testing data
# no validation data in this dataset by default - Keras provides only the training and test splits. 
# However, you can create a validation set yourself by splitting the training data.

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()


In [3]:
train_images, test_images = train_images / 255.0, test_images / 255.0

- As the fashion_mnist dataset has 10 categories: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot
- Class labels as integers (e.g., 0, 1, 2, ..., 9), and are converted into one-hot encoded vectors using the to_categorical function
- One-hot encoding represents each class as a binary vector of size equal to the number of classes:
    - Neural networks often work better with one-hot encoded labels when training on classification tasks because the output layer typically has the same number of nodes as the number of classes. One-hot encoding allows the model to compute probabilities for each class during training.
    - for example:
         - [[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],  # Corresponds to class 0
         - [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],  # Corresponds to class 1
         - [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],  # Corresponds to class 2
         - [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]]  # Corresponds to class 3



In [4]:
train_labels = keras.utils.to_categorical(train_labels, 10) 
test_labels = keras.utils.to_categorical(test_labels, 10) 

## Construct the network

In [5]:
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D

#### create convolutional neewral network model using Keras Sequential API
- Sequential model stacked in a linear order, where the output of one layer serves as the input for the next
- Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 1)), meaning 
  - 2D convolutional layer with 32 filters
  - each filiter has the size of 3*3
  - activation function 'ReLu' introduce non-linearity
  - input image is 32*32 pixels with 1 color channel, if it is 3 then it is 3 color channels - RGB
- MaxPooling2D((2, 2)):
  - Down-samples the feature mapy by taking the maximum value in each 2*2 region
  - (2,2) is the pool size, reducing hte spatial dimensions by a factor of 2
  - as a result, the smaller feature maps reduces the computational complexity
- more Conv2D() functions helps to extract even more complex features
- more maxPooling2D() functions reduce spatial dimentions, retaining the most importa features
- Flatten(): converts the multi-dimentsional feature maps into a 1D vector, preparing them for the dense layers
- Dense(64, activation='relu'): fully connected layer with 64 neurons
- Dense(64, activation='softmax'): 10 neurons, and softmax activation functions ensures the output probabilities for each class sum to 1

In [10]:
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])


## compile and train the model

- model.compileL sets up the optimizer, loss function, and metrics for training
    - optimizer 'adam': adaptive moment estimation is an advanced optimization algorithm that combines the benefits of RMSPro an SGD with momentum
        - it adjuts the learning rate for each parameter dynamically based on first and second moments of gradients
        - it is good for minimal tuning, and handles sparse gradients effectively
    - loss function 'catagorical_crossentropy': 
        - used for multi-classification with one-hot encoded labels
        - compute the difference between the predicted probability distribution from the softmax layer and the true label distribution
    - metrics 'accuracy': measures the percentage of correctly classified samples

- model.fit trains the model for the specific number of epochs using mini-batches of data
    - epoches=10: 
        - Specifies the number of complete passes through the entire training dataset.
        - During each epoch, the model updates its weights based on the loss and optimizer.
    - batch_size=64:
        - Defines the number of samples processed before the model updates its weights.
            - Smaller batch sizes: Provide faster feedback but noisier updates.
            - Larger batch sizes: Offer smoother updates but require more memory.
        - without specifing the size, the default size would be 32

- model learns to minimize the loss function, improving the accuracy over time

In [11]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=10)

Epoch 1/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - accuracy: 0.7651 - loss: 0.6577
Epoch 2/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.8841 - loss: 0.3215
Epoch 3/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 6ms/step - accuracy: 0.8995 - loss: 0.2731
Epoch 4/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.9125 - loss: 0.2378
Epoch 5/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 6ms/step - accuracy: 0.9219 - loss: 0.2104
Epoch 6/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9305 - loss: 0.1908
Epoch 7/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9363 - loss: 0.1742
Epoch 8/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9432 - loss: 0.1532
Epoch 9/10
[1m18

<keras.src.callbacks.history.History at 0x13f5b8200>

In [12]:
test_loss, test_acc = model.evaluate(test_images, test_labels)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9042 - loss: 0.3131
