In [1]:
import tensorflow as tf
from tensorflow import keras 

# Loading data

Feeding: It means to provide data to the model for training or testing. Feeding data to the model is the most important part of the training process. The data should be in the right format and should be normalized or standardized.

batch: A set of data points that are processed together in the model. A batch is used to update the weights of the model. It is a subset of the training data.
epoch: One epoch is when the entire dataset is passed forward and backward through the neural network only once.
backpropagation: The process of updating the weights of the model using the gradients of the loss function with respect to the weights. 

In [2]:
from tensorflow.keras.datasets import mnist # 28x28 images of handwritten digits 0-9
(x_train, y_train), (x_test, y_test) = mnist.load_data() 

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [3]:
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

(60000, 28, 28) (60000,)
(10000, 28, 28) (10000,)


In [4]:
tf.random.set_seed(42)

# Modeling

In [5]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)), # Flatten the input. 28x28 -> 784 pixels. 2D -> 1D. Why? Because the first layer of the model is a dense layer. Dense layers require 1D input.
  tf.keras.layers.Dense(512, activation='relu'),
  tf.keras.layers.Dense(512, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax') # 10 because we have 10 classes. Why softmax? Because it is the last layer of the model and we are doing a multi-class classification.
])

In [6]:
# Because we have sparse labels, if we have one-hot encoded labels, we should use categorical_crossentropy. SparseCategoricalCrossentropy is used when the labels are integers. The magnitude of the label values (big or small) is irrelevant in SparseCategoricalCrossentropy, as it only requires the integer labels. However, in CategoricalCrossentropy, the labels are one-hot encoded, and the magnitude of the probabilities in the output is important.
model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam', 
              metrics=['accuracy'])

# Training

In [7]:
model.fit(x_train, y_train, epochs=10)  

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f72487c3d90>

# Normalization

28 * 28 = 784 pixels. Each pixel has a value between 0 and 255. We can normalize the pixel values by dividing them by 255. This will make the pixel values between 0 and 1. Normalization is important because it helps the model to learn faster and better. It also helps to avoid exploding gradients.

In [8]:
import numpy as np
x_train = (x_train / 255.0).astype(np.float32)              
x_test  = (x_test  / 255.0).astype(np.float32)

In [9]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation='relu'),
  tf.keras.layers.Dense(512, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10) 

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f724820dfa0>

When we normalize the pixel values, the model learns faster and better. The accuracy of the model increases.

In [10]:
model.evaluate(x_test, y_test)



[0.09356796741485596, 0.9814000129699707]

Model test accuracy is 0.98 and training accuracy is 0.99. The model is overfitting. Overfitting is a common problem in machine learning. It occurs when the model learns the training data too well. The model memorizes the training data instead of learning the underlying patterns. As a result, the model performs well on the training data but poorly on the test data. To prevent overfitting, we can use techniques such as dropout, regularization, and early stopping.

# Standartization

Standardization is the process of rescaling the features so that they have the properties of a standard normal distribution with a mean of 0 and a standard deviation of 1. Standardization is important because it helps the model to learn faster and better. It also helps to avoid exploding gradients. 

In [11]:
# Loading data sets again for standartization
(x_train, y_train), (x_test, y_test) = mnist.load_data() 
mean = np.mean(x_train)                      
std = np.std(x_train)  
x_train = ((x_train - mean) / std).astype(np.float32) # Standartization. X = (X - mean) / std 

In [12]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation='relu'),
  tf.keras.layers.Dense(512, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10) 

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f72559f8100>