In [12]:
import tensorflow as tf
import numpy as np
import torch
print('tensorflow version : {}'.format(tf.__version__))
print('torch version : {}'.format(torch.__version__))

tensorflow version : 2.12.0
torch version : 2.0.1+cu118


In [3]:
mnist = tf.keras.datasets.mnist
(x_train,y_train),(x_test,y_test) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [6]:
x_train,x_test = x_train/255.0,x_test/255.0

In [13]:
# torch.tensor(x_train[0])
x_train.shape

(60000, 28, 28)

**Sequential is useful for stacking layers** where each layer has one input tensor and one output tensor. Layers are functions with a known mathematical structure that can be reused and have trainable variables. Most TensorFlow models are composed of layers. This model uses the **Flatten**, **Dense**, and Dropout layers.

In [15]:
# tf.keras.Sequential layer
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape = (28,28)), # flattening the shape
    tf.keras.layers.Dense(128,activation= 'relu'), # linear layer
    tf.keras.layers.Dropout(0.2), # regularization
    tf.keras.layers.Dense(10) # 10 classes
])

In [18]:
x_train[:1].shape

(1, 28, 28)

In [17]:
model(x_train[:1]).numpy() # outputs are logits

array([[ 0.6778041 , -0.03232397, -0.1614123 ,  0.22880659,  0.04342312,
         0.69481796,  0.4003253 , -0.08614585, -0.1426954 , -0.3351947 ]],
      dtype=float32)

In [21]:
predictions = model(x_train[:1]).numpy()
predictions

array([[ 0.6778041 , -0.03232397, -0.1614123 ,  0.22880659,  0.04342312,
         0.69481796,  0.4003253 , -0.08614585, -0.1426954 , -0.3351947 ]],
      dtype=float32)

**tf.nn.softmax** convert logits to probabilities for each class

In [24]:
tf.nn.softmax(predictions).numpy()

array([[0.16296797, 0.08011199, 0.07041013, 0.10401718, 0.08641598,
        0.16576439, 0.12347946, 0.0759142 , 0.0717404 , 0.0591783 ]],
      dtype=float32)

Loss function takes a vector of ground truth values and a vector of logits and returns a scaler loss for each example. This loss is equal to the negative log probability of the true cass: The loss is zero if the model is sure of the correct class.

The untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to -tf.math.log(1/10) ~= 2.3

In [27]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits = True)

In [28]:
loss_fn(y_train[:1],predictions).numpy()

1.7971878

In [32]:
y_train[:1],predictions

(array([5], dtype=uint8),
 array([[ 0.6778041 , -0.03232397, -0.1614123 ,  0.22880659,  0.04342312,
          0.69481796,  0.4003253 , -0.08614585, -0.1426954 , -0.3351947 ]],
       dtype=float32))

Before training, the model should be configure and compile using Model.compile.Set the optimizer class to adam, set the loss to the loss_fn function you defined earlier, and specify a metric to be evaluated for the model by setting the metrics parameter to accuracy.

In [33]:
model.compile(
    optimizer = 'adam',
    loss = loss_fn,
    metrics = ['accuracy']
)

In [34]:
model.fit(x_train,y_train,epochs = 5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x798e05185390>

In [35]:
model.evaluate(x_test,y_test,verbose = 2)

313/313 - 1s - loss: 0.0699 - accuracy: 0.9782 - 738ms/epoch - 2ms/step


[0.06987787783145905, 0.9782000184059143]

If we want to return the probabilties, we can wrap the trained model, and attach the softmax to it.

In [36]:
probability_model = tf.keras.Sequential([
    model,
    tf.keras.layers.Softmax()
])

In [39]:
probability_model(x_test[[5]])

<tf.Tensor: shape=(1, 10), dtype=float32, numpy=
array([[4.8685731e-09, 9.9970800e-01, 6.6094590e-06, 9.0015374e-07,
        3.4260574e-05, 3.2199335e-08, 8.2648569e-08, 2.2099742e-04,
        2.9001381e-05, 6.1511464e-08]], dtype=float32)>

In [40]:
x_test[[5]].shape # 3 dimensional

(1, 28, 28)

In [41]:
np.argsort(probability_model(x_test[[5]]))

array([[0, 5, 9, 6, 3, 2, 8, 4, 7, 1]])