## Introduction

# MNIST Image Classification with TensorFlow

This repository contains two scripts, `run.py` and `run2.py`, for image classification using the MNIST dataset with TensorFlow. The scripts demonstrate two different approaches to training a neural network for this task.

## Summary

### run.py

The `run.py` script leverages TensorFlow's high-level API to perform image classification on the MNIST dataset. It first loads and preprocesses the dataset by rescaling the image pixel values and one-hot encoding the labels. The model architecture is defined using TensorFlow's Functional API, comprising three dense layers. The model is then compiled with the Adam optimizer and categorical cross-entropy loss function. Finally, the model is trained using the `model.fit` method, which handles the training loop internally and includes validation on the test set.

### run2.py

The `run2.py` script implements a manual training loop using TensorFlow's lower-level API. Similar to `run.py`, it loads and preprocesses the MNIST dataset. The model architecture is also defined using the Functional API with three dense layers. However, instead of using `model.compile` and `model.fit`, the script defines a custom training loop using `tf.GradientTape` to manually compute gradients and update the model's weights. The script also includes a validation step at the end of each epoch to evaluate model performance on the test set.

## Differences

The primary difference between `run.py` and `run2.py` lies in how the model training process is handled. `run.py` utilizes TensorFlow's high-level API methods (`model.compile` and `model.fit`), which abstract away the details of the training loop, making the code more concise and easier to understand. On the other hand, `run2.py` bypasses these high-level methods in favor of a handcrafted training loop using `tf.GradientTape`. This approach provides more control over the training process, allowing for custom training logic, but it also requires more code and a deeper understanding of TensorFlow's lower-level operations.

By comparing these two scripts, users can gain insight into both high-level and low-level approaches to training neural networks with TensorFlow, each with its own advantages and trade-offs.


### `run.py`

In [1]:
%%time

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Flatten

# Load MNIST data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Rescale data
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# One-hot encode labels
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Build model using Functional API
inputs = Input(shape=(28, 28))
x = Flatten()(inputs)
x = Dense(128, activation='relu')(x)
x = Dense(64, activation='relu')(x)
outputs = Dense(10, activation='softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

# Compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train model
model.fit(x_train, y_train, epochs=10, batch_size=32, validation_data=(x_test, y_test))

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
CPU times: user 1min 36s, sys: 7.67 s, total: 1min 44s
Wall time: 1min 48s


<keras.src.callbacks.History at 0x79474024ee60>

We ran the code using a CPU and the wall time is: `Wall time: 1min 48s`.

### `run2.py`

The only thing different here is that the following code uses `gradienttape` from `tensorflow` library and this allows us to use a `for` loop so that the training process is more transparent. For example, we can see inside of the for loop the code computes the `loss` and update the `gradients`, which are required steps to train any neural network models.

In [3]:
%%time

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Flatten
from tensorflow.keras.optimizers import Adam
import numpy as np

# Load MNIST data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Rescale data
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# One-hot encode labels
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Build model using Functional API
inputs = Input(shape=(28, 28))
x = Flatten()(inputs)
x = Dense(128, activation='relu')(x)
x = Dense(64, activation='relu')(x)
outputs = Dense(10, activation='softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

# Define loss function and optimizer
loss_fn = tf.keras.losses.CategoricalCrossentropy()
optimizer = Adam()

# Training parameters
epochs = 10
batch_size = 32
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(buffer_size=1024).batch(batch_size)
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(batch_size)

# Training loop
for epoch in range(epochs):
    print(f'Epoch {epoch + 1}/{epochs}')
    train_loss = tf.keras.metrics.Mean()
    train_accuracy = tf.keras.metrics.CategoricalAccuracy()

    # Training step
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            logits = model(x_batch_train, training=True)
            loss_value = loss_fn(y_batch_train, logits)
        grads = tape.gradient(loss_value, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

        train_loss.update_state(loss_value)
        train_accuracy.update_state(y_batch_train, logits)

        if step % 500 == 0:
            print(f'Step {step}: loss = {loss_value.numpy()}')

    # Validation step
    val_loss = tf.keras.metrics.Mean()
    val_accuracy = tf.keras.metrics.CategoricalAccuracy()
    for x_batch_test, y_batch_test in test_dataset:
        test_logits = model(x_batch_test, training=False)
        val_loss.update_state(loss_fn(y_batch_test, test_logits))
        val_accuracy.update_state(y_batch_test, test_logits)

    print(f'Epoch {epoch + 1} - Loss: {train_loss.result().numpy()}, Accuracy: {train_accuracy.result().numpy()}, '
          f'Val Loss: {val_loss.result().numpy()}, Val Accuracy: {val_accuracy.result().numpy()}')

Epoch 1/10
Step 0: loss = 2.3509457111358643
Step 500: loss = 0.4965081810951233
Step 1000: loss = 0.3434571325778961
Step 1500: loss = 0.06301655620336533
Epoch 1 - Loss: 0.23892655968666077, Accuracy: 0.930400013923645, Val Loss: 0.12120147794485092, Val Accuracy: 0.9617999792098999
Epoch 2/10
Step 0: loss = 0.08149765431880951
Step 500: loss = 0.07721756398677826
Step 1000: loss = 0.059084922075271606
Step 1500: loss = 0.11262740939855576
Epoch 2 - Loss: 0.09931076318025589, Accuracy: 0.9697999954223633, Val Loss: 0.10143530368804932, Val Accuracy: 0.9677000045776367
Epoch 3/10
Step 0: loss = 0.13360005617141724
Step 500: loss = 0.008813761174678802
Step 1000: loss = 0.0466441847383976
Step 1500: loss = 0.008642111904919147
Epoch 3 - Loss: 0.06859397143125534, Accuracy: 0.9794333577156067, Val Loss: 0.09209497272968292, Val Accuracy: 0.9732000231742859
Epoch 4/10
Step 0: loss = 0.07262778282165527
Step 500: loss = 0.12308397889137268
Step 1000: loss = 0.016974708065390587
Step 1500: