<a href="https://colab.research.google.com/github/vanadhisivakumar-source/Machine-learning-projects/blob/main/handwritten_digit_recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Implement an Artificial Neural Network with backpropagation for handwritten digit recognition using the MNIST dataset, covering data loading and preprocessing, defining the network architecture, training the model, evaluating its performance, and visualizing predictions.

## Load and Preprocess Data

### Subtask:
Load the handwritten digit dataset (e.g., MNIST). Normalize pixel values to a [0, 1] range and reshape images to a 1D array of 784 features. Convert labels to one-hot encoded format.


**Reasoning**:
First, I'll import the necessary libraries, TensorFlow and Keras, to load and preprocess the MNIST dataset as specified in the instructions. This is a prerequisite for all subsequent steps.



In [None]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

print("TensorFlow and Keras modules loaded successfully.")

**Reasoning**:
Following the instructions, I will now load the MNIST dataset, normalize pixel values, reshape the images, and convert the labels to one-hot encoded format in a single code block.



In [None]:
print("Loading MNIST dataset...")
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print("MNIST dataset loaded.")

# Normalize pixel values to [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
print("Pixel values normalized.")

# Reshape images to 1D array of 784 features
x_train = x_train.reshape((x_train.shape[0], 28 * 28))
x_test = x_test.reshape((x_test.shape[0], 28 * 28))
print("Images reshaped to 1D array.")

# Convert labels to one-hot encoded format
num_classes = 10
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)
print("Labels converted to one-hot encoded format.")

print(f"x_train shape: {x_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"x_test shape: {x_test.shape}")
print(f"y_test shape: {y_test.shape}")

## Define ANN Architecture

### Subtask:
Define the Artificial Neural Network architecture with an input layer of 784 neurons, a hidden layer of 128 neurons with a suitable activation function (e.g., ReLU), and an output layer of 10 neurons with a Softmax activation function for multi-class classification.


**Reasoning**:
To define the ANN architecture as per the subtask, I will import necessary Keras modules, create a Sequential model, add the specified Dense layers with their respective activation functions and input shape, and then print the model summary.



In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

print("Building the Artificial Neural Network architecture...")

# Create a Sequential model instance
model = Sequential()

# Add a Dense layer for the hidden layer
model.add(Dense(128, activation='relu', input_shape=(784,))) # 784 neurons for input layer, 128 for hidden, ReLU activation

# Add a Dense layer for the output layer
model.add(Dense(10, activation='softmax')) # 10 neurons for output layer (10 digits), Softmax activation

print("ANN architecture built successfully.")

# Print a summary of the model
print("\nModel Summary:")
model.summary()

## Implement Backpropagation

### Subtask:
Implement the backpropagation algorithm for training the ANN. This includes defining the loss function (e.g., Categorical Crossentropy) and an optimizer (e.g., Stochastic Gradient Descent or Adam) to update weights and biases based on the calculated gradients.


**Reasoning**:
Following the instructions, I will compile the previously defined ANN model using the specified Adam optimizer, categorical crossentropy loss function, and accuracy metric.



In [None]:
print("Compiling the model...")

# Compile the model with the Adam optimizer, categorical crossentropy loss, and accuracy metric
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("Model compiled successfully.")

**Reasoning**:
The 'model' object was not defined because the preceding cell, which built the model architecture, was not executed. To resolve the `NameError`, I need to include the model's definition along with the compilation step in this code block.



In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

print("Building the Artificial Neural Network architecture...")

# Create a Sequential model instance
model = Sequential()

# Add a Dense layer for the hidden layer
model.add(Dense(128, activation='relu', input_shape=(784,))) # 784 neurons for input layer, 128 for hidden, ReLU activation

# Add a Dense layer for the output layer
model.add(Dense(10, activation='softmax')) # 10 neurons for output layer (10 digits), Softmax activation

print("ANN architecture built successfully.")

# Print a summary of the model
print("\nModel Summary:")
model.summary()

print("\nCompiling the model...")

# Compile the model with the Adam optimizer, categorical crossentropy loss, and accuracy metric
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("Model compiled successfully.")

**Reasoning**:
The next logical step in implementing backpropagation, as described in the subtask, is to train the model using the preprocessed data. This involves calling the `fit` method on the compiled model with the training data, specifying epochs and batch size.



In [None]:
print("Training the model...")

# Train the model
# Using x_train and y_train from the data preprocessing step
history = model.fit(
    x_train,
    y_train,
    epochs=10,       # Number of times to iterate over the entire dataset
    batch_size=128,  # Number of samples per gradient update
    validation_split=0.2 # Use 20% of the training data for validation
)

print("Model training complete.")

**Reasoning**:
The `NameError: name 'x_train' is not defined` indicates that the data loading and preprocessing steps, which define `x_train` and `y_train`, were not executed or their state was lost. To resolve this, I will re-include the data loading, normalization, reshaping, and one-hot encoding logic directly before the model training step to ensure all necessary variables are defined.



In [None]:
print("Loading MNIST dataset...")
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print("MNIST dataset loaded.")

# Normalize pixel values to [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
print("Pixel values normalized.")

# Reshape images to 1D array of 784 features
x_train = x_train.reshape((x_train.shape[0], 28 * 28))
x_test = x_test.reshape((x_test.shape[0], 28 * 28))
print("Images reshaped to 1D array.")

# Convert labels to one-hot encoded format
num_classes = 10
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)
print("Labels converted to one-hot encoded format.")

print(f"x_train shape: {x_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"x_test shape: {x_test.shape}")
print(f"y_test shape: {y_test.shape}")

print("\nTraining the model...")

# Train the model
history = model.fit(
    x_train,
    y_train,
    epochs=10,       # Number of times to iterate over the entire dataset
    batch_size=128,  # Number of samples per gradient update
    validation_split=0.2 # Use 20% of the training data for validation
)

print("Model training complete.")

**Reasoning**:
The `NameError: name 'mnist' is not defined` occurred because the necessary `tensorflow.keras.datasets` and other Keras modules were not imported in the current execution environment. To resolve this, I will combine all the required import statements, data loading and preprocessing steps, model definition, model compilation, and model training into a single robust code block. This ensures that all dependencies are met and variables are defined sequentially before use, addressing all previous `NameError` issues.



In [None]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

print("Loading MNIST dataset...")
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print("MNIST dataset loaded.")

# Normalize pixel values to [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
print("Pixel values normalized.")

# Reshape images to 1D array of 784 features
x_train = x_train.reshape((x_train.shape[0], 28 * 28))
x_test = x_test.reshape((x_test.shape[0], 28 * 28))
print("Images reshaped to 1D array.")

# Convert labels to one-hot encoded format
num_classes = 10
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)
print("Labels converted to one-hot encoded format.")

print(f"x_train shape: {x_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"x_test shape: {x_test.shape}")
print(f"y_test shape: {y_test.shape}")


print("\nBuilding the Artificial Neural Network architecture...")

# Create a Sequential model instance
model = Sequential()

# Add a Dense layer for the hidden layer
model.add(Dense(128, activation='relu', input_shape=(784,))) # 784 neurons for input layer, 128 for hidden, ReLU activation

# Add a Dense layer for the output layer
model.add(Dense(10, activation='softmax')) # 10 neurons for output layer (10 digits), Softmax activation

print("ANN architecture built successfully.")

# Print a summary of the model
print("\nModel Summary:")
model.summary()

print("\nCompiling the model...")

# Compile the model with the Adam optimizer, categorical crossentropy loss, and accuracy metric
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("Model compiled successfully.")

print("\nTraining the model...")

# Train the model
history = model.fit(
    x_train,
    y_train,
    epochs=10,       # Number of times to iterate over the entire dataset
    batch_size=128,  # Number of samples per gradient update
    validation_split=0.2 # Use 20% of the training data for validation
)

print("Model training complete.")

**Reasoning**:
The previous code block successfully completed the model training. The next logical step, according to the overall task, is to evaluate the trained model's performance on unseen data (the test set). This will involve using the `evaluate` method of the Keras model.



In [None]:
print("Evaluating the model...")

# Evaluate the model on the test data
loss, accuracy = model.evaluate(x_test, y_test, verbose=0)

print(f"\nTest Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")

print("Model evaluation complete.")

## Visualize Predictions

### Subtask:
Display a few example handwritten digits from the test set along with the model's predicted digit and the actual digit to visually inspect the model's performance. Include legends for clarity.


**Reasoning**:
To visualize the model's performance, I will generate Python code that selects a few random test images, uses the trained model to predict their digits, and then displays each image along with its actual and predicted labels. This requires importing `matplotlib` for plotting and `numpy` for array manipulation.



In [None]:
import matplotlib.pyplot as plt
import numpy as np

print("Visualizing a few predictions...")

# Select a few random indices from the test dataset
num_examples = 9
random_indices = np.random.choice(len(x_test), num_examples, replace=False)

plt.figure(figsize=(10, 10))

for i, index in enumerate(random_indices):
    # Retrieve the original image (reshaping the 1D x_test back to 28x28 for display)
    image = x_test[index].reshape(28, 28)

    # Retrieve the true one-hot encoded label and convert it back to a single digit
    true_label = np.argmax(y_test[index])

    # Use the trained model to make a prediction on the 1D reshaped test image
    # We need to pass the input in a batch format, hence x_test[index:index+1]
    prediction_probabilities = model.predict(x_test[index:index+1], verbose=0)

    # Convert the model's prediction into a single predicted digit
    predicted_label = np.argmax(prediction_probabilities)

    # Create a subplot for each example
    plt.subplot(3, 3, i + 1)
    plt.imshow(image, cmap='gray') # Display the image

    # Set the title with Actual and Predicted digits
    color = 'green' if predicted_label == true_label else 'red'
    plt.title(f'Actual: {true_label}\nPredicted: {predicted_label}', color=color)

    # Turn off the axis ticks for cleaner visualization
    plt.axis('off')

plt.tight_layout() # Adjust the layout of the plots to prevent overlap
plt.show()

print("Prediction visualization complete.")

## Final Task

### Subtask:
Summarize the implementation of the ANN with backpropagation for handwritten digit recognition, including the model's architecture, training process, and evaluation results.


## Summary:

### Q&A
The implementation of the Artificial Neural Network (ANN) with backpropagation for handwritten digit recognition involved the following:

*   **Model's Architecture**: The ANN consisted of an input layer of 784 neurons, a hidden layer with 128 neurons using a ReLU activation function, and an output layer with 10 neurons (for 10 classes) using a Softmax activation function. This architecture resulted in a total of 101,770 trainable parameters.
*   **Training Process**:
    *   The model was compiled using the Adam optimizer, 'categorical_crossentropy' as the loss function, and 'accuracy' as the metric.
    *   Training was performed for 10 epochs with a batch size of 128, utilizing a 20% validation split.
    *   During training, the model showed continuous improvement, with training accuracy increasing and loss decreasing.
*   **Evaluation Results**:
    *   After training, the model achieved a Test Loss of 0.0757 and a Test Accuracy of 0.9761 on the unseen test dataset.
    *   Visual inspection of predictions confirmed that the model generally performed well, correctly identifying most digits.

### Data Analysis Key Findings
*   The MNIST dataset was successfully loaded and preprocessed: pixel values were normalized to the \[0, 1] range, images were reshaped from (28, 28) to a 1D array of 784 features, and labels were one-hot encoded for 10 classes.
*   An Artificial Neural Network was defined with an input layer (784 neurons), a hidden layer (128 neurons, ReLU activation), and an output layer (10 neurons, Softmax activation), comprising a total of 101,770 trainable parameters.
*   The model was compiled with the Adam optimizer and categorical cross-entropy loss, then trained for 10 epochs with a batch size of 128 and a 20% validation split.
*   Upon evaluation, the trained model demonstrated strong performance on the test set, achieving a Test Loss of approximately 0.0757 and a Test Accuracy of 0.9761.
*   Visualizations of random test images confirmed the model's ability to accurately predict handwritten digits, with correct predictions highlighted in green and incorrect ones in red.

### Insights or Next Steps
*   The ANN model achieved a high accuracy of over 97% on the MNIST dataset, indicating its effectiveness in handwritten digit recognition. Further improvements could involve experimenting with more complex architectures (e.g., adding more hidden layers, increasing neuron count) or trying different optimizers and learning rates.
*   While the current model performs well, analyzing specific misclassified digits (the 'red' predictions in the visualization) could provide insights into patterns of errors and inform strategies for targeted model enhancement or data augmentation.
