<a href="https://colab.research.google.com/github/utkarsh-284/Deep-Learning/blob/main/DNN_using__NumPy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 align="center">Digit Recognition usinng DNN (only NumPy)</h1>

# Introduction
Digit recognition is a fundamental problem in computer vision and machine learning, serving as a benchmark for evaluating the effectiveness of various algorithms. The Modified National Institute of Standards and Technology (MNIST) dataset, consisting of 28×28 grayscale images of handwritten digits (0–9), is widely used for training and testing machine learning models. In this project, we implement a **Deep Neural Network (DNN) from scratch using only NumPy**, demonstrating the core principles of neural networks—forward propagation, backpropagation, and gradient descent—without relying on high-level frameworks like TensorFlow or PyTorch.

## Key Objectives:
* **Understand Neural Network Fundamentals:** Implement a multi-layer perceptron (MLP) with input, hidden, and output layers.

* **Hands-on Learning:** Build all components—activation functions (ReLU, Softmax), loss functions (Cross-Entropy), and optimization (Mini-batch Gradient Descent)—using NumPy.

* **Achieve High Accuracy:** Train the model to classify digits with ~95% validation accuracy, validating the correctness of the implementation.

* **Scalability:** Ensure the code can be extended for deeper architectures or other datasets.

## Challenges Addressed:
* **Numerical Stability:** Handling softmax and log operations with safeguards like 1e-8 to avoid division by zero.

* **Efficient Backpropagation:** Correctly computing gradients for weight updates.

* **Hyperparameter Tuning:** Selecting learning rates, batch sizes, and initialization (He initialization) for optimal convergence.

# Importing libraries

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

# Loading and Exploring data

In [None]:
train_df = pd.read_csv('train.csv')

train_df.head()

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
print(f"Training data shape: {train_df.shape}")

Training data shape: (42000, 785)


**Spliting data into train and validation sets:**

In [None]:
X = train_df.drop("label", axis=1)
y = train_df["label"]

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2,
                                                      random_state=42)

print(f"X_train shape: {X_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"X_valid shape: {X_valid.shape}")
print(f"y_valid shape: {y_valid.shape}")

X_train shape: (33600, 784)
y_train shape: (33600,)
X_valid shape: (8400, 784)
y_valid shape: (8400,)


**Converting data into numpy array and normalizing them:**

In [None]:
X_train = np.array(X_train) / 255. # Normalizing it around the mean 255
y_train = np.array(y_train)
X_valid = np.array(X_valid) / 255.
y_valid = np.array(y_valid)
m, n = X_train.shape

print(f"Training lables shape: {y_train.shape}")
print(f"Training data shape: {X_train.shape}")
print(f"m: {m}, n: {n}")

Training lables shape: (33600,)
Training data shape: (33600, 784)
m: 33600, n: 784


# Defining Functions for DNN

**Since the data should output 10 class, one for each digit, so it must be conerted to dummy variable (one hot encoded).**

In [None]:
def one_hot(y, num_class = 10):
    return np.eye(num_class, dtype=np.float32)[y]

y_train = one_hot(y_train.astype(int))

**We are using He Normalizations technique for initializing our weights, to make learing faster and convenient.**

In [None]:
def init_param():
    np.random.seed(1)
    W1 = np.random.randn(128, 784).astype(np.float32) * np.sqrt(2. / 784)
    b1 = np.zeros((128, 1), dtype = np.float32)
    W2 = np.random.randn(64, 128).astype(np.float32) * np.sqrt(2. / 128)
    b2 = np.zeros((64, 1), dtype=np.float32)
    W3 = np.random.randn(10, 64).astype(np.float32) * np.sqrt(2. / 64)
    b3 = np.zeros((10, 1), dtype=np.float32)
    return {'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2, 'W3': W3, 'b3': b3}

**Defining activation functions and its derivative, for forward and backward propogation:**

In [None]:
def relu(Z):
    return np.maximum(0, Z)

def relu_deriv(Z):
    return(Z > 0).astype(np.float32)

def softmax(Z):
    exp_Z = np.exp(Z - np.max(Z, axis=0, keepdims=True))
    return exp_Z / np.sum(exp_Z, axis=0, keepdims=True)

**Forward Propogation:**

In [None]:
def forward_prop(X, params):
    X = X.T

    # Layer 1: Z1 = W1 * X + b1
    Z1 = np.dot(params['W1'], X) + params['b1']
    A1 = relu(Z1)

    # Layer 2: Z2 = W2 * A1 + b2
    Z2 = np.dot(params['W2'], A1) + params['b2']
    A2 = relu(Z2)

    # Layer 3: Z3 = W3 * A2 + b3
    Z3 = np.dot(params['W3'], A2) + params['b3']
    A3 = softmax(Z3)

    return{
        'Z1': Z1, 'A1': A1,
        'Z2': Z2, 'A2': A2,
        'Z3': Z3, 'A3': A3,
        'X': X
    }

**Loss Function:**</br>
Using **Cross Entropy Loss** rather than Binary Loss Function since our data must output 10 different classes.

In [None]:
def cross_entropy(y_pred, y_true):
    m = y_true.shape[0]
    log_probs = np.log(y_pred.T + 1e-8)
    loss = -np.sum(y_true * log_probs) / m
    return loss

**Backward Propogation:**

In [None]:
def backward_prop(y_true, params, cache):
    m = y_true.shape[0] #number of training instances

    # Output layer gradients
    dZ3 = cache['A3'] - y_true.T
    dW3 = np.dot(dZ3, cache['A2'].T) / m
    db3 = np.sum(dZ3, axis=1, keepdims=True) / m

    # Hidden layer 2 gradients
    dA2 = np.dot(params['W3'].T, dZ3)
    dZ2 = dA2 * relu_deriv(cache['Z2'])
    dW2 = np.dot(dZ2, cache['A1'].T) / m
    db2 = np.sum(dZ2, axis=1, keepdims=True) / m

    # Hidden layer 1 gradients
    dA1 = np.dot(params['W2'].T, dZ2)
    dZ1 = dA1 * relu_deriv(cache['Z1'])
    dW1 = np.dot(dZ1, cache['X'].T) / m
    db1 = np.sum(dZ1, axis=1, keepdims=True) / m

    return {'dW3': dW3, 'db3': db3,
            'dW2': dW2, 'db2': db2,
            'dW1': dW1, 'db1': db1}

**Updating Parameters:**

In [None]:
def update_params(params, grads, lr=0.01):
    params['W1'] -= lr * grads['dW1']
    params['b1'] -= lr * grads['db1']
    params['W2'] -= lr * grads['dW2']
    params['b2'] -= lr * grads['db2']
    params['W3'] -= lr * grads['dW3']
    params['b3'] -= lr * grads['db3']
    return params

**Difining model for training data with different steps:**

In [None]:
def train(X, y, epochs=10, batch_size=64, lr=0.01):
    params = init_param()
    n = X.shape[0]

    for epoch in range(epochs):
        # Shuffle Data
        perm = np.random.permutation(n)
        X_shuffled = X[perm]
        y_shuffled = y[perm]

        epoch_loss = 0
        batches = 0

        for i in range(0, n, batch_size):
            # Mini batch
            X_batch = X_shuffled[i : i + batch_size]
            y_batch = y_shuffled[i : i + batch_size]

            # Forward propogation
            cache = forward_prop(X_batch, params)
            loss = cross_entropy(cache['A3'], y_batch)
            epoch_loss += loss

            # Backward Propogation
            grads = backward_prop(y_batch, params, cache)

            # Update Parameters
            params = update_params(params, grads, lr)
            batches += 1

        print(f"Epoch {epoch+1} / {epochs}, Loss: {epoch_loss/batches:.4f}")

    return params

**Defining functions for prediction and evaluating the model's performance:**

In [None]:
def predict(X, params):
    cache = forward_prop(X, params)
    return np.argmax(cache['A3'], axis=0)

def accuracy(y_pred, y_true):
    return np.mean(y_pred == y_true)

# Training & Evaluating

In [None]:
# Train Parameters
trained_params = train(X_train, y_train, epochs=20)

Epoch 1 / 20, Loss: 1.1287
Epoch 2 / 20, Loss: 0.4378
Epoch 3 / 20, Loss: 0.3492
Epoch 4 / 20, Loss: 0.3085
Epoch 5 / 20, Loss: 0.2817
Epoch 6 / 20, Loss: 0.2614
Epoch 7 / 20, Loss: 0.2442
Epoch 8 / 20, Loss: 0.2305
Epoch 9 / 20, Loss: 0.2179
Epoch 10 / 20, Loss: 0.2068
Epoch 11 / 20, Loss: 0.1969
Epoch 12 / 20, Loss: 0.1879
Epoch 13 / 20, Loss: 0.1798
Epoch 14 / 20, Loss: 0.1718
Epoch 15 / 20, Loss: 0.1653
Epoch 16 / 20, Loss: 0.1582
Epoch 17 / 20, Loss: 0.1527
Epoch 18 / 20, Loss: 0.1470
Epoch 19 / 20, Loss: 0.1418
Epoch 20 / 20, Loss: 0.1366


In [None]:
# Test accuracy
y_pred_valid = predict(X_valid, trained_params)
acc = accuracy(y_pred_valid, y_valid)
print(f"Validation accuracy: {acc * 100:.2f}%")

Validation accuracy: 95.07%


# Conclusion
The project successfully demonstrates the implementation of a **3-layer DNN (128-64-10 architecture)** using only NumPy, achieving **~95% validation accuracy** on the MNIST dataset. This confirms that the model correctly learns hierarchical features from raw pixel data through forward and backward propagation.

## Key Achievements:
* **NumPy Proficiency:** The implementation reinforces understanding of matrix operations, gradient computations, and neural network mechanics.

* **Performance:** The model's accuracy (~95%) is competitive with basic implementations using high-level frameworks, validating the correctness of the custom implementation.

* **Modular Design:** Functions for initialization, activation, loss, and training are decoupled, making the code reusable and extensible.


## Future Work:
* **Hyperparameter Optimization:**
 * Experiment with learning rate schedules (e.g., exponential decay).

 * Adjust batch sizes and network depth/width for better performance.

* **Advanced Techniques:**

 * Add **Batch Normalization** to accelerate training.

 * Implement **L2 Regularization** or **Dropout** to reduce overfitting.

* **Architecture Improvements:**

 * Replace ReLU with **LeakyReLU** or **Swish** for potential accuracy gains.

 * Extend to **Convolutional Neural Networks (CNNs)** for spatial feature learning.

* **Deployment:**

 * Convert the model to ONNX/TFLite for edge device deployment.

 * Build a web interface for real-time digit recognition.

## Final Thoughts:
This project serves as a strong foundation for deeper exploration into deep learning. By reimplementing core algorithms from scratch, we gain insights often abstracted away by high-level libraries. Future enhancements could bridge the gap between educational implementation and production-ready systems.

## Contributor
**Utkarsh Bhardwaj**  
[![LinkedIn](https://img.shields.io/badge/LinkedIn-Utkarsh284-blue)](https://www.linkedin.com/in/utkarsh284/)
[![GitHub](https://img.shields.io/badge/GitHub-utkarsh--284-lightgrey)](https://github.com/utkarsh-284)  
**Contact**: ubhardwaj284@gmail.com  
**Publish Date**: 8th June, 2025  
