* Made in Windows-10
* Version of python used: Python 3.12.3
* Currently there is ***no open-source project of sign calculator. This is first of it's kind.***
* Please enable internet access so that the images in the notebook can load properly.
* Block Camera by Hand to stop the App & Evaluate the answer on screen
* Do not let your face come in the camera screen, otherwise auto brightness/contrast will over-brighten the fingers, giving poor accuracy

# Project Technical Description: Sign Calculator

## Introduction
I have developed an innovative sign calculator that utilizes hand gestures to input numerical digits and arithmetic operators. The system employs a webcam to capture hand signs, processes these inputs, and evaluates the resulting mathematical expression. The final computed result is displayed on the screen when the user places their hand over the webcam, causing it to stop capturing.

### A Demo of the APP in Action  

<img src="https://raw.githubusercontent.com/souvikcseiitk/sign-calculator/main/extras/demo.gif" alt="Animation" width="600" height="338" />


### My Project is on PS No. 4: Enhancing Accessibility for Students with Disabilities 
In addressing the issue that not all learning materials are accessible to students with disabilities, this sign calculator project offers a promising solution. Traditional educational tools often fail to accommodate the diverse needs of students with disabilities. By incorporating an inclusive learning solution that leverages hand signs for input, this project caters to students with limited mobility or dexterity who may find using keyboards or touchscreens challenging. This sign calculator can be integrated with text-to-speech systems to provide auditory feedback, ensuring that visually impaired students can also benefit from the tool. Furthermore, the project can be expanded to include closed captions for instructional videos and alternative learning formats, making mathematical learning more accessible. This holistic approach not only facilitates better engagement and understanding for students with disabilities but also fosters an inclusive educational environment where all students have the opportunity to succeed.

### Accessibility Benefits for Disabled Individuals
This sign calculator project can significantly enhance accessibility for individuals with disabilities by providing an intuitive and interactive way to perform mathematical operations. For those with limited mobility or dexterity, traditional input methods such as keyboards and touchscreens can be challenging. This project allows users to input numbers and operators through hand signs, which can be especially beneficial for individuals with motor impairments. By leveraging sign language and hand gestures, users can interact with the calculator more comfortably and efficiently, reducing the physical strain of conventional input methods. Additionally, the visual feedback and real-time evaluation of expressions ensure a seamless and user-friendly experience, making technology more inclusive and accessible for everyone.


### Signs and their meanings
<img src="https://raw.githubusercontent.com/souvikcseiitk/sign-calculator/main/extras/signs%20meaning.png" alt="Animation" width="600" height="338" />

## Key Features
1. **Hand Sign Input**: The calculator takes numerical digits (0-5) and arithmetic operators (+, -, *, /) as inputs through hand signs.
2. **Expression Evaluation**: The system evaluates the captured expression and displays the result on the screen.
3. **Webcam Integration**: The webcam captures the hand signs, and the process halts when the user places their hand on the webcam, making the screen dark. This triggers the evaluation and display of the computed output.
4. **Dataset**: The model is trained using a Kaggle dataset specifically designed for hand sign recognition.
5. **Model Scope**: Currently, the model is trained to recognize digits 0 to 5 and operators "+, -, *, /".
6. **Usage Precaution**: Users are advised to keep their face away from the webcam during prediction to prevent auto brightness/contrast adjustments that could interfere with finger visibility.
7. **Notebook Structure**: 
   - **Cell One**: Training the dataset.
   - **Cell Two**: Testing on the test dataset.
   - **Cell Three**: Capturing hand signs using the webcam and displaying the evaluated answer. The camera stops capturing once the user places their hand over it, making the screen dark, and immediately evaluates and displays the expression.

## Model Description
### Architecture
The model employed is a Convolutional Neural Network (CNN) designed to recognize hand signs. The architecture consists of multiple layers to effectively extract and learn features from the input images.

1. **Convolutional Layers**: Three convolutional layers with ReLU activation are used to capture spatial hierarchies in the input images.
   - First layer: 32 filters, kernel size (3, 3)
   - Second layer: 64 filters, kernel size (3, 3)
   - Third layer: 128 filters, kernel size (3, 3)
   
2. **Pooling Layers**: MaxPooling layers follow each convolutional layer to reduce the spatial dimensions and retain essential features.
   - Pooling size: (2, 2)
   
3. **Dropout Layers**: Dropout is used after each pooling layer to prevent overfitting by randomly dropping units during training.
   - Dropout rate: 0.25 after first two pooling layers, 0.5 after the third pooling layer and fully connected layer.
   
4. **Fully Connected Layers**: Flattening the output from the last pooling layer, the model includes a dense layer to learn complex features.
   - Dense layer: 512 units, ReLU activation
   
5. **Output Layer**: A softmax layer for classification into 10 classes (0-5 digits and 4 operators).

### Training
The model is trained using the categorical cross-entropy loss function and the Adam optimizer, which adapts the learning rate during training. The training process includes data augmentation techniques to enhance the robustness of the model.

### Evaluation
The trained model is evaluated on a test dataset to assess its accuracy and generalization capability. The accuracy score provides an indication of the model’s performance on unseen data.

### Handling Wrong/Invalid Expressions   
<img src= "https://raw.githubusercontent.com/souvikcseiitk/sign-calculator/main/extras/demo_fail.gif" alt="Animation" width="600" height="338" />

### Deployment
The real-time deployment involves capturing hand signs through the webcam. The captured frames are processed and fed into the trained model to predict the corresponding digit or operator. The predictions are accumulated to form a mathematical expression, which is evaluated and displayed when the webcam capture is halted.


## Conclusion
This sign calculator project demonstrates the practical application of CNNs in real-time hand sign recognition and expression evaluation. The integration of machine learning, computer vision, and real-time processing offers a unique and interactive approach to performing arithmetic calculations. With further enhancements and training on a more comprehensive dataset, the model can be extended to recognize a wider range of digits and operators.


## Cell-A
Training the model

In [1]:
import os
import numpy as np
import cv2
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping

def load_data(data_dir):
    images = []
    labels = []
    label_mapping = {
        '0': 0, '1': 1, '2': 2, '3': 3, '4': 4, '5': 5,
        'ADD': 6, 'MUL': 7, 'SUB': 8, 'DIV': 9
    }
    for folder_name, label in label_mapping.items():
        label_dir = os.path.join(data_dir, folder_name)
        for filename in os.listdir(label_dir):
            if filename.endswith(".png") or filename.endswith(".jpg"):
                img_path = os.path.join(label_dir, filename)
                img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
                _, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY)
                img = cv2.resize(img, (64, 64))  # resize to 64x64 for uniformity
                images.append(img)
                labels.append(label)
    images = np.array(images)
    labels = np.array(labels)
    return images, labels

data_dir = r'data\train'
images, labels = load_data(data_dir)
images = images.reshape(-1, 64, 64, 1)  # reshaping for CNN input
images = images / 255.0  # Normalize pixel values to [0, 1]
labels = to_categorical(labels, num_classes=10)  # Adjust num_classes to 10

X_train, X_test, y_train, y_test = train_test_split(images, labels, test_size=0.2, random_state=42)

# Data augmentation
datagen = ImageDataGenerator(
    rotation_range=15,
    zoom_range=0.2,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True
)
datagen.fit(X_train)

# Initializing the CNN
model = Sequential()

# First convolution layer and pooling
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# Second convolution layer and pooling
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# Third convolution layer and pooling
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# Flattening the layers
model.add(Flatten())

# Adding a fully connected layer
model.add(Dense(units=512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(units=10, activation='softmax'))  # softmax for 10 classes

# Compiling the CNN
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Adding debug information
print("Training data shape:", X_train.shape)
print("Validation data shape:", X_test.shape)
print("Training labels shape:", y_train.shape)
print("Validation labels shape:", y_test.shape)

# Adding callbacks
checkpoint = ModelCheckpoint('best_model.keras', monitor='val_accuracy', save_best_only=True, mode='max')
early_stopping = EarlyStopping(monitor='val_accuracy', patience=5, mode='max')

# Training the model
try:
    history = model.fit(
        datagen.flow(X_train, y_train, batch_size=32),
        steps_per_epoch=len(X_train) // 32,
        epochs=30,
        validation_data=(X_test, y_test),
        callbacks=[checkpoint, early_stopping]
    )
    # Saving the final model
    model.save('hand_sign_model.keras')
except Exception as e:
    print(f"Error during training: {e}")


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Training data shape: (20545, 64, 64, 1)
Validation data shape: (5137, 64, 64, 1)
Training labels shape: (20545, 10)
Validation labels shape: (5137, 10)
Epoch 1/30


  self._warn_if_super_not_called()


[1m642/642[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m68s[0m 102ms/step - accuracy: 0.4351 - loss: 1.4405 - val_accuracy: 0.9173 - val_loss: 0.2413
Epoch 2/30
[1m  1/642[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1:03[0m 99ms/step - accuracy: 0.7500 - loss: 0.5722

  self.gen.throw(value)


[1m642/642[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.7500 - loss: 0.5722 - val_accuracy: 0.9188 - val_loss: 0.2406
Epoch 3/30
[1m642/642[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m61s[0m 94ms/step - accuracy: 0.8196 - loss: 0.4668 - val_accuracy: 0.9422 - val_loss: 0.1492
Epoch 4/30
[1m642/642[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9375 - loss: 0.3002 - val_accuracy: 0.9330 - val_loss: 0.1644
Epoch 5/30
[1m642/642[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m60s[0m 94ms/step - accuracy: 0.8809 - loss: 0.3193 - val_accuracy: 0.9669 - val_loss: 0.0895
Epoch 6/30
[1m642/642[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.9375 - loss: 0.1637 - val_accuracy: 0.9665 - val_loss: 0.0891
Epoch 7/30
[1m642/642[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m60s[0m 93ms/step - accuracy: 0.9006 - loss: 0.2662 - val_accuracy: 0.9763 - val_loss: 0.0735
Epoch 8/30
[1m642/642[0m [32m━

## Cell-B
Testing the Model

In [4]:
import os
import numpy as np
import cv2
import tensorflow as tf
from sklearn.metrics import accuracy_score

# Load the trained model
model = tf.keras.models.load_model('hand_sign_model.keras')

def preprocess_image(img_path):
    """Preprocess the image for prediction."""
    # Load the image in grayscale
    img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
    
    # Apply thresholding to binary
    _, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY)
    
    # Resize the image to the size expected by the model
    img = cv2.resize(img, (64, 64))
    
    # Normalize the image to [0, 1]
    img = img / 255.0
    
    # Reshape for the model
    img = img.reshape(1, 64, 64, 1)
    
    return img

def predict_hand_sign(img_path):
    """Predict the hand sign for the given image path."""
    img = preprocess_image(img_path)
    prediction = model.predict(img)
    predicted_class = np.argmax(prediction)
    return predicted_class

def evaluate_model(test_data_dir):
    """Evaluate the model on the test dataset."""
    y_true = []
    y_pred = []

    # Define label mapping based on folder names
    label_mapping = {
        '0': 0, '1': 1, '2': 2, '3': 3, '4': 4, '5': 5,
        'ADD': 6, 'MUL': 7, 'SUB': 8, 'DIV': 9
    }
    
    # Reverse mapping for easy lookup
    reverse_label_mapping = {v: k for k, v in label_mapping.items()}

    # Iterate through each folder in the test data directory
    for label in os.listdir(test_data_dir):
        label_dir = os.path.join(test_data_dir, label)
        if os.path.isdir(label_dir):
            # Check if the label is in the mapping
            if label in label_mapping:
                true_label = label_mapping[label]
                # Iterate through each image file in the folder
                for filename in os.listdir(label_dir):
                    if filename.endswith(".png") or filename.endswith(".jpg"):
                        img_path = os.path.join(label_dir, filename)
                        
                        # Predict the label
                        predicted_class = predict_hand_sign(img_path)
                        
                        # Append true and predicted labels
                        y_true.append(true_label)
                        y_pred.append(predicted_class)
    
    # Calculate accuracy
    accuracy = accuracy_score(y_true, y_pred)
    print(f"Model accuracy on test data: {accuracy * 100:.2f}%")

# Path to the test data directory
test_data_dir = r'data\test'

# Evaluate the model
evaluate_model(test_data_dir)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 283ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 26ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 26ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3

## Cell-C  [The APP]
* Using the Model for Real-Time Numeric Hand Sign Recognition & Real Time On-Screen Expression Evaluation.
* Block Camera by Hand to stop the App & Evaluate the answer on screen
* Do not let your face come in the camera screen, otherwise auto brightness/contrast will over-brighten the fingers, giving poor accuracy


In [1]:
import cv2
import numpy as np
import tensorflow as tf
from collections import deque
from time import time, sleep
import re

# Load the trained model
model = tf.keras.models.load_model('hand_sign_model.keras')

def preprocess_image(img):
    """Preprocess the image for prediction."""
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    _, binary = cv2.threshold(gray, 120, 255, cv2.THRESH_BINARY)
    resized = cv2.resize(binary, (64, 64))
    normalized = resized / 255.0
    reshaped = normalized.reshape(1, 64, 64, 1)
    return reshaped

def predict_hand_sign(img):
    """Predict the hand sign for the given image."""
    preprocessed_img = preprocess_image(img)
    prediction = model.predict(preprocessed_img)
    predicted_class = np.argmax(prediction)
    return predicted_class

def is_frame_black(frame, threshold=20):
    """Check if the frame is almost black."""
    return np.mean(frame) < threshold

def map_prediction_to_symbol(predicted_class):
    """Map the predicted class to a corresponding symbol."""
    symbol_map = {
        0: '0',
        1: '1',
        2: '2',
        3: '3',
        4: '4',
        5: '5',
        6: '+',
        7: '*',
        8: '-',
        9: '/'
    }
    return symbol_map.get(predicted_class, '?')  # Return '?' if the class is unknown

def preprocess_expression(expression):
    """Preprocess the expression to remove leading zeros from numbers."""
    def remove_leading_zeros(match):
        number = match.group()
        return str(int(number))
    processed_expression = re.sub(r'\b0*(\d+)\b', remove_leading_zeros, expression)
    return processed_expression

def evaluate_expression(expression):
    """Evaluate the given mathematical expression string."""
    try:
        processed_expression = preprocess_expression(expression)
        result = eval(processed_expression)
        return result
    except Exception as e:
        return f"Illegal expression: '{expression}'"

# Initialize webcam
cap = cv2.VideoCapture(0)

# Define the box coordinates (start and end points)
box_start = (100, 100)
box_end = (300, 300)

# Initialize a deque to store predictions
predictions = deque(maxlen=150)  # Increase the max length to accumulate more predictions

# Initialize a timer
start_time = time()

# Initialize an empty string to store predicted digits
predicted_digits = ""

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # Check if the frame is black
    if is_frame_black(frame):
        break

    # Draw the ROI box
    cv2.rectangle(frame, box_start, box_end, (0, 255, 0), 2)
    
    # Extract the ROI
    roi = frame[box_start[1]:box_end[1], box_start[0]:box_end[0]]
    
    # Predict the hand sign
    hand_sign = predict_hand_sign(roi)
    symbol = map_prediction_to_symbol(hand_sign)
    
    # Append the prediction to the deque
    predictions.append(symbol)
    
    # If 5 seconds have passed, determine the most frequent prediction
    if time() - start_time > 10:
        most_frequent_prediction = max(set(predictions), key=predictions.count)
        predicted_digits += most_frequent_prediction
        
        # Reset the deque and timer
        predictions.clear()
        start_time = time()
        
    # Display the thresholded image inside the ROI box
    thresholded_roi = preprocess_image(roi).reshape(64, 64) * 255
    thresholded_roi = np.uint8(thresholded_roi)
    thresholded_roi = cv2.resize(thresholded_roi, (box_end[0] - box_start[0], box_end[1] - box_start[1]))
    thresholded_roi_colored = cv2.cvtColor(thresholded_roi, cv2.COLOR_GRAY2BGR)
    frame[box_start[1]:box_end[1], box_start[0]:box_end[0]] = thresholded_roi_colored

    # Display the predicted digits string on the screen
    cv2.putText(frame, f"Predicted digits: {predicted_digits}", (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)

    # Show the frame
    cv2.imshow('Hand Sign Recognition', frame)

    # Wait a bit to slow down the loop
    if cv2.waitKey(300) & 0xFF == ord('q'):  # Slight delay to slow down the loop
        break

# Store the final predicted digits string in a global variable
final_predicted_digits = predicted_digits

# Evaluate the final predicted digits
calculated_value = evaluate_expression(final_predicted_digits)

# Create a blank frame to display the final result
result_frame = np.zeros((480, 640, 3), dtype=np.uint8)

# Display the final predicted digits and the calculated value
cv2.putText(result_frame, f"Expression: {final_predicted_digits}", (50, 200), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)
cv2.putText(result_frame, f"Result: {calculated_value}", (50, 300), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)

# Show the result frame
cv2.imshow('Result', result_frame)
cv2.waitKey(5000)  # Wait for 5 seconds

cap.release()
cv2.destroyAllWindows()


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 108ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 27ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 26ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 27ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 26ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2

# Kaggle Dataset Citation

1. https://www.kaggle.com/datasets/roshea6/finger-digits-05  [Ryan O'Shea]
2. https://www.kaggle.com/datasets/grassknoted/asl-alphabet/data [Akash] 

# Requirements.txt

numpy==1.21.0  
opencv-python==4.5.3.56  
tensorflow==2.5.0  
scikit-learn==0.24.2  
pandas==1.3.0  
matplotlib==3.4.2  
jupyter==1.0.0  

# Aknowledgement

* Thank you to everyone mentioned below :)
* This project development would not have been possible without your support:  



Hackconclave-24 IIT Guwahati

<img src="https://raw.githubusercontent.com/souvikcseiitk/sign-calculator/main/extras/IITG.png" alt="Hackconclave24 IITG" width="200" height="150" />  

Open-AI, Chat-GPT

<img src="https://raw.githubusercontent.com/souvikcseiitk/sign-calculator/main/extras/GPT.PNG" alt="GPT" width="450" height="150" /> 

Stack Overflow

<img src="https://raw.githubusercontent.com/souvikcseiitk/sign-calculator/main/extras/SO.PNG" alt="SO" width="500" height="120" /> 




