Spam Message Detector

title	emoji	colorFrom	colorTo	sdk	sdk_version	app_file	pinned
Spam Detector	🛡️	blue	green	docker	0.104.1	app.py	false

Spam Message Detector

A Natural Language Processing (NLP) and machine learning web application built with FastAPI that detects whether a message is spam or not with 98% accuracy.

Live Demo: Try the app online at https://huggingface.co/spaces/Adonpm/spam-detector

Overview

This web application leverages Natural Language Processing (NLP) techniques and a trained Random Forest classifier to determine whether a given text message is spam or legitimate. The model uses Bag of Words (BOW) with n-grams and custom stopword removal to achieve high accuracy.

Features

Clean, simple web interface for message submission
Real-time spam detection with high accuracy
FastAPI backend for speedy processing
Pre-trained model with 98% accuracy rate
Custom stopword handling for enhanced performance

Machine Learning Model

The spam detection model was built using the following approach:

Text Preprocessing:
- Custom stopwords removal (keeping negation words)
- Text normalization and stemming using Porter Stemmer
- Regular expression to filter out non-alphanumeric characters
Feature Engineering:
- Bag of Words (BOW) representation
- N-grams (1-2) to capture phrase context
- Maximum of 2500 features to optimize performance
Model Selection:
- Random Forest Classifier was selected due to its exceptional performance
- 98% accuracy on test data
- Excellent precision and recall metrics

Project Structure

spam-detector/
│
├── .githhub/workflows
│   └── huggingface-deploy.yml   # CI/CD pipeline for deployment
│
├── cv/
│   └── count_vectorizer.pkl     # Fitted CountVectorizer for text transformation
│
├── data/
│   ├── SMSSpamCollection        # Dataset used for training the model
│   └── readme.md                # Dataset documentation
│
├── models/
│   └── spam_model.pkl           # Trained Random Forest model
│
├── nltk_data/                   
│   ├── corpora/                 # NLTK corpora datasets (e.g., stopwords, wordnet)
│   └── tokenizer/               # Tokenizer models and data (e.g., punkt)
│
├── notebooks/
│   └── spam_classifier.ipynb    # Notebook for creating CountVectorizer and ML model
│
├── static/
│   ├── css/                     # CSS styling files
│   └── js/                      # JavaScript files (if any)
│
├── templates/
│   └── index.html               # Frontend HTML template
│
├── .gitattributes               # Git LFS configuration file
│
├── app.py                       # FastAPI application
│
├── Dockerfile                   # Docker containerization file
│
├── README.md                    # Project documentation (this file)
│
└── requirements.txt             # Python dependencies

Installation

Clone this repository:

git clone https://github.com/yourusername/spam-detection-app.git
cd spam-detection-app

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Download NLTK resources:

import nltk
nltk.download('stopwords')
nltk.download('punkt')

Usage

Start the FastAPI server using uvicorn:
```
uvicorn app:app --reload --port 8000
```
Open your browser and navigate to:
```
http://127.0.0.1:8000/
```
Enter a message in the input field and click "Submit" to see the prediction.

API Endpoints

GET / - Displays the home page with the message input form
POST /predict - Accepts a message and returns the prediction (Spam/Not Spam)

Model Performance

The Random Forest model achieved:

Accuracy: 98.5%
Confusion Matrix:
```
[[955   0]
 [ 17 143]]
```

Classification Report:

           precision    recall  f1-score   support
        0       0.98      1.00      0.99       955
        1       1.00      0.89      0.94       160

This indicates that:

The model correctly identified all legitimate messages (955/955)
The model correctly identified 89% of spam messages (143/160)
17 spam messages were incorrectly classified as legitimate (false negatives)
No legitimate messages were incorrectly classified as spam (false positives)

Technologies Used

Backend: FastAPI
Frontend: HTML, Jinja2 templates
Machine Learning: scikit-learn, NLTK
Natural Language Processing (NLP): Text preprocessing, tokenization, stemming, stopword removal
Data Processing: Pandas, NumPy
Model Serialization: Pickle
Containerization: Docker

Deployment

This project is containerized using Docker for easy deployment and scalability.

Dockerized App: The application is packaged with a Dockerfile, exposing port 7860 and ready to run in any container environment.
CI/CD Pipeline: The GitHub repository is integrated with a CI/CD pipeline that automates testing, building, and deploying the application whenever code is pushed.
Hugging Face Spaces: The app is deployed live on Hugging Face Spaces, enabling easy sharing and real-time use of the Spam Detector.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spam Message Detector

Table of Contents

Overview

Features

Machine Learning Model

Project Structure

Installation

Usage

API Endpoints

Model Performance

Technologies Used

Deployment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
cv		cv
data		data
models		models
nltk_data		nltk_data
notebooks		notebooks
static/css		static/css
templates		templates
.gitattributes		.gitattributes
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Adonpm/spam-detection-app

Folders and files

Latest commit

History

Repository files navigation

Spam Message Detector

Table of Contents

Overview

Features

Machine Learning Model

Project Structure

Installation

Usage

API Endpoints

Model Performance

Technologies Used

Deployment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages