A Flask-based web application that predicts the genre of a book based on its plot summary using a pre-trained CatBoost classifier and CountVectorizer. The app is built with a modern, visually appealing front-end using Bootstrap 5 and custom CSS, leveraging the CMU Book Summary Dataset for training. Deployed on an AWS EC2 instance for accessibility.
- Features
- Technologies
- Prerequisites
- Setup Instructions
- Project Structure
- Usage
- Dataset
- Troubleshooting
- Contributing
- License
- Predicts book genres (Fantasy, Science Fiction, Crime Fiction, Historical Novel, Horror, Thriller) from user-provided summaries.
- Modern, responsive UI with a cosmic-themed gradient background, glassmorphism effects, and animations.
- Input validation to handle empty summaries with user-friendly error messages.
- Deployed on AWS EC2 for scalable, cloud-based access.
- Uses a pre-trained CatBoost classifier and CountVectorizer for accurate genre prediction.
- Backend: Python 3, Flask
- Machine Learning: CatBoost, scikit-learn, NLTK
- Frontend: HTML, Bootstrap 5, Custom CSS
- Deployment: AWS EC2, PuTTY, PuTTYgen, WinSCP
- Dependencies: Listed in
requirements.txt
- Local Development:
- Python 3.8+
- pip
- Git
- AWS Deployment:
- AWS account with an EC2 instance (Ubuntu recommended)
- PuTTY and PuTTYgen for SSH access
- WinSCP for file transfer
- Security group configured to allow HTTP (port 80) and SSH (port 22)
- Files:
catboostclassifier.pkl
(pre-trained model)count_vectorizer.pkl
(fitted CountVectorizer)requirements.txt
(dependency list)
- Clone the Repository:
git clone <repository-url> cd book-genre-predictor
- Install Dependencies:
- Ensure Python 3 is installed:
python3 --version
- Install required packages:
pip install -r requirements.txt
- Download NLTK data:
import nltk
nltk.download('stopwords')
nltk.download('wordnet')
- Verify Model and Vectorizer:
import pickle
from sklearn.feature_extraction.text import CountVectorizer
model_file = open('catboostclassifier.pkl', 'rb')
model = pickle.load(model_file)
model_file.close()
vectorizer_file = open('count_vectorizer.pkl', 'rb')
vectorizer = pickle.load(vectorizer_file)
vectorizer_file.close()
text = "A magical world where dragons rule the skies."
vector = vectorizer.transform([text])
prediction = model.predict(vector)
print(prediction)
- Run the App
python app.py
- Launch an EC2 Instance:
- Create an Ubuntu EC2 instance (e.g., t2.micro).
- Download the .pem key file.
- Configure the security group to allow:
- SSH (port 22) for PuTTY access.
- HTTP (port 80) for web access.
- Use PuTTYgen to convert the .pem key to .ppk:
- Open PuTTYgen, load the .pem file, and save as .ppk.
-
Connect to the EC2 instance using WinSCP:
-
Hostname:
-
Username: ubuntu
-
Private key: Select the .ppk file.
-
Upload the project files (app.py, templates/index.html, catboostclassifier.pkl, count_vectorizer.pkl, requirements.txt) to /home/ubuntu/book-genre-predictor.
- Open PuTTY, set:
- Hostname: ubuntu@
- Port: 8080
- Private key: Load the .ppk file under Connection > SSH > Auth.
- Connect to the instance.
sudo apt-get update
sudo apt-get install python3 python3-pip
sudo apt install python-is-python3
cd /home/ubuntu/book-genre-predictor
pip install -r requirements.txt --break-system-packages
import nltk
nltk.download('stopwords')
nltk.download('wordnet')
python app.py
-
Access the app at http://:8080.
-
To keep the app running after closing PuTTY, use nohup:
nohup python app.py &
- To stop the app, find the process ID and kill it:
ps aux | grep python
kill <pid>
book-genre-predictor/
├── app.py # Flask application
├── templates/
│ └── index.html # HTML template with Bootstrap 5 and custom CSS
├── catboostclassifier.pkl # Pre-trained CatBoost model
├── count_vectorizer.pkl # Fitted CountVectorizer
├── requirements.txt # Python dependencies
└── README.md # This file
-
Open the app in a browser (http://:8080 or http://127.0.0.1:8080 locally).
-
Enter a book summary in the textarea (e.g., "A wizard embarks on a quest to defeat a dark sorcerer in a magical realm.").
-
Click "Predict Genre" to see the predicted genre (e.g., Fantasy, Thriller).
-
If the summary is empty, an error message will appear.
-
The result displays the first 100 characters of the summary and the predicted genre in a styled alert.