Skip to content

Flask app predicting book genres from summaries using CatBoost, with a cosmic-themed UI, deployed on AWS EC2.

Notifications You must be signed in to change notification settings

jarif87/lit-genre-classifier

Repository files navigation

Book Genre Predictor

A Flask-based web application that predicts the genre of a book based on its plot summary using a pre-trained CatBoost classifier and CountVectorizer. The app is built with a modern, visually appealing front-end using Bootstrap 5 and custom CSS, leveraging the CMU Book Summary Dataset for training. Deployed on an AWS EC2 instance for accessibility.

Table of Contents

Features

  • Predicts book genres (Fantasy, Science Fiction, Crime Fiction, Historical Novel, Horror, Thriller) from user-provided summaries.
  • Modern, responsive UI with a cosmic-themed gradient background, glassmorphism effects, and animations.
  • Input validation to handle empty summaries with user-friendly error messages.
  • Deployed on AWS EC2 for scalable, cloud-based access.
  • Uses a pre-trained CatBoost classifier and CountVectorizer for accurate genre prediction.

Technologies

  • Backend: Python 3, Flask
  • Machine Learning: CatBoost, scikit-learn, NLTK
  • Frontend: HTML, Bootstrap 5, Custom CSS
  • Deployment: AWS EC2, PuTTY, PuTTYgen, WinSCP
  • Dependencies: Listed in requirements.txt

Prerequisites

  • Local Development:
    • Python 3.8+
    • pip
    • Git
  • AWS Deployment:
    • AWS account with an EC2 instance (Ubuntu recommended)
    • PuTTY and PuTTYgen for SSH access
    • WinSCP for file transfer
    • Security group configured to allow HTTP (port 80) and SSH (port 22)
  • Files:
    • catboostclassifier.pkl (pre-trained model)
    • count_vectorizer.pkl (fitted CountVectorizer)
    • requirements.txt (dependency list)

Setup Instructions

Local Setup

  1. Clone the Repository:
    git clone <repository-url>
    cd book-genre-predictor
    
  2. Install Dependencies:
  • Ensure Python 3 is installed:
python3 --version
  1. Install required packages:

pip install -r requirements.txt
  1. Download NLTK data:

import nltk
nltk.download('stopwords')
nltk.download('wordnet')

  1. Verify Model and Vectorizer:
import pickle
from sklearn.feature_extraction.text import CountVectorizer

model_file = open('catboostclassifier.pkl', 'rb')
model = pickle.load(model_file)
model_file.close()

vectorizer_file = open('count_vectorizer.pkl', 'rb')
vectorizer = pickle.load(vectorizer_file)
vectorizer_file.close()

text = "A magical world where dragons rule the skies."
vector = vectorizer.transform([text])
prediction = model.predict(vector)
print(prediction)
  1. Run the App
python app.py

AWS EC2 Deployment

  • Launch an EC2 Instance:
  • Create an Ubuntu EC2 instance (e.g., t2.micro).
  • Download the .pem key file.
  • Configure the security group to allow:
  • SSH (port 22) for PuTTY access.
  • HTTP (port 80) for web access.

Convert PEM to PPK:

  • Use PuTTYgen to convert the .pem key to .ppk:
  • Open PuTTYgen, load the .pem file, and save as .ppk.

Transfer Files with WinSCP:

  • Connect to the EC2 instance using WinSCP:

  • Hostname:

  • Username: ubuntu

  • Private key: Select the .ppk file.

  • Upload the project files (app.py, templates/index.html, catboostclassifier.pkl, count_vectorizer.pkl, requirements.txt) to /home/ubuntu/book-genre-predictor.

SSH into EC2 with PuTTY:

  • Open PuTTY, set:
  • Hostname: ubuntu@
  • Port: 8080
  • Private key: Load the .ppk file under Connection > SSH > Auth.
  • Connect to the instance.

Install Dependencies on EC2:

Update the package list and install Python:


sudo apt-get update
sudo apt-get install python3 python3-pip
sudo apt install python-is-python3

Navigate to the project directory:

cd /home/ubuntu/book-genre-predictor

Install Python packages:

pip install -r requirements.txt --break-system-packages

Install NLTK data:


import nltk
nltk.download('stopwords')
nltk.download('wordnet')

Run the Flask App:

python app.py
  • Access the app at http://:8080.

  • To keep the app running after closing PuTTY, use nohup:


nohup python app.py &

  • To stop the app, find the process ID and kill it:

ps aux | grep python
kill <pid>

Project Structure

book-genre-predictor/
├── app.py                    # Flask application
├── templates/
│   └── index.html            # HTML template with Bootstrap 5 and custom CSS
├── catboostclassifier.pkl    # Pre-trained CatBoost model
├── count_vectorizer.pkl      # Fitted CountVectorizer
├── requirements.txt          # Python dependencies
└── README.md                 # This file

Usage

  • Open the app in a browser (http://:8080 or http://127.0.0.1:8080 locally).

  • Enter a book summary in the textarea (e.g., "A wizard embarks on a quest to defeat a dark sorcerer in a magical realm.").

  • Click "Predict Genre" to see the predicted genre (e.g., Fantasy, Thriller).

  • If the summary is empty, an error message will appear.

  • The result displays the first 100 characters of the summary and the predicted genre in a styled alert.

About

Flask app predicting book genres from summaries using CatBoost, with a cosmic-themed UI, deployed on AWS EC2.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages