This project focuses on developing a model capable of classifying quotes into multiple categories. It leverages a combination of Bidirectional Long Short-Term Memory (BiLSTM) and Attention Mechanism to enhance the accuracy and efficiency of multi-label classification.
- Data Folder: Contains both raw and cleaned datasets used for training and testing the model.
- DataCrawler: This directory includes all the necessary code for data crawling. It is designed to work seamlessly with the Scrapy command-line tool for efficient data scraping.
- Model.ipynb: This Jupyter notebook contains the complete code for training the classification model. It also provides a detailed analysis of the model's performance and results.
- EDA.ipynb: This notebook offers an exploratory data analysis (EDA) of the dataset, providing insights into its structure and key characteristics.
To run and interact with the project, the following technologies and libraries are required:
- NLTK (Natural Language Toolkit)
- Python
- PyTorch
- Scrapy