Skip to content

TopAgrume/NLP_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP_Project: Poem Classification and Generation

Project Overview

This project focuses on the classification and generation of poems, as well as web scraping to create our own dataset. The project is divided into several components, each utilizing different technologies and frameworks.

Datasets Used

  1. First dataset for generation: Kaggle - Poetry Foundation Poems
  2. Second dataset for generation: Kaggle - Complete Poetryfoundationorg Dataset
  3. Kaggle dataset for generation: Kaggle - Poem Classification NLP
  4. Our first dataset for classification (144 possible classes): Kaggle - Poems Dataset NLP (topics part)
  5. Creation of our own dataset for classification (5 possible classes): Kaggle - Poems Classification Dataset
  6. Poetry Foundation Terms of Service for Robots: Poetry Foundation Robots.txt

Our dataset was made by scraping the Poetry Foundation website for classification. It contains five different topics: nature, art & sciences, love, relationships, and religion, which are fairly well distributed.

See: Kaggle Dataset

Technologies and Frameworks Used

src
├── classification
│   ├── FNN
│   ├── Logistic Regression & Naive Bayes
│   ├── RNN / LSTM
│   ├── Transformers
│   └── XGBoost
└── generation
    ├── Ngram
    ├── Transformers
    └── RNN

Project Results

images/results.png

Poem Generation Examples

images/gpt2-examples.png

Members

  • angelo.eap
  • valentin.san
  • christophe.nguyen
  • alexandre.devaux-riviere
  • paul.duhot
  • mael.reynaud

About

Poems classification and generation

Topics

Resources

License

Stars

Watchers

Forks

Contributors 6