Skip to content

Unsupervised learning. My Natural Language Processing project on Topic Extraction and Text Clustering.

License

Notifications You must be signed in to change notification settings

vectorkoz/my-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

my-nlp

This repo contains my Natural Language Processing projects.

topic-extraction-text-clustering

This project looked at two separate NLP problems:

  • Clustering texts with unknown labels,
  • Selecting most relevant words for each label from texts with known labels.

This project was completed using Python, Jupyter Notebook, scikit-learn, pandas, numpy, matplotlib, seaborn.

The texts in the project had labels that split them into distinct groups. It was clearly shown that it was possible to represent texts via TF or TF-IDF, reduce dimensionality via T-distributed Stochastic Neighbor Embedding and use a clustering algorithm to produce good clusters that aligned with the actual text labels (see below). Clustering visualization

It was also shown that it's possible to obtain text topics by applying Non-Negative Matrix Factorization (NNMF) to TF-IDF data. Using NNMF, it's possible to extract most relevant words for each topic, and therefore, since NNMF topics usually aligned well with the original text labels, for the original labelled "topics" as well (see below). Top 10 words

Links

About

Unsupervised learning. My Natural Language Processing project on Topic Extraction and Text Clustering.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published