my-nlp

This repo contains my Natural Language Processing projects.

topic-extraction-text-clustering

This project looked at two separate NLP problems:

Clustering texts with unknown labels,
Selecting most relevant words for each label from texts with known labels.

This project was completed using Python, Jupyter Notebook, scikit-learn, pandas, numpy, matplotlib, seaborn.

The texts in the project had labels that split them into distinct groups. It was clearly shown that it was possible to represent texts via TF or TF-IDF, reduce dimensionality via T-distributed Stochastic Neighbor Embedding and use a clustering algorithm to produce good clusters that aligned with the actual text labels (see below).

It was also shown that it's possible to obtain text topics by applying Non-Negative Matrix Factorization (NNMF) to TF-IDF data. Using NNMF, it's possible to extract most relevant words for each topic, and therefore, since NNMF topics usually aligned well with the original text labels, for the original labelled "topics" as well (see below).

Links

Data source - texts of news articles posted on CNN and FOX News websites in 2014, labelled by the news website sections.
All code for the project
Detailed project report

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
topic-extraction-text-clustering		topic-extraction-text-clustering
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

my-nlp

topic-extraction-text-clustering

Links

About

Releases

Packages

Languages

License

vectorkoz/my-nlp

Folders and files

Latest commit

History

Repository files navigation

my-nlp

topic-extraction-text-clustering

Links

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages