Dimensionality Reduction and clustering + interprability

This project is comprised of 2 parts:

Tokenizing the text using:
- Word2Vec
- GloVe
Clustering on orignal data (Classic 4 data which comprises of scientific articles' abstracts , and BBC data which contains headlines from the bbc news channel )
- This is done on the data without any dimensionality reduction technique
- We will be using this to benchmark against later methods
Using Tandem methods, this is done by using :
- dimensionality reduction technique ( PCA , t-SNE , UMAP , Autoencoders)
- Followed by a clustering technique ( Kmeans, Spherical Kmeans , factorial Kmeans , Hierarchal clustering(WARD , Complete , Linkage and Single Metrics), and HDBSCAN
Clustering using combined methods ( where the dimension reduction and clustering are done at the same time ):
- Reduced kmeans
- Factorial kmeans
- Deep Clustering Network
- deep KMeans

Tokenizing the text using :
- BERT
- RoBERTa
Clustering on orignal data (classic4 data consists of abstracts of scientific articles, BBC data which contains headlines from the bbc news channel ,article1 data which comprises of news headlines , article2 data which comprises of wikipedia summaries.)
- This is done on the data without any dimensionality reduction technique
- We will be using this to benchmark against later methods.
Using Tandem methods, this is done by using :
- dimensionality reduction technique ( PCA , t-SNE , UMAP , Autoencoders)
- Followed by a clustering technique ( Kmeans, Spherical Kmeans , factorial Kmeans , Hierarchal clustering(WARD , Complete , Linkage and Single Metrics), and HDBSCAN.
Clustering using combined methods ( where the dimension reduction and clustering are done at the same time ):
- Reduced kmeans
- Factorial kmeans
- Deep Clustering Network
- deep KMeans

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Dimensionality_Reduction_classic4_data.ipynb		Dimensionality_Reduction_classic4_data.ipynb
README.md		README.md