This is a class I put together as an exercise in programming manchine learing algorithms. The methods here work with 2D Numpy arrays, as these are extremely fast for matrix manipulation.
With this class, you can cluster items with Train.KMeansClustering()
and then pass the generated cluster centroids to Train.NearestNeighborClassifier()
to classify unseen data. The latter is essentially a K Nearest Neighbor algorithm with k=1.
A working demonstration of the class is shown in run_k-means_on_20-newsgroups-dataset.ipynb
, where I cluster documents from the 20 Newsgroups data set. The clusters generated by Train.KMeansClustering()
allow for classification with Train.NearestNeighborClassifier()
on an unseen test set. This demonstration validates the machine learning algorithms via a high test-set classification accuracy of 82.50%.
- K Means Clustering Algorithm
- K Means Model Classifier
- Random Cluster Centroid Generator
- Various similarity functions