Skip to content
No description, website, or topics provided.
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
pca.py
pca_test7.py
test0.py
test1.py
test2.py
test2kmeans.py
test2littlegroups.py
test3.py
test4.py
test5.py
test5kmeans.py
test6.py
test7.py
test7littlegroups.py
test7random.py
test8.py
test9.py
test_init.py

README.md

doc_clustering

To run the basic K-Means clustering with PYSPARK with K=20: run test_init, test0, test1, test2kmeans, test7. Choose your initialization in test7 (random, kmeans++ or kmeans||).

To run the basic K-Means clustering with PYSPARK with K=7: run test_init, test0, test1, test2littlegroups, tesst7littlegroups

To run NaiveBayses: run test_init, test0, test1, test2, test3, test4, test5, test6

To run the standard K-means but with the PCA additional step: run test_init, test0, test1, test2kmeans, pca, pca_test7

To shuffle data from PYSPARK to SPARKR: use the files test8 and test9. Follow the commented instrction in test9 to run the files in SPARKR (you will need to install sparcl).

You can’t perform that action at this time.