STA663-Final-Project: Scalable K-Means

Team Members: Yunlu Hao, Lianghui Li

Include:

Kmeans.ipynb: main file
data: include all data files
test: intermediate code
readme.md

Package:

package_kmeans

This is the final project for course STA663. In this project, we implement the K-means++ and K-means||algorithm in Python following Bahmani's paper "Scalable k-means++" and speed up using Cython,JIT and multi-core processing.

We apply it to the simulated GAUSSMIXTURE dataset, SPAM and Housing dataset from UC Irvine Machine Learning repository. In this part, we compare clustering cost and runtime of the k-means|| algorithm with these of random initialization and the k-means++.

From the implementation on the simulated dataset and the real-world dataset, the k-means|| and k-means++ find a better initial centroids than random in most cases, which leads to a better final clustering performance. However, for our algorithm, k-means++ runs faster than k-means|| in most cases, in future anlysis, we should focus more on how to implement k-means|| in a more robust setting(in terms of cost reduction and better parallel application.

We also created an easily instable package for our functions, which can be found in https://github.com/yumegaaru/package_kmeans

To install, please go to Terminal and type in:

!pip install --upgrade pip git+git://github.com/yumegaaru/kmeans_package.git

then run the following code in your python:

import kemeans_package

Data and test code used in our analysis are also provided in folders under this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
data		data
testing		testing
Kmeans.ipynb		Kmeans.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

testing

testing

Kmeans.ipynb

Kmeans.ipynb

LICENSE

LICENSE

README.md

README.md

Repository files navigation

STA663-Final-Project: Scalable K-Means

Team Members: Yunlu Hao, Lianghui Li

About

Releases

Packages

Contributors 2

Languages

License

yumegaaru/Scalable_Kmeans

Folders and files

Latest commit

History

Repository files navigation

STA663-Final-Project: Scalable K-Means

Team Members: Yunlu Hao, Lianghui Li

About

Resources

License

Stars

Watchers

Forks

Languages