Truncated variational GMM

This is the variational GMM algorithm of the paper:
Can clustering scale sublinearly with its clusters? A variational EM acceleration of GMMs and k-means
Dennis Forster and Jörg Lücke, AISTATS, 2018

1. Prerequisites

Python versions: 2.7.x or 3.x
Python packages:
- numpy
- scipy
- sklearn
- matplotlib
- mpi4py
- kmc2

2. Execution

For execution, run
python main.py [algorithm] [parameters]
where [algorithm] is either var-GMM-X, var-GMM-X+1, var-GMM-S or var-GMM-S+1,
and [parameters] is an optional list of free model parameters.

To start, you can for example simply run python main.py var-GMM-S+1

3. Parameters

dataset=BIRCH[dims]-[clusters]-[nsamples], default: dataset=BIRCH2-400-100. Parameters for artificial BIRCH data set. [dims] is the dimensionality and [clusters] is the number of BIRCH clusters and [nsamples] is the number of data points per clusters.
C=[int], default: C=400. Number of clusters.
Cprime=[int], default: Cprime=5. Truncation paramer: number of non-zero values in posteriors. Cprime=1 recovers k-means-like algorithms.
G=[int], default: G=5. Cluster neighborhood size. Number of clusters that are considered as neighbors to each cluster c (including c itself) in the variational algorithms.
Niter=[int], default: Niter=25. Number of learning iterations.
Ninit=[int], default: Ninit=5. Number of initial E-step iterations to find better initial neighbor guesses.
VERBOSE=[True/False], default: VERBOSE=True. If True the loglikelihood, free energy, quantization error, purity, NMI, AMI and number of distance evaluations in each iteration are calculated and saved in the /output/ directory, as well as an image of the current clusters and the data set. If False, only the number of distance evaluations are counted and saved per iteration. After learning is complete, all final values will be calculated and saved.

4. Disclaimer

This algorithm is not optimized for fast execution speed. As measure for speed, we show the number of distance evaluations wrt the number of distance evaluations used by full GMMs/k-means per iteration.

For an implementation of the var-GMM-S algorithm with focus on fast real-time execution speed, see the following repository which uses C++ source code vc-GMM. The repository is part of the github-organization Variational Sublinear Clustering, which will integrate further developments/repositories.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Logs		Logs
output		output
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
truncated_GMM.py		truncated_GMM.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Truncated variational GMM

1. Prerequisites

2. Execution

3. Parameters

4. Disclaimer

About

Releases

Packages

Contributors 2

Languages

variational-sublinear-clustering/var-GMM

Folders and files

Latest commit

History

Repository files navigation

Truncated variational GMM

1. Prerequisites

2. Execution

3. Parameters

4. Disclaimer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages