This is an extension for the MOA framework which provides an experiment environment for the BICO algorithm. In order to use this extension, you must add the moa-bico-experiment.jar
to the classpath when launching MOA.
Example:
$ java -cp moa-bico-experiment.jar:moa.jar -javaagent:sizeofag.jar moa.gui.GUI
For a clustering experiment with the MOA framework you should use the following settings.
Stream: class moa.streams.clustering.SimpleCSVStream
csvFile
: path to input filesplitChar
: input CSV split character (optional, typical,
)classIndex
:true
if the last component of an input point is a class label (optional, typicalfalse
)decayHorizon
: number of input pointsdecayThreshold
: (is not needed)evaluationFrequency
: number of input points
Algorithm: class moa.clusterers.bico.BICO
Cluster
: number of desired centersDimensions
: dimension of an input pointMaxClusterFeatures
: coreset size (typicalCluster * 200
)Projections
: number of random projections used for nearest neighbor search in first level (typicalDimensions
)evaluateMicroClustering
:true
if the coreset should be the result (optional, typicalfalse
)
$ java -cp moa-bico-experiment.jar:moa.jar -javaagent:sizeofag.jar moa.DoTask EvaluateClustering -s \(moa.streams.clustering.SimpleCSVStream -f csvFile -s splitChar -c classIndex -h decayHorizon -e evaluationFrequency\) -l \(moa.clusterers.bico.BICO -k Cluster -d Dimensions -n MaxClusterFeatures -p Projections -M evaluateMicroClustering\) -i instanceLimit -d dumpFile
csvFile
: path to input filesplitChar
: input CSV split character (optional, typical,
)classIndex
:true
if the last component of an input point is a class label (optional, typicalfalse
)decayHorizon
: number of input pointsevaluationFrequency
: number of input pointsCluster
: number of desired centersDimensions
: dimension of an input pointMaxClusterFeatures
: coreset size (typicalCluster * 200
)Projections
: number of random projections used for nearest neighbor search in first level (typicalDimensions
)evaluateMicroClustering
:true
if the coreset should be the result (optional, typicalfalse
)instanceLimit
: number of input pointsdumpFile
: path to summary file (comma-separated)
This extension also provides three different classes to launch the BICO algorithm without the MOA framework.
$ java -cp moa-bico-experiment.jar:moa.jar -javaagent:sizeofag.jar moa.clusterers.bico.experiment.Experiments input n k d space output projections [seed]
input
: path to input file (space-separated)n
: number of input pointsk
: number of desired centersd
: dimension of an input pointspace
: coreset size (typicalk * 200
)output
: path to output file (space-separated)projections
: number of random projections used for nearest neighbor search in first level (typicald
)seed
: random seed (optional)
$ java -cp moa-bico-experiment.jar:moa.jar -javaagent:sizeofag.jar moa.clusterers.bico.experiment.Quickstart input n k d space output projections [splitchar [seed]]
input
: path to input filen
: number of input pointsk
: number of desired centersd
: dimension of an input pointspace
: coreset size (typicalk * 200
)output
: path to output file (space-separated)projections
: number of random projections used for nearest neighbor search in first level (typicald
)splitchar
: input CSV split character (optional, typical,
)seed
: random seed (optional)
$ java -cp moa-bico-experiment.jar:moa.jar -javaagent:sizeofag.jar moa.clusterers.bico.experiment.Script input k [r]
With this class the BICO algorithm evaluates all points of the input file, uses k * 200
as coreset size and the dimension of the input points as number of random projections used for nearest neighbor search in first level.
input
: path to input file (comma-separated)k
: number of desired centersr
: number of runs (optional, typical1
)
Hendrik Fichtenberger, Marc Gillé, Melanie Schmidt, Chris Schwiegelshohn, Christian Sohler: BICO: BIRCH Meets Coresets for k-Means Clustering. ESA 2013: 481-492 (2013) http://ls2-www.cs.tu-dortmund.de/bico/
See License