SCHISM software

Authors - Karlton Sequeira, Mohammed Zaki (2004)

SCHISM finds interesting subspace clusters.

K. Sequeira, M. J. Zaki. SCHISM: A New Approach for Interesting Subspace Mining. In the Proceedings of the Fourth IEEE Conference On Data Mining. 2004.

Karlton Sequeira and Mohammed J. Zaki. SCHISM: a new approach to interesting subspace mining. International Journal of Business Intelligence and Data Mining, 1(2):137–160, 2005. doi:10.1504/IJBIDM.2005.008360.

FUNCTIONALITY

dataGen creates horizontal format ASCII high-dimensional datasets having embedded subspaces

convertData converts horizontal format ASCII high-dimensional datasets output by dataGen to a suitable format i.e. vertical/binary/IBM/WEKA/LDR/MAFIA having embedded subspaces (Only IBM formats have been recently tested)

schism finds the embedded subspaces

RUNNING THEM

Command line arguments for convertData and dataGen can be found by running them without any arguments.

typically run them as

    dataGen -o /tmp/schism/data/swhatever.ha -d 60

This creates a horizontal ASCII dataset with default parameters and 60 dims.

    convertData -i /tmp/schism/data/swhatever -d 60

This creates a IBM format dataset with default parameters from the horizontal ASCII file earlier created.

    schism -i /tmp/schism/data/swhatever.ibm

This mines the IBM format file for embedded subspaces using default parameters

It prints "<Execution time(minutes)> <# interesting subspaces found> ". To print the actual subspaces, add parameter " -o 1".

In order to run these programs using as few parameters involves suffixing the horizontal ASCII file by ".ha" and placing all new files in the /tmp/schism/data directory

Example use may be seen in the shell scripts dataSCHISM.sh and testSCHISM.sh. Paths must be appropriately changed to run the scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Algorithms.h		Algorithms.h
Centre.h		Centre.h
Cluster.h		Cluster.h
ClusterSet.h		ClusterSet.h
Common.h		Common.h
Dim.h		Dim.h
Graph.gnu		Graph.gnu
LICENSE		LICENSE
Makefile		Makefile
Point.h		Point.h
README.md		README.md
VerticalCluster.cc		VerticalCluster.cc
calcdb.cpp		calcdb.cpp
calcdb.h		calcdb.h
chashtable.cpp		chashtable.cpp
chashtable.h		chashtable.h
convertData.cc		convertData.cc
convertData.h		convertData.h
dataGen.cc		dataGen.cc
dataMatch.sh		dataMatch.sh
dataSCHISM.sh		dataSCHISM.sh
eclat		eclat
eclat.cpp		eclat.cpp
eclat.h		eclat.h
enumerate.cpp		enumerate.cpp
eqclass.cpp		eqclass.cpp
eqclass.h		eqclass.h
extract.sh		extract.sh
matcher.cc		matcher.cc
maximal.cpp		maximal.cpp
maximal.h		maximal.h
stats.cpp		stats.cpp
stats.h		stats.h
testMatch.sh		testMatch.sh
testSCHISM.sh		testSCHISM.sh
timetrack.h		timetrack.h

License

zakimjz/SCHISM

Folders and files

Latest commit

History

Repository files navigation

SCHISM software

FUNCTIONALITY

RUNNING THEM

About

Topics

Resources

License

Stars

Watchers

Forks

Languages