SCluster

an implementation of spectral clustering for documents

Homepage: http://github.com/whym/scluster

Contact: http://whym.org

Overview

Spectral clustering a modern clustering technique considered to be effective for image clustering among others. [1] [2]

This software find clusters among documents based on the bag-of-words representation [3] and TF-IDF weighting [4].

[1]	Ulrike von Luxburg, A Tutorial on Spectral Clustering, 2006. http://arxiv.org/abs/0711.0189

[2]	Chris H. Q. Ding, Spectral Clustering, 2004. http://ranger.uta.edu/~chqding/Spectral/

[3]	http://en.wikipedia.org/wiki/Bag_of_words_model

[4]	http://en.wikipedia.org/wiki/Tf%E2%80%93idf

Requirements

Following softwares are required.

Python 2.7 or 3.4
Numpy
Scipy

How to use

Clone this repository.
Prepare documents as raw-text files, and put them in a directory, for example, 'reuters'.
Prepare a category file. For example, 'cats.txt' may contain:
```
14833 palm-oil veg-oil
14839 ship
```
This means that the file '14833' has 'palm-oil' and 'veg-oil' as its categories, and '14839' has 'ship' as its category.
Run: python -m scluster.clusterer cats.txt reusters/ -m kmeans,

Notes

When you use the Reuters set, notice No 17980 might contain non-Unicode character at Line 10. It should probably read: "world economic growth-side measures ..."

[5]	http://www.daviddlewis.com/resources/testcollections/reuters21578/

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
scluster		scluster
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.rst		README.rst
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SCluster

an implementation of spectral clustering for documents

Overview

Requirements

How to use

Notes

About

Releases

Packages

Languages

Homepage:	http://github.com/whym/scluster
Contact:	http://whym.org

License

whym/scluster

Folders and files

Latest commit

History

Repository files navigation

SCluster

an implementation of spectral clustering for documents

Overview

Requirements

How to use

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages