Random projection library for Python, converting a dictionary to low-dimensional numpy matrix
Python
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README
pyrandomprojection.py

README

pyrandomprojection
------------------

    by Joseph Turian

Random projection library for Python, converting a dictionary to
low-dimensional numpy vector

"Random projection is a simple geometric technique for reducing
the dimensionality of a set of points in Euclidean space while
preserving pairwise distances approximately" -Santosh Vempala
<http://www-math.mit.edu/~vempala/rp.html>

See also:
    http://github.com/turian/random-indexing-wordrepresentations
A more end-to-end package (rather than drop-in function) for specifically
inducing word representations over a corpus, using random indexing
(a specific application of random projection).

WARNING:
    * Read warnings in common.gaussian and common.deterministicrandom:
        http://github.com/turian/common/blob/master/gaussian.py
        http://github.com/turian/common/blob/master/deterministicrandom.py
    that indicate potential issues with the RNG and hash function.

NOTE:
    * The runtime is typically bounded by the number of calls to
    randomrow. If you have a single instance, simply call project()
    on it. If you have a list of instances, you shouldn't just call
    project on each instance, because that will be #nonzeros * randomrow
    calls. Instead, iterate over the columns (i.e. iterate feature-major
    order, not instance-major order), so you only do one randomrow op
    per feature type. There is example code that does this here:
        http://github.com/glorotxa/DeepANN/blob/master/exp_scripts/randomprojection.py

#import math
#import murmur

TODO:
    Need test suite to make sure that values are consistent across
    architectures.

REQUIREMENTS:
    * numpy
        Output is put in a (dense) numpy vector

    * My python common library:
        http://github.com/turian/common

        which in turn will require:

        * Murmur:
            easy_install murmur
            http://pypi.python.org/pypi/Murmur/
            Provides fast murmur hashes for strings, files, and ziped files.