This repo is no longer actively being maintained. Don't be dissapointed though, check out https://github.com/waylonflinn/bvec instead!
Fast Dot Products on Pretty Big Data
Bdot does big dot products (by making your RAM bigger on the inside). It's based on Bcolz and includes transparent disk-based storage.
matrix . vector and
matrix . matrix for most common numpy numeric data types (
pip install bdot
or build from source (requires bcolz >= 0.9.0)
python setup.py build_ext --inplace python setup.py install
Matrix . Vector
Multiply a matrix (
carray) with a vector (
numpy.ndarray), returns a vector (
import bdot import numpy as np matrix = np.random.random_integers(0, 12000, size=(300000, 100)) bcarray = bdot.carray(matrix, chunklen=2**13, cparams=bdot.cparams(clevel=2)) v = bcarray result = bcarray.dot(v) expected = matrix.dot(v) # should return True (expected == result).all()
Matrix . Matrix
Multiply a matrix (
carray) with the transpose of a matrix (
carray), returns a matrix (
import bdot import numpy as np matrix = np.random.random_integers(0, 120, size=(1000, 100)) bcarray1 = bdot.carray(matrix, chunklen=2**9, cparams=bdot.cparams(clevel=2)) bcarray2 = bdot.carray(matrix, chunklen=2**9, cparams=bdot.cparams(clevel=2)) # calculates bcarray1 . bcarray2.T (transpose) result = bcarray1.dot(bcarray2) expected = matrix.dot(matrix.T) # should return True (expected == result).all()
Save Result to Disk (Experimental)
Save really big results directly to disk
# create correctly sized container (helper method, not required) output = bcarray1.empty_like_dot(bcarray2, rootdir='/path/to/bcolz/output') # generate results directly on disk bcarray1.dot(bcarray2, out=output) # make sure the last bits get written output.flush()
out parameter can also be used to get
carray output with an
ndarray vector input. If you don't want disk based storage, just leave out the
rootdir parameter. You can also use your own
carray container, as long as it's the correct shape.
Benchmarks were done on data structures generated by the above code, are very informal, and vary a bit across data sets.
compression ratio: 3.5
percent performance: 68%
This project has three goals, each slightly more fantastic than the last:
Allow computation on (compressed) data which is (~5-10x) larger than RAM at approximately the same speed as
Allow computation on (slightly compressed) data at speeds that improve on
Allow computation on (compressed) data which resides on disk at some sizable percentage (~50-30%) of the speed of
So far, the first goal has been met.
Awesome TARDIS can be found here