Bdot

This repo is no longer actively being maintained. Don't be dissapointed though, check out https://github.com/waylonflinn/bvec instead!

Bdot

Fast Dot Products on Pretty Big Data

Bdot does big dot products (by making your RAM bigger on the inside). It's based on Bcolz and includes transparent disk-based storage.

Supports matrix . vector and matrix . matrix for most common numpy numeric data types (numpy.int64, numpy.int32, numpy.float64, numpy.float32)

Install

pip install bdot

or build from source (requires bcolz >= 0.9.0)

python setup.py build_ext --inplace
python setup.py install

Usage

Matrix . Vector

Multiply a matrix (carray) with a vector (numpy.ndarray), returns a vector (numpy.ndarray)

import bdot
import numpy as np

matrix = np.random.random_integers(0, 12000, size=(300000, 100))
bcarray = bdot.carray(matrix, chunklen=2**13, cparams=bdot.cparams(clevel=2))

v = bcarray[0]

result = bcarray.dot(v)
expected = matrix.dot(v)

# should return True
(expected == result).all()

Matrix . Matrix

Multiply a matrix (carray) with the transpose of a matrix (carray), returns a matrix (carray)

import bdot
import numpy as np

matrix = np.random.random_integers(0, 120, size=(1000, 100))
bcarray1 = bdot.carray(matrix, chunklen=2**9, cparams=bdot.cparams(clevel=2))
bcarray2 = bdot.carray(matrix, chunklen=2**9, cparams=bdot.cparams(clevel=2))

# calculates bcarray1 . bcarray2.T (transpose)
result = bcarray1.dot(bcarray2)
expected = matrix.dot(matrix.T)

# should return True
(expected == result).all()

Save Result to Disk (Experimental)

Save really big results directly to disk

# create correctly sized container (helper method, not required)
output = bcarray1.empty_like_dot(bcarray2, rootdir='/path/to/bcolz/output')

# generate results directly on disk
bcarray1.dot(bcarray2, out=output)

# make sure the last bits get written
output.flush()

The out parameter can also be used to get carray output with an ndarray vector input. If you don't want disk based storage, just leave out the rootdir parameter. You can also use your own carray container, as long as it's the correct shape.

Test

nosetests bdot

Simple Benchmarks

Benchmarks were done on data structures generated by the above code, are very informal, and vary a bit across data sets.

Space

numpy ~229MB
bdot ~64MB

compression ratio: 3.5

Time

numpy ~33 ms
bdot ~48 ms

percent performance: 68%

Goals

This project has three goals, each slightly more fantastic than the last:

Allow computation on (compressed) data which is (~5-10x) larger than RAM at approximately the same speed as numpy.dot
Allow computation on (slightly compressed) data at speeds that improve on numpy.dot
Allow computation on (compressed) data which resides on disk at some sizable percentage (~50-30%) of the speed of numpy.dot

So far, the first goal has been met.

Acknowledgements

This library wouldn't be possible without all the talented people who worked hard to create Bcolz (and the libraries on which it's based). Initial code was also heavily influenced by Bquery.

Awesome TARDIS can be found here

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
bdot		bdot
benchmarks		benchmarks
performance		performance
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
bdot.png		bdot.png
deploy.txt		deploy.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bdot

Fast Dot Products on Pretty Big Data

Install

Usage

Matrix . Vector

Matrix . Matrix

Save Result to Disk (Experimental)

Test

Simple Benchmarks

Space

Time

Goals

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

tailwind/bdot

Folders and files

Latest commit

History

Repository files navigation

Bdot

Fast Dot Products on Pretty Big Data

Install

Usage

Matrix . Vector

Matrix . Matrix

Save Result to Disk (Experimental)

Test

Simple Benchmarks

Space

Time

Goals

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages