Skip to content

Latest commit

 

History

History
29 lines (21 loc) · 1.12 KB

README.rst

File metadata and controls

29 lines (21 loc) · 1.12 KB

shuffle

https://travis-ci.org/tr11/shuffle.png?branch=master

Shuffle if a small C++ library with Python bindings designed to shuffle an array in a similar manner to what the Shuffle filter in the HDF5 file format does. This is particularly useful when compressing arrays of numerical values of similar magnitudes.

For example, in Python:

import shuffle
import numpy
import zlib
import gzip

vec = numpy.random.rand(1024*1024)      # 8 MB in a 64bit machine
print('Vector size:                   %d' % len(bytes(vec.data)))
print('Compressed size:               %d' % len(gzip.compress(vec.data)))
print('Compressed size with shuffle:  %d' % len(gzip.compress(shuffle.shuffle_ndarray(vec).data)))
print('Compressed difference:         %0.2f%% smaller' % (100*(1 - len(gzip.compress(shuffle.shuffle_ndarray(vec).data)) / len(gzip.compress(vec.data)))))

gives results along the lines of:

Vector size:                  8388608
Compressed size:              7910645
Compressed size with shuffle: 7355671
Compressed difference:        7.02% smaller