Shuffle if a small C++ library with Python bindings designed to shuffle an array in a similar manner to what the Shuffle filter in the HDF5 file format does. This is particularly useful when compressing arrays of numerical values of similar magnitudes.
For example, in Python:
import shuffle import numpy import zlib import gzip vec = numpy.random.rand(1024*1024) # 8 MB in a 64bit machine print('Vector size: %d' % len(bytes(vec.data))) print('Compressed size: %d' % len(gzip.compress(vec.data))) print('Compressed size with shuffle: %d' % len(gzip.compress(shuffle.shuffle_ndarray(vec).data))) print('Compressed difference: %0.2f%% smaller' % (100*(1 - len(gzip.compress(shuffle.shuffle_ndarray(vec).data)) / len(gzip.compress(vec.data)))))
gives results along the lines of:
Vector size: 8388608 Compressed size: 7910645 Compressed size with shuffle: 7355671 Compressed difference: 7.02% smaller