-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize boolean arrays more space-efficiently #428
Conversation
`numpy.save()` saves these as uint8 arrays, which wastes 7/8 bits
@@ -197,7 +197,18 @@ def serialize_numpy_array(array): | |||
# We currently serialize in numpy format. Other alternatives considered | |||
# were `pickle.dumps(array)` and HDF5 | |||
# | |||
header = None | |||
if array.dtype == bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about compressing the serialized bytes using something off-the-shelf?
The concern with introducing custom code here is that it will add extra work to any client who wants to support deserializing these masks for use in their code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll investigate zlib or something similar as well; this is guaranteed to achieve 8x reduction for arbitrary bit arrays, though (I don't think zlib can achieve that, and introducing zlib on the client side is considerably more complex, while this has already been implemented there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that will work, but we'll need to investigate client-side performance. From initial tests, compressing a single large (1920x1080) mask takes around 150ms - decompressing may be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get the system working and then think about implications of CPU vs bandwidth vs interface complexity
numpy.save()
saves these as uint8 arrays, which wastes 7/8 bits