Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize boolean arrays more space-efficiently #428

Closed
wants to merge 1 commit into from

Conversation

lethosor
Copy link
Contributor

@lethosor lethosor commented Feb 11, 2020

numpy.save() saves these as uint8 arrays, which wastes 7/8 bits

  • Followup to Switch to numpy for (de)serializing numpy arrays #426
  • Open to changes that would improve readability of this
  • Taking base64 encoding into account, this seems to result in space savings of between 3x and 4x for a 33x33 array (savings for larger arrays will be more significant)

`numpy.save()` saves these as uint8 arrays, which wastes 7/8 bits
@lethosor lethosor added the enhancement Code enhancement label Feb 11, 2020
@lethosor lethosor requested review from brimoor and a team February 11, 2020 17:35
@lethosor lethosor self-assigned this Feb 11, 2020
lethosor pushed a commit to voxel51/player51 that referenced this pull request Feb 11, 2020
@@ -197,7 +197,18 @@ def serialize_numpy_array(array):
# We currently serialize in numpy format. Other alternatives considered
# were `pickle.dumps(array)` and HDF5
#
header = None
if array.dtype == bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about compressing the serialized bytes using something off-the-shelf?

The concern with introducing custom code here is that it will add extra work to any client who wants to support deserializing these masks for use in their code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll investigate zlib or something similar as well; this is guaranteed to achieve 8x reduction for arbitrary bit arrays, though (I don't think zlib can achieve that, and introducing zlib on the client side is considerably more complex, while this has already been implemented there)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that will work, but we'll need to investigate client-side performance. From initial tests, compressing a single large (1920x1080) mask takes around 150ms - decompressing may be faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get the system working and then think about implications of CPU vs bandwidth vs interface complexity

@lethosor lethosor closed this Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Code enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants