Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: reliable export hdf5 with small chunks size #2280

Merged
merged 2 commits into from
Dec 2, 2022

Conversation

JovanVeljanoski
Copy link
Member

@JovanVeljanoski JovanVeljanoski commented Nov 24, 2022

This fixes some issues we've encountered with exporting some type of data to hdf5. The unit-tests have been scaled the reproduce the issues via minimal data, but the same issues are encountered with the default_chunk_size on larger datasets.

I suspect this is due to a combination of chunk_size and amount of missing/masked values in a particular chunk.
I do not know the exact origin of the errors at this point, so the tests have rather generic names.

Each of the unit-tests raises a different error, which is why I created separate tests rather than one in which the amount of data is varied.

Exporting the same data under the same conditions (i.e. chunk_size) to arrow or parquet format works just fine.

Checklist:

  • make unit-tests
  • make tests pass
    • (optional) rename tests with more informative/relevant names
  • party

@maartenbreddels maartenbreddels merged commit 2a01c10 into master Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants