Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Helper function to concatenate many hdf5 files. Tested against hundreds of thousands of files.
I could imagine using this when a user globs with a
.open
where vaex can call this to concat the files (maybe make theos.remove
optional), and create the final file for vaex.TODO: You'll see in the code that I handle string columns less than idea. I know that vaex creates a
data
andindices
group for string columns. I was able to recreate and append to that successfully, but was unable to get vaex to properly read it. I believe that is because vaex cannot mmap string columns from chunked hdf5 files, but that may be incorrect (just my best guess reading the source code).So currently the columns would come back as byte arrays, and would need to be casted like so
i'm sure we can figure out a better solution here.
CC @maartenbreddels