-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More progress on file round-trips #23
Conversation
Need to take a break from this for a couple days -- I will complete this and get it to a stable place where Python/R bindings can be built hopefully on Friday or Saturday |
On a cross-country flight shortly and will pick this back up and see how far I can get. |
Alright, I'm able to round-trip primitive arrays (fixed-size, no BINARY/UTF8 yet) and their metadata to an in-memory buffer! |
@hadley could you review and let me know any comments on what I have working so far?
You can see the complete API that you would use in the application domain (e.g. R/Python wrapper library) in I have enough working now to start in on the Python library wrapper -- be nice to get some initial perf numbers. Crazy busy week with the Arrow launch and Spark Summit so the earliest I'll be able to resume working on this is Thursday. |
Column(ColumnType::type type, | ||
const std::shared_ptr<metadata::Column>& metadata, | ||
const PrimitiveArray& values, | ||
const std::shared_ptr<Buffer>& buffer) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm contemplating making buffer
a member variable of PrimitiveArray
I think I missing a rough outline of how you imagine the different buffer subclasses fitting together. |
I can add better code documentation, but basically we have two different circumstances in the library right now:
The intent with the Buffer class is to create a uniform interface to a data pointer and its size, but allowing for attaching RAII memory deallocation (e.g. case 1) to a subclass when the buffer is destructed (e.g. the |
I'm gonna shoot to have a working Python wrapper by early next week. Will post perf numbers (read/write throughput) when I have them. I'm getting questions from people I've spoken with privately about this project about how Feather performance will compare with using Parquet (+ R/Python metadata) as the on-disk serialization format (once apache/parquet-cpp is complete, of course), so it will be nice to put a number on it ("XXX megabytes/second") |
Sounds good - I'll have more brain space to think about this when I get back from Australia, and I'll put some time into making an R wrapper too. I can also ask JJ for a code review once you think it's ready |
Perfect, that sounds great. |
Merging this. I'll see if I can get this working with pandas this weekend. |
I'm just testing to data in-memory but we can add an on-disk output stream and test that round-trip after enough things work the in-memory stream (easier).