Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More progress on file round-trips #23

Closed
wants to merge 11 commits into from
Closed

Conversation

wesm
Copy link
Owner

@wesm wesm commented Feb 10, 2016

I'm just testing to data in-memory but we can add an on-disk output stream and test that round-trip after enough things work the in-memory stream (easier).

@wesm
Copy link
Owner Author

wesm commented Feb 10, 2016

Need to take a break from this for a couple days -- I will complete this and get it to a stable place where Python/R bindings can be built hopefully on Friday or Saturday

@wesm
Copy link
Owner Author

wesm commented Feb 13, 2016

On a cross-country flight shortly and will pick this back up and see how far I can get.

@wesm
Copy link
Owner Author

wesm commented Feb 15, 2016

Alright, I'm able to round-trip primitive arrays (fixed-size, no BINARY/UTF8 yet) and their metadata to an in-memory buffer!

@wesm
Copy link
Owner Author

wesm commented Feb 16, 2016

@hadley could you review and let me know any comments on what I have working so far?

  • fixed-byte-width primitive arrays (e.g. numbers)
  • variable-length byte arrays (e.g. UTF8 and BINARY)

You can see the complete API that you would use in the application domain (e.g. R/Python wrapper library) in feather/writer-test.cc

I have enough working now to start in on the Python library wrapper -- be nice to get some initial perf numbers. Crazy busy week with the Arrow launch and Spark Summit so the earliest I'll be able to resume working on this is Thursday.

Column(ColumnType::type type,
const std::shared_ptr<metadata::Column>& metadata,
const PrimitiveArray& values,
const std::shared_ptr<Buffer>& buffer) :
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm contemplating making buffer a member variable of PrimitiveArray

@hadley
Copy link
Collaborator

hadley commented Feb 16, 2016

I think I missing a rough outline of how you imagine the different buffer subclasses fitting together.

@wesm
Copy link
Owner Author

wesm commented Feb 16, 2016

I can add better code documentation, but basically we have two different circumstances in the library right now:

  1. Data is being buffered from a file into newly-allocated memory (e.g. using fread)
  2. Data is already in-memory (or memory-mapped)

The intent with the Buffer class is to create a uniform interface to a data pointer and its size, but allowing for attaching RAII memory deallocation (e.g. case 1) to a subclass when the buffer is destructed (e.g. the OwnedMutableBuffer).

@wesm
Copy link
Owner Author

wesm commented Feb 18, 2016

I'm gonna shoot to have a working Python wrapper by early next week. Will post perf numbers (read/write throughput) when I have them. I'm getting questions from people I've spoken with privately about this project about how Feather performance will compare with using Parquet (+ R/Python metadata) as the on-disk serialization format (once apache/parquet-cpp is complete, of course), so it will be nice to put a number on it ("XXX megabytes/second")

@hadley
Copy link
Collaborator

hadley commented Feb 18, 2016

Sounds good - I'll have more brain space to think about this when I get back from Australia, and I'll put some time into making an R wrapper too. I can also ask JJ for a code review once you think it's ready

@wesm
Copy link
Owner Author

wesm commented Feb 18, 2016

Perfect, that sounds great.

@wesm
Copy link
Owner Author

wesm commented Feb 20, 2016

Merging this. I'll see if I can get this working with pandas this weekend.

@wesm wesm closed this in eaa1406 Feb 20, 2016
@wesm wesm deleted the basic-complete-read-write branch February 20, 2016 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants