Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invite Go to the party? #94

Closed
kokes opened this issue Apr 3, 2016 · 13 comments
Closed

Invite Go to the party? #94

kokes opened this issue Apr 3, 2016 · 13 comments

Comments

@kokes
Copy link

kokes commented Apr 3, 2016

In the spirit of #15 and #17, an API for Go would be quite good.

I've taken a quick stab at it over here. It supports just strings and numbers as of now and is just read only. So far the only hiccup was the Flatbuffers schema, where the type property clashed with a Go keyword, so you have to edit the generated code (only once though).

I'll take a closer look at the null bitarrays and categoricals/date/time next week. Any contributions on that (or tests, benchmarks or anything else) are obviously welcome.

O.

@wesm
Copy link
Owner

wesm commented Apr 3, 2016

Do you want to contribute these as patches to this repo? We can definitely make changes to the flatbuffers schema to suit other PLs. That's part of why we are not guaranteeing stability of the binary format at this point

@kokes
Copy link
Author

kokes commented Apr 3, 2016

I can contribute it to this repo if you guys want, no problem, I'll just wait until it's a viable library, at this point it's just a bunch of untested functions. I'll also want some Gophers to chime in on the API, for it to be in Go's spirit.

I haven't seen the C++ implementation, but I presume libraries in low level languages should provide access directly without knowing the type (meaning eg let abc = read_column('abc')) as well as unmarshalling into existing arrays/structs when schema is known ahead of compilation. I've started the former, I'll think about the latter, will probably mimic the JSON implementation in core Go.

As for the flatbuffers glitch - it's actually an open issue in the library, some maybe if it's resolved higher up the foodchain, it would make more sense. On the other hand, adopting dtype in place of type might prevent problems like these in other language implementations.

@wesm
Copy link
Owner

wesm commented Apr 3, 2016

I'm happy to create a Go integration branch for you. While the project is developing, having a bunch of implementations all over the place will make it harder to collaborate. I had hoped the Julia folks would be able to do this also (see JuliaData/Feather.jl#1) but they have some design limitation in their package manager that makes putting a julia/ directory in this repo not work right for them.

@kokes
Copy link
Author

kokes commented Apr 4, 2016

I've read through the Julia issue, the dilemma will be quite similar in Go (and Rust and other C-interfaceable languages). If a C99 implementation emerged, it would certainly make sense to wrap it in Go and compile using cgo, which is available for just these purposes. The advantage of a consistent implementation would probably be nice, especially in a language like Go, where the number of potential users/developers is probably quite low. Yes, wrapping C in Go leads to problems in tooling, loss of clean cross compilation, garbage collection funsies etc., but I guess it might be worth it, especially if the format evolves.

But then again, the feather file format is fairly simple, so a a pure, unwrapped implementation might end up being not much longer in SLOC terms. And with a robust test suite common across implementations, we might be good here as well.

I'll think about it, both approaches make sense in a way. (There's no C++ interoperability, so the number of options is down to these two.)

As for the code hosting - there are no issues with the code residing in your repository in terms of packaging. The Go tools interact with git repositories directly and are happy to just load stuff from a folder within a repository, no metadata or further structure necessary.

@wesm
Copy link
Owner

wesm commented Apr 4, 2016

I started working on a C99 wrapper, so I'll get a working patch up in the next day or so

@wesm
Copy link
Owner

wesm commented Apr 4, 2016

see #96

@dmbates
Copy link
Contributor

dmbates commented Apr 4, 2016

Thank you, @wesm, for creating the C API. I expect this will be fine for my purposes. I hope to have a Julia interface by tomorrow.

@dmbates
Copy link
Contributor

dmbates commented Apr 4, 2016

I haven't looked at the feather code itself but my version of libfeather.so provides feather_reader_num_rows but not feather_reader_num_columns.

@dmbates
Copy link
Contributor

dmbates commented Apr 4, 2016

@wesm It does seem that feather_reader_num_columns was skipped in feather-c.cc. I can submit a PR if you wish.

@wesm
Copy link
Owner

wesm commented Apr 4, 2016

Sure, sorry about the oversight. I just merged #96 so feel free to add more C API wrappers as needed

@kokes
Copy link
Author

kokes commented Apr 8, 2016

The pure Go implementation works fine for primitive arrays of bools, numbers, strings and binary. Any metadata-based series are not supported (should I go to the Arrow spec to get their layout? Last time I checked it wasn't here.) Null bitmaps are supported as well, they're now fully parsed to bool arrays, I may switch to lazy evaluation, which would be more memory efficient and faster (at the point of parsing).

@wesm
Copy link
Owner

wesm commented Apr 9, 2016

Cool. Feel free to submit a patch here (still working on finding a longer-term home for the repo than wesm/feather, though maybe it's okay) since we can set up integration tests, for example.

@wesm
Copy link
Owner

wesm commented May 24, 2017

It would be really great to have a native Go implementation of Apache Arrow. We already have some Go examples using GObject Introspection (which uses the C GLib bindings): https://github.com/apache/arrow/tree/master/c_glib/example/go

@wesm wesm closed this as completed May 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants