Add Support for Apache Arrow in TensorFlow Dataset #23001
Labels
comp:ops
OPs related issues
stat:awaiting response
Status - Awaiting response from author
type:feature
Feature requests
Apache Arrow is a standard format for in-memory columnar data. It provides a cross-language platform for systems to communicate and operate on data efficiently.
Adding Arrow support in TensorFlow Dataset will allow systems to interface with TensorFlow in a well defined way, without the need to develop custom converters, serialize data, or write to specialized files.
It would be straightforward to add a base layer of Arrow support that works on Arrow record batches (a common struct for Arrow IPC) and extend that layer to support different kinds of Arrow Ops:
A slightly more involved Op could use Arrow Flight - Arrow-based messaging over gRPC. Additionally, it would possible to define Ops to connect directly to other systems that can export Arrow data.
System information
The text was updated successfully, but these errors were encountered: