-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support NumPy for tensorflow-io #68
Comments
@yongtang Arrow supports reading numpy arrays to record batches. So I don't think it would be much effort to add this, but there would be a limit on dimensionality - for now at least. Hopefully that will change in the future. |
@BryanCutler I think there are two issues, one is the conversion between Apache Arrow and Numpy in memory, another is to read data from |
Actually the conversion from Numpy to Arrow is zero copy, so it wouldn't consume any more memory, but you're right, Arrow doesn't support reading these files. If you're able to read them directly in the op that would be cool! |
Performant solutions using tf.data and disk are quite painful right now, so this would be a welcome addition. I just want to mention that it's particularly useful if you support a case where data looks like:
|
@areeh PR #407 has been opened which should support your case. The PR allows to read a local numpy through the same address space. It may have limitations but if your process is local then performance might be good for large numpy array (as there is no serialization overhead before hand). The dict/tuple of the features has been added as well. |
@yongtang The PR looks great, it supports everything I had in mind when I wrote the comment. Thank you |
@yongtang can this be closed? |
@kvignesh1420 Ah yes thanks for the reminder 👍 |
In TensorFlow's guide of "Importing Data":
https://www.tensorflow.org/guide/datasets
It is possible to reading input data directly from TFRecord (
TFRecordDataset
), text (TextLineDataset
), csv (CsvDataset
) but not with NumPy. Reading input from NumPy still have to use a not so elegant way in the example code of the TensorFlow Guide:It should be possible to implement NumPy support so that reading input from numpy could be done in a similar fashion as other input format. This potentially could also improve the performance as it may not be needed to read everything into the memory immediately (remotely related: tensorflow/tensorflow#16933).
The text was updated successfully, but these errors were encountered: