Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can tfdf work with a streaming tf dataset? #10

Closed
sibyjackgrove opened this issue May 25, 2021 · 1 comment
Closed

Can tfdf work with a streaming tf dataset? #10

sibyjackgrove opened this issue May 25, 2021 · 1 comment

Comments

@sibyjackgrove
Copy link

My training data is in a multi GB CSV file. I have built a data pipeline using tf.data to stream this data and do some pre-processing,. Can I use these dataset objects in tfdf model.fit (similar to how it is done in Keras) or does tfdf need the dataset to have all the data stored in memory?

@achoum
Copy link
Collaborator

achoum commented May 27, 2021

Currently, all the dataset needs to fit in memory.

You can (and it is a good idea in this case) to feed the dataset as a stream using a tf.dataset. See the dataset section of the migration guide for more details. However, the memory consumption will still be ~4bytes per values + index.

See my comments on this issues for some details on how to optimize the ram consumption.

@achoum achoum closed this as completed May 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants