Can tfdf work with a streaming tf dataset? #10

sibyjackgrove · 2021-05-25T18:58:15Z

My training data is in a multi GB CSV file. I have built a data pipeline using tf.data to stream this data and do some pre-processing,. Can I use these dataset objects in tfdf model.fit (similar to how it is done in Keras) or does tfdf need the dataset to have all the data stored in memory?

achoum · 2021-05-27T13:23:45Z

Currently, all the dataset needs to fit in memory.

You can (and it is a good idea in this case) to feed the dataset as a stream using a tf.dataset. See the dataset section of the migration guide for more details. However, the memory consumption will still be ~4bytes per values + index.

See my comments on this issues for some details on how to optimize the ram consumption.

achoum closed this as completed May 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can tfdf work with a streaming tf dataset? #10

Can tfdf work with a streaming tf dataset? #10

sibyjackgrove commented May 25, 2021

achoum commented May 27, 2021

Can tfdf work with a streaming tf dataset? #10

Can tfdf work with a streaming tf dataset? #10

Comments

sibyjackgrove commented May 25, 2021

achoum commented May 27, 2021