Allow datasets to provide the number of examples they contain #36531
Labels
comp:data
tf.data related issues
stat:awaiting tensorflower
Status - Awaiting response from tensorflower
TF 2.1
for tracking issues in 2.1 release
type:feature
Feature requests
System information
Describe the feature and the current behavior/state.
Currently there is no good way to get to the number of samples or batches contained by a dataset although the information is usually available.
What you can do:
sum(1 for _ in dataset)
but this might not do what one wants:When the dataset is batched it will return the number of batches including the trailing one.
MultiWorkerMirroredStrategy
can't handle that.Usually this information is already available, see e.g. tensorflow/datasets#1403
Will this change the current api? How?
Add a member
num_examples
and/or an overload for__len__
Who will benefit with this feature?
MultiWorkerMirroredStrategy
steps_per_epoch
10/Unknown
Any Other info.
There is an experimental op
cardinality
which might be very related. However it often (always?) returns "Unknown". Tested with MNIST from TFDS.The text was updated successfully, but these errors were encountered: