Skip to content

v4.2.0

Compare
Choose a tag to compare
@Conchylicultor Conchylicultor released this 06 Jan 15:41

API:

  • Add tfds build to the CLI. See documentation.
  • DownloadManager now returns Pathlib-like objects
  • Datasets returned by tfds.as_numpy are compatible with len(ds)
  • New tfds.features.Dataset to represent nested datasets
  • Add tfds.ReadConfig(add_tfds_id=True) to add a unique id to the example ex['tfds_id'] (e.g. b'train.tfrecord-00012-of-01024__123')
  • Add num_parallel_calls option to tfds.ReadConfig to overwrite to default AUTOTUNE option
  • tfds.ImageFolder now support tfds.decode.SkipDecoder
  • Add multichannel audio support to tfds.features.Audio
  • Better tfds.as_dataframe visualization (ffmpeg video if installed, bounding boxes,...)
  • Add try_gcs to tfds.builder(..., try_gcs=True)
  • Simpler BuilderConfig definition: class VERSION and RELEASE_NOTES are applied to all BuilderConfig. Config description is now optional.

Breaking compatibility changes:

  • Removed configs for all text datasets. Only plain text version is kept. For example: multi_nli/plain_text -> multi_nli.
  • To guarantee better deterministic, new validations are performed on the keys when creating a dataset (to avoid filenames as keys (non-deterministic) and restrict key to str, bytes and int). New errors likely indicates an issue in the dataset implementation.
  • tfds.core.benchmark now returns a pd.DataFrame (instead of a dict)
  • tfds.units is not visible anymore from the public API

Bug fixes:

  • Support 0-len sequence with images of dynamic shape (Fix #2616)
  • Progression bar correctly updated when copying files.
  • Many bug fixes (GPath consistency with pathlib, s3 compatibility, TQDM visual artifacts, GCS crash on windows, re-download when checksums updated,...)
  • Better debugging and error message (e.g. human readable size,...)
  • Allow max_examples_per_splits=0 in tfds build --max_examples_per_splits=0 to test _split_generators only (without _generate_examples).

And of course, many new datasets and datasets updates.

Thank you the community for their many valuable contributions and to supporting us in this project!!!