Skip to content

Commit

Permalink
Update TFDS to 4.2.0
Browse files Browse the repository at this point in the history
API:

 * Add `tfds build` to the CLI. See [documentation](https://www.tensorflow.org/datasets/cli#tfds_build_download_and_prepare_a_dataset).
 * DownloadManager now returns [Pathlib-like](https://docs.python.org/3/library/pathlib.html#basic-use) objects
 * Datasets returned by `tfds.as_numpy` are compatible with `len(ds)`
 * New `tfds.features.Dataset` to represent nested datasets
 * Add `tfds.ReadConfig(add_tfds_id=True)` to add a unique identifiant to the example `ex['tfds_id']` (e.g. `b'train.tfrecord-00012-of-01024__123'`)
 * Add `num_parallel_calls` option to `tfds.ReadConfig` to overwrite to default `AUTOTUNE` option
 * `tfds.ImageFolder` now support `tfds.decode.SkipDecoder`
 * Add multichannel audio support to `tfds.features.Audio`
 * Better `tfds.as_dataframe` visualization (ffmpeg video if installed, bounding boxes,...)
 * Add `try_gcs` to `tfds.builder(..., try_gcs=True)`
 * Simpler `BuilderConfig` definition: global `VERSION` and `RELEASE_NOTES` are applied to all `BuilderConfig`. Config description is now optional.

Breaking compatibility changes:

* Removed non-plain text config of text datasets and remove config: `multi_nli/plain_text` -> `multi_nli`
* To guarantee better deterministic, new validations are performed on the keys when creating a dataset (to avoid filenames as keys (non-deterministic) and restrict key to `str`, `bytes` and `int`). New errors likely indicates an issue in the dataset implementation.
* `tfds.core.benchmark` now returns a `pd.DataFrame` (instead of a `dict`)
* `tfds.units` is not visible anymore from the public API

Bug fixes:

* Support 0-len sequence with images of dynamic shape (Fix #2616)
* Progression bar correctly updated when copying files.
* Many bug fixes (GPath consistency with pathlib, s3 compatibility, TQDM visual artifacts, GCS crash on windows, re-download when checksums updated,...)
* Better debugging and error message (e.g. human readable size,...)
* Allow `max_examples_per_splits=0` in `tfds build --max_examples_per_splits=0` to test `_split_generators` only (without `_generate_examples`).

And of course, new datasets and many datasets updates.

Thank you the community for their many valuable contributions and to supporting us in this project!!!

PiperOrigin-RevId: 350344016
  • Loading branch information
Conchylicultor authored and Copybara-Service committed Jan 6, 2021
1 parent 7a40c59 commit ccb1bbc
Show file tree
Hide file tree
Showing 3 changed files with 461 additions and 58 deletions.

0 comments on commit ccb1bbc

Please sign in to comment.