Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
API: * Add `tfds build` to the CLI. See [documentation](https://www.tensorflow.org/datasets/cli#tfds_build_download_and_prepare_a_dataset). * DownloadManager now returns [Pathlib-like](https://docs.python.org/3/library/pathlib.html#basic-use) objects * Datasets returned by `tfds.as_numpy` are compatible with `len(ds)` * New `tfds.features.Dataset` to represent nested datasets * Add `tfds.ReadConfig(add_tfds_id=True)` to add a unique identifiant to the example `ex['tfds_id']` (e.g. `b'train.tfrecord-00012-of-01024__123'`) * Add `num_parallel_calls` option to `tfds.ReadConfig` to overwrite to default `AUTOTUNE` option * `tfds.ImageFolder` now support `tfds.decode.SkipDecoder` * Add multichannel audio support to `tfds.features.Audio` * Better `tfds.as_dataframe` visualization (ffmpeg video if installed, bounding boxes,...) * Add `try_gcs` to `tfds.builder(..., try_gcs=True)` * Simpler `BuilderConfig` definition: global `VERSION` and `RELEASE_NOTES` are applied to all `BuilderConfig`. Config description is now optional. Breaking compatibility changes: * Removed non-plain text config of text datasets and remove config: `multi_nli/plain_text` -> `multi_nli` * To guarantee better deterministic, new validations are performed on the keys when creating a dataset (to avoid filenames as keys (non-deterministic) and restrict key to `str`, `bytes` and `int`). New errors likely indicates an issue in the dataset implementation. * `tfds.core.benchmark` now returns a `pd.DataFrame` (instead of a `dict`) * `tfds.units` is not visible anymore from the public API Bug fixes: * Support 0-len sequence with images of dynamic shape (Fix #2616) * Progression bar correctly updated when copying files. * Many bug fixes (GPath consistency with pathlib, s3 compatibility, TQDM visual artifacts, GCS crash on windows, re-download when checksums updated,...) * Better debugging and error message (e.g. human readable size,...) * Allow `max_examples_per_splits=0` in `tfds build --max_examples_per_splits=0` to test `_split_generators` only (without `_generate_examples`). And of course, new datasets and many datasets updates. Thank you the community for their many valuable contributions and to supporting us in this project!!! PiperOrigin-RevId: 350344016
- Loading branch information