Skip to content

v4.7.0

Compare
Choose a tag to compare
@marcenacp marcenacp released this 05 Oct 10:23
· 1150 commits to master since this release
f00f1e3

Added

  • [API] Added TfDataBuilder that is handy for storing experimental ad hoc TFDS datasets in notebook-like environments such that they can be versioned, described, and easily shared with teammates.
  • [API] Added options to create format-specific dataset builders. The new API now includes a number of NLP-specific builders, such as:
  • [API] Added tfds.beam.inc_counter to reduce beam.metrics.Metrics.counter boilerplate
  • [API] Added options to group together existing TFDS datasets into dataset collections and to perform simple operations over them.
  • [Documentation] update, specifically:
    • New guide on format-specific dataset builders;
    • New guide on adding new dataset collections to TFDS;
    • Updated TFDS CLI documentation.
  • [TFDS CLI] Supports custom config through Json (e.g. tfds build my_dataset --config='{"name": "my_custom_config", "description": "Abc"}')
  • New datasets:
  • Updated datasets:
    • C4 was updated to version 3.1.
    • common_voice was updated to a more recent snapshot.
    • wikipedia was updated with the 20220620 snapshot.
  • New dataset collections, such as xtreme and LongT5

Changed

  • The base Logger class expects more information to be passed to the as_dataset method. This should only be relevant to people who have implemented and registered custom Logger class(es).
  • You can set DEFAULT_BUILDER_CONFIG_NAME in a DatasetBuilder to change the default config if it shouldn't be the first builder config defined in BUILDER_CONFIGS.

Deprecated

Removed

Fixed

  • Various datasets
  • In Linux, when loading a dataset from a directory that is not your home (~) directory, a new ~ directory is not created in the current directory (fixes #4117).

Security