Skip to content

Commit

Permalink
Rename dataset_as_numpy as_numpy and update docs
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 232039691
  • Loading branch information
Ryan Sepassi authored and Copybara-Service committed Feb 1, 2019
1 parent da219a0 commit 78bb6ee
Show file tree
Hide file tree
Showing 33 changed files with 594 additions and 110 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,27 +99,27 @@ print(info)
)
```
### NumPy Usage with `tfds.dataset_as_numpy`
### NumPy Usage with `tfds.as_numpy`
As a convenience for users that want simple NumPy arrays in their programs, you
can use `tfds.dataset_as_numpy` to return a generator that yields NumPy array
can use `tfds.as_numpy` to return a generator that yields NumPy array
records out of a `tf.data.Dataset`. This allows you to build high-performance
input pipelines with `tf.data` but use whatever you'd like for your model
components.
```python
train_ds = tfds.load("mnist", split=tfds.Split.TRAIN)
train_ds = train_ds.shuffle(1024).batch(128).repeat(5).prefetch(10)
for example in tfds.dataset_as_numpy(train_ds):
for example in tfds.as_numpy(train_ds):
numpy_images, numpy_labels = example["image"], example["label"]
```
You can also use `tfds.dataset_as_numpy` in conjunction with `batch_size=-1` to
You can also use `tfds.as_numpy` in conjunction with `batch_size=-1` to
get the full dataset in NumPy arrays from the returned `tf.Tensor` object:
```python
train_data = tfds.load("mnist", split=tfds.Split.TRAIN, batch_size=-1)
numpy_data = tfds.dataset_as_numpy(train_data)
numpy_data = tfds.as_numpy(train_data)
numpy_images, numpy_labels = numpy_dataset["image"], numpy_dataset["label"]
```
Expand Down
4 changes: 4 additions & 0 deletions docs/api_docs/python/_toc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ toc:
section:
- title: Overview
path: /datasets/api_docs/python/tfds
- title: as_numpy
path: /datasets/api_docs/python/tfds/as_numpy
- title: builder
path: /datasets/api_docs/python/tfds/builder
- title: dataset_as_numpy
Expand Down Expand Up @@ -34,6 +36,8 @@ toc:
path: /datasets/api_docs/python/tfds/core/lazy_imports
- title: NamedSplit
path: /datasets/api_docs/python/tfds/core/NamedSplit
- title: SplitBase
path: /datasets/api_docs/python/tfds/core/SplitBase
- title: SplitDict
path: /datasets/api_docs/python/tfds/core/SplitDict
- title: SplitGenerator
Expand Down
2 changes: 2 additions & 0 deletions docs/api_docs/python/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@
* <a href="./tfds.md"><code>tfds</code></a>
* <a href="./tfds/download/GenerateMode.md"><code>tfds.GenerateMode</code></a>
* <a href="./tfds/Split.md"><code>tfds.Split</code></a>
* <a href="./tfds/as_numpy.md"><code>tfds.as_numpy</code></a>
* <a href="./tfds/builder.md"><code>tfds.builder</code></a>
* <a href="./tfds/core.md"><code>tfds.core</code></a>
* <a href="./tfds/core/BuilderConfig.md"><code>tfds.core.BuilderConfig</code></a>
* <a href="./tfds/core/DatasetBuilder.md"><code>tfds.core.DatasetBuilder</code></a>
* <a href="./tfds/core/DatasetInfo.md"><code>tfds.core.DatasetInfo</code></a>
* <a href="./tfds/core/GeneratorBasedBuilder.md"><code>tfds.core.GeneratorBasedBuilder</code></a>
* <a href="./tfds/core/NamedSplit.md"><code>tfds.core.NamedSplit</code></a>
* <a href="./tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>
* <a href="./tfds/core/SplitDict.md"><code>tfds.core.SplitDict</code></a>
* <a href="./tfds/core/SplitGenerator.md"><code>tfds.core.SplitGenerator</code></a>
* <a href="./tfds/core/SplitInfo.md"><code>tfds.core.SplitInfo</code></a>
Expand Down
4 changes: 3 additions & 1 deletion docs/api_docs/python/tfds.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,9 @@ Documentation:

## Functions

[`dataset_as_numpy(...)`](./tfds/dataset_as_numpy.md): Converts a `tf.data.Dataset` to an iterable of NumPy arrays.
[`as_numpy(...)`](./tfds/as_numpy.md): Converts a `tf.data.Dataset` to an iterable of NumPy arrays.

[`dataset_as_numpy(...)`](./tfds/dataset_as_numpy.md): DEPRECATED. Renamed <a href="./tfds/as_numpy.md"><code>tfds.as_numpy</code></a>.

[`builder(...)`](./tfds/builder.md): Fetches a <a href="./tfds/core/DatasetBuilder.md"><code>tfds.core.DatasetBuilder</code></a> by string name.

Expand Down
2 changes: 2 additions & 0 deletions docs/api_docs/python/tfds/Split.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ stages of training and evaluation.
* `ALL`: Special value corresponding to all existing splits of a dataset
merged together

Note: All splits, including compositions inherit from <a href="../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>

See the
[guide on splits](https://github.com/tensorflow/datasets/tree/master/docs/splits.md)
for more information.
Expand Down
57 changes: 47 additions & 10 deletions docs/api_docs/python/tfds/_api_cache.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"current_doc_full_name": "tfds.core.DatasetBuilder.IN_DEVELOPMENT",
"current_doc_full_name": "tfds.features.text.TextEncoder.__hash__",
"duplicate_of": {
"tfds.GenerateMode": "tfds.download.GenerateMode",
"tfds.GenerateMode.FORCE_REDOWNLOAD": "tfds.download.GenerateMode.FORCE_REDOWNLOAD",
Expand All @@ -9,7 +9,7 @@
"tfds.Split.__format__": "tfds.core.BuilderConfig.__format__",
"tfds.Split.__getattribute__": "tfds.core.BuilderConfig.__getattribute__",
"tfds.Split.__hash__": "tfds.core.BuilderConfig.__hash__",
"tfds.Split.__init__": "tfds.core.Version.__init__",
"tfds.Split.__init__": "tfds.core.SplitBase.__init__",
"tfds.Split.__reduce__": "tfds.core.BuilderConfig.__reduce__",
"tfds.Split.__reduce_ex__": "tfds.core.BuilderConfig.__reduce_ex__",
"tfds.Split.__repr__": "tfds.core.DatasetBuilder.__repr__",
Expand Down Expand Up @@ -58,6 +58,18 @@
"tfds.core.NamedSplit.__reduce_ex__": "tfds.core.BuilderConfig.__reduce_ex__",
"tfds.core.NamedSplit.__setattr__": "tfds.core.BuilderConfig.__setattr__",
"tfds.core.NamedSplit.__sizeof__": "tfds.core.BuilderConfig.__sizeof__",
"tfds.core.NamedSplit.__weakref__": "tfds.core.SplitBase.__weakref__",
"tfds.core.SplitBase.__delattr__": "tfds.core.BuilderConfig.__delattr__",
"tfds.core.SplitBase.__format__": "tfds.core.BuilderConfig.__format__",
"tfds.core.SplitBase.__getattribute__": "tfds.core.BuilderConfig.__getattribute__",
"tfds.core.SplitBase.__hash__": "tfds.core.BuilderConfig.__hash__",
"tfds.core.SplitBase.__new__": "tfds.core.BuilderConfig.__new__",
"tfds.core.SplitBase.__reduce__": "tfds.core.BuilderConfig.__reduce__",
"tfds.core.SplitBase.__reduce_ex__": "tfds.core.BuilderConfig.__reduce_ex__",
"tfds.core.SplitBase.__repr__": "tfds.core.DatasetBuilder.__repr__",
"tfds.core.SplitBase.__setattr__": "tfds.core.BuilderConfig.__setattr__",
"tfds.core.SplitBase.__sizeof__": "tfds.core.BuilderConfig.__sizeof__",
"tfds.core.SplitBase.__str__": "tfds.core.BuilderConfig.__str__",
"tfds.core.SplitDict.__delattr__": "tfds.core.BuilderConfig.__delattr__",
"tfds.core.SplitDict.__format__": "tfds.core.BuilderConfig.__format__",
"tfds.core.SplitDict.__reduce__": "tfds.core.BuilderConfig.__reduce__",
Expand Down Expand Up @@ -87,6 +99,7 @@
"tfds.core.SplitInfo.__str__": "tfds.core.BuilderConfig.__str__",
"tfds.core.Version.__delattr__": "tfds.core.BuilderConfig.__delattr__",
"tfds.core.Version.__format__": "tfds.core.BuilderConfig.__format__",
"tfds.core.Version.__init__": "tfds.core.SplitBase.__init__",
"tfds.core.Version.__reduce__": "tfds.core.BuilderConfig.__reduce__",
"tfds.core.Version.__reduce_ex__": "tfds.core.BuilderConfig.__reduce_ex__",
"tfds.core.Version.__setattr__": "tfds.core.BuilderConfig.__setattr__",
Expand Down Expand Up @@ -150,7 +163,7 @@
"tfds.features.BBox.__getslice__": "tfds.core.Version.__getslice__",
"tfds.features.BBox.__gt__": "tfds.core.Version.__gt__",
"tfds.features.BBox.__hash__": "tfds.core.Version.__hash__",
"tfds.features.BBox.__init__": "tfds.core.Version.__init__",
"tfds.features.BBox.__init__": "tfds.core.SplitBase.__init__",
"tfds.features.BBox.__iter__": "tfds.core.Version.__iter__",
"tfds.features.BBox.__le__": "tfds.core.Version.__le__",
"tfds.features.BBox.__len__": "tfds.core.Version.__len__",
Expand Down Expand Up @@ -199,7 +212,7 @@
"tfds.features.FeatureConnector.__format__": "tfds.core.BuilderConfig.__format__",
"tfds.features.FeatureConnector.__getattribute__": "tfds.core.BuilderConfig.__getattribute__",
"tfds.features.FeatureConnector.__hash__": "tfds.core.BuilderConfig.__hash__",
"tfds.features.FeatureConnector.__init__": "tfds.core.Version.__init__",
"tfds.features.FeatureConnector.__init__": "tfds.core.SplitBase.__init__",
"tfds.features.FeatureConnector.__new__": "tfds.core.BuilderConfig.__new__",
"tfds.features.FeatureConnector.__reduce__": "tfds.core.BuilderConfig.__reduce__",
"tfds.features.FeatureConnector.__reduce_ex__": "tfds.core.BuilderConfig.__reduce_ex__",
Expand Down Expand Up @@ -292,7 +305,7 @@
"tfds.features.TensorInfo.__getslice__": "tfds.core.Version.__getslice__",
"tfds.features.TensorInfo.__gt__": "tfds.core.Version.__gt__",
"tfds.features.TensorInfo.__hash__": "tfds.core.Version.__hash__",
"tfds.features.TensorInfo.__init__": "tfds.core.Version.__init__",
"tfds.features.TensorInfo.__init__": "tfds.core.SplitBase.__init__",
"tfds.features.TensorInfo.__iter__": "tfds.core.Version.__iter__",
"tfds.features.TensorInfo.__le__": "tfds.core.Version.__le__",
"tfds.features.TensorInfo.__len__": "tfds.core.Version.__len__",
Expand Down Expand Up @@ -365,7 +378,7 @@
"tfds.features.text.TextEncoder.__format__": "tfds.core.BuilderConfig.__format__",
"tfds.features.text.TextEncoder.__getattribute__": "tfds.core.BuilderConfig.__getattribute__",
"tfds.features.text.TextEncoder.__hash__": "tfds.core.BuilderConfig.__hash__",
"tfds.features.text.TextEncoder.__init__": "tfds.core.Version.__init__",
"tfds.features.text.TextEncoder.__init__": "tfds.core.SplitBase.__init__",
"tfds.features.text.TextEncoder.__new__": "tfds.core.BuilderConfig.__new__",
"tfds.features.text.TextEncoder.__reduce__": "tfds.core.BuilderConfig.__reduce__",
"tfds.features.text.TextEncoder.__reduce_ex__": "tfds.core.BuilderConfig.__reduce_ex__",
Expand Down Expand Up @@ -423,7 +436,7 @@
"tfds.file_adapter.FileFormatAdapter.__format__": "tfds.core.BuilderConfig.__format__",
"tfds.file_adapter.FileFormatAdapter.__getattribute__": "tfds.core.BuilderConfig.__getattribute__",
"tfds.file_adapter.FileFormatAdapter.__hash__": "tfds.core.BuilderConfig.__hash__",
"tfds.file_adapter.FileFormatAdapter.__init__": "tfds.core.Version.__init__",
"tfds.file_adapter.FileFormatAdapter.__init__": "tfds.core.SplitBase.__init__",
"tfds.file_adapter.FileFormatAdapter.__new__": "tfds.core.BuilderConfig.__new__",
"tfds.file_adapter.FileFormatAdapter.__reduce__": "tfds.core.BuilderConfig.__reduce__",
"tfds.file_adapter.FileFormatAdapter.__reduce_ex__": "tfds.core.BuilderConfig.__reduce_ex__",
Expand All @@ -448,7 +461,7 @@
"tfds.percent.__format__": "tfds.core.BuilderConfig.__format__",
"tfds.percent.__getattribute__": "tfds.core.BuilderConfig.__getattribute__",
"tfds.percent.__hash__": "tfds.core.BuilderConfig.__hash__",
"tfds.percent.__init__": "tfds.core.Version.__init__",
"tfds.percent.__init__": "tfds.core.SplitBase.__init__",
"tfds.percent.__new__": "tfds.core.BuilderConfig.__new__",
"tfds.percent.__reduce__": "tfds.core.BuilderConfig.__reduce__",
"tfds.percent.__reduce_ex__": "tfds.core.BuilderConfig.__reduce_ex__",
Expand Down Expand Up @@ -488,6 +501,7 @@
"tfds.Split.__str__": true,
"tfds.Split.__subclasshook__": true,
"tfds.Split.__weakref__": true,
"tfds.as_numpy": false,
"tfds.builder": false,
"tfds.core": false,
"tfds.core.BuilderConfig": false,
Expand Down Expand Up @@ -564,10 +578,9 @@
"tfds.core.DatasetInfo.description": true,
"tfds.core.DatasetInfo.download_checksums": true,
"tfds.core.DatasetInfo.features": true,
"tfds.core.DatasetInfo.initialize_from_package_data": true,
"tfds.core.DatasetInfo.initialize_from_bucket": true,
"tfds.core.DatasetInfo.initialized": true,
"tfds.core.DatasetInfo.name": true,
"tfds.core.DatasetInfo.num_examples": true,
"tfds.core.DatasetInfo.read_from_directory": true,
"tfds.core.DatasetInfo.size_in_bytes": true,
"tfds.core.DatasetInfo.splits": true,
Expand Down Expand Up @@ -626,6 +639,29 @@
"tfds.core.NamedSplit.__weakref__": true,
"tfds.core.NamedSplit.get_read_instruction": true,
"tfds.core.NamedSplit.subsplit": true,
"tfds.core.SplitBase": false,
"tfds.core.SplitBase.__abstractmethods__": true,
"tfds.core.SplitBase.__add__": true,
"tfds.core.SplitBase.__delattr__": true,
"tfds.core.SplitBase.__dict__": true,
"tfds.core.SplitBase.__doc__": true,
"tfds.core.SplitBase.__eq__": true,
"tfds.core.SplitBase.__format__": true,
"tfds.core.SplitBase.__getattribute__": true,
"tfds.core.SplitBase.__hash__": true,
"tfds.core.SplitBase.__init__": true,
"tfds.core.SplitBase.__module__": true,
"tfds.core.SplitBase.__new__": true,
"tfds.core.SplitBase.__reduce__": true,
"tfds.core.SplitBase.__reduce_ex__": true,
"tfds.core.SplitBase.__repr__": true,
"tfds.core.SplitBase.__setattr__": true,
"tfds.core.SplitBase.__sizeof__": true,
"tfds.core.SplitBase.__str__": true,
"tfds.core.SplitBase.__subclasshook__": true,
"tfds.core.SplitBase.__weakref__": true,
"tfds.core.SplitBase.get_read_instruction": true,
"tfds.core.SplitBase.subsplit": true,
"tfds.core.SplitDict": false,
"tfds.core.SplitDict.__cmp__": true,
"tfds.core.SplitDict.__contains__": true,
Expand Down Expand Up @@ -673,6 +709,7 @@
"tfds.core.SplitDict.popitem": true,
"tfds.core.SplitDict.setdefault": true,
"tfds.core.SplitDict.to_proto": true,
"tfds.core.SplitDict.total_num_examples": true,
"tfds.core.SplitDict.update": true,
"tfds.core.SplitDict.values": true,
"tfds.core.SplitDict.viewitems": true,
Expand Down
34 changes: 34 additions & 0 deletions docs/api_docs/python/tfds/as_numpy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
<meta itemprop="name" content="tfds.as_numpy" />
<meta itemprop="path" content="Stable" />
</div>

# tfds.as_numpy

``` python
tfds.as_numpy(
dataset,
graph=None
)
```



Defined in [`core/dataset_utils.py`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/core/dataset_utils.py).

Converts a `tf.data.Dataset` to an iterable of NumPy arrays.

`as_numpy` converts a possibly nested structure of `tf.data.Dataset`s
and `tf.Tensor`s to iterables of NumPy arrays and NumPy arrays, respectively.

#### Args:

* <b>`dataset`</b>: a possibly nested structure of `tf.data.Dataset`s and/or
`tf.Tensor`s.
* <b>`graph`</b>: `tf.Graph`, optional, explicitly set the graph to use.


#### Returns:

A structure matching `dataset` where `tf.data.Dataset`s are converted to
generators of NumPy arrays and `tf.Tensor`s are converted to NumPy arrays.
2 changes: 2 additions & 0 deletions docs/api_docs/python/tfds/core.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ API to define datasets.

[`class NamedSplit`](../tfds/core/NamedSplit.md): Descriptor corresponding to a named split (train, test, ...).

[`class SplitBase`](../tfds/core/SplitBase.md): Abstract base class for Split compositionality.

[`class SplitDict`](../tfds/core/SplitDict.md): Split info object.

[`class SplitGenerator`](../tfds/core/SplitGenerator.md): Defines the split information for the generator.
Expand Down
6 changes: 3 additions & 3 deletions docs/api_docs/python/tfds/core/DatasetBuilder.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ assert isinstance(train_dataset, tf.data.Dataset)
# And then the rest of your input pipeline
train_dataset = train_dataset.repeat().shuffle(1024).batch(128)
train_dataset = train_dataset.prefetch(2)
features = train_dataset.make_one_shot_iterator().get_next()
features = tf.compat.v1.data.make_one_shot_iterator(train_dataset).get_next()
image, label = features['image'], features['label']
```

Expand Down Expand Up @@ -110,8 +110,8 @@ Callers must pass arguments as keyword arguments.

#### Args:

* <b>`split`</b>: <a href="../../tfds/Split.md"><code>tfds.Split</code></a>, which subset of the data to read. If None (default),
returns all splits in a dict
* <b>`split`</b>: <a href="../../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>, which subset(s) of the data to read. If None
(default), returns all splits in a dict
`<key: tfds.Split, value: tf.data.Dataset>`.
* <b>`batch_size`</b>: `int`, batch size. Note that variable-length features will
be 0-padded if `batch_size > 1`. Users that want more custom behavior
Expand Down
17 changes: 6 additions & 11 deletions docs/api_docs/python/tfds/core/DatasetInfo.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,14 @@
<meta itemprop="property" content="features"/>
<meta itemprop="property" content="initialized"/>
<meta itemprop="property" content="name"/>
<meta itemprop="property" content="num_examples"/>
<meta itemprop="property" content="size_in_bytes"/>
<meta itemprop="property" content="splits"/>
<meta itemprop="property" content="supervised_keys"/>
<meta itemprop="property" content="urls"/>
<meta itemprop="property" content="version"/>
<meta itemprop="property" content="__init__"/>
<meta itemprop="property" content="compute_dynamic_properties"/>
<meta itemprop="property" content="initialize_from_package_data"/>
<meta itemprop="property" content="initialize_from_bucket"/>
<meta itemprop="property" content="read_from_directory"/>
<meta itemprop="property" content="write_to_directory"/>
</div>
Expand Down Expand Up @@ -113,10 +112,6 @@ Whether DatasetInfo has been fully initialized.



<h3 id="num_examples"><code>num_examples</code></h3>



<h3 id="size_in_bytes"><code>size_in_bytes</code></h3>


Expand Down Expand Up @@ -149,20 +144,20 @@ compute_dynamic_properties()



<h3 id="initialize_from_package_data"><code>initialize_from_package_data</code></h3>
<h3 id="initialize_from_bucket"><code>initialize_from_bucket</code></h3>

``` python
initialize_from_package_data()
initialize_from_bucket()
```

Initialize DatasetInfo from package data, returns True on success.
Initialize DatasetInfo from GCS bucket info files.

<h3 id="read_from_directory"><code>read_from_directory</code></h3>

``` python
read_from_directory(
dataset_info_dir,
from_packaged_data=False
from_bucket=False
)
```

Expand All @@ -177,7 +172,7 @@ This will overwrite all previous metadata.

* <b>`dataset_info_dir`</b>: `str` The directory containing the metadata file. This
should be the root directory of a specific dataset version.
* <b>`from_packaged_data`</b>: `bool`, If data is restored from packaged data,
* <b>`from_bucket`</b>: `bool`, If data is restored from info files on GCS,
then only the informations not defined in the code are updated


Expand Down
4 changes: 2 additions & 2 deletions docs/api_docs/python/tfds/core/GeneratorBasedBuilder.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,8 @@ Callers must pass arguments as keyword arguments.

#### Args:

* <b>`split`</b>: <a href="../../tfds/Split.md"><code>tfds.Split</code></a>, which subset of the data to read. If None (default),
returns all splits in a dict
* <b>`split`</b>: <a href="../../tfds/core/SplitBase.md"><code>tfds.core.SplitBase</code></a>, which subset(s) of the data to read. If None
(default), returns all splits in a dict
`<key: tfds.Split, value: tf.data.Dataset>`.
* <b>`batch_size`</b>: `int`, batch size. Note that variable-length features will
be 0-padded if `batch_size > 1`. Users that want more custom behavior
Expand Down
2 changes: 1 addition & 1 deletion docs/api_docs/python/tfds/core/NamedSplit.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

## Class `NamedSplit`


Inherits From: [`SplitBase`](../../tfds/core/SplitBase.md)



Expand Down

0 comments on commit 78bb6ee

Please sign in to comment.