From 4994dbbfd86492f3977b649a8600ee1f08f2ca24 Mon Sep 17 00:00:00 2001 From: Mark Daoust Date: Wed, 7 Nov 2018 14:18:40 -0800 Subject: [PATCH 1/7] Rewrite intro Put more emphasis on `tf.data`. Use `tf.symbol` more, (these are auto-linked to api pages in the site.) Drop base64 Add note about feature_description set privete_outputs Use reduced size images (ipython inlines them at full res size even if they're only shown with a limited size.) --- site/en/tutorials/load_data/tf-records.ipynb | 589 ++++++++++++------- 1 file changed, 383 insertions(+), 206 deletions(-) diff --git a/site/en/tutorials/load_data/tf-records.ipynb b/site/en/tutorials/load_data/tf-records.ipynb index f155d72fb5c..a88f765466e 100644 --- a/site/en/tutorials/load_data/tf-records.ipynb +++ b/site/en/tutorials/load_data/tf-records.ipynb @@ -6,10 +6,45 @@ "name": "tf-records.ipynb", "version": "0.3.2", "provenance": [], - "collapsed_sections": [] + "private_outputs": true, + "collapsed_sections": [], + "toc_visible": true } }, "cells": [ + { + "metadata": { + "id": "pL--_KGdYoBz", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "##### Copyright 2018 The TensorFlow Authors." + ] + }, + { + "metadata": { + "id": "uBDvXpYzYnGj", + "colab_type": "code", + "colab": {} + }, + "cell_type": "code", + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ], + "execution_count": 0, + "outputs": [] + }, { "metadata": { "id": "HQzaEQuJiW_d", @@ -17,7 +52,37 @@ }, "cell_type": "markdown", "source": [ - "# Using TFRecords and TF Examples" + "# Using TFRecords and `tf.Example`\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " View on TensorFlow.org\n", + " \n", + " Run in Google Colab\n", + " \n", + " View source on GitHub\n", + "
" + ] + }, + { + "metadata": { + "id": "3pkUd_9IZCFO", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "To read data efficiently it can be helpful to serialize your data and store it in a set of files (100-200MB Each) that can each be read linearly. This is especially true if the data is being streamed over a network. This can also be useful for caching any data-preprocessing.\n", + "\n", + "The TFRecord format is a simple format for storing a sequence of binary records.\n", + "\n", + "[Protocol buffers](https://developers.google.com/protocol-buffers/) are a cross-platform, cross-language library for efficient serialization of structured data.\n", + "\n", + "Protocol messages are defined by `.proto` files, these are often the easiest way to understand a Message type. \n", + "\n", + "The `tf.Example` message (or protobuf) is a flexible message type that represents a `{\"string\": value}` mapping. It is designed for use with TensorFlow, and is used throughout the higher-level APIs such as [TFX](https://www.tensorflow.org/tfx/)." ] }, { @@ -27,11 +92,20 @@ }, "cell_type": "markdown", "source": [ - "The Example structure as well as the TFRecord format are extremely useful for describing input data in the TensorFlow API. It allows developers to preprocess their data only once for multiple purposes, and allows developers to store their data locally. \n", "\n", - "The [`tf.Example`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/example.proto) [protocol buffer](https://developers.google.com/protocol-buffers/) (a protocol buffer is also called a message) is specifically designed for use with TensorFlow, as well as higher-level APIs such as [TFX](https://www.tensorflow.org/tfx/) and [Keras](https://www.tensorflow.org/guide/keras). This notebook will demonstrate how to create, parse, and use the `tf.Example` message, and then store, read, and write this data in the `.tfrecords` format. This tutorial includes an end-to-end example of reading/writing image data as TF Examples in the TFRecord format. \n", + "This notebook will demonstrate how to create, parse, and use the `tf.Example` message, and then store, read, and write `tf.Example` messages to `.tfrecord` files.\n", "\n", - "Note that, while extremely useful, using these structures is ultimately optional if using the [tf.data API](https://www.tensorflow.org/api_docs/python/tf/data) makes more sense." + "Note: While useful, using these structures is ultimately optional. There is no need to convert to existing code to use TFRecords, unless you are using [`tf.data`](https://www.tensorflow.org/guide/datasets) and reading data is still the bottleneck to training. See [Data Input Pipeline Performance](https://www.tensorflow.org/guide/performance/datasets) for dataset performance tips." + ] + }, + { + "metadata": { + "id": "WkRreBf1eDVc", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "## Setup" ] }, { @@ -49,11 +123,22 @@ "import tensorflow as tf\n", "tf.enable_eager_execution()\n", "\n", - "import numpy as np" + "import numpy as np\n", + "import IPython.display as display" ], "execution_count": 0, "outputs": [] }, + { + "metadata": { + "id": "e5Kq88ccUWQV", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "## `tf.Example`" + ] + }, { "metadata": { "id": "VrdQHgvNijTi", @@ -61,7 +146,7 @@ }, "cell_type": "markdown", "source": [ - "## Data Types In `tf.Example`" + "### Data types for `tf.Example`" ] }, { @@ -71,19 +156,21 @@ }, "cell_type": "markdown", "source": [ - "The `tf.Example` type is generic enough to accept a wide range of data types. While the following [three types](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/feature.proto#L4) of features are compatible with `tf.Example`, most other generic types can be coerced into one of these.\n", + "Fundamentally a `tf.Example` is a `{\"string\": tf.train.Feature}` mapping.\n", + "\n", + "The `tf.train.Feature` message type can accept one of the following three types ([ref]((https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/feature.proto))). Most other generic types can be coerced into one of these.\n", "\n", - "1. [`bytes_list`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/feature.proto#L65) (the following types can be coerced)\n", + "1. `tf.train.BytesList` (the following types can be coerced)\n", "\n", " - `string`\n", " - `byte`\n", "\n", - "1. [`float_list`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/feature.proto#L68) (the following types can be coerced)\n", + "1. `tf.train.FloatList` (the following types can be coerced)\n", "\n", " - `float` (`float32`)\n", " - `double` (`float64`)\n", "\n", - "1. [`int64_list`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/feature.proto#L65) (the following types can be coerced)\n", + "1. `tf.train.Int64List` (the following types can be coerced)\n", "\n", " - `bool`\n", " - `enum`\n", @@ -100,7 +187,9 @@ }, "cell_type": "markdown", "source": [ - "In order to convert a standard type to a `tf.Example`-compatible type, we can use the following functions. Each function takes a single input value and returns one of the 3 `list` types above." + "In order to convert a standard TensorFlow type to a `tf.Example`-compatible `tf.train.Feature`, we can use the following shortcut functions:\n", + "\n", + "Each function takes a scalar input value and returns a `tf.train.Feature` containing one of the 3 `list` types above." ] }, { @@ -129,6 +218,16 @@ "execution_count": 0, "outputs": [] }, + { + "metadata": { + "id": "Wst0v9O8hgzy", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "Note: To stay simple, this example only uses scalar inputs. The simplest way to handle non-scalar features is to use `tf.serialize_tensor` to convert tensors to binary-strings. Strings are scalars in tensorflow. Use `tf.parse_tensor` to convert the binary-string back to a tensor." + ] + }, { "metadata": { "id": "vsMbkkC8xxtB", @@ -158,6 +257,31 @@ "execution_count": 0, "outputs": [] }, + { + "metadata": { + "id": "nj1qpfQU5qmi", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "All proto messages can be serialized to a binary-string using the `.SerializeToString` method." + ] + }, + { + "metadata": { + "id": "5afZkORT5pjm", + "colab_type": "code", + "colab": {} + }, + "cell_type": "code", + "source": [ + "feature = _float_feature(np.exp(1))\n", + "\n", + "feature.SerializeToString()" + ], + "execution_count": 0, + "outputs": [] + }, { "metadata": { "id": "laKnw9F3hL-W", @@ -165,7 +289,7 @@ }, "cell_type": "markdown", "source": [ - "## Creating A `tf.Example` Message" + "### Creating a `tf.Example` message" ] }, { @@ -177,7 +301,7 @@ "source": [ "Suppose you want to create a `tf.Example` message from existing data. In practice, the dataset may come from anywhere, but the procedure of creating the `tf.Example` message from a single observation will be the same. \n", "\n", - "1. Within each observation, each value needs to be converted to one of the 3 compatible types, using one of the functions above. \n", + "1. Within each observation, each value needs to be converted to a `tf.train.Feature` containing one of the 3 compatible types, using one of the functions above. \n", "\n", "1. We create a map (dictionary) from the feature name string to the encoded feature value produced in #1.\n", "\n", @@ -298,6 +422,49 @@ "execution_count": 0, "outputs": [] }, + { + "metadata": { + "id": "_pbGATlG6u-4", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "Or the serialized form:" + ] + }, + { + "metadata": { + "id": "dGim-mEm6vit", + "colab_type": "code", + "colab": {} + }, + "cell_type": "code", + "source": [ + "create_example(example_observation).SerializeToString()" + ], + "execution_count": 0, + "outputs": [] + }, + { + "metadata": { + "id": "jyg1g3gU7DNn", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "## TFRecord files using tf.python_io" + ] + }, + { + "metadata": { + "id": "3FXG3miA7Kf1", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "The `tf.python_io` module contains python functions for reading and writing TFRecord files. " + ] + }, { "metadata": { "id": "CKn5uql2lAaN", @@ -305,7 +472,7 @@ }, "cell_type": "markdown", "source": [ - "## Writing `tf.Example` Messages To A `.tfrecords` File" + "### Writing a TFRecord file" ] }, { @@ -328,13 +495,10 @@ "source": [ "# Write the tf.Example observations to test.tfrecords.\n", "\n", - "writer = tf.python_io.TFRecordWriter('test.tfrecords')\n", - "\n", - "for i in range(n_observations):\n", - " example = create_example([feature0[i], feature1[i], feature2[i], feature3[i]])\n", - " writer.write(example.SerializeToString())\n", - "\n", - "writer.close()" + "with tf.python_io.TFRecordWriter('test.tfrecords') as writer:\n", + " for i in range(n_observations):\n", + " example = create_example([feature0[i], feature1[i], feature2[i], feature3[i]])\n", + " writer.write(example.SerializeToString())" ], "execution_count": 0, "outputs": [] @@ -359,7 +523,7 @@ }, "cell_type": "markdown", "source": [ - "## Reading A `.tfrecords` File" + "### Reading a TFRecord File" ] }, { @@ -474,7 +638,27 @@ }, "cell_type": "markdown", "source": [ - "## Using The `Dataset` Object" + "## TFRecord files using `tf.data`" + ] + }, + { + "metadata": { + "id": "GmehkCCT81Ez", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "The `tf.data` module also provides tools for reading and writing TFRecord files. The writer, `tf.data.experimental.TFRecordWriter`, is not covered by this tutorial." + ] + }, + { + "metadata": { + "id": "6aV0GQhV8tmp", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "### Reading a TFRecord File" ] }, { @@ -484,7 +668,11 @@ }, "cell_type": "markdown", "source": [ - "We can also read the `.tfrecords` file into a [`dataset` object](https://www.tensorflow.org/api_docs/python/tf/data/Dataset). More information on consuming the TFRecord object into a Dataset can be found [here](https://www.tensorflow.org/guide/datasets#consuming_tfrecord_data). Using this datatset structure can be useful for standardizing input data and optimizing performance. It is also easier and quicker to use this object." + "We can also read the TFRecord file using the `tf.data.TFRecordDataset` class. \n", + "\n", + "More information on consuming TFRecord files using `tf.data` can be found [here](https://www.tensorflow.org/guide/datasets#consuming_tfrecord_data). \n", + "\n", + "Using this datatset structure can be useful for standardizing input data and optimizing performance. It is also easier and quicker to use this object." ] }, { @@ -496,7 +684,35 @@ "cell_type": "code", "source": [ "filenames = ['test.tfrecords']\n", - "dataset = tf.data.TFRecordDataset(filenames)" + "dataset_raw = tf.data.TFRecordDataset(filenames)\n", + "\n", + "dataset_raw" + ], + "execution_count": 0, + "outputs": [] + }, + { + "metadata": { + "id": "6_EQ9i2E_-Fz", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "At this point the dataset contains serialized `tf.train.Example` Messages. When iterated over it returns these as scalar string tensors:\n", + "\n", + "Note: iterating over a `tf.data.Dataset` only works with eager execution enabled." + ] + }, + { + "metadata": { + "id": "hxVXpLz_AJlm", + "colab_type": "code", + "colab": {} + }, + "cell_type": "code", + "source": [ + "for record in dataset_raw.take(10):\n", + " print(repr(record))" ], "execution_count": 0, "outputs": [] @@ -508,7 +724,9 @@ }, "cell_type": "markdown", "source": [ - "Each record in this dataset is an `EagerTensor` type, as [eager execution](https://www.tensorflow.org/guide/eager) was enabled at the start of this notebook. These tensors can be parsed using the function below." + "These tensors can be parsed using the function below.\n", + "\n", + "Note: The `feature_description` is necessary here because datasets use graph-execution, and need this description to build their shape and type signature." ] }, { @@ -519,20 +737,51 @@ }, "cell_type": "code", "source": [ + "# Create a description of the features. \n", + "feature_description = {\n", + " 'feature0': tf.FixedLenFeature([], tf.int64, default_value=0),\n", + " 'feature1': tf.FixedLenFeature([], tf.string, default_value=''),\n", + " 'feature2': tf.FixedLenFeature([], tf.int64, default_value=0),\n", + " 'feature3': tf.FixedLenFeature([], tf.float32, default_value=0.0),\n", + "}\n", + "\n", "def _parse_function(example_proto):\n", - " \n", - " # Create a dictionary of features.\n", - " \n", - " features = {\n", - " 'feature0': tf.FixedLenFeature([], tf.int64, default_value=0),\n", - " 'feature1': tf.FixedLenFeature([], tf.string, default_value=''),\n", - " 'feature2': tf.FixedLenFeature([], tf.int64, default_value=0),\n", - " 'feature3': tf.FixedLenFeature([], tf.float32, default_value=0.0),\n", - " }\n", - " \n", " # Parse the input tf.Example proto using the dictionary above.\n", - " \n", - " return tf.parse_single_example(example_proto, features)" + " return tf.parse_single_example(example_proto, feature_description)" + ], + "execution_count": 0, + "outputs": [] + }, + { + "metadata": { + "id": "gWETjUqhEQZf", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "Or use `tf.parse example` to parse a whole batch at once." + ] + }, + { + "metadata": { + "id": "AH73hav6Bnmg", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "Apply this finction to each item in the dataset using the `tf.data.Dataset.map` method:" + ] + }, + { + "metadata": { + "id": "6Ob7D-zmBm1w", + "colab_type": "code", + "colab": {} + }, + "cell_type": "code", + "source": [ + "dataset_parsed = dataset_raw.map(_parse_function)\n", + "dataset_parsed " ], "execution_count": 0, "outputs": [] @@ -555,8 +804,8 @@ }, "cell_type": "code", "source": [ - "for record in dataset.take(10):\n", - " print(_parse_function(record))" + "for record in dataset_parsed.take(10):\n", + " print(repr(record))" ], "execution_count": 0, "outputs": [] @@ -568,7 +817,7 @@ }, "cell_type": "markdown", "source": [ - "## Reading/Writing Image Data" + "## Walkthrough: Reading/Writing Image Data" ] }, { @@ -578,7 +827,7 @@ }, "cell_type": "markdown", "source": [ - "This is an example of how to read and write image data using TFRecords. The purpose of this is to show how, end to end, input data (in this case an image) and write the data as a `.tfrecords` file, then read the file back and display the image.\n", + "This is an example of how to read and write image data using TFRecords. The purpose of this is to show how, end to end, input data (in this case an image) and write the data as a TFRecord file, then read the file back and display the image.\n", "\n", "This can be useful if, for example, you want to use several models on the same input dataset. Instead of storing the image data raw, it can be preprocessed into the TFRecords format, and that can be used in all further processing and modelling. \n", "\n", @@ -587,48 +836,77 @@ }, { "metadata": { - "id": "BbK8nGxvU9d0", + "id": "5Lk2qrKvN0yu", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "### Fetch the images" + ] + }, + { + "metadata": { + "id": "3a0fmwg8lHdF", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "# These imports are relevant for displaying and encoding image strings.\n", - "\n", - "import base64\n", - "\n", - "from IPython.display import Image" + "cat_in_snow = tf.keras.utils.get_file('320px-Felis_catus-cat_on_snow.jpg', 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/b6/Felis_catus-cat_on_snow.jpg/320px-Felis_catus-cat_on_snow.jpg')\n", + "williamsburg_bridge = tf.keras.utils.get_file('194px-New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg','https://upload.wikimedia.org/wikipedia/commons/thumb/f/fe/New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg/194px-New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg')" ], "execution_count": 0, "outputs": [] }, { "metadata": { - "id": "3a0fmwg8lHdF", + "id": "ELE4ueh4o3OM", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "!wget -O 'cat_in_snow.jpg' 'https://upload.wikimedia.org/wikipedia/commons/b/b6/Felis_catus-cat_on_snow.jpg'\n", - "!wget -O 'williamsburg_bridge.jpg' 'https://upload.wikimedia.org/wikipedia/commons/f/fe/New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg'" + "!ls" ], "execution_count": 0, "outputs": [] }, { "metadata": { - "id": "ELE4ueh4o3OM", + "id": "7aJJh7vENeE4", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "!ls" + "display.Image(filename=cat_in_snow)" + ], + "execution_count": 0, + "outputs": [] + }, + { + "metadata": { + "id": "KkW0uuhcXZqA", + "colab_type": "code", + "colab": {} + }, + "cell_type": "code", + "source": [ + "display.Image(filename=williamsburg_bridge)" ], "execution_count": 0, "outputs": [] }, + { + "metadata": { + "id": "VSOgJSwoN5TQ", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "### Write a TFRecord file" + ] + }, { "metadata": { "id": "Azx83ryQEU6T", @@ -648,8 +926,8 @@ "cell_type": "code", "source": [ "image_labels = {\n", - " 'cat_in_snow.jpg': 0,\n", - " 'williamsburg_bridge.jpg': 1,\n", + " cat_in_snow : 0,\n", + " williamsburg_bridge : 1,\n", "}" ], "execution_count": 0, @@ -664,26 +942,27 @@ "cell_type": "code", "source": [ "# This is an example, just using the cat image.\n", + "image_string = open(cat_in_snow, 'rb').read()\n", "\n", - "file = open('cat_in_snow.jpg', 'rb').read()\n", - "\n", - "image_shape = tf.image.decode_jpeg(file).shape\n", - "image_string = base64.b64encode(file)\n", - "\n", - "label = image_labels['cat_in_snow.jpg']\n", + "label = image_labels[cat_in_snow]\n", "\n", "# Create a dictionary with features that may be relevant.\n", + "def image_example(image_string, label):\n", + " image_shape = tf.image.decode_jpeg(image_string).shape\n", "\n", - "feature = {\n", - " 'height': _int64_feature(image_shape[0]),\n", - " 'width': _int64_feature(image_shape[1]),\n", - " 'depth': _int64_feature(image_shape[2]),\n", - " 'label': _int64_feature(label),\n", - " 'image_raw': _bytes_feature(image_string),\n", - "}\n", + " feature = {\n", + " 'height': _int64_feature(image_shape[0]),\n", + " 'width': _int64_feature(image_shape[1]),\n", + " 'depth': _int64_feature(image_shape[2]),\n", + " 'label': _int64_feature(label),\n", + " 'image_raw': _bytes_feature(image_string),\n", + " }\n", + "\n", + " return tf.train.Example(features=tf.train.Features(feature=feature))\n", "\n", - "tf_example = tf.train.Example(features=tf.train.Features(feature=feature))\n", - "print(tf_example)" + "for line in str(image_example(image_string, label)).split('\\n')[:15]:\n", + " print(line)\n", + "print('...')" ], "execution_count": 0, "outputs": [] @@ -710,27 +989,14 @@ "# First, process the two images into tf.Example messages.\n", "# Then, write to a .tfrecords file.\n", "\n", - "writer = tf.python_io.TFRecordWriter('images.tfrecords')\n", - "\n", - "for filename, label in image_labels.items():\n", - " \n", - " file = open(filename, 'rb').read()\n", - "\n", - " image_shape = tf.image.decode_jpeg(file).shape\n", - " image_string = base64.b64encode(file)\n", - "\n", - " feature = {\n", - " 'height': _int64_feature(image_shape[0]),\n", - " 'width': _int64_feature(image_shape[1]),\n", - " 'depth': _int64_feature(image_shape[2]),\n", - " 'label': _int64_feature(label),\n", - " 'image_raw': _bytes_feature(image_string),\n", - " }\n", - " \n", - " tf_example = tf.train.Example(features=tf.train.Features(feature=feature))\n", - " writer.write(tf_example.SerializeToString())\n", - "\n", - "writer.close()" + "with tf.python_io.TFRecordWriter('images.tfrecords') as writer:\n", + " for filename, label in image_labels.items():\n", + " \n", + " image_string = open(filename, 'rb').read()\n", + " \n", + " tf_example = image_example(image_string, label)\n", + " \n", + " writer.write(tf_example.SerializeToString())\n" ], "execution_count": 0, "outputs": [] @@ -755,6 +1021,8 @@ }, "cell_type": "markdown", "source": [ + "### Read the file\n", + "\n", "We now have the file `images.tfrecords`. We can now iterate over the records in the file to read back what we wrote. Since, for our use case we will just reproduce the image, the only feature we need is the raw image string. We can extract that using the getters described above, namely `example.features.feature['image_raw'].bytes_list.value[0]`. We also use the labels to determine which record is the cat as opposed to the bridge." ] }, @@ -766,155 +1034,64 @@ }, "cell_type": "code", "source": [ - "record_iterator = tf.python_io.tf_record_iterator(path='images.tfrecords')\n", - "\n", - "# Create a dictionary mapping the image label to the bytes string.\n", + "image_dataset_raw = tf.data.TFRecordDataset('images.tfrecords')\n", "\n", - "image_bytes = {}\n", + "# Create a dictionary describing the features. \n", + "image_feature_description = {\n", + " 'height': tf.FixedLenFeature([], tf.int64),\n", + " 'width': tf.FixedLenFeature([], tf.int64),\n", + " 'depth': tf.FixedLenFeature([], tf.int64),\n", + " 'label': tf.FixedLenFeature([], tf.int64),\n", + " 'image_raw': tf.FixedLenFeature([], tf.string),\n", + "}\n", "\n", - "for string_record in record_iterator:\n", - " example = tf.train.Example()\n", - " example.ParseFromString(string_record)\n", - " \n", - " label = example.features.feature['label'].int64_list.value[0]\n", - " \n", - " image_bytes[label] = example.features.feature['image_raw'].bytes_list.value[0]" - ], - "execution_count": 0, - "outputs": [] - }, - { - "metadata": { - "id": "qTkNHH9pid40", - "colab_type": "text" - }, - "cell_type": "markdown", - "source": [ - "Now, we create new blank JPEG files, that we will write the decoded image strings to." - ] - }, - { - "metadata": { - "id": "eSzTHYZkVGTd", - "colab_type": "code", - "colab": {} - }, - "cell_type": "code", - "source": [ - "with open('cat_in_snow_from_tfrecords.jpg', 'w') as f:\n", - " f.write(base64.b64decode(image_bytes[image_labels['cat_in_snow.jpg']]))\n", + "def _parse_image_function(example_proto):\n", + " # Parse the input tf.Example proto using the dictionary above.\n", + " return tf.parse_single_example(example_proto, image_feature_description)\n", "\n", - "with open('williamsburg_bridge_from_tfrecords.jpg', 'w') as f:\n", - " f.write(base64.b64decode(image_bytes[image_labels['williamsburg_bridge.jpg']]))" + "image_dataset_parsed = image_dataset_raw.map(_parse_image_function)\n", + "image_dataset_parsed" ], "execution_count": 0, "outputs": [] }, { "metadata": { - "id": "azilbA7Pjeu2", + "id": "0PEEFPk4NEg1", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "Let's display these images! Remember these are not the raw images, these have been encoded as a `.tfrecords` file and then read back into raw image format." + "Now we can recover the images from the TFRecord file:" ] }, { "metadata": { - "colab_type": "code", - "id": "sQq8cG07U6NG", - "colab": {} - }, - "cell_type": "code", - "source": [ - "Image(filename='cat_in_snow_from_tfrecords.jpg', width=500)" - ], - "execution_count": 0, - "outputs": [] - }, - { - "metadata": { - "id": "KVoldzEVjqIX", + "id": "yZf8jOyEIjSF", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "Image(filename='williamsburg_bridge_from_tfrecords.jpg', width=500)" + "for image_features in image_dataset_parsed:\n", + " image_raw = image_features['image_raw'].numpy()\n", + " display.display(display.Image(data=image_raw))" ], "execution_count": 0, "outputs": [] }, { "metadata": { - "id": "OiP8jBE44mEF", - "colab_type": "text" - }, - "cell_type": "markdown", - "source": [ - "In practice however, it is less practical to work directly with the raw TFRecords format than with the `dataset` object. This is also true for images, where we can easily load image files into a dataset that is ready to use. This example follows the documentation [here](https://www.tensorflow.org/guide/datasets#decoding_image_data_and_resizing_it). We first define another `_parse_function` to parse an image file into a decoded image, then leverage the `from_tensor_slices` method to load these images into a dataset." - ] - }, - { - "metadata": { - "id": "kG66dtkRpNFQ", - "colab_type": "code", - "colab": {} - }, - "cell_type": "code", - "source": [ - "def _parse_function(filename, label):\n", - " image_string = tf.read_file(filename)\n", - " image_decoded = tf.image.decode_jpeg(image_string)\n", - " \n", - " return image_decoded, label" - ], - "execution_count": 0, - "outputs": [] - }, - { - "metadata": { - "id": "Q25yC0VlwLeG", - "colab_type": "code", - "colab": {} - }, - "cell_type": "code", - "source": [ - "filenames = tf.constant(['cat_in_snow.jpg', 'williamsburg_bridge.jpg'])\n", - "labels = tf.constant([0, 1])\n", - "\n", - "dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))\n", - "dataset = dataset.map(_parse_function)" - ], - "execution_count": 0, - "outputs": [] - }, - { - "metadata": { - "id": "2FC8JOld5Ar-", - "colab_type": "text" - }, - "cell_type": "markdown", - "source": [ - "Again using eager execution, we can print the records in the dataset. Each record is a tuple of a part of the image and the label. Each has type `tf.Tensor`, the first is an array of non-trivial shape (as it is part of an image), and the second is the label.\n", - "\n", - "From the below, we see that the first record comes from the cat image (as the label is `0`) and the second is from the bridge image (as the label is `1`)." - ] - }, - { - "metadata": { - "id": "s5zte_y3yGq8", + "id": "HJrI-k1QY2DY", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "for record in dataset.take(2):\n", - " print(record)" + "" ], "execution_count": 0, "outputs": [] } ] -} +} \ No newline at end of file From 7f2b2d97c0956a9c49c85521fd9da88bbd64afa6 Mon Sep 17 00:00:00 2001 From: Mark Daoust Date: Mon, 12 Nov 2018 18:12:25 -0800 Subject: [PATCH 2/7] Resolve review comments. --- site/en/tutorials/load_data/tf-records.ipynb | 77 +++++++------------- 1 file changed, 25 insertions(+), 52 deletions(-) diff --git a/site/en/tutorials/load_data/tf-records.ipynb b/site/en/tutorials/load_data/tf-records.ipynb index a88f765466e..985db4d67ad 100644 --- a/site/en/tutorials/load_data/tf-records.ipynb +++ b/site/en/tutorials/load_data/tf-records.ipynb @@ -26,7 +26,8 @@ "metadata": { "id": "uBDvXpYzYnGj", "colab_type": "code", - "colab": {} + "colab": {}, + "cellView": "form" }, "cell_type": "code", "source": [ @@ -74,13 +75,13 @@ }, "cell_type": "markdown", "source": [ - "To read data efficiently it can be helpful to serialize your data and store it in a set of files (100-200MB Each) that can each be read linearly. This is especially true if the data is being streamed over a network. This can also be useful for caching any data-preprocessing.\n", + "To read data efficiently it can be helpful to serialize your data and store it in a set of files (100-200MB each) that can each be read linearly. This is especially true if the data is being streamed over a network. This can also be useful for caching any data-preprocessing.\n", "\n", "The TFRecord format is a simple format for storing a sequence of binary records.\n", "\n", "[Protocol buffers](https://developers.google.com/protocol-buffers/) are a cross-platform, cross-language library for efficient serialization of structured data.\n", "\n", - "Protocol messages are defined by `.proto` files, these are often the easiest way to understand a Message type. \n", + "Protocol messages are defined by `.proto` files, these are often the easiest way to understand a message type. \n", "\n", "The `tf.Example` message (or protobuf) is a flexible message type that represents a `{\"string\": value}` mapping. It is designed for use with TensorFlow, and is used throughout the higher-level APIs such as [TFX](https://www.tensorflow.org/tfx/)." ] @@ -95,7 +96,7 @@ "\n", "This notebook will demonstrate how to create, parse, and use the `tf.Example` message, and then store, read, and write `tf.Example` messages to `.tfrecord` files.\n", "\n", - "Note: While useful, using these structures is ultimately optional. There is no need to convert to existing code to use TFRecords, unless you are using [`tf.data`](https://www.tensorflow.org/guide/datasets) and reading data is still the bottleneck to training. See [Data Input Pipeline Performance](https://www.tensorflow.org/guide/performance/datasets) for dataset performance tips." + "Note: While useful, using these structures is ultimately optional. There is no need to convert existing code to use TFRecords, unless you are using [`tf.data`](https://www.tensorflow.org/guide/datasets) and reading data is still the bottleneck to training. See [Data Input Pipeline Performance](https://www.tensorflow.org/guide/performance/datasets) for dataset performance tips." ] }, { @@ -158,7 +159,7 @@ "source": [ "Fundamentally a `tf.Example` is a `{\"string\": tf.train.Feature}` mapping.\n", "\n", - "The `tf.train.Feature` message type can accept one of the following three types ([ref]((https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/feature.proto))). Most other generic types can be coerced into one of these.\n", + "The `tf.train.Feature` message type can accept one of the following three types (See the [`.proto` file]((https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/feature.proto) for reference). Most other generic types can be coerced into one of these.\n", "\n", "1. `tf.train.BytesList` (the following types can be coerced)\n", "\n", @@ -672,7 +673,7 @@ "\n", "More information on consuming TFRecord files using `tf.data` can be found [here](https://www.tensorflow.org/guide/datasets#consuming_tfrecord_data). \n", "\n", - "Using this datatset structure can be useful for standardizing input data and optimizing performance. It is also easier and quicker to use this object." + "Using this dataset structure can be useful for standardizing input data and optimizing performance. It is also easier and quicker to use this object." ] }, { @@ -684,9 +685,8 @@ "cell_type": "code", "source": [ "filenames = ['test.tfrecords']\n", - "dataset_raw = tf.data.TFRecordDataset(filenames)\n", - "\n", - "dataset_raw" + "raw_dataset = tf.data.TFRecordDataset(filenames)\n", + "raw_dataset" ], "execution_count": 0, "outputs": [] @@ -698,7 +698,9 @@ }, "cell_type": "markdown", "source": [ - "At this point the dataset contains serialized `tf.train.Example` Messages. When iterated over it returns these as scalar string tensors:\n", + "At this point the dataset contains serialized `tf.train.Example` messages. When iterated over it returns these as scalar string tensors. \n", + "\n", + "Use the `.take` method to only show the first 10 records.\n", "\n", "Note: iterating over a `tf.data.Dataset` only works with eager execution enabled." ] @@ -711,8 +713,8 @@ }, "cell_type": "code", "source": [ - "for record in dataset_raw.take(10):\n", - " print(repr(record))" + "for raw_record in raw_dataset.take(10):\n", + " print(repr(raw_record))" ], "execution_count": 0, "outputs": [] @@ -780,8 +782,8 @@ }, "cell_type": "code", "source": [ - "dataset_parsed = dataset_raw.map(_parse_function)\n", - "dataset_parsed " + "parsed_dataset = raw_dataset.map(_parse_function)\n", + "parsed_dataset " ], "execution_count": 0, "outputs": [] @@ -804,8 +806,8 @@ }, "cell_type": "code", "source": [ - "for record in dataset_parsed.take(10):\n", - " print(repr(record))" + "for parsed_record in parsed_dataset.take(10):\n", + " print(repr(raw_record))" ], "execution_count": 0, "outputs": [] @@ -858,19 +860,6 @@ "execution_count": 0, "outputs": [] }, - { - "metadata": { - "id": "ELE4ueh4o3OM", - "colab_type": "code", - "colab": {} - }, - "cell_type": "code", - "source": [ - "!ls" - ], - "execution_count": 0, - "outputs": [] - }, { "metadata": { "id": "7aJJh7vENeE4", @@ -904,7 +893,7 @@ }, "cell_type": "markdown", "source": [ - "### Write a TFRecord file" + "### Write the TFRecord file" ] }, { @@ -991,12 +980,9 @@ "\n", "with tf.python_io.TFRecordWriter('images.tfrecords') as writer:\n", " for filename, label in image_labels.items():\n", - " \n", " image_string = open(filename, 'rb').read()\n", - " \n", " tf_example = image_example(image_string, label)\n", - " \n", - " writer.write(tf_example.SerializeToString())\n" + " writer.write(tf_example.SerializeToString())" ], "execution_count": 0, "outputs": [] @@ -1021,7 +1007,7 @@ }, "cell_type": "markdown", "source": [ - "### Read the file\n", + "### Read the TFRecord file\n", "\n", "We now have the file `images.tfrecords`. We can now iterate over the records in the file to read back what we wrote. Since, for our use case we will just reproduce the image, the only feature we need is the raw image string. We can extract that using the getters described above, namely `example.features.feature['image_raw'].bytes_list.value[0]`. We also use the labels to determine which record is the cat as opposed to the bridge." ] @@ -1034,7 +1020,7 @@ }, "cell_type": "code", "source": [ - "image_dataset_raw = tf.data.TFRecordDataset('images.tfrecords')\n", + "raw_image_dataset = tf.data.TFRecordDataset('images.tfrecords')\n", "\n", "# Create a dictionary describing the features. \n", "image_feature_description = {\n", @@ -1049,8 +1035,8 @@ " # Parse the input tf.Example proto using the dictionary above.\n", " return tf.parse_single_example(example_proto, image_feature_description)\n", "\n", - "image_dataset_parsed = image_dataset_raw.map(_parse_image_function)\n", - "image_dataset_parsed" + "parsed_image_dataset = raw_image_dataset.map(_parse_image_function)\n", + "parsed_image_dataset" ], "execution_count": 0, "outputs": [] @@ -1073,25 +1059,12 @@ }, "cell_type": "code", "source": [ - "for image_features in image_dataset_parsed:\n", + "for image_features in parsed_image_dataset:\n", " image_raw = image_features['image_raw'].numpy()\n", " display.display(display.Image(data=image_raw))" ], "execution_count": 0, "outputs": [] - }, - { - "metadata": { - "id": "HJrI-k1QY2DY", - "colab_type": "code", - "colab": {} - }, - "cell_type": "code", - "source": [ - "" - ], - "execution_count": 0, - "outputs": [] } ] } \ No newline at end of file From 748c74f80e5e0af14c6fe41089b863f1c21d1d02 Mon Sep 17 00:00:00 2001 From: Mark Daoust Date: Tue, 13 Nov 2018 17:22:41 -0800 Subject: [PATCH 3/7] Work in progress. --- site/en/tutorials/load_data/tf-records.ipynb | 348 ++++++++++++------- 1 file changed, 222 insertions(+), 126 deletions(-) diff --git a/site/en/tutorials/load_data/tf-records.ipynb b/site/en/tutorials/load_data/tf-records.ipynb index 985db4d67ad..9a908d4ac96 100644 --- a/site/en/tutorials/load_data/tf-records.ipynb +++ b/site/en/tutorials/load_data/tf-records.ipynb @@ -26,8 +26,8 @@ "metadata": { "id": "uBDvXpYzYnGj", "colab_type": "code", - "colab": {}, - "cellView": "form" + "cellView": "form", + "colab": {} }, "cell_type": "code", "source": [ @@ -341,11 +341,12 @@ "# boolean feature, encoded as False or True\n", "feature0 = np.random.choice([False, True], n_observations)\n", "\n", - "# bytes feature\n", - "feature1 = np.random.bytes(n_observations)\n", - "\n", "# integer feature, random between -10000 and 10000\n", - "feature2 = np.random.randint(-10000, 10000, n_observations)\n", + "feature1 = np.random.randint(0, 5, n_observations)\n", + "\n", + "# bytes feature\n", + "strings = np.array(['cat','dog','chicken','horse','goat'])\n", + "feature2 = strings[feature1]\n", "\n", "# float feature, from a standard normal distribution\n", "feature3 = np.random.randn(n_observations)" @@ -371,27 +372,25 @@ }, "cell_type": "code", "source": [ - "def create_example(features):\n", + "def serialize_example(feature0, feature1, feature2, feature3):\n", " \"\"\"\n", " Creates a tf.Example message ready to be written to a file.\n", - " \n", - " Inputs:\n", - " - features: a 4-list of the values in the observation\n", " \"\"\"\n", " \n", " # Create a dictionary mapping the feature name to the tf.Example-compatible\n", " # data type.\n", " \n", " feature = {\n", - " 'feature0': _int64_feature(features[0]),\n", - " 'feature1': _bytes_feature(features[1]),\n", - " 'feature2': _int64_feature(features[2]),\n", - " 'feature3': _float_feature(features[3]),\n", + " 'feature0': _int64_feature(feature0),\n", + " 'feature1': _int64_feature(feature1),\n", + " 'feature2': _bytes_feature(feature2),\n", + " 'feature3': _float_feature(feature3),\n", " }\n", " \n", " # Create a Features message using tf.train.Example.\n", " \n", - " return tf.train.Example(features=tf.train.Features(feature=feature))" + " example_proto = tf.train.Example(features=tf.train.Features(feature=feature))\n", + " return example_proto.SerializeToString()" ], "execution_count": 0, "outputs": [] @@ -403,7 +402,7 @@ }, "cell_type": "markdown", "source": [ - "For example, suppose we have a single observation from the dataset, `[False, bytes('example'), -1234, 0.9876]`. We can create and print the `tf.Example` message for this observation using `create_message()`. Each single observation will be written as a `Features` message as per the above. Note that the `tf.Example` [message](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/example.proto#L88) is just a wrapper around the `Features` message." + "For example, suppose we have a single observation from the dataset, `[False, 4, bytes('goat'), 0.9876]`. We can create and print the `tf.Example` message for this observation using `create_message()`. Each single observation will be written as a `Features` message as per the above. Note that the `tf.Example` [message](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/example.proto#L88) is just a wrapper around the `Features` message." ] }, { @@ -416,9 +415,10 @@ "source": [ "# This is an example observation from the dataset.\n", "\n", - "example_observation = [False, bytes('example'), -1234, 0.9876]\n", + "example_observation = []\n", "\n", - "print(create_example(example_observation))" + "serialized_example = serialize_example(False, 4, bytes('goat'), 0.9876)\n", + "serialized_example" ], "execution_count": 0, "outputs": [] @@ -430,7 +430,7 @@ }, "cell_type": "markdown", "source": [ - "Or the serialized form:" + "To decode the message use the `tf.train.Example.FromString` method." ] }, { @@ -441,373 +441,469 @@ }, "cell_type": "code", "source": [ - "create_example(example_observation).SerializeToString()" + "example_proto = tf.train.Example.FromString(serialized_example)\n", + "example_proto" ], "execution_count": 0, "outputs": [] }, { "metadata": { - "id": "jyg1g3gU7DNn", + "id": "y-Hjmee-fbLH", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "## TFRecord files using tf.python_io" + "## TFRecord files using `tf.data`" ] }, { "metadata": { - "id": "3FXG3miA7Kf1", + "id": "GmehkCCT81Ez", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "The `tf.python_io` module contains python functions for reading and writing TFRecord files. " + "The `tf.data` module also provides tools for reading and writing data in tensorflow." ] }, { "metadata": { - "id": "CKn5uql2lAaN", + "id": "1FISEuz8ubu3", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "### Writing a TFRecord file" + "### Writing a TFRecord file\n", + "\n", + "The easiest wayt to get the data into a dataset is to use the `from_tensor_slices` method.\n", + "\n", + "Applied to an array, it returns a dataset of scalars." ] }, { "metadata": { - "id": "LNW_FA-GQWXs", + "id": "mXeaukvwu5_-", + "colab_type": "code", + "colab": {} + }, + "cell_type": "code", + "source": [ + "tf.data.Dataset.from_tensor_slices(feature1)" + ], + "execution_count": 0, + "outputs": [] + }, + { + "metadata": { + "id": "f-q0VKyZvcad", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "We now write the 10,000 observations to the file `test.tfrecords`. Each observation is converted to a `tf.Example` message, then written to file. We can then verify that the file `test.tfrecords` has been created." + "Applies to a tuple of arrays, it returns a dataset of tuples:" ] }, { "metadata": { - "id": "MKPHzoGv7q44", + "id": "H5sWyu1kxnvg", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "# Write the tf.Example observations to test.tfrecords.\n", - "\n", - "with tf.python_io.TFRecordWriter('test.tfrecords') as writer:\n", - " for i in range(n_observations):\n", - " example = create_example([feature0[i], feature1[i], feature2[i], feature3[i]])\n", - " writer.write(example.SerializeToString())" + "features_dataset = tf.data.Dataset.from_tensor_slices((feature0, feature1, feature2, feature3))\n", + "features_dataset" ], "execution_count": 0, "outputs": [] }, { "metadata": { - "id": "EjdFHHJMpUUo", + "id": "m1C-t71Nywze", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "!ls" + "# Just take one element from the dataset.\n", + "for f0,f1,f2,f3 in features_dataset.take(1):\n", + " print(f0)\n", + " print(f1)\n", + " print(f2)\n", + " print(f3)" ], "execution_count": 0, "outputs": [] }, { "metadata": { - "id": "wtQ7k0YWQ1cz", + "id": "mhIe63awyZYd", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "### Reading a TFRecord File" + "Use the `tf.data.Dataset.map` method to apply a function to each element of a `Dataset`.\n", + "\n", + "The mapped function must be a TensorFlow-aware function. To make `create_example` TensorFlow-aware, wrap it with py_func:" ] }, { "metadata": { - "id": "utkozytkQ-2K", + "id": "apB5KYrJzjPI", + "colab_type": "code", + "colab": {} + }, + "cell_type": "code", + "source": [ + "def tf_serialize_example(f0,f1,f2,f3):\n", + " return tf.py_func(serialize_example, (f0,f1,f2,f3), tf.string)" + ], + "execution_count": 0, + "outputs": [] + }, + { + "metadata": { + "id": "6aV0GQhV8tmp", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "Suppose we now want to read this data back, to be input as data into a model.\n", + "### Reading a TFRecord file" + ] + }, + { + "metadata": { + "id": "o3J5D4gcSy8N", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "We can also read the TFRecord file using the `tf.data.TFRecordDataset` class. \n", "\n", - "The following example imports the data as is, as a `tf.Example` message. This can be useful to verify that a the file contains the data that we expect. This can also be useful if the input data is stored as TFRecords but you would prefer to input NumPy data (or some other input data type), for example [here](https://www.tensorflow.org/guide/datasets#consuming_numpy_arrays), since this example allows us to read the values themselves.\n", + "More information on consuming TFRecord files using `tf.data` can be found [here](https://www.tensorflow.org/guide/datasets#consuming_tfrecord_data). \n", "\n", - "We iterate through the TFRecords in the infile, extract the `tf.Example` message, and can read/store the values within." + "Using this dataset structure can be useful for standardizing input data and optimizing performance. It is also easier and quicker to use this object." ] }, { "metadata": { - "id": "36ltP9B8OezA", + "id": "6OjX6UZl-bHC", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "record_iterator = tf.python_io.tf_record_iterator(path='test.tfrecords')\n", - "\n", - "for string_record in record_iterator:\n", - " example = tf.train.Example()\n", - " example.ParseFromString(string_record)\n", - " \n", - " print(example)\n", - " \n", - " # Exit after 1 iteration as this is purely demonstrative.\n", - " break" + "filenames = ['test.tfrecords']\n", + "raw_dataset = tf.data.TFRecordDataset(filenames)\n", + "raw_dataset" ], "execution_count": 0, "outputs": [] }, { "metadata": { - "id": "i3uquiiGTZTK", + "id": "6_EQ9i2E_-Fz", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "The features of the `example` object (created above of type `tf.Example`) can be accessed using its getters (similarly to any protocol buffer message). `example.features` returns a `repeated feature` message, then getting the `feature` message returns a map of feature name to feature value (stored in Python as a dictionary)." + "At this point the dataset contains serialized `tf.train.Example` messages. When iterated over it returns these as scalar string tensors. \n", + "\n", + "Use the `.take` method to only show the first 10 records.\n", + "\n", + "Note: iterating over a `tf.data.Dataset` only works with eager execution enabled." ] }, { "metadata": { - "id": "-UNzS7vsUBs0", + "id": "hxVXpLz_AJlm", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "print(dict(example.features.feature))" + "for raw_record in raw_dataset.take(10):\n", + " print(repr(raw_record))" ], "execution_count": 0, "outputs": [] }, { "metadata": { - "id": "u1M-WrbqUUVW", + "id": "W-6oNzM4luFQ", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "From this dictionary, you can get any given value as with a dictionary." + "These tensors can be parsed using the function below.\n", + "\n", + "Note: The `feature_description` is necessary here because datasets use graph-execution, and need this description to build their shape and type signature." ] }, { "metadata": { - "id": "2yCBu70IUb2H", + "id": "zQjbIR1nleiy", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "print(example.features.feature['feature3'])" + "# Create a description of the features. \n", + "feature_description = {\n", + " 'feature0': tf.FixedLenFeature([], tf.int64, default_value=0),\n", + " 'feature1': tf.FixedLenFeature([], tf.string, default_value=''),\n", + " 'feature2': tf.FixedLenFeature([], tf.int64, default_value=0),\n", + " 'feature3': tf.FixedLenFeature([], tf.float32, default_value=0.0),\n", + "}\n", + "\n", + "def _parse_function(example_proto):\n", + " # Parse the input tf.Example proto using the dictionary above.\n", + " return tf.parse_single_example(example_proto, feature_description)" ], "execution_count": 0, "outputs": [] }, { "metadata": { - "id": "4dw6_OI9UiNZ", + "id": "gWETjUqhEQZf", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "Now, we can access the value using the getters again." + "Or use `tf.parse example` to parse a whole batch at once." ] }, { "metadata": { - "id": "BdDYjDnDUlFe", + "id": "AH73hav6Bnmg", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "Apply this finction to each item in the dataset using the `tf.data.Dataset.map` method:" + ] + }, + { + "metadata": { + "id": "6Ob7D-zmBm1w", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "print(example.features.feature['feature3'].float_list.value)" + "parsed_dataset = raw_dataset.map(_parse_function)\n", + "parsed_dataset " ], "execution_count": 0, "outputs": [] }, { "metadata": { - "id": "y-Hjmee-fbLH", + "colab_type": "text", + "id": "sNV-XclGnOvn" + }, + "cell_type": "markdown", + "source": [ + "Now, we can use eager execution to display the observations in the dataset. Note that there are 10,000 observations in this dataset, but we only display the first 10. The data is displayed as a dictionary of features. Each item is a `tf.Tensor`, and the `numpy` element of this tensor displays the value of the feature." + ] + }, + { + "metadata": { + "id": "x2LT2JCqhoD_", + "colab_type": "code", + "colab": {} + }, + "cell_type": "code", + "source": [ + "for parsed_record in parsed_dataset.take(10):\n", + " print(repr(raw_record))" + ], + "execution_count": 0, + "outputs": [] + }, + { + "metadata": { + "id": "jyg1g3gU7DNn", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "## TFRecord files using `tf.data`" + "## TFRecord files using tf.python_io" ] }, { "metadata": { - "id": "GmehkCCT81Ez", + "id": "3FXG3miA7Kf1", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "The `tf.data` module also provides tools for reading and writing TFRecord files. The writer, `tf.data.experimental.TFRecordWriter`, is not covered by this tutorial." + "The `tf.python_io` module contains python functions for reading and writing TFRecord files. " ] }, { "metadata": { - "id": "6aV0GQhV8tmp", + "id": "CKn5uql2lAaN", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "### Reading a TFRecord File" + "### Writing a TFRecord file" ] }, { "metadata": { - "id": "o3J5D4gcSy8N", + "id": "LNW_FA-GQWXs", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "We can also read the TFRecord file using the `tf.data.TFRecordDataset` class. \n", - "\n", - "More information on consuming TFRecord files using `tf.data` can be found [here](https://www.tensorflow.org/guide/datasets#consuming_tfrecord_data). \n", - "\n", - "Using this dataset structure can be useful for standardizing input data and optimizing performance. It is also easier and quicker to use this object." + "We now write the 10,000 observations to the file `test.tfrecords`. Each observation is converted to a `tf.Example` message, then written to file. We can then verify that the file `test.tfrecords` has been created." ] }, { "metadata": { - "id": "6OjX6UZl-bHC", + "id": "MKPHzoGv7q44", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "filenames = ['test.tfrecords']\n", - "raw_dataset = tf.data.TFRecordDataset(filenames)\n", - "raw_dataset" + "# Write the tf.Example observations to test.tfrecords.\n", + "\n", + "with tf.python_io.TFRecordWriter('test.tfrecords') as writer:\n", + " for i in range(n_observations):\n", + " example = create_example([feature0[i], feature1[i], feature2[i], feature3[i]])\n", + " writer.write(example.SerializeToString())" ], "execution_count": 0, "outputs": [] }, { "metadata": { - "id": "6_EQ9i2E_-Fz", - "colab_type": "text" - }, - "cell_type": "markdown", - "source": [ - "At this point the dataset contains serialized `tf.train.Example` messages. When iterated over it returns these as scalar string tensors. \n", - "\n", - "Use the `.take` method to only show the first 10 records.\n", - "\n", - "Note: iterating over a `tf.data.Dataset` only works with eager execution enabled." - ] - }, - { - "metadata": { - "id": "hxVXpLz_AJlm", + "id": "EjdFHHJMpUUo", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "for raw_record in raw_dataset.take(10):\n", - " print(repr(raw_record))" + "!ls" ], "execution_count": 0, "outputs": [] }, { "metadata": { - "id": "W-6oNzM4luFQ", + "id": "wtQ7k0YWQ1cz", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "These tensors can be parsed using the function below.\n", + "### Reading a TFRecord file" + ] + }, + { + "metadata": { + "id": "utkozytkQ-2K", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "Suppose we now want to read this data back, to be input as data into a model.\n", "\n", - "Note: The `feature_description` is necessary here because datasets use graph-execution, and need this description to build their shape and type signature." + "The following example imports the data as is, as a `tf.Example` message. This can be useful to verify that a the file contains the data that we expect. This can also be useful if the input data is stored as TFRecords but you would prefer to input NumPy data (or some other input data type), for example [here](https://www.tensorflow.org/guide/datasets#consuming_numpy_arrays), since this example allows us to read the values themselves.\n", + "\n", + "We iterate through the TFRecords in the infile, extract the `tf.Example` message, and can read/store the values within." ] }, { "metadata": { - "id": "zQjbIR1nleiy", + "id": "36ltP9B8OezA", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "# Create a description of the features. \n", - "feature_description = {\n", - " 'feature0': tf.FixedLenFeature([], tf.int64, default_value=0),\n", - " 'feature1': tf.FixedLenFeature([], tf.string, default_value=''),\n", - " 'feature2': tf.FixedLenFeature([], tf.int64, default_value=0),\n", - " 'feature3': tf.FixedLenFeature([], tf.float32, default_value=0.0),\n", - "}\n", + "record_iterator = tf.python_io.tf_record_iterator(path='test.tfrecords')\n", "\n", - "def _parse_function(example_proto):\n", - " # Parse the input tf.Example proto using the dictionary above.\n", - " return tf.parse_single_example(example_proto, feature_description)" + "for string_record in record_iterator:\n", + " example = tf.train.Example()\n", + " example.ParseFromString(string_record)\n", + " \n", + " print(example)\n", + " \n", + " # Exit after 1 iteration as this is purely demonstrative.\n", + " break" ], "execution_count": 0, "outputs": [] }, { "metadata": { - "id": "gWETjUqhEQZf", + "id": "i3uquiiGTZTK", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "Or use `tf.parse example` to parse a whole batch at once." + "The features of the `example` object (created above of type `tf.Example`) can be accessed using its getters (similarly to any protocol buffer message). `example.features` returns a `repeated feature` message, then getting the `feature` message returns a map of feature name to feature value (stored in Python as a dictionary)." ] }, { "metadata": { - "id": "AH73hav6Bnmg", + "id": "-UNzS7vsUBs0", + "colab_type": "code", + "colab": {} + }, + "cell_type": "code", + "source": [ + "print(dict(example.features.feature))" + ], + "execution_count": 0, + "outputs": [] + }, + { + "metadata": { + "id": "u1M-WrbqUUVW", "colab_type": "text" }, "cell_type": "markdown", "source": [ - "Apply this finction to each item in the dataset using the `tf.data.Dataset.map` method:" + "From this dictionary, you can get any given value as with a dictionary." ] }, { "metadata": { - "id": "6Ob7D-zmBm1w", + "id": "2yCBu70IUb2H", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "parsed_dataset = raw_dataset.map(_parse_function)\n", - "parsed_dataset " + "print(example.features.feature['feature3'])" ], "execution_count": 0, "outputs": [] }, { "metadata": { - "colab_type": "text", - "id": "sNV-XclGnOvn" + "id": "4dw6_OI9UiNZ", + "colab_type": "text" }, "cell_type": "markdown", "source": [ - "Now, we can use eager execution to display the observations in the dataset. Note that there are 10,000 observations in this dataset, but we only display the first 10. The data is displayed as a dictionary of features. Each item is a `tf.Tensor`, and the `numpy` element of this tensor displays the value of the feature." + "Now, we can access the value using the getters again." ] }, { "metadata": { - "id": "x2LT2JCqhoD_", + "id": "BdDYjDnDUlFe", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ - "for parsed_record in parsed_dataset.take(10):\n", - " print(repr(raw_record))" + "print(example.features.feature['feature3'].float_list.value)" ], "execution_count": 0, "outputs": [] From bf27fbda2424d90193ab1c29f8c5bdbdc3006d1b Mon Sep 17 00:00:00 2001 From: Mark Daoust Date: Wed, 14 Nov 2018 08:37:10 -0800 Subject: [PATCH 4/7] Finished the section of writing a TFRecord file with `tf.data` --- site/en/tutorials/load_data/tf-records.ipynb | 92 +++++++++++++++++--- 1 file changed, 78 insertions(+), 14 deletions(-) diff --git a/site/en/tutorials/load_data/tf-records.ipynb b/site/en/tutorials/load_data/tf-records.ipynb index 9a908d4ac96..fb8aa28d29e 100644 --- a/site/en/tutorials/load_data/tf-records.ipynb +++ b/site/en/tutorials/load_data/tf-records.ipynb @@ -526,7 +526,7 @@ }, "cell_type": "code", "source": [ - "# Just take one element from the dataset.\n", + "# Use `take(1)` to only pull one example from the dataset.\n", "for f0,f1,f2,f3 in features_dataset.take(1):\n", " print(f0)\n", " print(f1)\n", @@ -545,7 +545,9 @@ "source": [ "Use the `tf.data.Dataset.map` method to apply a function to each element of a `Dataset`.\n", "\n", - "The mapped function must be a TensorFlow-aware function. To make `create_example` TensorFlow-aware, wrap it with py_func:" + "The mapped function must operate in TensorFlow graph mode: It must operate on and return `tf.Tensors`. A non-tensor function, like `create_example`, can be wrapped with `tf.py_func` to make it compatible.\n", + "\n", + "Using `tf.py_func` requires that you specify the shape and type information that is otherwise unavailable:" ] }, { @@ -557,7 +559,60 @@ "cell_type": "code", "source": [ "def tf_serialize_example(f0,f1,f2,f3):\n", - " return tf.py_func(serialize_example, (f0,f1,f2,f3), tf.string)" + " tf_string = tf.py_func(\n", + " serialize_example, \n", + " (f0,f1,f2,f3), # pass these args to the above function.\n", + " tf.string) # the return type is `tf.string`.\n", + " return tf.reshape(tf_string, ()) # The result is a scalar" + ], + "execution_count": 0, + "outputs": [] + }, + { + "metadata": { + "id": "CrFZ9avE3HUF", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "Apply this function to each element in the dataset:" + ] + }, + { + "metadata": { + "id": "VDeqYVbW3ww9", + "colab_type": "code", + "colab": {} + }, + "cell_type": "code", + "source": [ + "serialized_features_dataset = features_dataset.map(tf_serialize_example)\n", + "serialized_features_dataset" + ], + "execution_count": 0, + "outputs": [] + }, + { + "metadata": { + "id": "p6lw5VYpjZZC", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "And write them to a TFRecord file:" + ] + }, + { + "metadata": { + "id": "vP1VgTO44UIE", + "colab_type": "code", + "colab": {} + }, + "cell_type": "code", + "source": [ + "filename = 'test.tfrecord'\n", + "writer = tf.data.experimental.TFRecordWriter(filename)\n", + "writer.write(serialized_features_dataset)" ], "execution_count": 0, "outputs": [] @@ -583,7 +638,7 @@ "\n", "More information on consuming TFRecord files using `tf.data` can be found [here](https://www.tensorflow.org/guide/datasets#consuming_tfrecord_data). \n", "\n", - "Using this dataset structure can be useful for standardizing input data and optimizing performance. It is also easier and quicker to use this object." + "Using `TFRecordDataset`s can be useful for standardizing input data and optimizing performance." ] }, { @@ -594,7 +649,7 @@ }, "cell_type": "code", "source": [ - "filenames = ['test.tfrecords']\n", + "filenames = [filename]\n", "raw_dataset = tf.data.TFRecordDataset(filenames)\n", "raw_dataset" ], @@ -652,8 +707,8 @@ "# Create a description of the features. \n", "feature_description = {\n", " 'feature0': tf.FixedLenFeature([], tf.int64, default_value=0),\n", - " 'feature1': tf.FixedLenFeature([], tf.string, default_value=''),\n", - " 'feature2': tf.FixedLenFeature([], tf.int64, default_value=0),\n", + " 'feature1': tf.FixedLenFeature([], tf.int64, default_value=0),\n", + " 'feature2': tf.FixedLenFeature([], tf.string, default_value=''),\n", " 'feature3': tf.FixedLenFeature([], tf.float32, default_value=0.0),\n", "}\n", "\n", @@ -722,6 +777,16 @@ "execution_count": 0, "outputs": [] }, + { + "metadata": { + "id": "Cig9EodTlDmg", + "colab_type": "text" + }, + "cell_type": "markdown", + "source": [ + "Note how the `tf.parse_example` function unpacks the `tf.Example` fields into standard tensors." + ] + }, { "metadata": { "id": "jyg1g3gU7DNn", @@ -739,7 +804,7 @@ }, "cell_type": "markdown", "source": [ - "The `tf.python_io` module contains python functions for reading and writing TFRecord files. " + "The `tf.python_io` module also contains pure-python functions for reading and writing TFRecord files. " ] }, { @@ -770,12 +835,11 @@ }, "cell_type": "code", "source": [ - "# Write the tf.Example observations to test.tfrecords.\n", - "\n", - "with tf.python_io.TFRecordWriter('test.tfrecords') as writer:\n", + "# Write the `tf.Example` observations to the file.\n", + "with tf.python_io.TFRecordWriter(filename) as writer:\n", " for i in range(n_observations):\n", - " example = create_example([feature0[i], feature1[i], feature2[i], feature3[i]])\n", - " writer.write(example.SerializeToString())" + " example = serialize_example(feature0[i], feature1[i], feature2[i], feature3[i])\n", + " writer.write(example)" ], "execution_count": 0, "outputs": [] @@ -825,7 +889,7 @@ }, "cell_type": "code", "source": [ - "record_iterator = tf.python_io.tf_record_iterator(path='test.tfrecords')\n", + "record_iterator = tf.python_io.tf_record_iterator(path=filename)\n", "\n", "for string_record in record_iterator:\n", " example = tf.train.Example()\n", From 2b74d7b20098a744726eff72cc1f618918718993 Mon Sep 17 00:00:00 2001 From: Mark Daoust Date: Wed, 14 Nov 2018 08:49:28 -0800 Subject: [PATCH 5/7] Image attribution. --- site/en/tutorials/load_data/tf-records.ipynb | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/site/en/tutorials/load_data/tf-records.ipynb b/site/en/tutorials/load_data/tf-records.ipynb index fb8aa28d29e..c1c481eb538 100644 --- a/site/en/tutorials/load_data/tf-records.ipynb +++ b/site/en/tutorials/load_data/tf-records.ipynb @@ -94,7 +94,7 @@ "cell_type": "markdown", "source": [ "\n", - "This notebook will demonstrate how to create, parse, and use the `tf.Example` message, and then store, read, and write `tf.Example` messages to `.tfrecord` files.\n", + "This notebook will demonstrate how to create, parse, and use the `tf.Example` message, and then serialize, write, and read `tf.Example` messages to and from `.tfrecord` files.\n", "\n", "Note: While useful, using these structures is ultimately optional. There is no need to convert existing code to use TFRecords, unless you are using [`tf.data`](https://www.tensorflow.org/guide/datasets) and reading data is still the bottleneck to training. See [Data Input Pipeline Performance](https://www.tensorflow.org/guide/performance/datasets) for dataset performance tips." ] @@ -1028,7 +1028,8 @@ }, "cell_type": "code", "source": [ - "display.Image(filename=cat_in_snow)" + "display.display(display.Image(filename=cat_in_snow))\n", + "display.display(display.HTML('Image cc-by: Von.grzanka'))" ], "execution_count": 0, "outputs": [] @@ -1041,7 +1042,8 @@ }, "cell_type": "code", "source": [ - "display.Image(filename=williamsburg_bridge)" + "display.display(display.Image(filename=williamsburg_bridge))\n", + "display.display(display.HTML('source'))" ], "execution_count": 0, "outputs": [] From 5f22f74f323d092550ea71e9bd0daa76f751a69c Mon Sep 17 00:00:00 2001 From: Mark Daoust Date: Thu, 29 Nov 2018 13:19:41 -0800 Subject: [PATCH 6/7] py3 --- site/en/tutorials/load_data/tf-records.ipynb | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/site/en/tutorials/load_data/tf-records.ipynb b/site/en/tutorials/load_data/tf-records.ipynb index c1c481eb538..74975f0ac13 100644 --- a/site/en/tutorials/load_data/tf-records.ipynb +++ b/site/en/tutorials/load_data/tf-records.ipynb @@ -9,6 +9,10 @@ "private_outputs": true, "collapsed_sections": [], "toc_visible": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" } }, "cells": [ @@ -247,8 +251,8 @@ }, "cell_type": "code", "source": [ - "print(_bytes_feature('test_string'))\n", - "print(_bytes_feature(bytes('test_bytes')))\n", + "print(_bytes_feature(b'test_string'))\n", + "print(_bytes_feature(u'test_bytes'.encode('utf-8')))\n", "\n", "print(_float_feature(np.exp(1)))\n", "\n", @@ -345,7 +349,7 @@ "feature1 = np.random.randint(0, 5, n_observations)\n", "\n", "# bytes feature\n", - "strings = np.array(['cat','dog','chicken','horse','goat'])\n", + "strings = np.array([b'cat', b'dog', b'chicken', b'horse', b'goat'])\n", "feature2 = strings[feature1]\n", "\n", "# float feature, from a standard normal distribution\n", @@ -417,7 +421,7 @@ "\n", "example_observation = []\n", "\n", - "serialized_example = serialize_example(False, 4, bytes('goat'), 0.9876)\n", + "serialized_example = serialize_example(False, 4, b'goat', 0.9876)\n", "serialized_example" ], "execution_count": 0, From 33fd058b4147f437062e593f3a5d9970c6d51eef Mon Sep 17 00:00:00 2001 From: Billy Lamberta Date: Thu, 29 Nov 2018 15:04:05 -0800 Subject: [PATCH 7/7] a few updates --- site/en/tutorials/load_data/tf-records.ipynb | 24 ++++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/site/en/tutorials/load_data/tf-records.ipynb b/site/en/tutorials/load_data/tf-records.ipynb index 74975f0ac13..3e4cec99b26 100644 --- a/site/en/tutorials/load_data/tf-records.ipynb +++ b/site/en/tutorials/load_data/tf-records.ipynb @@ -87,7 +87,7 @@ "\n", "Protocol messages are defined by `.proto` files, these are often the easiest way to understand a message type. \n", "\n", - "The `tf.Example` message (or protobuf) is a flexible message type that represents a `{\"string\": value}` mapping. It is designed for use with TensorFlow, and is used throughout the higher-level APIs such as [TFX](https://www.tensorflow.org/tfx/)." + "The `tf.Example` message (or protobuf) is a flexible message type that represents a `{\"string\": value}` mapping. It is designed for use with TensorFlow and is used throughout the higher-level APIs such as [TFX](https://www.tensorflow.org/tfx/)." ] }, { @@ -100,7 +100,7 @@ "\n", "This notebook will demonstrate how to create, parse, and use the `tf.Example` message, and then serialize, write, and read `tf.Example` messages to and from `.tfrecord` files.\n", "\n", - "Note: While useful, using these structures is ultimately optional. There is no need to convert existing code to use TFRecords, unless you are using [`tf.data`](https://www.tensorflow.org/guide/datasets) and reading data is still the bottleneck to training. See [Data Input Pipeline Performance](https://www.tensorflow.org/guide/performance/datasets) for dataset performance tips." + "Note: While useful, these structures are optional. There is no need to convert existing code to use TFRecords, unless you are using [`tf.data`](https://www.tensorflow.org/guide/datasets) and reading data is still the bottleneck to training. See [Data Input Pipeline Performance](https://www.tensorflow.org/guide/performance/datasets) for dataset performance tips." ] }, { @@ -192,9 +192,9 @@ }, "cell_type": "markdown", "source": [ - "In order to convert a standard TensorFlow type to a `tf.Example`-compatible `tf.train.Feature`, we can use the following shortcut functions:\n", + "In order to convert a standard TensorFlow type to a `tf.Example`-compatible `tf.train.Feature`, you can use the following shortcut functions:\n", "\n", - "Each function takes a scalar input value and returns a `tf.train.Feature` containing one of the 3 `list` types above." + "Each function takes a scalar input value and returns a `tf.train.Feature` containing one of the three `list` types above." ] }, { @@ -480,7 +480,7 @@ "source": [ "### Writing a TFRecord file\n", "\n", - "The easiest wayt to get the data into a dataset is to use the `from_tensor_slices` method.\n", + "The easiest way to get the data into a dataset is to use the `from_tensor_slices` method.\n", "\n", "Applied to an array, it returns a dataset of scalars." ] @@ -764,7 +764,7 @@ }, "cell_type": "markdown", "source": [ - "Now, we can use eager execution to display the observations in the dataset. Note that there are 10,000 observations in this dataset, but we only display the first 10. The data is displayed as a dictionary of features. Each item is a `tf.Tensor`, and the `numpy` element of this tensor displays the value of the feature." + "Use eager execution to display the observations in the dataset. There are 10,000 observations in this dataset, but we only display the first 10. The data is displayed as a dictionary of features. Each item is a `tf.Tensor`, and the `numpy` element of this tensor displays the value of the feature." ] }, { @@ -788,7 +788,7 @@ }, "cell_type": "markdown", "source": [ - "Note how the `tf.parse_example` function unpacks the `tf.Example` fields into standard tensors." + "Here, the `tf.parse_example` function unpacks the `tf.Example` fields into standard tensors." ] }, { @@ -808,7 +808,7 @@ }, "cell_type": "markdown", "source": [ - "The `tf.python_io` module also contains pure-python functions for reading and writing TFRecord files. " + "The `tf.python_io` module also contains pure-Python functions for reading and writing TFRecord files. " ] }, { @@ -828,7 +828,7 @@ }, "cell_type": "markdown", "source": [ - "We now write the 10,000 observations to the file `test.tfrecords`. Each observation is converted to a `tf.Example` message, then written to file. We can then verify that the file `test.tfrecords` has been created." + "Now write the 10,000 observations to the file `test.tfrecords`. Each observation is converted to a `tf.Example` message, then written to file. We can then verify that the file `test.tfrecords` has been created." ] }, { @@ -997,7 +997,7 @@ "\n", "This can be useful if, for example, you want to use several models on the same input dataset. Instead of storing the image data raw, it can be preprocessed into the TFRecords format, and that can be used in all further processing and modelling. \n", "\n", - "First, let's download [this](https://commons.wikimedia.org/wiki/File:Felis_catus-cat_on_snow.jpg) adorable image of a cat in the snow, and [this](https://upload.wikimedia.org/wikipedia/commons/f/fe/New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg) awesome picture of the Williamsburg Bridge, NYC under construction." + "First, let's download [this image](https://commons.wikimedia.org/wiki/File:Felis_catus-cat_on_snow.jpg) of a cat in the snow and [this photo](https://upload.wikimedia.org/wikipedia/commons/f/fe/New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg) of the Williamsburg Bridge, NYC under construction." ] }, { @@ -1214,7 +1214,7 @@ }, "cell_type": "markdown", "source": [ - "Now we can recover the images from the TFRecord file:" + "Recover the images from the TFRecord file:" ] }, { @@ -1233,4 +1233,4 @@ "outputs": [] } ] -} \ No newline at end of file +}