##### Copyright 2019 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Estimator

<table class="tfo-notebook-buttons" align="left">
  <td><a target="_blank" href="https://tensorflow.google.cn/guide/estimator"><img src="https://tensorflow.google.cn/images/tf_logo_32px.png">在 TensorFlow.org 上查看</a></td>
  <td><a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/guide/estimator.ipynb"><img src="https://tensorflow.google.cn/images/colab_logo_32px.png">在 Google Colab 中运行</a></td>
  <td><a target="_blank" href="https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/guide/estimator.ipynb"><img src="https://tensorflow.google.cn/images/GitHub-Mark-32px.png">在 GitHub 上查看源代码</a></td>
  <td><a href="https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/guide/estimator.ipynb"><img src="https://tensorflow.google.cn/images/download_logo_32px.png">下载笔记本</a></td>
</table>

> Warning: Estimators are not recommended for new code.  Estimators run `v1.Session`-style code which is more difficult to write correctly, and can behave unexpectedly, especially when combined with TF 2 code. Estimators do fall under our [compatibility guarantees](https://tensorflow.org/guide/versions), but will receive no fixes other than security vulnerabilities. See the [migration guide](https://tensorflow.org/guide/migrate) for details.

本文档介绍了 `tf.estimator`，它是一种高级 TensorFlow API。Estimator 封装了以下操作：

- 训练
- 评估
- 预测
- 导出以供使用

您可以使用我们提供的预制 Estimator 或编写您自己的自定义 Estimator。所有 Estimator（无论是预制还是自定义）都是基于 <code>tf.estimator.Estimator</code> 类的类。

有关简单示例，请查看 [Estimator 教程](../tutorials/estimator/linear.ipynb)。有关 API 设计概述，请参阅[白皮书](https://arxiv.org/abs/1708.02637)。

## 设置

In [None]:
!pip install -U tensorflow_datasets

In [None]:
import tempfile
import os

import tensorflow as tf
import tensorflow_datasets as tfds

## 优势

与 `tf.keras.Model` 类似，`estimator` 是模型级别的抽象。`tf.estimator` 提供了一些目前仍在为 `tf.keras` 开发中的功能。包括：

- 基于参数服务器的训练
- 完整的 [TFX](http://tensorflow.org/tfx) 集成

## Estimator 功能

Estimator 提供了以下优势：

- 您可以在本地主机上或分布式多服务器环境中运行基于 Estimator 的模型，而无需更改模型。此外，您还可以在 CPU、GPU 或 TPU 上运行基于 Estimator 的模型，而无需重新编码模型。
- Estimator 提供了安全的分布式训练循环，可控制如何以及何时进行以下操作：
    - 加载数据
    - 处理异常
    - 创建检查点文件并从故障中恢复
    - 保存 TensorBoard 摘要

在用 Estimator 编写应用时，您必须将数据输入流水线与模型分离。这种分离简化了使用不同数据集进行的实验。

## 预制 Estimator 程序结构

使用预制 Estimator，您能够在比基础 TensorFlow API 高很多的概念层面上工作。您无需再担心创建计算图或会话，因为 Estimator 会替您完成所有“基础工作”。此外，使用预制 Estimator，您只需改动较少代码就能试验不同的模型架构。例如，`tf.estimator.DNNClassifier` 是一个预制 Estimator 类，可基于密集的前馈神经网络对分类模型进行训练。

依赖于预制 Estimator 的 TensorFlow 程序通常包括以下四个步骤：

### 1. 编写一个或多个数据集导入函数。

例如，您可以创建一个函数来导入训练集，创建另一个函数来导入测试集。每个数据集导入函数必须返回以下两个对象：

- 字典，其中键是特征名称，值是包含相应特征数据的张量（或 SparseTensor）
- 包含一个或多个标签的张量

The `input_fn` should return a `tf.data.Dataset` that yields pairs in that format.

例如，以下代码展示了输入函数的基本框架：

In [None]:
def train_input_fn():
  titanic_file = tf.keras.utils.get_file("train.csv", "https://storage.googleapis.com/tf-datasets/titanic/train.csv")
  titanic = tf.data.experimental.make_csv_dataset(
      titanic_file, batch_size=32,
      label_name="survived")
  titanic_batches = (
      titanic.cache().repeat().shuffle(500)
      .prefetch(tf.data.AUTOTUNE))
  return titanic_batches

有关详细信息，请参阅<a>数据指南</a>。

### 2. 定义特征列。

每个 `tf.feature_column` 标识了特征名称、特征类型，以及任何输入预处理。例如，以下代码段创建了三个包含整数或浮点数据的特征列。前两个特征列仅标识了特征的名称和类型。第三个特征列还指定了一个会被程序调用以缩放原始数据的 lambda：

For example, the following snippet creates three feature columns.

- The first uses the `age` feature directly as a floating-point input.
- The second uses the `class` feature as a categorical input.
- The third uses the `embark_town` as a categorical input, but uses the `hashing trick` to avoid the need to enumerate the options, and to set the number of options.

有关详细信息，请参阅[特征列教程](https://www.tensorflow.org/tutorials/keras/feature_columns)。

In [None]:
# Define three numeric feature columns. population = tf.feature_column.numeric_column('population') crime_rate = tf.feature_column.numeric_column('crime_rate') median_education = tf.feature_column.numeric_column(   'median_education',   normalizer_fn=lambda x: x - global_education_mean)


### 3. 实例化相关预制 Estimator。

For example, here's a sample instantiation of a pre-made Estimator named `LinearClassifier`:

In [None]:
# Instantiate an estimator, passing the feature columns. estimator = tf.estimator.LinearClassifier(   feature_columns=[population, crime_rate, median_education])


有关详细信息，请参阅[线性分类器教程](https://www.tensorflow.org/tutorials/estimator/linear)。

### 4. 调用训练、评估或推断方法。

例如，所有 Estimator 都会提供一个用于训练模型的 `train` 方法。


In [None]:
# `input_fn` is the function created in Step 1 estimator.train(input_fn=my_training_set, steps=2000)


In [None]:
result = model.evaluate(train_input_fn, steps=10)

for key, value in result.items():
  print(key, ":", value)

In [None]:
您可以在下面看到与此相关的示例。

### 预制 Estimator 的优势

预制 Estimator 对最佳做法进行了编码，具有以下优势：

- 确定计算图不同部分的运行位置，以及在单台机器或集群上实施策略的最佳做法。
- 事件（摘要）编写和通用摘要的最佳做法。

如果不使用预制 Estimator，则您必须自己实现上述功能。

## 自定义 Estimator

每个 Estimator（无论预制还是自定义）的核心是其*模型函数*，这是一种为训练、评估和预测构建计算图的方法。当您使用预制 Estimator 时，已经有人为您实现了模型函数。当使用自定义 Estimator 时，您必须自己编写模型函数。

> Note: A custom `model_fn` will still run in 1.x-style graph mode. This means there is no eager execution and no automatic control dependencies. You should plan to migrate away from `tf.estimator` with custom `model_fn`. The alternative APIs are `tf.keras` and `tf.distribute`. If you still need an `Estimator` for some part of your training you can use the `tf.keras.estimator.model_to_estimator` converter to create an `Estimator` from a `keras.Model`.

## 推荐工作流

您可以使用 `tf.keras.estimator.model_to_estimator` 将现有的 Keras 模型转换为 Estimator。这样一来，您的 Keras 模型就可以利用 Estimator 的优势，例如分布式训练。

实例化 Keras MobileNet V2 模型并用训练中使用的优化器、损失和指标来编译模型：

In [None]:
keras_mobilenet_v2 = tf.keras.applications.MobileNetV2(
    input_shape=(160, 160, 3), include_top=False)
keras_mobilenet_v2.trainable = False

estimator_model = tf.keras.Sequential([
    keras_mobilenet_v2,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(1)
])

# Compile the model
estimator_model.compile(
    optimizer='adam',
    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
    metrics=['accuracy'])

从已编译的 Keras 模型创建 `Estimator`。Keras 模型的初始模型状态会保留在已创建的 `Estimator`中：

In [None]:
est_mobilenet_v2 = tf.keras.estimator.model_to_estimator(keras_model=estimator_model)

您可以像对待任何其他 `Estimator` 一样对待派生的 `Estimator`。

In [None]:
IMG_SIZE = 160  # All images will be resized to 160x160

def preprocess(image, label):
  image = tf.cast(image, tf.float32)
  image = (image/127.5) - 1
  image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
  return image, label

In [None]:
def train_input_fn(batch_size):
  data = tfds.load('cats_vs_dogs', as_supervised=True)
  train_data = data['train']
  train_data = train_data.map(preprocess).shuffle(500).batch(batch_size)
  return train_data

要进行训练，可调用 Estimator 的训练函数：

In [None]:
est_mobilenet_v2.train(input_fn=lambda: train_input_fn(32), steps=500)

同样，要进行评估，可调用 Estimator 的评估函数：

In [None]:
est_mobilenet_v2.evaluate(input_fn=lambda: train_input_fn(32), steps=10)

有关详细信息，请参阅 `tf.keras.estimator.model_to_estimator` 文档。

## 从 Keras 模型创建 Estimator

Estimators by default save checkpoints with variable names rather than the object graph described in the [Checkpoint guide](checkpoint.ipynb). `tf.train.Checkpoint` will read name-based checkpoints, but variable names may change when moving parts of a model outside of the Estimator's `model_fn`. For forwards compatibility saving object-based checkpoints makes it easier to train a model inside an Estimator and then use it outside of one.

In [None]:
import tensorflow as tf

In [None]:
import tensorflow_datasets as tfds
tfds.disable_progress_bar()

In [None]:
class Net(tf.keras.Model):
  """A simple linear model."""

  def __init__(self):
    super(Net, self).__init__()
    self.l1 = tf.keras.layers.Dense(5)

  def call(self, x):
    return self.l1(x)

In [None]:
def model_fn(features, labels, mode):
  net = Net()
  opt = tf.keras.optimizers.Adam(0.1)
  ckpt = tf.train.Checkpoint(step=tf_compat.train.get_global_step(),
                             optimizer=opt, net=net)
  with tf.GradientTape() as tape:
    output = net(features['x'])
    loss = tf.reduce_mean(tf.abs(output - features['y']))
  variables = net.trainable_variables
  gradients = tape.gradient(loss, variables)
  return tf.estimator.EstimatorSpec(
    mode,
    loss=loss,
    train_op=tf.group(opt.apply_gradients(zip(gradients, variables)),
                      ckpt.step.assign_add(1)),
    # Tell the Estimator to save "ckpt" in an object-based format.
    scaffold=tf_compat.train.Scaffold(saver=ckpt))

tf.keras.backend.clear_session()
est = tf.estimator.Estimator(model_fn, './tf_estimator_example/')
est.train(toy_dataset, steps=10)

`tf.train.Checkpoint` can then load the Estimator's checkpoints from its `model_dir`.

In [None]:
opt = tf.keras.optimizers.Adam(0.1)
net = Net()
ckpt = tf.train.Checkpoint(
  step=tf.Variable(1, dtype=tf.int64), optimizer=opt, net=net)
ckpt.restore(tf.train.latest_checkpoint('./tf_estimator_example/'))
ckpt.step.numpy()  # From est.train(..., steps=10)

## SavedModels from Estimators

Estimators export SavedModels through `tf.Estimator.export_saved_model`.

In [None]:
input_column = tf.feature_column.numeric_column("x")

estimator = tf.estimator.LinearClassifier(feature_columns=[input_column])

def input_fn():
  return tf.data.Dataset.from_tensor_slices(
    ({"x": [1., 2., 3., 4.]}, [1, 1, 0, 0])).repeat(200).shuffle(64).batch(16)
estimator.train(input_fn)

To save an `Estimator` you need to create a `serving_input_receiver`. This function builds a part of a `tf.Graph` that parses the raw data received by the SavedModel. 

The `tf.estimator.export` module contains functions to help build these `receivers`.


The following code builds a receiver, based on the `feature_columns`, that accepts serialized `tf.Example` protocol buffers, which are often used with [tf-serving](https://tensorflow.org/serving).

In [None]:
tmpdir = tempfile.mkdtemp()

serving_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
  tf.feature_column.make_parse_example_spec([input_column]))

estimator_base_path = os.path.join(tmpdir, 'from_estimator')
estimator_path = estimator.export_saved_model(estimator_base_path, serving_input_fn)

You can also load and run that model, from python:

In [None]:
imported = tf.saved_model.load(estimator_path)

def predict(x):
  example = tf.train.Example()
  example.features.feature["x"].float_list.value.extend([x])
  return imported.signatures["predict"](
    examples=tf.constant([example.SerializeToString()]))

In [None]:
print(predict(1.5))
print(predict(3.5))

`tf.estimator.export.build_raw_serving_input_receiver_fn` allows you to create input functions which take raw tensors rather than `tf.train.Example`s.

## Using `tf.distribute.Strategy` with Estimator (Limited support)

`tf.estimator` is a distributed training TensorFlow API that originally supported the async parameter server approach. `tf.estimator` now supports `tf.distribute.Strategy`. If you're using `tf.estimator`, you can change to distributed training with very few changes to your code. With this, Estimator users can now do synchronous distributed training on multiple GPUs and multiple workers, as well as use TPUs. This support in Estimator is, however, limited. Check out the [What's supported now](#estimator_support) section below for more details.

Using `tf.distribute.Strategy` with Estimator is slightly different than in the Keras case. Instead of using `strategy.scope`, now you pass the strategy object into the `RunConfig` for the Estimator.

You can refer to the [distributed training guide](distributed_training.ipynb) for more information.

Here is a snippet of code that shows this with a premade Estimator `LinearRegressor` and `MirroredStrategy`:


In [None]:
mirrored_strategy = tf.distribute.MirroredStrategy()
config = tf.estimator.RunConfig(
    train_distribute=mirrored_strategy, eval_distribute=mirrored_strategy)
regressor = tf.estimator.LinearRegressor(
    feature_columns=[tf.feature_column.numeric_column('feats')],
    optimizer='SGD',
    config=config)

Here, you use a premade Estimator, but the same code works with a custom Estimator as well. `train_distribute` determines how training will be distributed, and `eval_distribute` determines how evaluation will be distributed. This is another difference from Keras where you use the same strategy for both training and eval.

Now you can train and evaluate this Estimator with an input function:


In [None]:
def input_fn(dataset):     ...  # manipulate dataset, extracting the feature dict and the label     return feature_dict, label


Another difference to highlight here between Estimator and Keras is the input handling. In Keras, each batch of the dataset is split automatically across the multiple replicas. In Estimator, however, you do not perform automatic batch splitting, nor automatically shard the data across different workers. You have full control over how you want your data to be distributed across workers and devices, and you must provide an `input_fn` to specify how to distribute your data.

Your `input_fn` is called once per worker, thus giving one dataset per worker. Then one batch from that dataset is fed to one replica on that worker, thereby consuming N batches for N replicas on 1 worker. In other words, the dataset returned by the `input_fn` should provide batches of size `PER_REPLICA_BATCH_SIZE`. And the global batch size for a step can be obtained as `PER_REPLICA_BATCH_SIZE * strategy.num_replicas_in_sync`.

When performing multi-worker training, you should either split your data across the workers, or shuffle with a random seed on each. You can check an example of how to do this in the [Multi-worker training with Estimator](../tutorials/distribute/multi_worker_with_estimator.ipynb) tutorial.

And similarly, you can use multi worker and parameter server strategies as well. The code remains the same, but you need to use `tf.estimator.train_and_evaluate`, and set `TF_CONFIG` environment variables for each binary running in your cluster.

<a name="estimator_support"></a>

### What's supported now?

There is limited support for training with Estimator using all strategies except `TPUStrategy`. Basic training and evaluation should work, but a number of advanced features such as `v1.train.Scaffold` do not. There may also be a number of bugs in this integration and there are no plans to actively improve this support (the focus is on Keras and custom training loop support). If at all possible, you should prefer to use `tf.distribute` with those APIs instead.

Training API | MirroredStrategy | TPUStrategy | MultiWorkerMirroredStrategy | CentralStorageStrategy | ParameterServerStrategy
:-- | :-- | :-- | :-- | :-- | :--
Estimator API | Limited support | Not supported | Limited support | Limited support | Limited support

### Examples and tutorials

如果可能，您可以通过构建自己的自定义 Estimator 进一步改进模型。

1. [使用 Estimator 进行多工作进程训练教程](../tutorials/distribute/multi_worker_with_estimator.ipynb)展示了如何在 MNIST 数据集上使用 `MultiWorkerMirroredStrategy` 在多个工作进程上一起训练。
2. 使用 Kubernetes 模板在 `tensorflow/ecosystem` 中[使用分布策略运行多工作进程训练](https://github.com/tensorflow/ecosystem/tree/master/distribution_strategy)的端到端示例。它从 Keras 模型开始，然后使用 `tf.keras.estimator.model_to_estimator` API  将其转换为 Estimator。
3. 如果有其他合适的预制 Estimator，可通过运行实验确定哪个预制 Estimator 能够生成最佳结果。