##### Copyright 2018 The TensorFlow Authors.


In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# 使用 TensorFlow 进行分布式训练

<table class="tfo-notebook-buttons" align="left">
  <td><a target="_blank" href="https://tensorflow.google.cn/guide/distributed_training" class=""><img src="https://tensorflow.google.cn/images/tf_logo_32px.png" class="">在 TensorFlow.org 上查看</a></td>
  <td><a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/guide/distributed_training.ipynb" class=""><img src="https://tensorflow.google.cn/images/colab_logo_32px.png">在 Google Colab 中运行 </a></td>
  <td><a target="_blank" href="https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/guide/distributed_training.ipynb" class=""><img src="https://tensorflow.google.cn/images/GitHub-Mark-32px.png">在 GitHub 上查看源代码</a></td>
  <td><a href="https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/guide/distributed_training.ipynb" class=""><img src="https://tensorflow.google.cn/images/download_logo_32px.png">下载笔记本</a></td>
</table>

## 概述

`tf.distribute.Strategy` 是一个可在多个 GPU、多台机器或 TPU 上进行分布式训练的 TensorFlow API。使用此 API，您只需改动较少代码就能分布现有模型和训练代码。

`tf.distribute.Strategy` 旨在实现以下目标：

- Easy to use and support multiple user segments, including researchers, machine learning engineers, etc.
- 提供开箱即用的良好性能。
- 轻松切换策略。

You can distribute training using `tf.distribute.Strategy` with a high-level API like Keras `Model.fit`, as well as [custom training loops](https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch) (and, in general, any computation using TensorFlow).

In TensorFlow 2.x, you can execute your programs eagerly, or in a graph using [`tf.function`](function.ipynb). `tf.distribute.Strategy` intends to support both these modes of execution, but works best with `tf.function`. Eager mode is only recommended for debugging purposes and not supported for `tf.distribute.TPUStrategy`. Although training is the focus of this guide, this API can also be used for distributing evaluation and prediction on different platforms.

您在使用 `tf.distribute.Strategy` 时只需改动少量代码，因为我们修改了 TensorFlow 的底层组件，使其可感知策略。这些组件包括变量、层、优化器、指标、摘要和检查点。

In this guide, you will learn about various types of strategies and how you can use them in different situations. To learn how to debug performance issues, check out the [Optimize TensorFlow GPU performance](gpu_performance_analysis.md) guide.

注：要更深入地了解这些概念，请观看深入演示 [Inside TensorFlow：`tf.distribute.Strategy`](https://youtu.be/jKV53r9-H14)。如果您打算编写自己的训练循环，则特别推荐这样做。


## Set up TensorFlow

In [None]:
import tensorflow as tf

## 策略类型

`tf.distribute.Strategy` 打算涵盖不同轴上的许多用例。目前已支持其中的部分组合，将来还会添加其他组合。其中一些轴包括：

- *同步和异步训练*：这是通过数据并行进行分布式训练的两种常用方法。在同步训练中，所有工作进程都同步地对输入数据的不同片段进行训练，并且会在每一步中聚合梯度。在异步训练中，所有工作进程都独立训练输入数据并异步更新变量。通常情况下，同步训练通过全归约实现，而异步训练通过参数服务器架构实现。
- *硬件平台*：您可能需要将训练扩展到一台机器上的多个 GPU 或一个网络中的多台机器（每台机器拥有 0 个或多个 GPU），或扩展到 Cloud TPU 上。

In order to support these use cases, TensorFlow has `MirroredStrategy`, `TPUStrategy`, `MultiWorkerMirroredStrategy`, `ParameterServerStrategy`, `CentralStorageStrategy`, as well as other strategies available. The next section explains which of these are supported in which scenarios in TensorFlow. Here is a quick overview:

训练 API | `MirroredStrategy` | `TPUStrategy` | `MultiWorkerMirroredStrategy` | `CentralStorageStrategy` | `ParameterServerStrategy`
:-- | :-- | :-- | :-- | :-- | :--
**Keras `Model.fit`** | 支持 | 支持 | Supported | 实验性支持 | 实验性支持
**自定义训练循环** | 支持 | 支持 | Supported | 实验性支持 | 实验性支持
**Estimator API** | 有限支持 | 不支持 | 有限支持 | 有限支持 | 有限支持

注：[实验性支持](https://tensorflow.google.cn/guide/versions#what_is_not_covered)指不保证该 API 的兼容性。

Warning: Estimator support is limited. Basic training and evaluation are experimental, and advanced features—such as scaffold—are not implemented. You should be using Keras or custom training loops if a use case is not covered. Estimators are not recommended for new code. Estimators run `v1.Session`-style code which is more difficult to write correctly, and can behave unexpectedly, especially when combined with TF 2 code. Estimators do fall under our [compatibility guarantees](https://tensorflow.org/guide/versions), but will receive no fixes other than security vulnerabilities. Go to the [migration guide](https://tensorflow.org/guide/migrate) for details.

### MirroredStrategy

`tf.distribute.MirroredStrategy` 支持在一台机器的多个 GPU 上进行同步分布式训练。该策略会为每个 GPU 设备创建一个副本。模型中的每个变量都会在所有副本之间进行镜像。这些变量将共同形成一个名为 `MirroredVariable` 的单个概念变量。这些变量会通过应用相同的更新彼此保持同步。

Efficient all-reduce algorithms are used to communicate the variable updates across the devices. All-reduce aggregates tensors across all the devices by adding them up, and makes them available on each device. It’s a fused algorithm that is very efficient and can reduce the overhead of synchronization significantly. There are many all-reduce algorithms and implementations available, depending on the type of communication available between devices. By default, it uses the NVIDIA Collective Communication Library ([NCCL](https://developer.nvidia.com/nccl)) as the all-reduce implementation. You can choose from a few other options or write your own.

以下是创建 `MirroredStrategy` 的最简单方式：

In [None]:
mirrored_strategy = tf.distribute.MirroredStrategy()

这会创建一个 `MirroredStrategy` 实例，该实例使用所有对 TensorFlow 可见的 GPU，并使用 NCCL 进行跨设备通信。

如果您只想使用机器上的部分 GPU，您可以这样做：

In [None]:
mirrored_strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])

如果您想重写跨设备通信，可以通过提供 `tf.distribute.CrossDeviceOps` 的实例，使用 `cross_device_ops` 参数来实现。目前，除了默认选项 `tf.distribute.NcclAllReduce` 外，还有 `tf.distribute.HierarchicalCopyAllReduce` 和 `tf.distribute.ReductionToOneDevice` 两个选项。

In [None]:
mirrored_strategy = tf.distribute.MirroredStrategy(
    cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())

### TPUStrategy

您可以使用 `tf.distribute.experimental.TPUStrategy` 在张量处理单元 (TPU) 上运行 TensorFlow 训练。TPU 是 Google 的专用 ASIC，旨在显著加速机器学习工作负载。您可通过 Google Colab、[TensorFlow Research Cloud](https://tensorflow.google.cn/tfrc) 和 [Cloud TPU](https://cloud.google.com/tpu) 平台进行使用。

就分布式训练架构而言，`TPUStrategy` 和 `MirroredStrategy` 是一样的，即实现同步分布式训练。TPU 会在多个 TPU 核心之间实现高效的全归约和其他集合运算，并将其用于 `TPUStrategy`。

下面演示了如何将 `TPUStrategy` 实例化：

Note: To run any TPU code in Colab, you should select TPU as the Colab runtime. Refer to the [Use TPUs](tpu.ipynb) guide for a complete example.

```python
cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(
    tpu=tpu_address)
tf.config.experimental_connect_to_cluster(cluster_resolver)
tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
tpu_strategy = tf.distribute.TPUStrategy(cluster_resolver)
```

`TPUClusterResolver` 实例可帮助定位 TPU。在 Colab 中，您无需为其指定任何参数。

If you want to use this for Cloud TPUs:

- 在 `tpu` 参数中指定 TPU 资源的名称。
- You must initialize the TPU system explicitly at the *start* of the program. This is required before TPUs can be used for computation. Initializing the TPU system also wipes out the TPU memory, so it's important to complete this step first in order to avoid losing state.

### MultiWorkerMirroredStrategy

`tf.distribute.MultiWorkerMirroredStrategy` is very similar to `MirroredStrategy`. It implements synchronous distributed training across multiple workers, each with potentially multiple GPUs. Similar to `tf.distribute.MirroredStrategy`, it creates copies of all variables in the model on each device across all workers.

Here is the simplest way of creating `MultiWorkerMirroredStrategy`:

In [None]:
strategy = tf.distribute.MultiWorkerMirroredStrategy()

`MultiWorkerMirroredStrategy` 有两种用于跨设备通信的实现。`CommunicationImplementation.RING` 基于 [RPC](https://en.wikipedia.org/wiki/Remote_procedure_call)，同时支持 CPU 和 GPU。`CommunicationImplementation.NCCL` 使用 NCCL 并在 GPU 上提供最先进的性能，但它不支持 CPU。`CollectiveCommunication.AUTO` 将选择权交给 Tensorflow。您可以通过以下方式指定它们：


In [None]:
communication_options = tf.distribute.experimental.CommunicationOptions(
    implementation=tf.distribute.experimental.CommunicationImplementation.NCCL)
strategy = tf.distribute.MultiWorkerMirroredStrategy(
    communication_options=communication_options)

One of the key differences to get multi worker training going, as compared to multi-GPU training, is the multi-worker setup. The `'TF_CONFIG'` environment variable is the standard way in TensorFlow to specify the cluster configuration to each worker that is part of the cluster. Learn more in the [setting up TF_CONFIG section](#TF_CONFIG) of this document.

For more details about `MultiWorkerMirroredStrategy`, consider the following tutorials:

- [Multi-worker training with Keras Model.fit](../tutorials/distribute/multi_worker_with_keras.ipynb)
- [Multi-worker training with a custom training loop](../tutorials/distribute/multi_worker_with_ctl.ipynb)

### ParameterServerStrategy

Parameter server training is a common data-parallel method to scale up model training on multiple machines. A parameter server training cluster consists of workers and parameter servers. Variables are created on parameter servers and they are read and updated by workers in each step. Check out the [Parameter server training](../tutorials/distribute/parameter_server_training.ipynb) tutorial for details.

In TensorFlow 2, parameter server training uses a central coordinator-based architecture via the `tf.distribute.experimental.coordinator.ClusterCoordinator` class.

In this implementation, the `worker` and `parameter server` tasks run `tf.distribute.Server`s that listen for tasks from the coordinator. The coordinator creates resources, dispatches training tasks, writes checkpoints, and deals with task failures.

In the programming running on the coordinator, you will use a `ParameterServerStrategy` object to define a training step and use a `ClusterCoordinator` to dispatch training steps to remote workers. Here is the simplest way to create them:

```python
strategy = tf.distribute.experimental.ParameterServerStrategy(
    tf.distribute.cluster_resolver.TFConfigClusterResolver(),
    variable_partitioner=variable_partitioner)
coordinator = tf.distribute.experimental.coordinator.ClusterCoordinator(
    strategy)
```

To learn more about `ParameterServerStrategy`, check out the [Parameter server training with Keras Model.fit and a custom training loop](../tutorials/distribute/parameter_server_training.ipynb) tutorial.

Note: You will need to configure the `'TF_CONFIG'` environment variable if you use `TFConfigClusterResolver`. It is similar to <a href="#TF_CONFIG" data-md-type="link">`'TF_CONFIG'`</a> in `MultiWorkerMirroredStrategy` but has additional caveats.

In TensorFlow 1, `ParameterServerStrategy` is available only with an Estimator via `tf.compat.v1.distribute.experimental.ParameterServerStrategy` symbol.

注：此策略是 [`experimental`](https://www.tensorflow.org/guide/versions#what_is_not_covered)，因为它目前正在进行积极开发。

### CentralStorageStrategy

`tf.distribute.experimental.CentralStorageStrategy` 也执行同步训练。变量不会被镜像，而是放在 CPU 上，且运算会复制到所有本地 GPU 。如果只有一个 GPU，则所有变量和运算都将被放在该 GPU 上。

请通过以下代码，创建 `CentralStorageStrategy` 实例：


In [None]:
central_storage_strategy = tf.distribute.experimental.CentralStorageStrategy()

这会创建一个 `CentralStorageStrategy` 实例，该实例将使用所有可见的 GPU 和 CPU。在副本上对变量的更新将先进行聚合，然后再应用于变量。

Note: This strategy is [`experimental`](https://www.tensorflow.org/guide/versions#what_is_not_covered), as it is currently a work in progress.

### 其他策略

除上述策略外，还有其他两种策略可能对使用 `tf.distribute` API 进行原型设计和调试有所帮助。

#### 默认策略

The Default Strategy is a distribution strategy which is present when no explicit distribution strategy is in scope. It implements the `tf.distribute.Strategy` interface but is a pass-through and provides no actual distribution. For instance, `Strategy.run(fn)` will simply call `fn`. Code written using this strategy should behave exactly as code written without any strategy. You can think of it as a "no-op" strategy.

The Default Strategy is a singleton—and one cannot create more instances of it. It can be obtained using `tf.distribute.get_strategy` outside any explicit strategy's scope (the same API that can be used to get the current strategy inside an explicit strategy's scope).

In [None]:
default_strategy = tf.distribute.get_strategy()

此策略有两个主要用途：

- It allows writing distribution-aware library code unconditionally. For example, in `tf.keras.optimizers` you can use `tf.distribute.get_strategy` and use that strategy for reducing gradients—it will always return a strategy object on which you can call the `Strategy.reduce` API.


In [None]:
# In optimizer or other library code
# Get currently active strategy
strategy = tf.distribute.get_strategy()
strategy.reduce("SUM", 1., axis=None)  # reduce some values

- 与库代码类似，它可用于编写最终用户的程序以便使用或不使用分布策略，而无需条件逻辑。下面是一个说明了这一点的示例代码段：

In [None]:
if tf.config.list_physical_devices('GPU'):
  strategy = tf.distribute.MirroredStrategy()
else:  # Use the Default Strategy
  strategy = tf.distribute.get_strategy()

with strategy.scope():
  # Do something interesting
  print(tf.Variable(1.))

#### OneDeviceStrategy

`tf.distribute.OneDeviceStrategy` 是一种会将所有变量和计算放在单个指定设备上的策略。

```python
strategy = tf.distribute.OneDeviceStrategy(device="/gpu:0")
```

This strategy is distinct from the Default Strategy in a number of ways. In the Default Strategy, the variable placement logic remains unchanged when compared to running TensorFlow without any distribution strategy. But when using `OneDeviceStrategy`, all variables created in its scope are explicitly placed on the specified device. Moreover, any functions called via `OneDeviceStrategy.run` will also be placed on the specified device.

通过此策略分布的输入将被预获取到指定设备。在默认策略中，没有输入分布。

与默认策略类似，在切换到实际分布到多个设备/机器的其他策略之前，也可以使用此策略来测试代码。这将比默认策略更多地使用分布策略机制，但不能像使用 `MirroredStrategy` 或 `TPUStrategy` 等策略那样充分发挥其作用。如果您想让代码表现地像没有策略，请使用默认策略。

So far you've learned about different strategies and how you can instantiate them. The next few sections show the different ways in which you can use them to distribute your training.

## Use tf.distribute.Strategy with Keras Model.fit

`tf.distribute.Strategy` is integrated into `tf.keras`, which is TensorFlow's implementation of the [Keras API specification](https://keras.io). `tf.keras` is a high-level API to build and train models. By integrating into the `tf.keras` backend, it's seamless for you to distribute your training written in the Keras training framework [using Model.fit](/keras/customizing_what_happens_in_fit.ipynb).

您需要对代码进行以下更改：

1. 创建一个合适的 `tf.distribute.Strategy` 实例。
2. Move the creation of Keras model, optimizer and metrics inside `strategy.scope`. Thus the code in the model's `call()`, `train_step()`, and `test_step()` methods will all be distributed and executed on the accelerator(s).

TensorFlow 分布策略支持所有类型的 Keras 模型 - 序贯、函数式和子类化。

Here is a snippet of code to do this for a very simple Keras model with one `Dense` layer:

In [None]:
mirrored_strategy = tf.distribute.MirroredStrategy()

with mirrored_strategy.scope():
  model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])

model.compile(loss='mse', optimizer='sgd')

This example uses `MirroredStrategy`, so you can run this on a machine with multiple GPUs. `strategy.scope()` indicates to Keras which strategy to use to distribute the training. Creating models/optimizers/metrics inside this scope allows you to create distributed variables instead of regular variables. Once this is set up, you can fit your model like you would normally. `MirroredStrategy` takes care of replicating the model's training on the available GPUs, aggregating gradients, and more.

In [None]:
dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(100).batch(10)
model.fit(dataset, epochs=2)
model.evaluate(dataset)

我们在这里使用了 `tf.data.Dataset` 来提供训练和评估输入。您还可以使用 Numpy 数组：

In [None]:
import numpy as np

inputs, targets = np.ones((100, 1)), np.ones((100, 1))
model.fit(inputs, targets, epochs=2, batch_size=10)

In both cases—with `Dataset` or NumPy—each batch of the given input is divided equally among the multiple replicas. For instance, if you are using the `MirroredStrategy` with 2 GPUs, each batch of size 10 will be divided among the 2 GPUs, with each receiving 5 input examples in each step. Each epoch will then train faster as you add more GPUs. Typically, you would want to increase your batch size as you add more accelerators, so as to make effective use of the extra computing power. You will also need to re-tune your learning rate, depending on the model. You can use `strategy.num_replicas_in_sync` to get the number of replicas.

In [None]:
mirrored_strategy.num_replicas_in_sync

In [None]:
# Compute a global batch size using a number of replicas.
BATCH_SIZE_PER_REPLICA = 5
global_batch_size = (BATCH_SIZE_PER_REPLICA *
                     mirrored_strategy.num_replicas_in_sync)
dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(100)
dataset = dataset.batch(global_batch_size)

LEARNING_RATES_BY_BATCH_SIZE = {5: 0.1, 10: 0.15, 20:0.175}
learning_rate = LEARNING_RATES_BY_BATCH_SIZE[global_batch_size]

### 目前支持的策略

训练 API | `MirroredStrategy` | `TPUStrategy` | `MultiWorkerMirroredStrategy` | `ParameterServerStrategy` | `CentralStorageStrategy`
--- | --- | --- | --- | --- | ---
Keras `Model.fit` | 支持 | 支持 | 实验性支持 | Experimental support | 实验性支持

### 示例和教程

Here is a list of tutorials and examples that illustrate the above integration end-to-end with Keras `Model.fit`:

1. [Tutorial](../tutorials/distribute/keras.ipynb): Training with `Model.fit` and `MirroredStrategy`.
2. [Tutorial](../tutorials/distribute/multi_worker_with_keras.ipynb): Training with `Model.fit` and `MultiWorkerMirroredStrategy`.
3. [Guide](tpu.ipynb): Contains an example of using `Model.fit` and `TPUStrategy`.
4. [Tutorial](../tutorials/distribute/parameter_server_training.ipynb): Parameter server training with `Model.fit` and `ParameterServerStrategy`.
5. [Tutorial](https://www.tensorflow.org/text/tutorials/bert_glue): Fine-tuning BERT for many tasks from the GLUE benchmark with `Model.fit` and `TPUStrategy`.
6. 包含使用各种策略实现的最先进模型集合的 TensorFlow Model Garden [仓库](https://github.com/tensorflow/models/tree/master/official)。

## 在自定义训练循环中使用 `tf.distribute.Strategy`

As demonstrated above, using `tf.distribute.Strategy` with Keras `Model.fit` requires changing only a couple lines of your code. With a little more effort, you can also use `tf.distribute.Strategy` [with custom training loops](/keras/writing_a_training_loop_from_scratch.ipynb).

如果您需要更多相对于使用 Estimator 或 Keras 时的灵活性和对训练循环的控制权，您可以编写自定义训练循环。例如，在使用 GAN 时，您可能会希望每轮使用不同数量的生成器或判别器步骤。同样，高级框架也不太适合强化学习训练。

The `tf.distribute.Strategy` classes provide a core set of methods to support custom training loops. Using these may require minor restructuring of the code initially, but once that is done, you should be able to switch between GPUs, TPUs, and multiple machines simply by changing the strategy instance.

Below is a brief snippet illustrating this use case for a simple training example using the same Keras model as before.


First, create the model and optimizer inside the strategy's scope. This ensures that any variables created with the model and optimizer are mirrored variables.

In [None]:
with mirrored_strategy.scope():
  model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
  optimizer = tf.keras.optimizers.SGD()

接下来，创建输入数据集并调用 `tf.distribute.Strategy.experimental_distribute_dataset` 以根据策略分布数据集。

In [None]:
dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(1000).batch(
    global_batch_size)
dist_dataset = mirrored_strategy.experimental_distribute_dataset(dataset)

Then, define one step of the training. Use `tf.GradientTape` to compute gradients and optimizer to apply those gradients to update your model's variables. To distribute this training step, put it in a function `train_step` and pass it to `tf.distribute.Strategy.run` along with the dataset inputs you got from the `dist_dataset` created before:

In [None]:
loss_object = tf.keras.losses.BinaryCrossentropy(
  from_logits=True,
  reduction=tf.keras.losses.Reduction.NONE)

def compute_loss(labels, predictions):
  per_example_loss = loss_object(labels, predictions)
  return tf.nn.compute_average_loss(per_example_loss, global_batch_size=global_batch_size)

def train_step(inputs):
  features, labels = inputs

  with tf.GradientTape() as tape:
    predictions = model(features, training=True)
    loss = compute_loss(labels, predictions)

  gradients = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))
  return loss

@tf.function
def distributed_train_step(dist_inputs):
  per_replica_losses = mirrored_strategy.run(train_step, args=(dist_inputs,))
  return mirrored_strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_losses,
                         axis=None)

以上代码还需注意以下几点：

1. 您使用了 `tf.nn.compute_average_loss` 来计算损失。`tf.nn.compute_average_loss` 将每个样本的损失相加，然后将总和除以 `global_batch_size`。这很重要，因为稍后在每个副本上计算出梯度后，会通过对它们**求和**使其在副本中聚合。
2. 您还使用了 `tf.distribute.Strategy.reduce` API 来聚合 `tf.distribute.Strategy.run` 返回的结果。`tf.distribute.Strategy.run` 会从策略中的每个本地副本返回结果，您可以通过多种方式使用此结果。可以 `reduce` 它们以获得聚合值。还可以通过执行 `tf.distribute.Strategy.experimental_local_results` 获得包含在结果中的值的列表，每个本地副本一个列表。
3. 当在一个分布策略作用域内调用 `apply_gradients` 时，它的行为会被修改。具体来说，在同步训练期间，在将梯度应用于每个并行实例之前，它会对梯度的所有副本求和。


最后，当我们定义完训练步骤后，就可以迭代 `dist_dataset`，并在循环中运行训练：

In [None]:
for dist_inputs in dist_dataset:
  print(distributed_train_step(dist_inputs))

在上面的示例中，我们通过迭代 `dist_dataset` 为训练提供输入。我们还提供 `tf.distribute.Strategy.make_experimental_numpy_dataset` 以支持 Numpy 输入。您可以在调用 `tf.distribute.Strategy.experimental_distribute_dataset` 之前使用此 API 来创建数据集。

迭代数据的另一种方式是显式地使用迭代器。当您希望运行给定数量的步骤而非迭代整个数据集时，可能会用到此方式。现在可以将上面的迭代修改为：先创建迭代器，然后在迭代器上显式地调用 `next` 以获得输入数据。

In [None]:
iterator = iter(dist_dataset)
for _ in range(10):
  print(distributed_train_step(next(iterator)))

这涵盖了使用 `tf.distribute.Strategy` API 分布自定义训练循环的最简单情况。

### 目前支持的策略

训练 API | `MirroredStrategy` | `TPUStrategy` | `MultiWorkerMirroredStrategy` | `ParameterServerStrategy` | `CentralStorageStrategy`
:-- | :-- | :-- | :-- | :-- | :--
自定义训练循环 | 支持 | 支持 | 实验性支持 | 实验性支持 | Experimental support

### 示例和教程

Here are some examples for using distribution strategies with custom training loops:

1. [Tutorial](../tutorials/distribute/custom_training.ipynb): Training with a custom training loop and `MirroredStrategy`.
2. [Tutorial](../tutorials/distribute/multi_worker_with_ctl.ipynb): Training with a custom training loop and `MultiWorkerMirroredStrategy`.
3. [Guide](tpu.ipynb): Contains an example of a custom training loop with `TPUStrategy`.
4. [Tutorial](../tutorials/distribute/parameter_server_training.ipynb): Parameter server training with a custom training loop and `ParameterServerStrategy`.
5. 包含使用各种策略实现的最先进模型集合的 TensorFlow Model Garden [仓库](https://github.com/tensorflow/models/tree/master/official)。


## 其他主题

本部分涵盖与多个用例相关的一些主题。

<a name="TF_CONFIG"></a>

### 设置 TF_CONFIG 环境变量

For multi-worker training, as mentioned before, you need to set up the `'TF_CONFIG'` environment variable for each binary running in your cluster. The `'TF_CONFIG'` environment variable is a JSON string which specifies what tasks constitute a cluster, their addresses and each task's role in the cluster. The [`tensorflow/ecosystem`](https://github.com/tensorflow/ecosystem) repo provides a Kubernetes template, which sets up `'TF_CONFIG'` for your training tasks.

There are two components of `'TF_CONFIG'`: a cluster and a task.

- 集群提供有关训练集群的信息，后者是一个由不同类型的作业（例如工作进程）组成的字典。在多工作进程训练中，除了常规工作进程执行的作业之外，通常会有一个工作进程承担更多职责，例如保存检查点和为 TensorBoard 编写摘要文件。这种工作进程被称为“首席”工作进程，习惯上将索引为 `0` 的工作进程指定为首席工作进程（实际上这就是 `tf.distribute.Strategy` 的实现方式）。
- A task on the other hand provides information about the current task. The first component cluster is the same for all workers, and the second component task is different on each worker and specifies the type and index of that worker.

One example of `'TF_CONFIG'` is:

```python
os.environ["TF_CONFIG"] = json.dumps({     "cluster": {         "worker": ["host1:port", "host2:port", "host3:port"],         "ps": ["host4:port", "host5:port"]     },    "task": {"type": "worker", "index": 1} })
```


This `'TF_CONFIG'` specifies that there are three workers and two `"ps"` tasks in the `"cluster"` along with their hosts and ports. The `"task"` part specifies the role of the current task in the `"cluster"`—worker `1` (the second worker). Valid roles in a cluster are `"chief"`, `"worker"`, `"ps"`, and `"evaluator"`. There should be no `"ps"` job except when using `tf.distribute.experimental.ParameterServerStrategy`.

## 后续计划

`tf.distribute.Strategy` is actively under development. Try it out and provide your feedback using [GitHub issues](https://github.com/tensorflow/tensorflow/issues/new).