##### Copyright 2020 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

<table class="tfo-notebook-buttons" align="left">
  <td><a target="_blank" href="https://tensorflow.google.cn/federated/tutorials/tff_for_federated_learning_research_compression"><img src="https://tensorflow.google.cn/images/tf_logo_32px.png">在 TensorFlow.org 上查看</a></td>
  <td>     <a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/federated/tutorials/tff_for_federated_learning_research_compression.ipynb"><img src="https://tensorflow.google.cn/images/colab_logo_32px.png">在 Google Colab 中运行</a>
</td>
  <td>     <a target="_blank" href="https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/federated/tutorials/tff_for_federated_learning_research_compression.ipynb"><img src="https://tensorflow.google.cn/images/GitHub-Mark-32px.png">在 GitHub 上查看源代码</a>
</td>
  <td>     <a href="https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/federated/tutorials/tff_for_federated_learning_research_compression.ipynb"><img src="https://tensorflow.google.cn/images/download_logo_32px.png">下载笔记本</a> </td>
</table>

# 用于联合学习研究的 TFF：模型和更新压缩

**注**：本 Colab 已通过验证，可与[最新发布版本](https://github.com/tensorflow/federated#compatibility)的 `tensorflow_federated` pip 软件包一起使用，但 Tensorflow Federated 项目仍处于预发布开发阶段，可能无法在 `master` 上运行。

在本教程中，我们使用 [EMNIST](https://tensorflow.google.cn/federated/api_docs/python/tff/simulation/datasets/emnist) 数据集演示如何使用 `tff.learning` API 启用有损压缩算法以降低联合平均算法中的通信成本。有关联合平均算法的更多详细信息，请参阅论文[基于分散数据的深度网络高效通信学习](https://arxiv.org/abs/1602.05629)。

## 准备工作

在开始之前，请运行以下代码以确保您的环境已正确设置。如果没有看到问候，请参阅[安装](../install.md)指南中的说明。

In [None]:
#@test {"skip": true}
!pip install --quiet --upgrade tensorflow-federated
!pip install --quiet --upgrade tensorflow-model-optimization
!pip install --quiet --upgrade nest-asyncio

import nest_asyncio
nest_asyncio.apply()

In [1]:
%load_ext tensorboard

import functools

import numpy as np
import tensorflow as tf
import tensorflow_federated as tff

验证 TFF 是否正常工作。

In [2]:
@tff.federated_computation
def hello_world():
  return 'Hello, World!'

hello_world()

b'Hello, World!'

## 准备输入数据

在本部分中，我们将加载并预处理 TFF 中包含的 EMNIST 数据集。请查看[图像分类联合学习](https://tensorflow.google.cn/federated/tutorials/federated_learning_for_image_classification#preparing_the_input_data)教程，以获取有关 EMN​​IST 数据集的更多详细信息。


In [3]:
# This value only applies to EMNIST dataset, consider choosing appropriate
# values if switching to other datasets.
MAX_CLIENT_DATASET_SIZE = 418

CLIENT_EPOCHS_PER_ROUND = 1
CLIENT_BATCH_SIZE = 20
TEST_BATCH_SIZE = 500

emnist_train, emnist_test = tff.simulation.datasets.emnist.load_data(
    only_digits=True)

def reshape_emnist_element(element):
  return (tf.expand_dims(element['pixels'], axis=-1), element['label'])

def preprocess_train_dataset(dataset):
  """Preprocessing function for the EMNIST training dataset."""
  return (dataset
          # Shuffle according to the largest client dataset
          .shuffle(buffer_size=MAX_CLIENT_DATASET_SIZE)
          # Repeat to do multiple local epochs
          .repeat(CLIENT_EPOCHS_PER_ROUND)
          # Batch to a fixed client batch size
          .batch(CLIENT_BATCH_SIZE, drop_remainder=False)
          # Preprocessing step
          .map(reshape_emnist_element))

emnist_train = emnist_train.preprocess(preprocess_train_dataset)

## 定义模型

在这里，我们基于原始的 FedAvg CNN 定义一个 Keras 模型，然后将该 Keras 模型包装到 [tff.learning.Model](https://tensorflow.google.cn/federated/api_docs/python/tff/learning/Model) 的实例中，以便 TFF 可以使用它。

请注意，我们需要一个**函数**来生成模型，而不是直接生成模型。此外，该函数**无法**仅捕获预构造的模型，它必须在其被调用的上下文中创建模型。原因在于 TFF 是为迁移到设备而设计的，需要控制何时构造资源，以便能够捕获并打包这些资源。

In [4]:
def create_original_fedavg_cnn_model(only_digits=True):
  """The CNN model used in https://arxiv.org/abs/1602.05629."""
  data_format = 'channels_last'

  max_pool = functools.partial(
      tf.keras.layers.MaxPooling2D,
      pool_size=(2, 2),
      padding='same',
      data_format=data_format)
  conv2d = functools.partial(
      tf.keras.layers.Conv2D,
      kernel_size=5,
      padding='same',
      data_format=data_format,
      activation=tf.nn.relu)

  model = tf.keras.models.Sequential([
      tf.keras.layers.InputLayer(input_shape=(28, 28, 1)),
      conv2d(filters=32),
      max_pool(),
      conv2d(filters=64),
      max_pool(),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(512, activation=tf.nn.relu),
      tf.keras.layers.Dense(10 if only_digits else 62),
      tf.keras.layers.Softmax(),
  ])

  return model

# Gets the type information of the input data. TFF is a strongly typed
# functional programming framework, and needs type information about inputs to 
# the model.
input_spec = emnist_train.create_tf_dataset_for_client(
    emnist_train.client_ids[0]).element_spec

def tff_model_fn():
  keras_model = create_original_fedavg_cnn_model()
  return tff.learning.from_keras_model(
      keras_model=keras_model,
      input_spec=input_spec,
      loss=tf.keras.losses.SparseCategoricalCrossentropy(),
      metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

## 训练模型并输出训练指标

现在，我们准备构造联合平均算法，并基于 EMNIST 数据集训练定义的模型。

首先，我们需要使用 [tff.learning.algorithms.build_weighted_fed_avg](https://tensorflow.google.cn/federated/api_docs/python/tff/learning/algorithms/build_weighted_fed_avg) API 构建一个联合平均算法。

In [5]:
federated_averaging = tff.learning.algorithms.build_weighted_fed_avg(
    model_fn=tff_model_fn,
    client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.02),
    server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0))

现在，我们运行联合平均算法。从 TFF 的角度来看，联合学习算法的执行过程如下所示：

1. 初始化算法并获取初始服务器状态。服务器状态包含执行算法所必需的信息。回想一下，由于 TFF 具有函数性，因此该状态既包括算法使用的任何优化器状态（例如动量项），也包括模型参数本身——这些将作为参数传递并以 TFF 计算的结果形式返回。
2. 逐轮执行算法。在每个轮次中，一个新的服务器状态将作为每个客户端基于其数据训练模型所得到的结果返回。一个轮次中通常存在以下情形：
    1. 服务器将模型广播给所有参与的客户端。
    2. 每个客户端都基于模型和自己的数据执行工作。
    3. 服务器聚合所有模型以产生包含新模型的服务器状态。

有关更多详细信息，请参阅[自定义联合算法，第 2 部分：实现联合平均](https://tensorflow.google.cn/federated/tutorials/custom_federated_algorithms_2)教程。

训练指标会写入 Tensorboard 目录以在训练后显示。

In [6]:
#@title Load utility functions

def format_size(size):
  """A helper function for creating a human-readable size."""
  size = float(size)
  for unit in ['bit','Kibit','Mibit','Gibit']:
    if size < 1024.0:
      return "{size:3.2f}{unit}".format(size=size, unit=unit)
    size /= 1024.0
  return "{size:.2f}{unit}".format(size=size, unit='TiB')

def set_sizing_environment():
  """Creates an environment that contains sizing information."""
  # Creates a sizing executor factory to output communication cost
  # after the training finishes. Note that sizing executor only provides an
  # estimate (not exact) of communication cost, and doesn't capture cases like
  # compression of over-the-wire representations. However, it's perfect for
  # demonstrating the effect of compression in this tutorial.
  sizing_factory = tff.framework.sizing_executor_factory()

  # TFF has a modular runtime you can configure yourself for various
  # environments and purposes, and this example just shows how to configure one
  # part of it to report the size of things.
  context = tff.framework.ExecutionContext(executor_fn=sizing_factory)
  tff.framework.set_default_context(context)

  return sizing_factory

In [7]:
def train(federated_averaging_process, num_rounds, num_clients_per_round, summary_writer):
  """Trains the federated averaging process and output metrics."""
  # Create a environment to get communication cost.
  environment = set_sizing_environment()

  # Initialize the Federated Averaging algorithm to get the initial server state.
  state = federated_averaging_process.initialize()

  with summary_writer.as_default():
    for round_num in range(num_rounds):
      # Sample the clients parcitipated in this round.
      sampled_clients = np.random.choice(
          emnist_train.client_ids,
          size=num_clients_per_round,
          replace=False)
      # Create a list of `tf.Dataset` instances from the data of sampled clients.
      sampled_train_data = [
          emnist_train.create_tf_dataset_for_client(client)
          for client in sampled_clients
      ]
      # Round one round of the algorithm based on the server state and client data
      # and output the new state and metrics.
      result = federated_averaging_process.next(state, sampled_train_data)
      state = result.state
      train_metrics = result.metrics['client_work']['train']

      # For more about size_info, please see https://tensorflow.google.cn/federated/api_docs/python/tff/framework/SizeInfo
      size_info = environment.get_size_info()
      broadcasted_bits = size_info.broadcast_bits[-1]
      aggregated_bits = size_info.aggregate_bits[-1]

      print('round {:2d}, train_metrics={}, broadcasted_bits={}, aggregated_bits={}'.format(
          round_num, train_metrics, format_size(broadcasted_bits), format_size(aggregated_bits)))

      # Add metrics to Tensorboard.
      for name, value in train_metrics.items():
          tf.summary.scalar(name, value, step=round_num)

      # Add broadcasted and aggregated data size to Tensorboard.
      tf.summary.scalar('cumulative_broadcasted_bits', broadcasted_bits, step=round_num)
      tf.summary.scalar('cumulative_aggregated_bits', aggregated_bits, step=round_num)
      summary_writer.flush()

In [8]:
# Clean the log directory to avoid conflicts.
try:
  tf.io.gfile.rmtree('/tmp/logs/scalars')
except tf.errors.OpError as e:
  pass  # Path doesn't exist

# Set up the log directory and writer for Tensorboard.
logdir = "/tmp/logs/scalars/original/"
summary_writer = tf.summary.create_file_writer(logdir)

train(federated_averaging_process=federated_averaging, num_rounds=10,
      num_clients_per_round=10, summary_writer=summary_writer)

round  0, train_metrics=OrderedDict([('sparse_categorical_accuracy', 0.092454836), ('loss', 2.310193), ('num_examples', 941), ('num_batches', 51)]), broadcasted_bits=507.62Mibit, aggregated_bits=507.62Mibit
round  1, train_metrics=OrderedDict([('sparse_categorical_accuracy', 0.10029791), ('loss', 2.3102622), ('num_examples', 1007), ('num_batches', 55)]), broadcasted_bits=1015.24Mibit, aggregated_bits=1015.25Mibit
round  2, train_metrics=OrderedDict([('sparse_categorical_accuracy', 0.10710711), ('loss', 2.3048222), ('num_examples', 999), ('num_batches', 54)]), broadcasted_bits=1.49Gibit, aggregated_bits=1.49Gibit
round  3, train_metrics=OrderedDict([('sparse_categorical_accuracy', 0.1061061), ('loss', 2.3066027), ('num_examples', 999), ('num_batches', 55)]), broadcasted_bits=1.98Gibit, aggregated_bits=1.98Gibit
round  4, train_metrics=OrderedDict([('sparse_categorical_accuracy', 0.1287594), ('loss', 2.2999024), ('num_examples', 1064), ('num_batches', 58)]), broadcasted_bits=2.48Gibit, a

使用上面指定的根日志目录启动 TensorBoard，以显示训练指标。加载数据可能需要几秒钟。除了损失和准确率外，我们还输出广播和汇总的数据量。广播数据是指服务器推送到每个客户端的张量，而聚合数据是指每个客户端返回到服务器的张量。

In [None]:
#@test {"skip": true}
%tensorboard --logdir /tmp/logs/scalars/ --port=0

## 构建自定义聚合函数

现在我们来实现对聚合数据使用有损压缩算法的函数。我们将使用 TFF 的 API 来为此创建一个 `tff.aggregators.AggregationFactory`。虽然研究人员可能经常想要实现他们自己的函数（可以通过 `tff.aggregators` API 完成），但我们将使用内置方法（特别是 `tff.learning.compression_aggregator`）来实现。

请务必注意，此聚合器不会立即将压缩应用于整个模型。相反，仅对模型中大到一定程度的变量应用压缩。通常，诸如偏差之类的小变量对不准确更敏感，并且相对较小，潜在节省的通信成本也相对较小。

In [12]:
compression_aggregator = tff.learning.compression_aggregator()
isinstance(compression_aggregator, tff.aggregators.WeightedAggregationFactory)

True

在上面，您可以看到压缩聚合器是一个*加权*聚合工厂，这意味着它涉及加权聚合（与用于差分隐私的聚合器相反，后者通常未加权）。

此聚合工厂可以通过其 `model_aggregator` 参数直接插入 FedAvg。

In [13]:
federated_averaging_with_compression = tff.learning.algorithms.build_weighted_fed_avg(
    tff_model_fn,
    client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.02),
    server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),
    model_aggregator=compression_aggregator)

## 再次训练模型

现在，我们来运行新的联合平均算法。

In [14]:
logdir_for_compression = "/tmp/logs/scalars/compression/"
summary_writer_for_compression = tf.summary.create_file_writer(
    logdir_for_compression)

train(federated_averaging_process=federated_averaging_with_compression, 
      num_rounds=10,
      num_clients_per_round=10,
      summary_writer=summary_writer_for_compression)

round  0, train_metrics=OrderedDict([('sparse_categorical_accuracy', 0.087804876), ('loss', 2.3126457), ('num_examples', 1025), ('num_batches', 55)]), broadcasted_bits=507.62Mibit, aggregated_bits=146.47Mibit
round  1, train_metrics=OrderedDict([('sparse_categorical_accuracy', 0.073267326), ('loss', 2.3111901), ('num_examples', 1010), ('num_batches', 56)]), broadcasted_bits=1015.24Mibit, aggregated_bits=292.93Mibit
round  2, train_metrics=OrderedDict([('sparse_categorical_accuracy', 0.08925144), ('loss', 2.3071017), ('num_examples', 1042), ('num_batches', 57)]), broadcasted_bits=1.49Gibit, aggregated_bits=439.40Mibit
round  3, train_metrics=OrderedDict([('sparse_categorical_accuracy', 0.07985144), ('loss', 2.3061485), ('num_examples', 1077), ('num_batches', 59)]), broadcasted_bits=1.98Gibit, aggregated_bits=585.86Mibit
round  4, train_metrics=OrderedDict([('sparse_categorical_accuracy', 0.11947791), ('loss', 2.302166), ('num_examples', 996), ('num_batches', 55)]), broadcasted_bits=2.48

再次启动 TensorBoard 以比较两次运行之间的训练指标。

正如您在 Tensorboard 中所见， `orginial` 图中的 `sparse_categorical_accuracy` 曲线和 `compression` 曲线之间存在明显下降，而 `loss`和`aggregated_bits` 图中的这两条曲线则非常相似。

总之，我们实现了一种与原始联合平均算法的性能相近的压缩算法，同时显著降低了通信成本。

In [None]:
#@test {"skip": true}
%tensorboard --logdir /tmp/logs/scalars/ --port=0

## 练习

要实现自定义压缩算法并将其应用于训练循环，您可以：

1. 将一个新的压缩算法实现为 [tff.aggregators.MeanFactory](https://tensorflow.google.cn/federated/api_docs/python/tff/aggregators/MeanFactory) 的子类。
2. 用压缩算法执行训练，看看它是否比上面的算法做得更好。

具有潜在价值的开放研究问题包括：非均匀量化、霍夫曼编码等无损压缩以及基于之前训练轮次的信息调整压缩的机制。

推荐阅读材料：

- [ 通过减少客户资源需求来扩大联合学习的范围 ](https://research.google/pubs/pub47774/)
- [联合学习：提高沟通效率的策略](https://research.google/pubs/pub45648/)
- [联合学习中的高级和开放问题](https://arxiv.org/abs/1912.04977)中的*第 3.5 节：通信和压缩*