##### Copyright 2021 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Migrating from TPU embedding_columns (with TPUEstimator) to TPUEmbedding layer (TF2)

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/guide/migrate/tpu_embedding">
    <img src="https://www.tensorflow.org/images/tf_logo_32px.png" />
    View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/migrate/tpu_embedding.ipynb">
    <img src="https://www.tensorflow.org/images/colab_logo_32px.png" />
    Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/docs/blob/master/site/en/guide/migrate/tpu_embedding.ipynb">
    <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />
    View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/docs/site/en/guide/migrate/tpu_embedding.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

In TF1, `tf.compat.v1.estimator.tpu.TPUEstimator` is a high level API that encapsulates training, evaluation, prediction, and export for serving with TPUs. It has special support for `tf.compat.v1.tpu.experimental.embedding_column`, which is a TPU based embedding feature column which allows you to use embedding tables that are larger than memory of a single TPU device or to use sparse or ragged input to embeddings on the TPU. In TF2, use `tf.keras` with a Keras [model](https://www.tensorflow.org/api_docs/python/tf/keras/Model?version=nightly), [layers](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer?version=nightly), [optimizers](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Optimizer?version=nightly), and [metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Metric) for the aforementioned tasks. For Keras and TPUs we have produced a special `tfrs.layers.embedding.TPUEmbedding` layer that performs the same functions as the TPU `embedding_column`s with `TPUEstimator`. In this guide, you will go through the basic setup of a `TPUEstimator` training and evaluation program to run it with an `embedding_column`. Then, you will create the equivalent training and evaluation program in TF2 using Keras and the `TPUEmbedding` layer, with the help of `tf.distribute.TPUStrategy`.

## Setup

For both TF1 and TF2 examples demonstrated below, you will first start with a couple of necessary TensorFlow imports.

**Note**: `tf-nightly` and `cloud_tpu_client` are only necessary here until TensorFlow 2.6 is available in Colab.

In [None]:
!pip install tensorflow-recommenders
!pip uninstall -y tensorflow keras tensorflow-estimator
!pip install tf-nightly
!pip install cloud_tpu_client

In [None]:
import tensorflow as tf
import tensorflow.compat.v1 as tf1

# TPUEmbedding layer is not part of Tensorflow
import tensorflow_recommenders as tfrs

# Import the CloudTPU client to set the runtime version to match the current tensorflow version
try:
  from cloud_tpu_client import Client
  c = Client(tpu='')
  c.configure_tpu_version(tf.__version__, restart_type='ifNeeded')
except ImportError:
  pass

and prepare some simple data for demonstration:

In [None]:
features = [[1., 1.5]]
embedding_features_indices = [[0, 0], [0, 1]]
embedding_features_values = [0, 5]
labels = [[0.3]]
eval_features = [[4., 4.5]]
eval_embedding_features_indices = [[0, 0], [0, 1]]
eval_embedding_features_values = [4, 3]
eval_labels = [[0.8]]

## TF1: TPUEstimator.train/evaluate

When using TPU embeddings with `TPUEstimator`, we will need to define a feature column for each feature we want to lookup and then pass that information to the `TPUEstimator` via the `embedding_config_spec` argument.

In [None]:
# Create the embedding column, the key must match one of the keys in the dict of input features.
# The num_buckets is the vocabular size for the embedding table.
embedding_id_column = (
      tf1.feature_column.categorical_column_with_identity(
          key="sparse_feature", num_buckets=10))

# dimension is the width of the embedding table.
embedding_column = tf1.tpu.experimental.embedding_column(
    embedding_id_column, dimension=5)

embedding_config_spec = tf1.estimator.tpu.experimental.EmbeddingConfigSpec(
    feature_columns=(embedding_column,),
    optimization_parameters=(
        tf1.tpu.experimental.AdagradParameters(0.05)))

Next, to use a `TPUEstimator`, you should define a few functions: an input function for the training data, an evaluation input function for the evaluation data, and a model function that tells the `TPUEstimator` how the training op is defined with the features and labels.

In [None]:
def _input_fn(params):
  dataset = tf1.data.Dataset.from_tensor_slices((
      {"dense_feature": features,
       "sparse_feature": tf1.SparseTensor(
           embedding_features_indices,
           embedding_features_values, [1, 2])},
           labels))
  dataset = dataset.repeat()
  return dataset.batch(params['batch_size'], drop_remainder=True)

def _eval_input_fn(params):
  dataset = tf1.data.Dataset.from_tensor_slices((
      {"dense_feature": eval_features,
       "sparse_feature": tf1.SparseTensor(
           eval_embedding_features_indices,
           eval_embedding_features_values, [1, 2])},
           eval_labels))
  dataset = dataset.repeat()
  return dataset.batch(params['batch_size'], drop_remainder=True)

def _model_fn(features, labels, mode, params):
  embedding_features = tf1.keras.layers.DenseFeatures(embedding_column)(features)
  concatenated_features = tf1.keras.layers.Concatenate(axis=1)(
      [embedding_features, features["dense_feature"]])
  logits = tf1.layers.Dense(1)(concatenated_features)
  loss = tf1.losses.mean_squared_error(labels=labels, predictions=logits)
  optimizer = tf1.train.AdagradOptimizer(0.05)
  optimizer = tf1.tpu.CrossShardOptimizer(optimizer)
  train_op = optimizer.minimize(loss, global_step=tf1.train.get_global_step())
  return tf1.estimator.tpu.TPUEstimatorSpec(mode, loss=loss, train_op=train_op)

With those functions defined, you will now create a `tf.distribute.cluster_resolver.TPUClusterResolver` that provides the cluster information, and a `tf.compat.v1.estimator.tpu.RunConfig` object. Using the model function you have defined, you can now create a `TPUEstimator`. Checkpoint saving is skipped to simplify the example. Note that you need to specify the batch size for both training and evaluation for a `TPUEstimator`.

In [None]:
cluster_resolver = tf1.distribute.cluster_resolver.TPUClusterResolver(tpu='')
print("All devices: ", tf1.config.list_logical_devices('TPU'))

In [None]:
tpu_config = tf1.estimator.tpu.TPUConfig(
    iterations_per_loop=10,
    per_host_input_for_training=tf1.estimator.tpu.InputPipelineConfig
          .PER_HOST_V2)
config = tf1.estimator.tpu.RunConfig(
    cluster=cluster_resolver,
    save_checkpoints_steps=None,
    tpu_config=tpu_config)
estimator = tf1.estimator.tpu.TPUEstimator(
    model_fn=_model_fn, config=config, train_batch_size=8, eval_batch_size=8,
    embedding_config_spec=embedding_config_spec)

Call `train` to train the `TPUEstimator`,

In [None]:
estimator.train(_input_fn, steps=1)

And call `evaluate` to evaluate the model using the evaluation data you have prepared.

In [None]:
estimator.evaluate(_eval_input_fn, steps=1)

### TF2: Keras training API with `tf.distribute.TPUStrategy`

In TF2, you will use a `tf.keras` model with a `tf.distribute.TPUStrategy` to train your models on the TPU workers. To begin with, you will create a `TPUClusterResolver` to provide the cluster information, and connect to the cluster.

In [None]:
cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
tf.config.experimental_connect_to_cluster(cluster_resolver)
tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
print("All devices: ", tf.config.list_logical_devices('TPU'))

Next, prepare your data. This is similar to how you created a dataset in TensorFlow 1, except the dataset function is now passed a `tf.distribute.InputContext` object  rather than a params dict. You can use this object to determine the local batch size (and which host this pipeline is for, so you can properly partition your data).

When using TPU Embeddings it is important to include the 'drop_remainder=True' option when batching since the TPU Embedding API required a fixed batch size. More over the same batch size must be used to evaluation and training if they are taking place on the same set of devices.

Finally, you should use `tf.keras.utils.experimental.DatasetCreator` along with the special input option `experimental_fetch_to_device=False` as seen below.

In [None]:
global_batch_size = 8

def _input_dataset(context: tf.distribute.InputContext):
  dataset = tf.data.Dataset.from_tensor_slices((
      {"dense_feature": features,
       "sparse_feature": tf.SparseTensor(
           embedding_features_indices,
           embedding_features_values, [1, 2])},
           labels))
  dataset = dataset.shuffle(10).repeat()
  dataset = dataset.batch(
      context.get_per_replica_batch_size(global_batch_size),
      drop_remainder=True)
  return dataset.prefetch(2)

def _eval_dataset(context: tf.distribute.InputContext):
  dataset = tf.data.Dataset.from_tensor_slices((
      {"dense_feature": eval_features,
       "sparse_feature": tf.SparseTensor(
           eval_embedding_features_indices,
           eval_embedding_features_values, [1, 2])},
           eval_labels))
  dataset = dataset.repeat()
  dataset = dataset.batch(
      context.get_per_replica_batch_size(global_batch_size),
      drop_remainder=True)
  return dataset.prefetch(2)

input_options = tf.distribute.InputOptions(
    experimental_fetch_to_device=False)

input_dataset = tf.keras.utils.experimental.DatasetCreator(
    _input_dataset, input_options=input_options)

eval_dataset = tf.keras.utils.experimental.DatasetCreator(
    _eval_dataset, input_options=input_options)

Next, once your data is prepared, you will create a `TPUStrategy`, and define your model, metrics, and optimizer under the scope of this strategy. You should pick a number for `steps_per_execution` in `Model.compile` because it specifies the number of batches to run during each `tf.function` call, and is critical for performance. This argument is similar to `iterations_per_loop` used in `TPUEstimator`.

The features and table configuration that was previously specified via the `tf1.tpu.experimental.embedding_column` and `tf1.tpu.experimental.shared_embedding_column` are now specified directly via a pair of configuration objects `tf.tpu.experimental.embedding.FeatureConfig` and `tf.tpu.experimental.embedding.TableConfig`. Please see the associated documentation for more details.

In [None]:
strategy = tf.distribute.TPUStrategy(cluster_resolver)
with strategy.scope():
  optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.05)
  dense_input = tf.keras.Input(shape=(2,), dtype=tf.float32, batch_size=global_batch_size)
  sparse_input = tf.keras.Input(shape=(), dtype=tf.int32, batch_size=global_batch_size)
  embedded_input = tfrs.layers.embedding.TPUEmbedding(
      feature_config=tf.tpu.experimental.embedding.FeatureConfig(
          table=tf.tpu.experimental.embedding.TableConfig(
              vocabulary_size=10,
              dim=5,
              initializer=tf.initializers.TruncatedNormal(mean=0.0, stddev=1)),
          name="sparse_input"),
      optimizer=optimizer)(sparse_input)
  input = tf.keras.layers.Concatenate(axis=1)([dense_input, embedded_input])
  result = tf.keras.layers.Dense(1)(input)
  model = tf.keras.Model(inputs={"dense_feature": dense_input, "sparse_feature": sparse_input}, outputs=result)
  model.compile(optimizer, "mse", steps_per_execution=10)

With that, you are ready to train the model with the training dataset,

In [None]:
model.fit(input_dataset, epochs=5, steps_per_epoch=10)

and evaluate the model using the evaluation dataset.

In [None]:
model.evaluate(eval_dataset, steps=1, return_dict=True)

For more customization such as optimization procedure, Keras APIs allow you to define the training step. Please see [Customize what happens in Model.fit](https://www.tensorflow.org/guide/keras/customizing_what_happens_in_fit
) for an example of this workflow, and [Use TPUs](https://www.tensorflow.org/guide/tpu) for more information of TPU usage, including an example of customized training loop.