Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
6 contributors

Users who have contributed to this file

@rsepassi @lukaszkaiser @stefan-it @martinpopel @nikiparmar @joel-shor
51 lines (39 sloc) 1.76 KB

Running on Cloud TPUs

Tensor2Tensor supports running on Google Cloud Platforms TPUs, chips specialized for ML training. See the official tutorials for running the T2T Transformer for text on Cloud TPUs and Transformer for Speech Recognition.

Other models on TPU

Many of Tensor2Tensor's models work on TPU.

You can provision a VM and TPU with ctpu up. Use the t2t-trainer command on the VM as usual with the additional flags --use_tpu and --cloud_tpu_name=$TPU_NAME.

Note that because the TPUEstimator does not catch the OutOfRangeError during evaluation, you should ensure that --eval_steps is small enough to not exhaust the evaluation data.

A non-exhaustive list of T2T models that work on TPU:

  • Image generation: imagetransformer with imagetransformer_base_tpu (or imagetransformer_tiny_tpu)
  • Super-resolution: img2img_transformer with img2img_transformer_base_tpu (or img2img_transformer_tiny_tpu)
  • resnet with resnet_50 (or resnet_18 or resnet_34)
  • revnet with revnet_104 (or revnet_38_cifar)
  • shake_shake with shakeshake_tpu (or shakeshake_small)

Example invocation

Use ctpu up to bring up the VM and TPU machines; once the machines are ready it will SSH you into the VM and you can run the following:

# DATA_DIR and OUT_DIR should be GCS buckets
# TPU_NAME should have been set automatically by the ctpu tool

t2t-trainer \
  --model=shake_shake \
  --hparams_set=shakeshake_tpu \
  --problem=image_cifar10 \
  --train_steps=180000 \
  --eval_steps=9 \
  --local_eval_frequency=100 \
  --data_dir=$DATA_DIR \
  --output_dir=$OUT_DIR \
  --use_tpu \
  --cloud_tpu_name=$TPU_NAME
You can’t perform that action at this time.