## Getting Started with PyTorch on Cloud TPUs

This notebook will show you how to:

* Install PyTorch/XLA on Colab, which lets you use PyTorch with TPUs.
* Run basic PyTorch functions on TPUs, like creating and adding tensors.
* Run PyTorch modules and autograd on TPUs.
* Run PyTorch networks on TPUs.

PyTorch/XLA is a package that lets PyTorch connect to Cloud TPUs and use TPU cores as devices. Colab provides a free Cloud TPU system (a remote CPU host + four TPU chips with two cores each) and installing PyTorch/XLA only takes a couple minutes.

To use PyTorch on Cloud TPUs in your own Colab notebook you can copy this one, or copy the setup cell below and configure your Colab environment to use TPUs.




<h3>  &nbsp;&nbsp;Use Colab Cloud TPU&nbsp;&nbsp; <a href="https://cloud.google.com/tpu/"><img valign="middle" src="https://raw.githubusercontent.com/GoogleCloudPlatform/tensorflow-without-a-phd/master/tensorflow-rl-pong/images/tpu-hexagon.png" width="50"></a></h3>

* On the main menu, click Runtime and select **Change runtime type**. Set "TPU" as the hardware accelerator.
* The cell below makes sure you have access to a TPU on Colab.


In [1]:
import os
assert os.environ['COLAB_TPU_ADDR'], 'Make sure to select TPU from Edit > Notebook settings > Hardware accelerator'

## Installing PyTorch/XLA

Run the following cell (or copy it into your own notebook!) to install PyTorch, Torchvision, and PyTorch/XLA. It will take a couple minutes to run.

The PyTorch/XLA package lets PyTorch connect to Cloud TPUs. (It's named PyTorch/XLA, not PyTorch/TPU, because XLA is the name of the TPU compiler.) In particular, PyTorch/XLA makes TPU cores available as PyTorch devices. This lets PyTorch create and manipulate tensors on TPUs.

In [2]:
!pip install cloud-tpu-client==0.10 torch==2.0.0 torchvision==0.15.1 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp310-cp310-linux_x86_64.whl

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torch-xla==2.0
  Downloading https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp310-cp310-linux_x86_64.whl (162.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m162.9/162.9 MB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting cloud-tpu-client==0.10
  Downloading cloud_tpu_client-0.10-py3-none-any.whl (7.4 kB)
Collecting torch==2.0.0
  Downloading torch-2.0.0-cp310-cp310-manylinux1_x86_64.whl (619.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m619.9/619.9 MB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchvision==0.15.1
  Downloading torchvision-0.15.1-cp310-cp310-manylinux1_x86_64.whl (6.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.0/6.0 MB[0m [31m104.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting google-api-python-client==1.8.0 (from cloud-tpu-client==0.10)


## Creating and Manipulating Tensors on TPUs

PyTorch uses Cloud TPUs just like it uses CPU or CUDA devices, as the next few cells will show. Each core of a Cloud TPU is treated as a different PyTorch  device.




In [3]:
# imports pytorch
import torch

# imports the torch_xla package
import torch_xla
import torch_xla.core.xla_model as xm

As mentioned above, the PyTorch/XLA package (torch_xla) lets PyTorch use TPU devices. The `xla_device()` function returns the TPU's "default" core as a device. This lets PyTorch creates tensors on TPUs:

In [4]:
# Creates a random tensor on xla:1 (a Cloud TPU core)
dev = xm.xla_device()
t1 = torch.ones(3, 3, device = dev)
print(t1)

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], device='xla:1')


See the documentation at http://pytorch.org/xla/ for a description of all public PyTorch/XLA functions. Here `xm.xla_device()` acquired the first Cloud TPU core ('xla:1'). Other cores can be directly acquired, too:

In [5]:
# Creating a tensor on the second Cloud TPU core
second_dev = xm.xla_device(n=2, devkind='TPU')
t2 = torch.zeros(3, 3, device = second_dev)
print(t2)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]], device='xla:2')


It is recommended that you use functions like `xm.xla_device()` over directly specifying TPU cores.

Tensors on TPUs can be manipulated like any other PyTorch tensor. The following cell adds, multiplies, and matrix multiplies two tensors on a TPU core:

In [6]:
a = torch.randn(2, 2, device = dev)
b = torch.randn(2, 2, device = dev)
print(a + b)
print(b * 2)
print(torch.matmul(a, b))

tensor([[-1.1846, -0.7140],
        [-0.3259, -0.5264]], device='xla:1')
tensor([[-0.9715, -1.2307],
        [-2.1193,  0.7613]], device='xla:1')
tensor([[ 0.4448,  0.3940],
        [ 0.6057, -0.7984]], device='xla:1')


This next cell runs a 1D convolution on a TPU core:

In [7]:
# Creates random filters and inputs to a 1D convolution
filters = torch.randn(33, 16, 3, device = dev)
inputs = torch.randn(20, 16, 50, device = dev)
torch.nn.functional.conv1d(inputs, filters)

tensor([[[ -2.2614,  -7.4375,  -3.0452,  ...,   6.4813,   6.0025,  -3.8181],
         [ -2.1178,  -1.2323,   6.3152,  ...,  -8.1402,   1.5390,  10.5330],
         [  3.7358,  -6.1666,  -5.3654,  ...,   3.9503,   6.6946,  -1.0387],
         ...,
         [ -1.0524,  -7.5402,  -6.6635,  ...,  -5.7106,  -9.5255,   9.1400],
         [-12.9870,   1.4063,  -6.9533,  ...,  10.5729,   1.3097,  -5.2656],
         [  5.0329,   1.4415,   8.1006,  ...,  -3.4235,   3.5638,  -5.9472]],

        [[ -3.0059,   3.8605,   3.6280,  ...,  -8.1614, -13.1281,   5.2417],
         [ -3.7675,  -4.9035,  -1.3131,  ...,   4.4226, -11.7430,  11.3242],
         [-13.1958,   5.3812,   3.2664,  ...,  -4.4664,   5.2152,   2.1421],
         ...,
         [  5.9822,  -2.8872,   8.6605,  ...,  -9.1931,  -6.1449,  -6.7736],
         [  1.4102,   1.8250, -12.2252,  ...,   3.3475,  -7.8704,   1.3273],
         [  5.8575,  -0.6981,   3.5026,  ...,  -3.9181,   2.3322,   0.6250]],

        [[ -7.5734,  -1.9799, -13.0047,  ...

And tensors can be transferred between CPU and TPU. In the following cell, a tensor on the CPU is copied to a TPU core, and then copied back to the CPU again. Note that PyTorch makes copies of tensors when transferring them across devices, so `t_cpu` and `t_cpu_again` are different tensors.



In [8]:
# Creates a tensor on the CPU (device='cpu' is unnecessary and only added for clarity)
t_cpu = torch.randn(2, 2, device='cpu')
print(t_cpu)

t_tpu = t_cpu.to(dev)
print(t_tpu)

t_cpu_again = t_tpu.to('cpu')
print(t_cpu_again)

tensor([[-1.3384, -0.6064],
        [ 0.0678,  1.4400]])
tensor([[-1.3384, -0.6064],
        [ 0.0678,  1.4400]], device='xla:1')
tensor([[-1.3384, -0.6064],
        [ 0.0678,  1.4400]])


## Running PyTorch modules and autograd on TPUs

Modules and autograd are fundamental PyTorch components.

In PyTorch, every stateful function is a module. Modules are Python classes augmented with metadata that lets PyTorch understand how to use them in a neural network. For example, linear layers are modules, as are entire networks. Since modules are stateful, they can be placed on devices, too. PyTorch/XLA lets us place them on TPU cores:


In [9]:
# Creates a linear module
fc = torch.nn.Linear(5, 2, bias=True)

# Copies the module to the XLA device (the first Cloud TPU core)
fc = fc.to(dev)

# Creates a random feature tensor
features = torch.randn(3, 5, device=dev, requires_grad=True)

# Runs and prints the module
output = fc(features)
print(output)

tensor([[ 0.6056, -0.4776],
        [-0.3333, -0.4420],
        [-0.7993, -0.9726]], device='xla:1', grad_fn=<AddmmBackward0>)


Autograd is the system PyTorch uses to populate the gradients of weights in a neural network. See [here](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autograd-tutorial-py) for details about PyTorch's autograd. When a module is run on a TPU core, its gradients are also populated on the same TPU core by autograd. The following cell demonstrates this:

In [10]:
output.backward(torch.ones_like(output))
print(fc.weight.grad)

tensor([[ 1.6606,  1.4688,  1.9482,  0.6864, -1.0761],
        [ 1.6606,  1.4688,  1.9482,  0.6864, -1.0761]], device='xla:1')


## Running PyTorch networks on TPUs

As mentioned above, PyTorch networks are also modules, and so they're run in the same way. The following cell runs a relatively simple PyTorch network from the [PyTorch tutorial docs](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py) on a TPU core:

In [11]:
import torch.nn as nn
import torch.nn.functional as F

# Simple example network from
# https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


# Places network on the default TPU core
net = Net().to(dev)

# Creates random input on the default TPU core
input = torch.randn(1, 1, 32, 32, device=dev)

# Runs network
out = net(input)
print(out)

tensor([[-0.0527, -0.0063, -0.0312,  0.0100,  0.0611,  0.0544,  0.0016, -0.0216,
         -0.0166,  0.0461]], device='xla:1', grad_fn=<AddmmBackward0>)


As in the previous snippets, running PyTorch on a TPU just requires specifying a TPU core as a device.