### A.9.1 PyTorch computations on GPU devices
Modifying the training loop to run optionally on a GPU is relatively simple and only requires changing three lines of code (see section A.7). Before we make the modifications, it’s crucial to understand the main concept behind GPU computations within PyTorch.
In PyTorch, a device is where computations occur and data resides. The CPU and the GPU are examples of devices. A PyTorch tensor resides in a device, and its operations are executed on the same device.

Let’s see how this works in action. Assuming that you installed a GPU-compatible version of PyTorch (see section A.1.3), we can double-check that our runtime indeed supports GPU computing via the following code:

In [11]:
import torch
print(torch.cuda.is_available())
# ThIs is False because Apple Silicon chips (like M4, M2, M1) use GPUs that aren't supported by CUDA, which is specific to NVIDIA GPUs. PyTorch can utilise the Apple GPU via Metal Performance Shaders (MPS), Apple's GPU framework.

False


In [12]:
print(torch.backends.mps.is_available())

True


In [13]:
# Set the Device to MPS When running our PyTorch model, specify the device as mps
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")

Now, suppose we have two tensors that we can add; this computation will be carried out on the CPU by default:

In [14]:
tensor_1 = torch.tensor([1., 2., 3.])
tensor_2 = torch.tensor([4., 5., 6.])
print(tensor_1 + tensor_2)

tensor([5., 7., 9.])


We can now use the `.to()` method. This method is the same as the one we use to change a tensor’s datatype (see 2.2.2) to transfer these tensors onto a GPU and perform the addition there.
Move Tensors to the GPU: Use `.to("mps")` or `.to(device)` to transfer tensors to the GP

In [15]:
tensor_1 = tensor_1.to("mps")
tensor_2 = tensor_2.to("mps")
print(tensor_1 + tensor_2)

tensor([5., 7., 9.], device='mps:0')


In [16]:
tensor_1 = tensor_1.to(device)
tensor_2 = tensor_2.to(device)
print(tensor_1 + tensor_2)

tensor([5., 7., 9.], device='mps:0')


The resulting tensor now includes the device information, `device='cuda:0'`, which means that the tensors reside on the first GPU. If your machine hosts multiple GPUs, you can specify which GPU you’d like to transfer the tensors to. You do so by indicating the device ID in the transfer command. For instance, you can use `.to("cuda:0")`, `.to("cuda:1")`, and so on.

On Apple Silicon using MPS (Metal Performance Shaders), the concept of multiple GPUs does not apply because Apple Silicon devices typically have only one integrated GPU accessible via MPS. This simplifies device selection compared to CUDA-based systems.

However, all tensors must be on the same device. Otherwise, the computation will fail, where one tensor resides on the CPU and the other on the GPU:

In [17]:
tensor_1 = tensor_1.to("cpu")
print(tensor_1 + tensor_2)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, mps:0 and cpu!