- As of mid-2019, the main PyTorch releases only have acceleration on GPUs that have support for CUDA.
### 3.9.1 Managing a tensor's device attribute
- In addition to `dtype`, a PyTorch `Tensor` also has the notion of `device`, which is **where on the computer the tensor data is placed**.

In [1]:
import torch

In [2]:
points_gpu = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]], device='cuda')

- Copy a tensor created on the CPU onto the GPU using the `to` method:

In [3]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points_gpu = points.to(device='cuda')

- Doing so returns a new tensor that has the same numerical data, but **stored in the RAM of the GPU**, rather than in regular system RAM.
- **In almost all cases, CPU- and GPU-based tensors expose the same user-facing API**, making it much easier to write code that is agnostic to where, exactly, the heavy number crunching(噛み砕く「かみくだく」) is running.
- If our machine has more than one GPU, we can also decide on which GPU we allocate the tensor by passing a zero-based integer identifying the GPU on the machine:

In [4]:
points_gpu = points.to(device='cuda:0')

In [5]:
points_gpu = 2 * points.to(device='cuda')

- 上記計算の発生したこと：
    - The `points` tensor is copied to the GPU.
    - A new tensor is allocated on the GPU and used to store the result of the multiplication.
    - A handle to that GPU tensor is returned.

In [6]:
points_gpu = points_gpu + 4

- The addition is still performed on the GPU, and no information flows to the CPU (**unless we print or access the resulting tensor**).
- In order to move the tensor back to the CPU:

In [7]:
points_cpu = points_gpu.to(device='cpu')

- Can also use the shorthand methods `cpu` and `cuda` instead of the `to` method:

In [8]:
points_gpu = points.cuda() # defaults to GPU index 0
points_gpu = points.cuda(0)
points_cpu = points_gpu.cpu()

- By using the `to` method, we can change the placement and the data type simultaneouly by providing both `device` and `dtype` as arguments.