# PyTorch Tutorial

**William Yue**

In this Jupyter notebook, my goal is to gain familiarity with PyTorch by following the [online tutorials](https://pytorch.org/tutorials/). Hopefully I will know how it works at the end.

## Table of Contents

1. [Introduction to Pytorch](#Introduction-to-PyTorch)<br />
    1.1 [Tensors](#Tensors)<br />
    1.2 [Datasets & DataLoaders](#Datasets-and-DataLoaders)<br />
    1.3 [Summary of Section 1](#Summary-of-Section-1)
    

## §1 Introduction to PyTorch

### 1.1 Tensors

Let's start by getting `torch` and `numpy` in here.

In [1]:
import torch
import numpy as np

Tensors appear to just be `torch`'s version of a matrix or multi-dimensional array, similar to `numpy`'s ndarrays. The difference is that tensors can run on GPUs or other fast hardware. They are also optimized for automatic differentiation.

#### Initializing a Tensor

There are several ways to make a tensor:

In [2]:
data = [[1,2,3],[4,5,6]]
x_data = torch.tensor(data)
print(x_data)

tensor([[1, 2, 3],
        [4, 5, 6]])


In [3]:
np_data = np.arange(6).reshape(2,3)
x_np = torch.tensor(np_data)
print(x_np)

tensor([[0, 1, 2],
        [3, 4, 5]], dtype=torch.int32)


In [4]:
x_ones = torch.ones_like(x_data)
print(x_ones)

x_rand = torch.rand_like(x_data, dtype=torch.float)
print(x_rand)

tensor([[1, 1, 1],
        [1, 1, 1]])
tensor([[0.7008, 0.4108, 0.3072],
        [0.0762, 0.3754, 0.9614]])


Note that we have a problem if we don't convert the `dtype` in the `torch.rand_like` function:

In [5]:
x_rand_test = torch.rand_like(x_data)

RuntimeError: "check_uniform_bounds" not implemented for 'Long'

This appears to be because the initial tensor `x_data` has the datatype `long` (64-bit integer, according to [documentation](https://pytorch.org/docs/stable/tensor_attributes.html)), and there is no way to sample a random number in the interval `[0,1)` for this datatype.

We can also directly specify the shape for `torch.rand`, `torch.ones`, and `torch.zeros`:

In [6]:
shape=(2,3)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(rand_tensor,ones_tensor,zeros_tensor,sep='\n')

another_ones_tensor = torch.ones(4,5)
print(another_ones_tensor)

tensor([[0.5223, 0.4205, 0.5046],
        [0.7249, 0.7542, 0.8844]])
tensor([[1., 1., 1.],
        [1., 1., 1.]])
tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])


There are several attributes of tensors that we can check:

In [7]:
tensor = torch.rand(3,4)

print(tensor.shape)
print(tensor.dtype)
print(tensor.device)

torch.Size([3, 4])
torch.float32
cpu


#### Operations on Tensors

If you're observant, you'll notice that the device above that the tensor is stored on is a CPU! It turns out that by default, all tensors are initialized with `cpu` as their device. I'm on a Makerspace computer that comes with a NVIDIA GPU that supports CUDA, so we'll want to convert the device to a GPU possible. We can first check if CUDA is available before switching the tensor to that using the `.to` command.

In [8]:
if torch.cuda.is_available():
    tensor = tensor.to('cuda')

print(tensor)

tensor([[0.1523, 0.0669, 0.9925, 0.9683],
        [0.2053, 0.0651, 0.3430, 0.8538],
        [0.2522, 0.8512, 0.3672, 0.4854]], device='cuda:0')


Looks better now!

Tensors can be operated on similar to `numpy` arrays, with standard indexing and slicing.

In [9]:
tensor = torch.rand(4,4)
print(tensor)

print(tensor[0])
print(tensor[0:1])
print(tensor[:,0])
print(tensor[:,-1])

tensor([[0.9981, 0.7904, 0.4151, 0.4267],
        [0.6463, 0.6392, 0.4745, 0.2393],
        [0.8886, 0.1257, 0.5814, 0.9456],
        [0.7998, 0.0311, 0.5486, 0.9827]])
tensor([0.9981, 0.7904, 0.4151, 0.4267])
tensor([[0.9981, 0.7904, 0.4151, 0.4267]])
tensor([0.9981, 0.6463, 0.8886, 0.7998])
tensor([0.4267, 0.2393, 0.9456, 0.9827])


Note that similar to `numpy` arrays, `tensor[0]` and `tensor[0:1]` have different dimensionalities. We can also do standard concatenation along a given `dim` (not `axis`) using `torch.cat`. 

In [10]:
t0 = torch.cat([tensor]*3, dim=0)
t1 = torch.cat([tensor]*3, dim=1)

print(t0,t1,sep='\n ----- \n')

tensor([[0.9981, 0.7904, 0.4151, 0.4267],
        [0.6463, 0.6392, 0.4745, 0.2393],
        [0.8886, 0.1257, 0.5814, 0.9456],
        [0.7998, 0.0311, 0.5486, 0.9827],
        [0.9981, 0.7904, 0.4151, 0.4267],
        [0.6463, 0.6392, 0.4745, 0.2393],
        [0.8886, 0.1257, 0.5814, 0.9456],
        [0.7998, 0.0311, 0.5486, 0.9827],
        [0.9981, 0.7904, 0.4151, 0.4267],
        [0.6463, 0.6392, 0.4745, 0.2393],
        [0.8886, 0.1257, 0.5814, 0.9456],
        [0.7998, 0.0311, 0.5486, 0.9827]])
 ----- 
tensor([[0.9981, 0.7904, 0.4151, 0.4267, 0.9981, 0.7904, 0.4151, 0.4267, 0.9981,
         0.7904, 0.4151, 0.4267],
        [0.6463, 0.6392, 0.4745, 0.2393, 0.6463, 0.6392, 0.4745, 0.2393, 0.6463,
         0.6392, 0.4745, 0.2393],
        [0.8886, 0.1257, 0.5814, 0.9456, 0.8886, 0.1257, 0.5814, 0.9456, 0.8886,
         0.1257, 0.5814, 0.9456],
        [0.7998, 0.0311, 0.5486, 0.9827, 0.7998, 0.0311, 0.5486, 0.9827, 0.7998,
         0.0311, 0.5486, 0.9827]])


We also have standard arithmetic operations. Three ways to do matrix multiplication are shown below, using `@` and `matmul`. `y1`, `y2`, and `y3` should have the same value. Note that `.T` transposes the matrix.

In [11]:
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)
y3 = torch.rand(tensor.shape)
torch.matmul(tensor, tensor.T, out=y3)

tensor([[1.9753, 1.4494, 1.6310, 1.4699],
        [1.4494, 1.1087, 1.1568, 1.0323],
        [1.6310, 1.1568, 2.0375, 1.9628],
        [1.4699, 1.0323, 1.9628, 1.9074]])

If you want to do element-wise multiplication instead, you can use `*` or `mul` instead.

In [12]:
z1 = tensor * tensor
z2 = tensor.mul(tensor)
z3 = torch.rand(tensor.shape) # note that using torch.rand_like also works
torch.mul(tensor, tensor, out=z3)

tensor([[9.9627e-01, 6.2471e-01, 1.7228e-01, 1.8208e-01],
        [4.1770e-01, 4.0859e-01, 2.2516e-01, 5.7284e-02],
        [7.8954e-01, 1.5800e-02, 3.3798e-01, 8.9415e-01],
        [6.3963e-01, 9.6775e-04, 3.0098e-01, 9.6579e-01]])

If your tensor has one element (for example if you summed everything in the tensor), you can get that element out using `.item()`.

In [13]:
agg = tensor.sum()
print(agg, type(agg), sep='\n')
agg_item = agg.item()
print(agg_item, type(agg_item), sep='\n')

tensor(9.5331)
<class 'torch.Tensor'>
9.533109664916992
<class 'float'>


### Datasets and DataLoaders

### Summary of Section 1

**1.1 Tensors**
* Tensors are PyTorch's version of matrices, similar to `numpy`'s ndarrays.
* There are many ways to create them: `torch.tensor()`, `torch.from_numpy()`, `torch.rand_like()`, and `torch.ones()`.
* We can check if CUDA is on our computer using `torch.cuda.is_available()`. Tensors are normally initialized with device `cpu`, so we need to convert them to CUDA using `.to('cuda')`.
* Tensors can be sliced like `numpy` arrays using similar operations. There's normal slicing, `tensor.cat` for joining tensors, across dimension `dim`, and standard matrix multiplication or element-wise multiplication by `@` or `*`, respectively (`matmul` and `mul` also work, respectively).
* Given a tensor of a single element, we can use `.item()` to extract it.