# PyTorch introduction
The fundamental building block in PyTorch is *tensors*. A PyTorch Tensor is conceptually identical to a numpy array. Any computation you might want to perform with numpy can also be accomplished with PyTorch Tensors.

Unlike NumPy, Tensors can utilize GPUs to accelerate numeric computations.

In [1]:
import torch
device = torch.device('cpu')
# device = torch.device('cuda')     # Run on GPU

# Create a random matrix
x = torch.randn(2,3, device=device)
print(x)
print(x+1)
print(torch.matmul(x,x.T))

tensor([[ 1.0224,  0.2589, -0.9608],
        [-0.5659,  0.1118,  2.2862]])
tensor([[2.0224, 1.2589, 0.0392],
        [0.4341, 1.1118, 3.2862]])
tensor([[ 2.0354, -2.7462],
        [-2.7462,  5.5594]])


## PyTorch: Autograd
Manually implementing the backward pass for large networks can quickly get complet. 
With *automatic differentiation*, this can be done automatically using `autograd` in PyTorch. The forward pass in a network will define a computational graph, where nodes will be Tensors and edges will be functions that produce output Tensors from the input Tensors. 

The only thing we need to do it specifying `requires_grad=True` when constructing a Tensor.

With `x` being a tensor with `requires_grad=True`, after backpropagation `x.grad` will be a Tensor holding the gradient of `x` with respect to some scalar value

In [2]:
w = torch.tensor([0.1], requires_grad=True)
b = torch.tensor([2.0], requires_grad=True)

x = torch.tensor([0.0])
y = torch.tensor([1.0])

# Forward pass: compute predicted y using operations on Tensors. Since w and
# b have requires_grad=True, operations involving these Tensors will cause
# PyTorch to build a computational graph, allowing automatic computation of
# gradients. Since we are no longer implementing the backward pass by hand we
# don't need to keep references to intermediate values.

y_pred = w*x+b
print(f'True label: {y}', f'\nPredicted: {y_pred}')
loss = (y_pred - y).pow(2)

print(f'Loss: {loss.item()}')

loss.backward()

print(f'Gradient b: {b.grad}')
print(f'Gradient w: {w.grad}')

# Manually zero the gradients after running the backward pass
w.grad.zero_()
b.grad.zero_()

True label: tensor([1.]) 
Predicted: tensor([2.], grad_fn=<AddBackward0>)
Loss: 1.0
Gradient b: tensor([2.])
Gradient w: tensor([0.])


tensor([0.])

## PyTorch: New autograd functions
In PyTorch we can easily define our own autograd operator by defining a subclasss of `torch.autograd.Function` and implemnting the `forward` and `backward` functions. 

Example of implementing our own ReLU function:

In [5]:
import torch 

class MyReLU(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x):
        ctx.save_for_backward(x)
        x_out = x.clamp(min=0)
        print(f'MyReLU forward {x} -> {x_out}')
        return x_out
    
    @staticmethod
    def backward(ctx, grad_output):
        x, = ctx.saved_tensors
        grad_x = grad_output.clone()
        grad_x[x<0] = 0
        return grad_x

In [6]:
w = torch.tensor([0.1], requires_grad=True)
b = torch.tensor([2.0], requires_grad=True)

x = torch.tensor([0.0])
y = torch.tensor([1.0])

y_pred = MyReLU.apply(w*x+b)
print(f'True label: {y}', f'\nPredicted: {y_pred}')
loss = (y_pred - y).pow(2)

print(f'Loss: {loss.item()}')

loss.backward()

print(f'Gradient b: {b.grad}')
print(f'Gradient w: {w.grad}')

# Manually zero the gradients after running the backward pass
w.grad.zero_()
b.grad.zero_()

MyReLU forward tensor([2.], grad_fn=<AddBackward0>) -> tensor([2.])
True label: tensor([1.]) 
Predicted: tensor([2.], grad_fn=<MyReLUBackward>)
Loss: 1.0
Gradient b: tensor([2.])
Gradient w: tensor([0.])


tensor([0.])

## PyTorch: nn
For large neural networks, raw autograd can be a bit too low-level. 

When building neural networks we usually arrange the computation into layers, wheras some have learnable parameters to be optimized during learning. 

In PyTorch the `nn` package defines a set of *Modules* which are roughly equivalent to neural network layers

In [8]:
import torch

device = torch.device('cpu')

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H,D_out),
).to(device)

loss_fn = torch.nn.MSELoss(reduction='sum')

In [16]:
learning_rate = 1e-4

y_pred = model(x) 

loss = loss_fn(y_pred, y)
print(loss.item())

model.zero_grad()

loss.backward()

# Update the weights using gradient descent. Each parameter is a Tensor, so
# we can access its data and gradients like we did before.
# Example of parameter update
with torch.no_grad():
    for param in model.parameters():
        param.data -= learning_rate * param.grad

415.1668395996094


## PyTorch: Custom nn Modules
You can define your own Modules by subclassing `nn.Module` and defining the `forward` pass.

In [None]:
import torch

class TwoLayerNet(torch.nn.Module):
  def __init__(self, D_in, H, D_out):
    """
    In the constructor we instantiate two nn.Linear modules and assign them as
    member variables.
    """
    super(TwoLayerNet, self).__init__()
    self.linear1 = torch.nn.Linear(D_in, H)
    self.linear2 = torch.nn.Linear(H, D_out)

  def forward(self, x):
    """
    In the forward function we accept a Tensor of input data and we must return
    a Tensor of output data. We can use Modules defined in the constructor as
    well as arbitrary (differentiable) operations on Tensors.
    """
    h_relu = self.linear1(x).clamp(min=0)
    y_pred = self.linear2(h_relu)
    return y_pred

In [None]:
device = torch.device('cpu')

N, D_in, H, D_out = 64, 1000, 100, 10
x = torch.randn(N, D_in, device = device)
y = torch.randn(N, D_out, device = device)

model = TwoLayerNet(D_in, H, D_out)

loss_fn = torch.nn.MSELoss(reduction='sum')

# Define the optimization algorithm to be used (Stochastic Gradient Descent):
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

In [None]:
y_pred = model(x)

# Compute and print loss
loss = loss_fn(y_pred, y)
print(loss.item())

# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()