<a href="https://colab.research.google.com/github/xhxuciedu/CS175/blob/master/pytorch-tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import numpy as np
import sys
import torch 
import torchvision
import torch.nn as nn
import torchvision.transforms as transforms

### Check Package Versions

In [63]:
print('__Python VERSION:', sys.version)
print('__PyTorch VERSION:', torch.__version__)
print('__CUDNN VERSION:', torch.backends.cudnn.version())
print('__Number CUDA Devices:', torch.cuda.device_count())

__Python VERSION: 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0]
__PyTorch VERSION: 1.0.1.post2
__CUDNN VERSION: 7402
__Number CUDA Devices: 1


### PyTorch
What is PyTorch?

It’s a Python based scientific computing package targeted at two sets of audiences:

* A replacement for numpy to use the power of GPUs
* a deep learning research platform that provides maximum flexibility and speed





### Tensors

Tensors are similar to numpy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.


Construct a 5x3 matrix, uninitialized

In [66]:
x = torch.Tensor(5, 3)
print(x)

tensor([[1.0646e-35, 0.0000e+00, 4.4842e-44],
        [0.0000e+00,        nan, 0.0000e+00],
        [4.2314e+21, 2.7152e-06, 3.3497e-09],
        [1.0386e+21, 5.4215e-05, 8.3387e-10],
        [1.3733e-05, 1.6502e-07, 2.7558e-09]])


In [67]:
# get its size
y = torch.rand(5, 3)
print(x + y)

tensor([[4.0797e-02, 5.9958e-02, 8.9297e-02],
        [5.0070e-01,        nan, 1.9553e-01],
        [4.2314e+21, 5.2750e-01, 4.6367e-01],
        [1.0386e+21, 8.7858e-01, 7.3029e-01],
        [3.5620e-01, 1.4906e-01, 2.3095e-01]])


In [68]:
# Addition: in-place
y.add_(x)

tensor([[4.0797e-02, 5.9958e-02, 8.9297e-02],
        [5.0070e-01,        nan, 1.9553e-01],
        [4.2314e+21, 5.2750e-01, 4.6367e-01],
        [1.0386e+21, 8.7858e-01, 7.3029e-01],
        [3.5620e-01, 1.4906e-01, 2.3095e-01]])

### Numpy Bridge
Converting a torch Tensor to a numpy array and vice versa is a breeze.

The torch Tensor and numpy array will share their underlying memory locations, and changing one will change the other.

Converting torch Tensor to numpy Array

In [0]:
# Create a numpy array.
x = np.array([[1, 2], [3, 4]])

# Convert the numpy array to a torch tensor.
y = torch.from_numpy(x)

# Convert the torch tensor to a numpy array.
z = y.numpy()

In [0]:
# Conversion
a = np.array([1, 2, 3])
v = torch.from_numpy(a)         # Convert a numpy array to a Tensor

b = v.numpy()                   # Tensor to numpy
b[1] = -1                       # Numpy and Tensor share the same memory
assert(a[1] == b[1])            # Change Numpy will also change the Tensor

###CUDA Tensors

All the Tensors on the CPU except a CharTensor support converting to NumPy and back.


Tensors can be moved onto GPU using the .cuda function.

In [0]:
# let us run this cell only if CUDA is available

x = torch.rand(3,2)
y = torch.rand(3,2)
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    x + y

In [77]:
x

tensor([[4.4668e-01, 2.9350e-01],
        [8.7424e-01, 2.2018e-01],
        [4.8721e-04, 5.0994e-01]], device='cuda:0')

In [78]:
y

tensor([[0.6657, 0.0082],
        [0.9423, 0.1514],
        [0.5457, 0.5076]], device='cuda:0')

## Autograd: automatic differentiation

Central to all neural networks in PyTorch is autograd, a core torch package for automatic differentiation. 


The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

Let us see this in more simple terms with some examples.

In [0]:
# create an variable
x = torch.ones((2,2), requires_grad=True)

# Do an operation of variable:
y = x + 2

# Do more operations on y
z = y * y * 3
out = z.mean()

In [0]:
# Gradients
# ---------
# let's backprop now
# ``out.backward()`` is equivalent to doing ``out.backward(torch.Tensor([1.0]))``
out.backward()

In [89]:
###############################################################
# print gradients d(out)/dx
#

print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


### Basic autograd example 1 

In [0]:
# Create tensors.
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

In [0]:
# Build a computational graph.
y = w * x + b    # y = 2 * x + 3

In [0]:
# Compute gradients.
y.backward()

In [15]:
# Print out the gradients.
print(x.grad)    # x.grad = 2 
print(w.grad)    # w.grad = 1 
print(b.grad)    # b.grad = 1 

tensor(2.)
tensor(1.)
tensor(1.)


In [4]:
y.detach().numpy()

array(5., dtype=float32)

### Basic autograd example 2  

In [16]:
# Create tensors of shape (10, 3) and (10, 2).
x = torch.randn(10, 3)
y = torch.randn(10, 2)
print(x.shape, y.shape)

torch.Size([10, 3]) torch.Size([10, 2])


In [17]:
# Build a fully connected layer.
linear = nn.Linear(3, 2)
print ('w: ', linear.weight)
print ('b: ', linear.bias)

w:  Parameter containing:
tensor([[-0.5096, -0.4019, -0.4009],
        [ 0.4762,  0.3286, -0.5562]], requires_grad=True)
b:  Parameter containing:
tensor([-0.0573, -0.4369], requires_grad=True)


In [24]:
loss = torch.sum((linear(x)-y)**2)/y.shape[0]
print('loss: ', loss.data.numpy())

loss:  2.1739974


In [0]:
loss.backward()

In [35]:
print('w grad: ', linear.weight.grad)
print('b grad: ', linear.bias.grad)

w grad:  tensor([[-0.5087, -0.1274, -0.4962],
        [ 0.7756,  1.2394, -1.5578]])
b grad:  tensor([ 0.1228, -0.2947])


In [50]:
# check grad
print('w grad:', (linear(x)-y).transpose(0,1).mm(x)/y.shape[0]*2)
print('b grad:', 2*torch.mean(linear(x)-y, dim=0))

w grad: tensor([[-0.5087, -0.1274, -0.4962],
        [ 0.7756,  1.2394, -1.5578]], grad_fn=<MulBackward0>)
b grad: tensor([ 0.1228, -0.2947], grad_fn=<MulBackward0>)


###. Basic autograd example 3

In [0]:
# Create tensors of shape (10, 3) and (10, 2).
x = torch.randn(10, 3)
y = torch.randn(10, 2)
linear = nn.Linear(3, 2)

In [52]:
# Build loss function and optimizer.
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(linear.parameters(), lr=0.01)

# Forward pass.
pred = linear(x)

# Compute loss.
loss = criterion(pred, y)
print('loss: ', loss.item())

# Backward pass.
loss.backward()

# Print out the gradients.
print ('dL/dw: ', linear.weight.grad) 
print ('dL/db: ', linear.bias.grad)

# 1-step gradient descent.
optimizer.step()

# You can also perform gradient descent at the low level.
# linear.weight.data.sub_(0.01 * linear.weight.grad.data)
# linear.bias.data.sub_(0.01 * linear.bias.grad.data)

# Print out the loss after 1-step gradient descent.
pred = linear(x)
loss = criterion(pred, y)
print('loss after 1 step optimization: ', loss.item())

loss:  1.4257314205169678
dL/dw:  tensor([[-0.3429, -0.3756,  0.2482],
        [-0.7196, -1.0730, -0.1675]])
dL/db:  tensor([ 0.6538, -0.1833])
loss after 1 step optimization:  1.401167392730713


array([ 1, -1,  3])