## 2023 해군 AI 전문인력 양성과정 Day 01: Tutorial 1 (PyTorch, Neural Network)
### TA: Jihyoung Jang (jihyoung@unist.ac.kr)

# Part 0: Check GPU status
* Why do we put "!" before the below command?

In [1]:
!nvidia-smi

Mon Sep 11 01:42:39 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   43C    P8     9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# Part 1: Load PyTorch
* PyTorch is one of the deep learning framework.
* PyTorch is already installed in Google Colab.

In [2]:
import torch

In [3]:
print(torch.__version__)

2.0.1+cu118


# Part 2: Basic Tensor Manipulation

Tensor is PyTorch's basic data structure (i.e., basic buliding block). Each tensor is a multi-dimensional matrix.

In [4]:
t0 = torch.zeros(1, 3)
print(t0)

tensor([[0., 0., 0.]])


In [5]:
t1 = torch.tensor([0, 1, 2, 3, 4])
t2 = torch.tensor([5 ,6 ,7, 8, 9])
print(t1)
print(t2)

tensor([0, 1, 2, 3, 4])
tensor([5, 6, 7, 8, 9])


In [6]:
print(t1.dim())
print(t1.shape)
print(t1.size())

1
torch.Size([5])
torch.Size([5])


In [7]:
t_cat = torch.cat([t1, t2], dim=0)
print(t_cat)

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


In [8]:
print(t_cat.dim())
print(t_cat.shape)
print(t_cat.size())

1
torch.Size([10])
torch.Size([10])


In [9]:
t_reshpae1 = t_cat.view(5, 2)
print(t_reshpae1)
print(t_reshpae1.dim())
print(t_reshpae1.shape)

tensor([[0, 1],
        [2, 3],
        [4, 5],
        [6, 7],
        [8, 9]])
2
torch.Size([5, 2])


### Exercise
Write the code taht creates simple tensor with the following example:
```
tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])
```

In [10]:
##### TODO #####
t_reshpae2 = t_cat.view(2, 5)
print(t_reshpae2)
print(t_reshpae2.dim())
print(t_reshpae2.shape)

tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])
2
torch.Size([2, 5])


In [11]:
t3 = torch.tensor([1, 2, 3])
t3_repeat = t3.repeat(3)

print(t3)
print(t3_repeat)

tensor([1, 2, 3])
tensor([1, 2, 3, 1, 2, 3, 1, 2, 3])


### Exercise
Write the code taht creates simple tensor with the following example:
```
tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])
```

In [12]:
##### TODO #####
t3_repeat = t3.repeat(3, 1)
t3_repeat

tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])

In [13]:
t3_stack = torch.stack([t3, t3, t3])

print(t3_stack)

tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])


In [14]:
t4 = torch.tensor([
                  [1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]
                  ], dtype=torch.float32)
print(t4)
print(t4.dtype)

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
torch.float32


In [15]:
t4[0]

tensor([1., 2., 3.])

In [16]:
t4[0, :]

tensor([1., 2., 3.])

In [17]:
t4[:, 0]

tensor([1., 4., 7.])

In [18]:
t4[0:3]

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])

In [19]:
t4[:]

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])

In [20]:
t4[:, 2]

tensor([3., 6., 9.])

### Exercise
Write the code taht creates simple tensor with the following example:
```
tensor([[5., 6.],
        [8., 9.]])

```

In [21]:
##### TODO #####


# Part 3: Tensor on GPU

In [22]:
print(torch.cuda.is_available())

True


In [23]:
a = torch.ones(3)
b = torch.randn(100, 50, 3)

In [24]:
a

tensor([1., 1., 1.])

In [25]:
b

tensor([[[ 0.3015,  0.8100,  1.7689],
         [ 0.3123,  3.7744, -1.1475],
         [ 1.1690, -0.4432,  0.6728],
         ...,
         [ 0.9330,  0.9485, -0.9469],
         [-0.7034,  0.0529,  0.0380],
         [-0.6898, -0.4165,  2.2390]],

        [[ 0.0565, -0.6544, -0.7873],
         [ 0.3363, -0.6204, -0.5109],
         [ 1.1398, -0.3315,  2.2327],
         ...,
         [ 0.1869,  0.6094,  0.7444],
         [ 0.2430,  0.5762,  1.6998],
         [-0.3301,  1.9211, -1.4964]],

        [[-0.3609,  0.5699, -0.1222],
         [ 0.3629,  0.9838,  0.1390],
         [ 0.2538, -2.2331,  0.9392],
         ...,
         [ 0.8425, -0.2181, -0.0249],
         [-0.7116, -1.0874,  0.6799],
         [-0.5124, -1.1516,  0.1588]],

        ...,

        [[-0.8300, -1.0494, -0.2530],
         [-1.1417,  0.3095,  0.8921],
         [-1.2366,  0.6332, -2.1407],
         ...,
         [-1.1204,  0.9501, -1.0690],
         [-0.5497, -0.8638, -0.0920],
         [-1.1678, -2.2471,  1.2417]],

        [[

In [26]:
print(b.size())

torch.Size([100, 50, 3])


In [27]:
print(a.device)
print(b.device)

cpu
cpu


In [28]:
c = a + b

In [29]:
print(c.device)

cpu


In [30]:
# upload a and b to GPU
a = a.to('cuda')
b = b.to('cuda')

In [31]:
print(a.device)
print(b.device)

cuda:0
cuda:0


In [32]:
c = c.to('cpu')

In [33]:
print(c.device)

cpu


In [34]:
e = torch.ones(3).to('cuda')
d = torch.ones(3)

In [35]:
print(e.device)
print(d.device)

cuda:0
cpu


### Short Quiz
Why does the below code return an error message?

In [36]:
f = e + d

RuntimeError: ignored

# Part 4: Autograd
PyTorch is famous for its ability to automatically calculate gradients. We can simply use the `backward()` method to tell PyTorch to figure out these gradients for us, and it stores them in the grad attribute for later use.






In [37]:
x = torch.tensor([2.], requires_grad=True)
print(x)

tensor([2.], requires_grad=True)


Why None? What do you think about this result?

In [38]:
print(x.grad)

None


$y=x^2 \times 3$

$x.grad = \frac{d(y)}{d(x)} = \frac{d(3x^2)}{d(x)} = 6x = 12$

In [39]:
y = x * x * 3
y.backward()

print(x.grad)

tensor([12.])


In [40]:
z = x * x * 3
z.backward()

print(x.grad)

tensor([24.])


### Exercise

In [41]:
x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [42]:
y = x + 2
print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


In [43]:
z = y * y * 3
print(z)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)


In [44]:
out = z.mean()
print(out)

tensor(27., grad_fn=<MeanBackward0>)


In [45]:
y.retain_grad()
z.retain_grad()
out.backward()

In [46]:
print(z.grad)

tensor([[0.2500, 0.2500],
        [0.2500, 0.2500]])


In [47]:
print(y.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


In [48]:
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


In [49]:
with torch.no_grad():
  x = torch.ones(2, 2, requires_grad=True)
  y = x + 2
  z = y * y * 3
  out = z.mean()

In [50]:
out

tensor(27.)

### Short Quiz
Why does the below code return an error message?

In [51]:
out.backward()

RuntimeError: ignored

# Part 5: nn.Module
`torch.nn.Module` is the fundamental base class for all Neural Networks in PyTorch. A module can contain other modules, allowing us to create complex neural network architectures in a structured and organized manner. These modules can be organized in a hierarchical or tree-like structure, enabling us to build and manage deep and intricate neural networks efficiently.






In [52]:
import torch.nn as nn

X = torch.tensor([[1., 2., 3.], [4., 5., 6.]])

print(X)
print(X.shape)

tensor([[1., 2., 3.],
        [4., 5., 6.]])
torch.Size([2, 3])


In [53]:
linear_fn = nn.Linear(3, 1)

In [54]:
linear_fn

Linear(in_features=3, out_features=1, bias=True)

In [55]:
Y = linear_fn(X)

print(Y)
print(Y.shape)

tensor([[ 0.0546],
        [-0.6098]], grad_fn=<AddmmBackward0>)
torch.Size([2, 1])


### How to design our own model?

In [56]:
class Model(nn.Module):
  def __init__(self, input_dim, output_dim, hidden_dim):
    super(Model, self).__init__()
    self.linear_1 = nn.Linear(input_dim, hidden_dim)
    self.linear_2 = nn.Linear(hidden_dim, output_dim)
    self.relu = nn.ReLU()
  def forward(self, x):
    x = self.linear_1(x)
    x = self.relu(x)
    x = self.linear_2(x)
    return x

# Part 6: MLP

In [57]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torch.utils.data import DataLoader

import torchvision
import torchvision.transforms as transforms

In [58]:
train_dataset = torchvision.datasets.MNIST(root='./', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = torchvision.datasets.MNIST(root='./', train=False, transform=transforms.ToTensor())

train_loader = DataLoader(dataset=train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=128, shuffle=False)

### Exercise
Write the code taht builds simple neural network model with the following architecture:
```
NeuralNet(
  (fc1): Linear(in_features=784, out_features=20, bias=True)
  (fc2): Linear(in_features=20, out_features=20, bias=True)
  (fc3): Linear(in_features=20, out_features=10, bias=True)
  (sigmoid): Sigmoid()
)
```

In [63]:
##### TODO #####
class NeuralNet(nn.Module):
  def __init__(self, input_size, hidden_size, output_size):
    super(NeuralNet, self).__init__()
    self.fc1 = nn.Linear(input_size, hidden_size)
    self.fc2 = nn.Linear(hidden_size, hidden_size)
    self.fc3 = nn.Linear(hidden_size, output_size)
    self.sigmoid = nn.Sigmoid()
  def forward(self, x):
    out = self.fc1(x)
    out = self.sigmoid(out)
    out = self.fc2(out)
    out = self.sigmoid(out)
    out = self.fc3(out)
    return out

In [64]:
# input dim - 784
# hidden dim - 20
# output dim - 10 (why?)

model = NeuralNet(784, 20, 10)
model = model.to('cuda')
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.05)

In [65]:
total_step = len(train_loader)

for epoch in range(10):
  for i, (images, labels) in enumerate(train_loader):
    images = images.reshape(-1, 28*28).to('cuda')
    labels = labels.to('cuda')

    outputs = model(images)
    loss = criterion(outputs, labels)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (i+1) % 100 == 0:
      print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'.format(epoch+1, 10, i+1, total_step, loss.item()))

Epoch [1/10], Step [100/469], Loss: 2.3015
Epoch [1/10], Step [200/469], Loss: 2.2992
Epoch [1/10], Step [300/469], Loss: 2.3022
Epoch [1/10], Step [400/469], Loss: 2.2987
Epoch [2/10], Step [100/469], Loss: 2.2856
Epoch [2/10], Step [200/469], Loss: 2.2842
Epoch [2/10], Step [300/469], Loss: 2.2748
Epoch [2/10], Step [400/469], Loss: 2.2577
Epoch [3/10], Step [100/469], Loss: 2.2414
Epoch [3/10], Step [200/469], Loss: 2.2200
Epoch [3/10], Step [300/469], Loss: 2.1815
Epoch [3/10], Step [400/469], Loss: 2.1278
Epoch [4/10], Step [100/469], Loss: 1.9413
Epoch [4/10], Step [200/469], Loss: 1.9613
Epoch [4/10], Step [300/469], Loss: 1.8475
Epoch [4/10], Step [400/469], Loss: 1.8129
Epoch [5/10], Step [100/469], Loss: 1.6476
Epoch [5/10], Step [200/469], Loss: 1.5159
Epoch [5/10], Step [300/469], Loss: 1.5863
Epoch [5/10], Step [400/469], Loss: 1.4348
Epoch [6/10], Step [100/469], Loss: 1.3044
Epoch [6/10], Step [200/469], Loss: 1.2910
Epoch [6/10], Step [300/469], Loss: 1.1682
Epoch [6/10

### Exercise
How to test model performace? Try it yourself!

In [66]:
with torch.no_grad():
  correct = 0
  total = 0
  for images, labels in test_loader:
    images = images.reshape(-1, 28*28).to('cuda')
    labels = labels.to('cuda')
    outputs = model(images)
    _, predicted = torch.max(outputs.data, 1)  # classificatoin model -> get the label prediction of top 1
    total += labels.size(0)
    correct += (predicted == labels).sum().item()
  print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))

Accuracy of the network on the 10000 test images: 83.13 %
