# Welcome to the first lab!

This lab is a refresher of Python, Pytorch, Numpy and the Colab infrasrtucture which we are going to use during this class.

The rule of thumb #1: try to understand every single line of code in the lab notebook.

The rule of thumb #2: do not hesitate to ask a question about any line of code, this is precisely why we do these labs.

## Google Colab

https://research.google.com/colaboratory/faq.html

Main things to remember:

* It is free to use;

* Colab resources **are not guaranteed** and not unlimited, and the usage limits sometimes fluctuate;

* Colab notebooks are stored in Google Drive, or can be loaded from GitHub. 

In [None]:
# Colab is effectively running on the Linux virtual machine in the cloud
# exclamation mark will pass the command to the tty/terminal

! uname -a 

Linux 5cead4802c70 5.4.104+ #1 SMP Sat Jun 5 09:50:34 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux


In [None]:
# Colab gives you the write access to some part of the virtual file system

! pwd
! echo 'file on colab' > test.txt
! ls -lh
! cat test.txt

/content
total 8.0K
drwxr-xr-x 1 root root 4.0K Sep  1 19:26 sample_data
-rw-r--r-- 1 root root   14 Sep  9 15:21 test.txt
file on colab


### Data munging: local filesystem / google drive / direct download from the internet

more details here: https://colab.research.google.com/notebooks/io.ipynb#scrollTo=vz-jH8T_Uk2c

In [None]:
# uploading from your local filesystem

from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

KeyboardInterrupt: ignored

## Installing exernal packages

In [None]:
# list currently installed packages

! pip list

Package                       Version
----------------------------- --------------
absl-py                       0.12.0
alabaster                     0.7.12
albumentations                0.1.12
altair                        4.1.0
appdirs                       1.4.4
argcomplete                   1.12.3
argon2-cffi                   21.1.0
arviz                         0.11.2
astor                         0.8.1
astropy                       4.3.1
astunparse                    1.6.3
atari-py                      0.2.9
atomicwrites                  1.4.0
attrs                         21.2.0
audioread                     2.1.9
autograd                      1.3
Babel                         2.9.1
backcall                      0.2.0
beautifulsoup4                4.6.3
bleach                        4.0.0
blis                          0.4.1
bokeh                         2.3.3
Bottleneck                    1.3.2
branca                        0.4.2
bs4                           0.0.1
CacheControl

In [None]:
# installing nlp library

! pip install datasets

Collecting datasets
  Downloading datasets-1.11.0-py3-none-any.whl (264 kB)
[K     |████████████████████████████████| 264 kB 5.3 MB/s 
Collecting xxhash
  Downloading xxhash-2.0.2-cp37-cp37m-manylinux2010_x86_64.whl (243 kB)
[K     |████████████████████████████████| 243 kB 41.3 MB/s 
Collecting huggingface-hub<0.1.0
  Downloading huggingface_hub-0.0.16-py3-none-any.whl (50 kB)
[K     |████████████████████████████████| 50 kB 6.8 MB/s 
[?25hCollecting fsspec>=2021.05.0
  Downloading fsspec-2021.8.1-py3-none-any.whl (119 kB)
[K     |████████████████████████████████| 119 kB 47.1 MB/s 
Installing collected packages: xxhash, huggingface-hub, fsspec, datasets
Successfully installed datasets-1.11.0 fsspec-2021.8.1 huggingface-hub-0.0.16 xxhash-2.0.2


## Python

Python refresher material is adapted from https://colab.research.google.com/drive/1pfIa4Mfynjsi7F38P8J6lF1qaGlolzzw

For this lab and much of machine learning, we will use Python for its ease of understanding and large library ecosystem. Let's briefly go over Python syntax and features.

Python is an [imperative language](https://en.wikipedia.org/wiki/Imperative_programming) based on [statements](https://en.wikipedia.org/wiki/Statement_(computer_science)). That is, programs in Python consists of lines composed of statements. Among other things, a statement can be:
* a single expression, e.g. `5 + 5`
* an assignment, e.g. `x = 5`
    * variable names can be any length and can consist of uppercase and lowercase letters (A-Z, a-z), digits (0-9), and the underscore character (_), except they cannot start with a digit
* a function call, e.g. `print(x)`
* make in-line comments by prepending lines with \#

### Built-in Data Types

* Numbers: there are two important numerical types, `int` and `float`
  * integers: `1`, `-3`, etc.
  * floating-point: `1.0`, `3.14`, etc.
* strings: `'apple'`, `"v"`
* boolean values: `True`, `False`

### Functions

Functions are defined with the following syntax:
```
def function_name(arg1, args2=default2, ...):
    # function body
    return
```
Most of the time, your functions should return a value (possibly multiple values, separated by commas) using the `return` keyword, but this isn't a requirement. If you don't explicitly return something, the function will return the special `None` value by default.

Functions are called by using the function name followed by parentheses.
If you use the function name without parentheses, you are referring to the function itself, as an object.

Python contains many built-in functions. Some of these are straightforward math operators, (e.g. `+, -, /, *`), and others must be called using parentheses, such as `print()`, `int()`, `sum()`, or `len()`.

**(exercise)** Write a function that takes in two numbers and computes their mean.

In [None]:
def mean(a, b):
    mean_result = (a+b)//2
    return mean_result

In [None]:
mean(1,6)

3

### Data Structures

Python also has three built-in data structures that are very useful:

**Lists** are ordered lists and are created using brackets (`[]`) with comma-separated values. We can access list elements using the list name followed by the index of the element we want to access.

In [None]:
# lists
l = [1, 2, 3]
print(l[0]) # indexing
print(l[1])
print(l[-1]) # negative indexing
print(l[:2]) # slicing

ll = [1, "a", []] # list elements don't need to be the same type

lll = [] # defining empty list, also `list()`
print(lll)
lll.append(1) # add to a list
lll.append(2)
print(lll)
print(len(lll)) # get the length

print(ll + lll) # concatenation of two lists

1
2
3
[1, 2]
[]
[1, 2]
2
[1, 'a', [], 1, 2]


**Dictionaries**, or hash tables, are sets of key-value pairs. We create dictionaries using curly braces (`{}`). We can access the value associated with a particular key by typing the name of the dictionary followed by the key in brackets. The key can be a variable or a literal (strings or numbers).


In [None]:
d = {"apple": "a fruit", "banana": "an herb", "monkey": "a mammal"}
print(len(d)) # number of key-value pairs in d
print(d["apple"]) # accessing an element

d['broccoli'] = 'a vegetable' # assign a new key-value pair
del d['apple'] # delete a key-value pair
print("apple" in d) # check membership
key = "banana"
print(key in d) # variables as keys

print(d.keys()) # unordered list of keys in the dictionary
print(d.values()) # unordered list of values in the dictionary

3
a fruit
False
True
dict_keys(['banana', 'monkey', 'broccoli'])
dict_values(['an herb', 'a mammal', 'a vegetable'])


**Tuples** are also ordered lists of items, but unlike lists they cannot be changed (i.e., they're immutable). Tuples are created using parentheses and their elements can also be accessed using brackets.


In [None]:
t = (1, 2, "cow")
print(t[-1])

cow


# PyTorch

The material in this section is borrowed from [here](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)

**What is PyTorch?**

PyTorch is a Python-based scientific computing package serving two broad purposes:
* A replacement for NumPy to use the power of GPUs and other accelerators.
* An automatic differentiation library that is useful to implement neural networks.

## Tensors

Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs or other specialized hardware to accelerate computing. If you’re familiar with ndarrays, you’ll be right at home with the Tensor API. If not, follow along in this quick API walkthrough.

In [None]:
import torch
import numpy as np

### Tensor Initialization

Tensors can be initialized in various ways. Take a look at the following examples:

**Directly from data**

Tensors can be created directly from data. The data type is automatically inferred.

In [None]:
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)

In [None]:
x_data

tensor([[1, 2],
        [3, 4]])

**From a NumPy array**

Tensors can be created from NumPy arrays (and vice versa).

In [None]:
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

**From another tensor:**

The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.

In [None]:
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 

Random Tensor: 
 tensor([[0.8700, 0.8070],
        [0.2064, 0.7080]]) 



**With random or constant values:**

shape is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor.

### Tensor Attributes

Tensor attributes describe their shape, datatype, and the device on which they are stored.

In [None]:
tensor = torch.rand(3,4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### Tensor Operations

Over 100 tensor operations, including transposing, indexing, slicing, mathematical operations, linear algebra, random sampling, and more are comprehensively described here: https://pytorch.org/docs/stable/index.html

Most of them can be run on the GPU (at typically higher speeds than on a CPU). If you’re using Colab, allocate a GPU by going to **Runtime -> Change runtime type.**

In [None]:
# We move our tensor to the GPU if available
import torch
tensor = torch.rand(3,4)

if torch.cuda.is_available():
  tensor = tensor.to('cuda')

In [None]:
tensor

tensor([[0.0061, 0.6835, 0.5389, 0.4224],
        [0.9294, 0.4265, 0.3622, 0.3288],
        [0.7888, 0.5223, 0.3090, 0.0559]], device='cuda:0')

Try out some of the operations from the list. If you’re familiar with the NumPy API, you’ll find the Tensor API a breeze to use.

**Standard numpy-like indexing and slicing:**

In [None]:
tensor = torch.ones(4, 4)
tensor[:,1] = 0
print(tensor)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])


**Joining tensors** You can use torch.cat to concatenate a sequence of tensors along a given dimension. See also torch.stack, another tensor joining op that is subtly different from torch.cat.

In [None]:
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)

tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])


**Multiplying tensors**

In [None]:
# This computes the element-wise product
print(f"tensor.mul(tensor) \n {tensor.mul(tensor)} \n")
# Alternative syntax:
print(f"tensor * tensor \n {tensor * tensor}")

tensor.mul(tensor) 
 tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]]) 

tensor * tensor 
 tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])


This computes the matrix multiplication between two tensors



In [None]:
print(f"tensor.matmul(tensor.T) \n {tensor.matmul(tensor.T)} \n")
# Alternative syntax:
print(f"tensor @ tensor.T \n {tensor @ tensor.T}")

tensor.matmul(tensor.T) 
 tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]]) 

tensor @ tensor.T 
 tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]])


**In-place operations** Operations that have a _ suffix are in-place. For example: x.copy_(y), x.t_(), will change x.

In [None]:
print(tensor, "\n")
tensor.add_(5)
print(tensor)

tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]]) 

tensor([[6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.]])


**Broadcasting** This term means specific set of rules which allows to make operations with two tensors when their shapes do not match directly. We encourage students to read more about broadcasting rules and shape manipulations if they feel lack of experience with this topic.

Useful references: 
https://machinelearningknowledge.ai/pytorch-tutorial-for-reshape-squeeze-unsqueeze-flatten-and-view/

https://medium.com/ai%C2%B3-theory-practice-business/broadcasting-in-pytorch-numpy-36bbdef22dff


### Bridge with NumPy

Tensors on the CPU and NumPy arrays can share their underlying memory locations, and changing one will change the other.

### Tensor to NumPy array

In [None]:
t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")

t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]


A change in the tensor reflects in the NumPy array.

In [None]:
t.add_(1)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]


NumPy array to Tensor



In [None]:
n = np.ones(5)
t = torch.from_numpy(n)

In [None]:
t_1 = t.to('cuda')

In [None]:
t_1

tensor([2., 2., 2., 2., 2.], device='cuda:0')

In [None]:
np.add(n,5, out=n)

array([7., 7., 7., 7., 7.], dtype=float32)

In [None]:
t

tensor([7., 7., 7., 7., 7.])

In [None]:
t_1

tensor([2., 2., 2., 2., 2.], device='cuda:0')

Changes in the NumPy array reflects in the tensor.

In [None]:
np.add(n, 1, out=n)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]


## Models in PyTorch

PyTorch provides a submodule for implementing and building machine learning models called `torch.nn` (`nn` for neural networks). `nn` implements many common mathematical transformations as functions that can be chained together to build machine learning modules. Let's go over an example of creating and using logistic regression in PyTorch. 

In [None]:
import torch.nn as nn

class BinaryLogisticRegression(nn.Module):

  def __init__(self, input_dim):
    super(BinaryLogisticRegression, self).__init__()
    self.linear = nn.Linear(input_dim, 1)
    self.sigmoid = nn.Sigmoid()

  def forward(self, input):
    outputs = self.sigmoid(self.linear(input)) # includes the bias term
    return outputs

model = BinaryLogisticRegression(2) # instantiate model, this will implicitly initialize the parameters following default initialization scheme
print(f"Model: {model}")
print(f"Linear weights:\n {model.linear.weight}")
print(f"Linear bias:\n {model.linear.bias}")
print()
print(f"Dummy model outputs: {model(torch.rand(2))}")

Model: BinaryLogisticRegression(
  (linear): Linear(in_features=2, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)
Linear weights:
 Parameter containing:
tensor([[-0.4173, -0.0211]], requires_grad=True)
Linear bias:
 Parameter containing:
tensor([-0.2439], requires_grad=True)

Dummy model outputs: tensor([0.3630], grad_fn=<SigmoidBackward>)


## A Gentle Introduction To `torch.autograd`

`torch.autograd` is PyTorch’s automatic differentiation engine that powers neural network training. In this section, you will get a conceptual understanding of how autograd helps a neural network train.

### Background

Neural networks (NNs) are a collection of nested functions that are executed on some input data. These functions are defined by parameters (consisting of weights and biases), which in PyTorch are stored in tensors.

Training a NN happens in two steps:

**Forward Propagation:** In forward prop, the NN makes its best guess about the correct output. It runs the input data through each of its functions to make this guess.

**Backward Propagation:** In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent. For a more detailed walkthrough of backprop, [check out this video from 3Blue1Brown](https://www.youtube.com/watch?v=tIeHLnjs5U8).

### Differentiation in Autograd

In [None]:
import torch

a = torch.tensor(2., requires_grad=True)
b = torch.tensor(6., requires_grad=True)

We create another tensor Q from a and b.

$Q=3a^{3} - b^{2}$

In [None]:
Q = 3*a**3 - b**2

Let’s assume a and b to be parameters of an NN, and Q to be the error. In NN training, we want gradients of the error w.r.t. parameters, i.e.

$\frac{\partial Q}{\partial a} = 9a^{2}$

$\frac{\partial Q}{\partial b} = -2b$

When we call .backward() on Q, autograd calculates these gradients and stores them in the respective tensors’ .grad attribute.

We need to explicitly pass a gradient argument in Q.backward() because it is a vector. gradient is a tensor of the same shape as Q, and it represents the gradient of Q w.r.t. itself, i.e.

$\frac{\partial Q}{\partial Q} = 1$

Equivalently, we can also aggregate Q into a scalar and call backward implicitly, like Q.sum().backward().



In [None]:
Q.backward()

Gradients are now deposited in a.grad and b.grad

In [None]:
# check if collected gradients are correct
print(9*a**2 == a.grad)
print(-2*b == b.grad)

tensor(True)
tensor(True)


In [None]:
a.grad

tensor(36.)