
Introduction to Torch's tensor library
======================================

All of deep learning is computations on tensors, which are
generalizations of a matrix that can be indexed in more than 2
dimensions. We will see exactly what this means in-depth later. First,
lets look what we can do with tensors.



In [None]:
# Original Author: Robert Guthrie.
# Adapted by Amitabh Chaudhary

import torch
import torch.nn as nn
import torch.nn.functional as F


torch.manual_seed(1)

#### Creating Tensors

Tensors can be created from Python lists with the torch.tensor()
function.




In [None]:
# torch.tensor(data) creates a torch.Tensor object with the given data.
V_data = [1., 2., 3.]
V = torch.tensor(V_data)
print(V)

# Creates a matrix
M_data = [[1., 2., 3.], [4., 5., 6]]
M = torch.tensor(M_data)
print(M)

# Create a 3D tensor of size 2x2x2.
T_data = [[[1., 2.], [3., 4.]],
          [[5., 6.], [7., 8.]]]
T = torch.tensor(T_data)
print(T)

What is a 3D tensor anyway? Think about it like this. If you have a
vector, indexing into the vector gives you a scalar. If you have a
matrix, indexing into the matrix gives you a vector. If you have a 3D
tensor, then indexing into the tensor gives you a matrix!

A note on terminology:
when I say "tensor" in this tutorial, it refers
to any torch.Tensor object. Matrices and vectors are special cases of
torch.Tensors, where their dimension is 1 and 2 respectively. When I am
talking about 3D tensors, I will explicitly use the term "3D tensor".




In [None]:
# Index into V and get a scalar (0 dimensional tensor)
print(V[0])
# Get a Python number from it
print(V[0].item())

# Index into M and get a vector
print(M[0])

# Index into T and get a matrix
print(T[0])

# Index into T to get the last column of the second matrix.
print(T[1,:,-1])

You can also create tensors of other data types. To create a tensor of integer types, try
torch.tensor([[1, 2], [3, 4]]) (where all elements in the list are integers).
You can also specify a data type by passing in ``dtype=torch.data_type``.
Check the documentation for more data types, but
Float and Long will be the most common.




You can create a tensor with random data and the supplied dimensionality
with torch.randn()




In [None]:
x = torch.randn((3, 4, 5))
print(x)

#### Operations with Tensors


You can operate on tensors in the ways you would expect.



In [None]:
x = torch.tensor([1., 2., 3.])
y = torch.tensor([4., 5., 6.])
z = x + y
print(z)

See the documentation---[pytorch.org/docs/torch.html](https://pytorch.org/docs/torch.html)---for a
complete list of the large number of operations available to you. They
expand beyond just mathematical operations.

One helpful operation that we will make use of later is concatenation.




In [None]:
# By default, it concatenates along the first axis (concatenates rows)
x_1 = torch.randn(2, 5)
y_1 = torch.randn(3, 5)
z_1 = torch.cat([x_1, y_1])
print(z_1)

# Concatenate columns:
x_2 = torch.randn(2, 3)
y_2 = torch.randn(2, 5)
# second arg specifies which axis to concat along
z_2 = torch.cat([x_2, y_2], 1)
print(z_2)

# If your tensors are not compatible, torch will complain.  Uncomment to see the error
#torch.cat([x_1, x_2])

To see the shape of the tensor use either the attribute .shape (with no parentheses following it) or the function .size() (with parentheses since this is a function call).

In [None]:
print(z_2.shape) #.shape is an attribute
print(z_2.size()) #.size() is a member functionm

#### Reshaping Tensors


Use the .view() method to reshape a tensor. This method is used frequently, because many neural network components expect their inputs in a certain shape. You will often need to reshape your tensor before passing it to a component.




In [None]:
x = torch.randn(2, 3, 4)
print(x)
print(x.view(2, 12))  # Reshape to 2 rows, 12 columns
# Same as above.  If one of the dimensions is -1, its size can be inferred
print(x.view(2, -1))

The newer function .reshape() is similar to .view(), and actually more general.  There is a [slight difference](https://jdhao.github.io/2019/07/10/pytorch_view_reshape_transpose_permute/) between the two, which is unimportant at this point.

In [None]:
x = torch.randn(3, 4)
print(x)
print(x.reshape(2, 6))
print(x.view(2, 6))

One can create new tensors that retain the shape and datatype of a given tensor.  Two such functions are ones_like() and rand_like().

In [None]:
y = torch.ones_like(x)
print(y)
print(torch.rand_like(x.reshape(-1,6))) #-1 implies "infer the size"

Multiply matrices together using the @ operator or the matmul() function.  These will give an error if the sizes don't match.

In [None]:
x = torch.tensor([[1, 2, 3], [0, 0, 1]])
y = torch.tensor([[4, 5]])
z = torch.tensor([[1], [2], [3]])

print(y @ x)
print(y.matmul(x))
print(x.matmul(z))
print(x @ z)

For matrices of compatible sizes use * or mul() to compute an element-wise product. These follow [broadcasting rules](https://numpy.org/doc/stable/user/basics.broadcasting.html) as in numpy.

In [None]:
x = torch.tensor([[1, 2, 3], [4, 5, 6]])
y = torch.tensor([[1, 1, 3], [0, 0, 1]])
z = torch.tensor([[2],[3]])
u = torch.tensor([1, 3, 0])
print(x * y)
print(y.mul(x))
print(x * z)
print(x.mul(u))

Use the attribute .T or the function transpose(0, 1) to transpose a matrix.  transpose(dim1, dim2) can switch between any two dimensions of a matrix.

In [None]:
x = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(x.T)
print(x.transpose(0,1))

y = torch.tensor([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9],[10, 11, 12]]])
print(y)
print(y.transpose(0, 1))
print(y.transpose(1, 2))

We often need to remove a dimension (of size 1) or add a dimension (of size 1) from a tensor.  For these use squeeze() and unsqueeze().

In [None]:
x = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(x, x.shape)
y = x.unsqueeze(1)
print(y, y.shape)
z = y[:,:,1]
print(z, z.shape)
u = z.squeeze(1)
print(u, u.shape)

In [None]:
'''
PROBLEM 4 (a)

Write a function repeat() that takes a tensor of any shape and "repeats"
the vectors in the last dimension.
So if x is tensor([2, 3]), repeat(x) should return
    tensor([[2, 3],
            [2, 3]])

And if x is tensor([[[2, 3],[4,5]],[[7, 8],[9,10]]]),
repeat(x) should return
    tensor([[[[ 2,  3],
              [ 2,  3]],
             [[ 4,  5],
              [ 4,  5]]],
            [[[ 7,  8],
              [ 7,  8]],
             [[ 9, 10],
              [ 9, 10]]]]).
Use only the functions torch.cat() and torch.unsqueeze() and no loops.
'''

def repeat(x):
    ## WRITE YOUR CODE HERE
    pass

def test_repeat():
    x1 = torch.tensor([2, 3])
    x2 = torch.tensor([[2, 3],[1,5]])
    x3 = torch.tensor([[[2, 3],[4,5]],[[7, 8],[9,10]]])
    y1 = torch.tensor([[2, 3], [2, 3]])
    y2 = torch.tensor([[[2, 3],[2, 3]],[[1, 5],[1, 5]]]) 
    y3 = torch.tensor([[[[ 2,  3],[ 2,  3]],[[ 4,  5],[ 4,  5]]],
            [[[ 7,  8],[ 7,  8]],[[ 9, 10],[ 9, 10]]]])
    assert(torch.equal(repeat(x1),y1))
    assert(torch.equal(repeat(x2),y2))    
    assert(torch.equal(repeat(x3),y3))
    print('Passed all tests.')
test_repeat()

Deep learning building blocks: affine maps, softmax, and embeddings
==========================================================================

Deep learning consists of composing linearities with non-linearities in
clever ways. The introduction of non-linearities allows for powerful
models. In this section, we will learn about the affine map, which is a linearity, and softmax, which is a non-linearity.  We'll also learn about a common objective or loss function used with softmax: negative log likelihood loss.


#### Affine Maps


One of the core workhorses of deep learning is the affine map, which is
a function $f(x)$ where
$$
\begin{align}f(x) = Ax + b\end{align}
$$
for a matrix $A$ and vectors $x, b$. The parameters to be
learned here are $A$ and $b$. Often, $b$ is refered to
as the *bias* term.


PyTorch and most other deep learning frameworks do things a little
differently than traditional linear algebra. It maps the rows of the
input instead of the columns. That is, the $i$'th row of the
output below is the mapping of the $i$'th row of the input under
$A$, plus the bias term. Look at the example below.




In [None]:
lin = nn.Linear(3, 2)
# The linear layer gets initialized with random parameters
# corresponding to A and b.
print("The weight parameter")
print(lin.weight)
print("The bias parameter")
print(lin.bias)

In [None]:
# Let us change these parameters to simpler numbers
with torch.no_grad():    
    lin.weight.copy_(torch.tensor([[3.0, 4.0, 3.0], 
                                   [3.0, 4.0, 1.0]]))
    lin.bias.copy_(torch.tensor([1.0, 2.5]))
# Let the data be 2 x 3 matrix.  The linear layer will 
# transform each row x to f(x)
data = torch.tensor([[4.0, 6.0, 1.0], [7.0, 1.0, 0.0]])
print(lin(data))

#### Softmax and Probabilities

The function  Softmax(𝑥) is a non-linearity.  It is usually the last operation done in a network. This is because it takes in a vector of real numbers and returns a probability distribution. Its definition is as follows. Let  $x$  be a vector of real numbers (in $[-\infty, \infty]$). Then the i'th component of  Softmax(𝑥)  is

$$ \frac{\exp(𝑥_i)}{\sum_j \exp(x_j)}$$
 
It should be clear that the output is a probability distribution: each element is non-negative and the sum over all components is $1$.

You could also think of it as just applying an element-wise exponentiation operator to the input to make everything non-negative and then dividing by the normalization constant.

In [None]:
# Softmax is also in torch.nn.functional
data = torch.tensor([3.0, 4.0, 5.0])
print(F.softmax(data, dim=0))
print(F.softmax(data, dim=0).sum())  # Sums to 1 because it is a distribution!
print(F.log_softmax(data, dim=0))  # theres also log_softmax

In [None]:
'''
PROBLEM 4 (b)

Write a function mysoftmax() that takes a vector, and applies
softmax to it.  The returned vector consists of probabilities that 
add to 1.

So mysoftmax(torch.tensor([3.0,4.0, 5.0,-600.0])) returns
    tensor([0.0900, 0.2447, 0.6652, 0.0000]).
'''

def mysoftmax(x):
    ## WRITE YOUR CODE HERE
    pass

def test_mysoftmax():
    x1 = torch.tensor([3.0,4.0,5.0,-600.0])
    x2 = torch.tensor([-33.44,4.44,-5.01,-6.0])
    assert(torch.allclose(mysoftmax(x1), F.softmax(x1,0)))
    assert(torch.allclose(mysoftmax(x2), F.softmax(x2,0)))
    print('Passed all tests.')

test_mysoftmax()    

In [None]:
'''
PROBLEM 4(c)

Extend above mysoftmax to accept tensors of any shape
Write a function mysoftmaxex() that takes a tensor of any shape, and
a dimension d, and applies softmax along dimension d.  So slices 
along dimension d consist of probabilities that add to 1.
E.g., mysoftmaxex(tensor([[3.0,4.0, 2.3],[5.0,-600.0, 2.3]]),0)
returns
    tensor([[0.1192, 1.0000, 0.5000],
            [0.8808, 0.0000, 0.5000]])
and mysoftmaxex(tensor([[3.0,4.0, 2.3],[5.0,-600.0, 2.3]]),1)
returns
    tensor([[0.2373, 0.6449, 0.1178],
            [0.9370, 0.0000, 0.0630]]).
'''

def mysoftmaxex(x, d):
    ## WRITE YOUR CODE HERE.
    ## You will want to use torch.exp() 
    ## https://pytorch.org/docs/stable/generated/torch.exp.html
    ## and torch.sum()
    ## https://pytorch.org/docs/stable/generated/torch.sum.html
    pass

def test_mysoftmaxex():
    x = torch.tensor([[3.0,4.0, 2.3],[5.0,-600.0, 2.3]])
    assert(torch.allclose(mysoftmaxex(x,0), F.softmax(x,0)))
    assert(torch.allclose(mysoftmaxex(x,1), F.softmax(x,1)))
test_mysoftmaxex()

#### Word Embeddings

[Torchtext](https://pytorch.org/text/stable/index.html) is a library within the PyTorch framework that consists of data processing utilities and popular datasets for natural language processing.

[GloVe](https://nlp.stanford.edu/projects/glove/) is set of dense vector representations, or embeddings.  Torchtext has support for GloVe. (The following code takes several minutes to run the first time, since it downloads the GloVe embeddings.)

In [None]:
from torchtext.vocab import GloVe

glove = GloVe(name='6B')

words = ["hello", "hi", "king", "president"]
vecs = glove.get_vecs_by_tokens(words)

print(vecs.shape)
print('The first 10 values in the embedding for "hello" are',
     vecs[0,:10])

In [None]:
'''
PROBLEM 4(d)

Write code to verify if in GloVe "similar words map into 
similar vectors.  Briefly discuss your results."
'''

## WRITE YOUR CODE HERE
