# Week 1

__Goals for this week__

We will talk about the organization of this course, including the project you will be working on during this semester. We will go through the prerequisites of this course, i.e. several topics you should know before we can proceed further with our neural networks study.

__How to solve this lab?__

Just follow this notebook and run all the code cells. Correct answers for some exercises are provided at the bottom.

## Course Information

- __Programming assignments [10 pts]:__ Some labs will have short Python programming assignments. Each will be worth 2 points. You are expected to complete these assignments in one week and submit your code.
- __Project [40 pts]:__ You will work in pairs on a deep learning project throughout the entire semester. You are expected to consult your progress during the labs and present your solution at the end of semester. 
- __Exam [50 pts]__

Please check the [Project](../project/project.ipynb) page _right now_ for more information about how to proceed with your project from now on.

### Feedback

- Please use [Askalot](https://askalot.fiit.stuba.sk/fiit/) for all the questions that might be interesting for other students. You can also use it as a general discussion board for this course.
- Please fill our [questionnaire](https://forms.gle/r27nBAvnMC7jbjJ58) after each lab. This questionnaire gives us information about what the students struggle with during this course.
- This notebook is a work in progress. If you notice a mistake, notify us, raise an issue or make a pull request on [GitHub](https://github.com/matus-pikuliak/neural_networks_at_fiit).

### Neural Network Seminar

We also have a neural networks focused seminar at FIIT that covers more advanced topics and student projects. See [our page](https://www.pewe.sk/nngroup/) for more information. It is open for everyone.
 
## Prerequisites

Prerequisites are topics that you should already be familiar with from your previous study. If you have troubles with some of them, you should review them during this week.

### Python

We will use _Python 3.6_ in our labs. We assume that you have seen Python code before. If you have not, you should learn the basics as soon as possible (e.g. [W3Schools tutorial](https://www.w3schools.com/python/default.asp))). We review the basic concepts in the scripts below. You should understand it fully, otherwise review your knowledge before you proceed.

In [None]:
# Basic types
5      # integer
5.1    # float
'foo'  # string
True
False  # boolean operands
None  # null-like operand

# Type conversion
float('5.2')
int(5.2)
str(5.2)

# Basic operations
a = 2 + 5 - 2.5
a += 3
b = 2 ** 3  # exponentiation
print(a, b)
print(5 / 2)
print(5 // 2)  # Notice the difference between these two
'foo' + 'bar'

# F-strings
print(f'1 + 2 = {1 + 2}, a = {a}')
print('1 + 2 = {1 + 2}, a = {a}')  # We need the f at the start for {} to work as expression wrappers.


In [None]:
# Conditions
if a > 4 and b < 3:  # and, or and not are the basic logical operators
    print('a')
elif b > 5:
    print('b')
else:
    print('c')   # Indentation by spaces or tabs tells us where the statement belongs. print('c') is in the else.
print('d')       # But print('d') is outside, it will print every time.

# Loops
while a < 10:
    if b > 3:
        a += 1  # More indentation for code that is "deeper"
    else:
        a += 2
print(f'a = {a}')

# while loops are not considered 'pythonic', instead for loops are more common
# We will return to them later
for char in 'string':
    print(char)
    

In [None]:
# Functions
def example(a, b=1, c=1):  # b and c have default values
    return a*b, a*c  # we return two values at the same time

a, b = example(1, 2, 3)  # and we can assign both values at the same time as well
print(a, b)
print(example(4))
print(example(5, 2))
print(example(5, c=2))  # Notice how do the arguments behave
    
# Classes
class A:
    
    def __init__(self, b):  # Constructor
        self.b = b  # Object variable
        
    def add_to_b(self, c):  # self is always the first argument and it references the object itself
        self.b += c
        
    def sub_from_b(self, c):
        self.add_to_b(-c)  # Calling object method
        
a = A(5)
a.add_to_b(1)
print(a.b)
a.sub_from_b(2)
print(a.b)


In [None]:
# Lists
a = [1, 2, 3, 'foo']
len(a)  # Length
a[1]  # Second element
print(a[1:3])  # Second to third element
a.append(4)  # Adding at the end
del a[3]  # Removing at the index
[]  # Empty array
print(a)

# This is why for loops are used more often
for el in a:
    print(el + 1)
    
# We can define lists with list comprehension statements
b = [el + 2 for el in a]
print(b)


In [None]:
# Dictionaries - key-based structures
a = {
    'foo': 'bar',
    5: 'five',
    4: 'four',
    'nested': [1, 2, 3]
}
print(a['foo'])
a['bar'] = 'foo'
if 'nested' in a:  # Does key exist?
    print(a['nested'])
{}  # Empty
del a['foo']  # Remove record
    
print()
print('Keys:')
for key in a:
    print(key)
    
print()
print('Keys and values:')
for key, value in a.items():
    print(key, ':', value)
    
# Dictionaries can be also defined via comprehension statement
a = {i: i**2 for i in [1, 2, 3, 4]}
print(a)

In [None]:
# Several useful iterators
print('range')
for i in range(5):
    print(i)
    
print()
print('enumerate')
lowercase = ['a', 'b', 'c']
for i, el in enumerate(lowercase):
    print(i, el)
    
print()
print('zip')
uppercase = ['A', 'B', 'C']
for a, b in zip(lowercase, uppercase):
    print(a, b)

### Linear Algebra

Neural network models can be defined using vectors and matrices, i.e. concepts from linear algebra. You should know how basic linear operations work. Some of the concepts were covered during your _Algebra and Discrete Mathematics_ course. Read the provided links to review neccessary topics (note that there are some questions at the end of each page) and solve the exercises in this notebook.

#### Vectors
- [On vectors](https://www.mathsisfun.com/algebra/vectors.html)
- [On dot product](https://www.mathsisfun.com/algebra/vectors-dot-product.html)

In these labs we will use italic for scalars $x$, lowercase bold for vectors $\mathbf{x}$ and uppercase bold for matrices $\mathbf{X}$. Please, keep this notation in mind.

__Exercise 1.1:__ Calculate the following:

$
\begin{align}
\mathbf{a} = \begin{bmatrix}0 \\ 1 \\ 3 \end{bmatrix} \ \
\mathbf{b} = \begin{bmatrix}2 \\ 4 \\ 1 \end{bmatrix}
\end{align} \\
5\mathbf{a} = ?\\
\mathbf{a} + \mathbf{b} = ?\\
\mathbf{a} \cdot \mathbf{b} = ?\\
||\mathbf{a}|| = ?
$

__Exercise 1.2:__ How would you quickly check whether or not two vectors (e.g. $\mathbf{a}$ and $\mathbf{b}$) are orthogonal (perpendicular)?

__Exercise 1.3:__ Is $\mathbf{a}$ longer than $\mathbf{b}$?

#### Matrices
- [On matrices](https://www.mathsisfun.com/algebra/matrix-introduction.html)
- [On matrix multiplication](https://www.mathsisfun.com/algebra/matrix-multiplying.html)

__Exercise 1.4:__ Calculate the following. Vectors are columns by default.

$
\mathbf{C} = \begin{bmatrix}0 & 2 \\ 1 & 2 \end{bmatrix} \ \
\mathbf{d} = \begin{bmatrix}4 \\ 1 \end{bmatrix}\\
\mathbf{C}\mathbf{d} = ?\\
\mathbf{d}^T \mathbf{C} - \mathbf{d}^T = ?\\
\mathbf{C}^T\mathbf{d} = ?\\
\mathbf{C}\mathbf{d}^T = ?
$

__Exercise 1.5:__ Express the result of general matrix-vector product $\mathbf{Wx}$ as a vector of dot products. Can you do the same with $\mathbf{x}^T\mathbf{W}$?

### NumPy

Numpy is a popular Python library for scientific computation. It provides a convenient way of working with vectors and matrices. 


In [1]:
import numpy as np

# Creating vectors
a = np.array([0, 1, 3])
b = np.array([1, 4, 1])

# Basic operations, results for E 1.1
print(5 * a)
print()
print(a + b)
print()
print(np.dot(a, b))
print()
print(np.linalg.norm(a))

[ 0  5 15]

[1 5 4]

7

3.1622776601683795


In [11]:
# Creating matrices
# First way:
W = np.array([
    [1, 2],
    [3, 4]
])

# Second way:
W = np.array([1, 2, 3, 4]).reshape(2, 2)

# There is a difference between a 1-D vector and a column matrix in numpy:
a = np.array([1, 2])  # This is a vector
b = np.array([  # This is a matrix
    [1],
    [2]
])

# First let's see the dimensions of these two
print(a.shape)
print(b.shape)


(2,)
(2, 1)


In [15]:
# Then we can multiply them with W using np.matmul or @ matrix multiplication operator
print(W @ a)
print(W @ b)

print(np.dot(W, a))
print(np.dot(W, b))


[ 5 11]
[[ 5]
 [11]]
[ 5 11]
[[ 5]
 [11]]


__Exercise 1.6:__ What is the difference between the two results from previous code cell?

__Exercise 1.7:__ Solve Exercise 1.4 in NumPy. Use `np.array.T` attribute for transposing.


In [13]:
C = np.array([
    [0, 2], 
    [1, 2]
])
d = np.array([4, 1])

print()
print(np.dot(C, d))
print()
print(np.dot(d.T, C) - d.T)
print()
print(np.dot(C.T, d))
print()
print(np.dot(C, d.T))



[2 6]

[-3  9]

[ 1 10]

[2 6]


In [None]:
# Indexing, i.e. selecting elements from an array

W = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

W[1, 1]  # Element from second row of second column
W[0]  # First row
W[[0, 2]]  # First AND third row
W[:, 0]  # First column
W[1, [0, 2]]  # First AND third column of second row

### Derivatives

The final topic we need to cover are derivatives. Almost all of neural network training done in practice is currently based on methods that calculate the derivatives with respect to (w.r.t.) parameters of the model. You should remember basics from your _Calculus_ course (Matematická analýza), but we recommend you to read the following link to refresh your memory:

- [On derivatives](https://www.mathsisfun.com/calculus/derivatives-introduction.html)

You won't need to use derivatives during this course, so you won't need to learn all the [derivative rules](https://www.mathsisfun.com/calculus/derivatives-rules.html). However we need you to have an intuition about what derivatives are and what is their geometric interpretation. In essence, we need you to understand that a derivative tells us what is the slope of the tangent at given point. You should understand what is happening in the gif below:

![Tangents](images/tangents.gif)
<center><small>License: en:User:Dino, User:Lfahlberg <a href="https://creativecommons.org/licenses/by-sa/3.0">CC BY-SA 3.0</a>, via Wikimedia Commons</small></center>

Partial derivatives are a concept you might have not head about before. It is applied when we derive a function with more than one variable. In such case we can actually derive a function in any direction. Read the following link:

- [On partial derivatives](https://www.mathsisfun.com/calculus/derivatives-partial.html)

Function of two variables can be visualized by a 3D graph. In this graph you can pick a point and then ask, what is a slope of the tangent in any direction. Most commonly you would calculate the slope along x or y axis: $\frac{df}{dx}$ and $\frac{df}{dy}$.

Vector of derivatives w.r.t. all the parameters is called a _gradient_. Generaly for function $f$ with arbitrary number of parameters $x_1. x_2, ..., x_N = \mathbf{x}$, the gradient $\triangledown f$ is defined as:

\begin{equation}
\triangledown f(\mathbf{x}) = \frac{df}{d\mathbf{x}} = \begin{bmatrix}\frac{df}{dx_1} \\ \frac{df}{dx_2} \\ \vdots \\ \frac{df}{dx_N} \end{bmatrix}
\end{equation}

Gradient is the most important concept from this week's lab. The gradient is a vector quantity that tells us the _direction of steepest ascent_ at each point. This is a very important property, which we will often use in the following weeks. The magnitude of this vector tells us how steep this ascent is, i.e. what is the slope of the tangent in the direction of the gradient.

To cpmpare _derivative_ and _gradient_:

- _Derivative_ is a quantity that tells us, what is the rate of change in given direction.
- _Gradient_ is a quantity that tells us what is the direction of the steepest rate of change, along with the rate of this change.

Observe the difference between these two concepts in the Figure below. All the plots show the same function $F(x,y) = \sin(x) \cos(y)$. In first two plots we shot the derivatives w.r.t $y$ and $x$ respectively. These are shown as white arrows. Notice that they all point in one direction. On the other hand in the last plot we show the gradients. If we interpret the derivatives from the two previous plots as vectors, these gradients are in fact their sum.

![Derivatives and gradient](images/derivatives.svg)

__Exercise 1.8:__ Calculate the following. All the derivative rules you need to know are $(af)' = af'$, $(f + g)' = f' + g'$ and $(x^k)' = kx^{k-1}$.

$
f(x^2 + y^2 + 2x)\\
\frac{df}{dx}=?\\
\frac{df}{dy}=?\\
\triangledown f(x, y) = ?\\
\\
g(x_1, x_2, \dots, x_N) = g(\mathbf{x}) = \mathbf{a} \cdot \mathbf{x}\\
\triangledown g(\mathbf{x}) = ?
$

## Correct Answers

__E 1.1:__

$
5\mathbf{a} = \begin{bmatrix}0 \\ 5 \\ 15 \end{bmatrix} \\
\mathbf{a} + \mathbf{b} = \begin{bmatrix}1 \\ 5 \\ 4 \end{bmatrix} \\
\mathbf{a} \cdot \mathbf{b} = 7\\
||\mathbf{a}|| = \sqrt{10} = 3.162
$

__E 1.2:__ Check whether $\mathbf{a} \cdot \mathbf{b} = 0$. Note, that the angle between a vector and a zero vector is not defined.

__E 1.3:__ No, $||\mathbf{a}|| < ||\mathbf{b}||$

__E 1.4:__

$
\mathbf{C}\mathbf{d} = \begin{bmatrix}2 \\ 6 \end{bmatrix} \\
\mathbf{d}^T \mathbf{C} - \mathbf{d}^T = \begin{bmatrix}-3 & 9 \end{bmatrix} \\
\mathbf{C}^T\mathbf{d} = \begin{bmatrix}1 \\ 10 \end{bmatrix} \\
$

The last term $\mathbf{C}\mathbf{d}^T$ is not valid. You can not multiply two matrices with dimensions $2 \times 2$ and $1 \times 2$.

__E 1.5:__ $\mathbf{Wx}$ can be expressed as a column vector $[\mathbf{w_1} \cdot \mathbf{x}, \mathbf{w_2} \cdot \mathbf{x}, \dots, \mathbf{w_N} \cdot \mathbf{x} ]^T$. where $\mathbf{w_k}$ is $k$-th row of $\mathbf{W}$. $\mathbf{x}^T\mathbf{W}$ can be expressed as a row vector $[\mathbf{w_{*,1}} \cdot \mathbf{x}, \mathbf{w_{*,2}} \cdot \mathbf{x}, \dots, \mathbf{w_{*,N}} \cdot \mathbf{x} ]$. where $\mathbf{w_{*,k}}$ is $k$-th _column_ of $\mathbf{W}$.

__E 1.8:__

$
\frac{df}{dx}= 2x + 2\\
\frac{df}{dy}= 2y\\
\triangledown f(x, y)  = \begin{bmatrix}2x + 2 \\ 2y \end{bmatrix} \\
\triangledown g(\mathbf{x})  = \mathbf{a}
$