Adapted from https://github.com/hadrienj/deepLearningBook-Notes

In this notebook we cover some basics of Linear Algebra as seen in [Deep Learning Book](http://www.deeplearningbook.org/contents/linear_algebra.html) with a focus on using numpy


### 1. Scalars & Vectors

* **Scalar** is a *single* number, denoted as $x$


* **Vector** is an *array of scalars*, denoted by $\boldsymbol{x}$
    * Thus, a vector has $n$ scalars $x_1, x_2 \cdots x_n$
    * Note that indexing here begins with 1, unlike python (where it begins with 0)

$$
\boldsymbol{x} =\begin{bmatrix}
    x_1 \\
    x_2 \\
    \cdots \\
    x_n
\end{bmatrix}
$$

Let us now look how we can create an array using `numpy`

In [1]:
import numpy as np

In [4]:
x1 = np.array([[4, 5, 6]])
print(x1.shape)

(1, 3)


In [3]:
# We will represent a vector as a column vector, that is having multiple rows
x = np.array([[4], [5], [8]])
print(f'x: \n{x}\n')

print(f'shape: {x.shape}')

print(f'x[2]: {x[2][0]}')

x: 
[[4]
 [5]
 [8]]

shape: (3, 1)
x[2]: 8


### 2. Matrices & Tensors

* **Matrix** is a 2D array of scalars, denoted by $\boldsymbol{X}$
$$
\boldsymbol{X}=
\begin{bmatrix}
    X_{1,1} & X_{1,2} & \cdots & X_{1,n} \\\\
    X_{2,1} & X_{2,2} & \cdots & X_{2,n} \\\\
    \cdots & \cdots & \cdots & \cdots \\\\
    X_{m,1} & X_{m,2} & \cdots & X_{m,n}
\end{bmatrix}
$$

    - This matrix has $m$ rows and $n$ columns
    - Each indvidual element such as $X_{1,1}$ is a *scalar*
    - If $m = n$, the matrix is known as **Square** Matrix


* **Tensor** is an array with **more than 2** axes, denoted as **X**
    * Think of Tensor as a generalization of an array with more than 2 axes

In [5]:
#Here X is a Matrix
X = np.array([[4,5,7], [10, 11, 13], [56, 80, 90]])
print(f'X: \n{X}\n')
print(f'shape: {X.shape}')

print(f'X[2][1]: {X[2][1]}')

X: 
[[ 4  5  7]
 [10 11 13]
 [56 80 90]]

shape: (3, 3)
X[2][1]: 80


In [6]:
#Here T is a Tensor
T = np.array([[[4, 5, 7], [10, 11, 13]], [[56, 80, 90], [9, 8, 10]]])
print(f'shape: {T.shape}')

print(f'T[1][0][1]: {T[1][0][1]}')

shape: (2, 2, 3)
T[1][0][1]: 80


### 3. Transpose
For a 2D matrix transpose can be obtained as follows
$(A^T)_{i,j} = A_{j, i}$

For a vector, transpose makes the column vector into a row. Thus a column vector can also be represented as $\boldsymbol{x} = [x_1, x_2, x_3]^T$

In [7]:
xt = np.transpose(x)
print(f'x transpose shape: {xt.shape}')

Xt = np.transpose(X)
print(f'Xt transpose shape: {Xt.shape}')

Tt = np.transpose(T, axes=[0, 2, 1])
print(f'Tt transpose shape: {Tt.shape}')

x transpose shape: (1, 3)
Xt transpose shape: (3, 3)
Tt transpose shape: (2, 3, 2)


### 4. Broadcasting

* You can add a scalar to a vector, and numpy will add it to each element in the vector
    
    $\boldsymbol{x} + a = \boldsymbol{x}_i + a$
    
    
* Similarly you can add a vector to a matrix, and numpy will add the vector to each column of the matrix

In [10]:
print(f'x: \n{x}\n')
print(f'x+3: \n{x + 3}\n')

x: 
[[4]
 [5]
 [8]]

x+3: 
[[ 7]
 [ 8]
 [11]]



In [12]:
print(f'X: \n{X}\n')
print(f'x: \n{x}\n')
print(f'X+x: \n{X + x}\n')

X: 
[[ 4  5  7]
 [10 11 13]
 [56 80 90]]

x: 
[[4]
 [5]
 [8]]

X+x: 
[[ 8  9 11]
 [15 16 18]
 [64 88 98]]



### 5. Matrix Multiplication

This is perhaps one operation that you would use quite frequently in any ML/DL model.
You should remember a few things about multiplication

* $\boldsymbol{C} = \boldsymbol{A} \boldsymbol{B}$ is only defined when the second dimension of $\boldsymbol{A}$ matches the first dimension of $\boldsymbol{B}$


* Further, if  $\boldsymbol{A}$ is of shape (m, n) and $\boldsymbol{B}$ of shape (n, p), then $\boldsymbol{C}$ is of shape (m, p) 


* This operation is concretely defined as $C_{i,j} = \sum_k A_{i, k} B_{k, j}$

    * $\boldsymbol{C}_{i, j}$ is computed by taking the dot product of $i$-th row of $\boldsymbol{A}$ with $j$-th column of $\boldsymbol{B}$


* A more useful method to think of matrix multiplcation is as **linear combination of columns** of $\boldsymbol{A}$ weighted by column entries of $\boldsymbol{B}$

<img src="images/mat-mul2.png" width="400" alt="Matrix Multiplication" title="Mat Mul">


<em>Matrix Multiplication. Image Credit: https://www.mpcm.org/visualizing-matrix-multiplication-as-a-linear-combination-eli-benderskys-website/</em>


In [13]:
print(f'X: \n{X}\n')

print(f'x: \n{x}\n')

print(np.matmul(X, x))

X: 
[[ 4  5  7]
 [10 11 13]
 [56 80 90]]

x: 
[[4]
 [5]
 [8]]

[[  97]
 [ 199]
 [1344]]


### 6. Element Wise multiplication: Hadamard product

Element wise multiplication $\boldsymbol{A} \odot \boldsymbol{B}$

Notice how numpy uses the * for this. Important to be careful, and not to confuse this with matrix multiplication

In [14]:
#Different from element wise multiplication

Y = np.array([[40, 50, 70], [100, 110, 130], [560, 800, 900]])

print(f'X: \n{X}\n')
print(f'Y: \n{Y}\n')

print(f'X * Y: \n{X * Y}\n')


X: 
[[ 4  5  7]
 [10 11 13]
 [56 80 90]]

Y: 
[[ 40  50  70]
 [100 110 130]
 [560 800 900]]

X * Y: 
[[  160   250   490]
 [ 1000  1210  1690]
 [31360 64000 81000]]



### 7. Norms

* Norm can be thought of as a proxy for size of a vector. 

  We define $L^p$ norm $\Vert \boldsymbol{x}\Vert _p = (\sum |\boldsymbol{x}_i|^{p})^{\frac{1}{p}}$ 
  
  $p \ge 1, p \in \Re$
  
  
* Norm is a *function* that maps vectors to *non-negative* values. A norm satisfies the following properties:
    * $f(\boldsymbol{x}) = 0 =>  \boldsymbol{x} = 0$
    * $f(\boldsymbol{x} + \boldsymbol{y}) \le f(\boldsymbol{x}) + f(\boldsymbol{y})$ (Triangle inequality)
    * $\forall \ \alpha \in \Re, \ f(\alpha \ \boldsymbol{x}) = |\alpha|\ f(\boldsymbol{x})$
  
  
* $L^2$ norm is called the **Euclidean norm**, often $\Vert \boldsymbol{x} \Vert$ 
    * We work mostly with squared $L^2$ norm which can be computed as $\boldsymbol{x}^T \boldsymbol{x}$
    
    * Squared $L^2$ norm is easier to work with as its derivative is $2 * \boldsymbol{x}$
   
    * In some ML applications it is important to distinguish between elements that are zero and small but zero. Squared $L^2$ norm may not be the right choice as it grows very slowly near the origin
    
    
* **$L^1$ norm** is the absolute sum of all members of a vector

    * Useful when difference between 0 and non-zero elements is essential.


* **Max-Norm**: $L^\infty$: This simplifies to absoute value of the element with highest magnitude

In [15]:
print(f'x: \n{x}\n')

lp2 = np.linalg.norm(x)
print(f'lp2 {lp2}')

lp1 = np.linalg.norm(x, ord=1)
print(f'lp1 {lp1}')

lp_inf = np.linalg.norm(x, ord=np.inf)
print(f'lp_inf {lp_inf}')

x: 
[[4]
 [5]
 [8]]

lp2 10.246950765959598
lp1 17.0
lp_inf 8.0
