
# Lecture 1 - Introduction

## what is deeplearning?
- finding patterns in data
- finding the correct representations data should be in to perform the task given
- examples: learning to predict the category(label) of an image

## Machine Learning:
- A study of computer programmes that improve their performance at some **task** with **experience** (data)

## Linear Regression:
An example: Weather prediction (supervised learning) 
- Task: Predict air temperature (real number)
- Experience: Historic data of air temperature (train data)
- performance measure: Deviation of the forecast (test data)

Given training data $\left\{(X_i, Y_i)\right\}^n_{i=1}$      $X_i \epsilon \mathbb{R}_d, y_i, \epsilon \mathbb{R}$

Find a model $\hat{y} = f(x)$

Such that $f(x) \approx y$ on test data

- The idea of machine learning is to perform estimate function $f$
    - In other words, the aim is to estimate $y$ using $x$
    - $\hat{y} = f(x)$
    
<img src="./images/function_estimation.png">

## Linear Model!

- Given training data $\left\{(X_i,Y_i) ~ p\; i.i.d\right\}^n_{i=1}$ 
    - p i.i.d being data taken from all available data
- assuming linear model $\hat{y} = f_{w,b}(x) = w^Tx+b$
    - prediction function follows (node weights * )
- Find optimal parameters w,b by minimising empitical loss

- $\hat{L}(w,b) = \frac{1}{n}\Sigma^{n}_{i=1}(w^Tx_i + b - y_i)^2$
    
    - $w^Tx_i + b$ being the function prediction
    - The ground truth $y_i$ is subtracted to find out how far off the predictions were
    - This is squared and devided by the number of observations $n$
    - This gives the predicted prediction line


# Base mathematics - Linear algebra

## Scalars
- A single number
    - denoted as a lower case letter
    - We may say "$Let\: s\: \epsilon\: \mathbb{R}\: be\: the\: slope\: of\: the\: line$" when defining a scalar real-value
    - We may say "$Let\: s\: \epsilon\: \mathbb{N}\: be\: the\: number\: of\: units$" when defining a scalar natural number

## Vectors
- An array of elements
    - denoted by bold lower case letter.
        - indexes denoted with italic lower case letter and its corresponding subscript.
    - Numbers are arranged in order
    - Each element is identified by its index
    - If the vector is a $\mathbb{R}$ and has $n$ elements it is denoted $\mathbb{R}^n$ by taking the Cartesian product.

- e.g. $\begin{align}
        \mathbf{x} &= \begin{bmatrix}
           x_1 \\
           x_2 \\
           \vdots \\
           x_3
         \end{bmatrix}
        \end{align}$

### Cartesian Product
- $\mathbf{a}\: \times\: \mathbf{b}$ will give you all possible configuration of sets $\mathbf{a}$ and $\mathbf{b} as a new set$

Vectors identify a point in space, each element giving a differenct co-oridinate of the space.
When indexing particular elements from a set, this notation can be used to single out the specified indecies.
- $indicies \: x_1, x_2, x_3$ can be accessed using $S - \left\{1, 2, 3\right\}$ writting it as $\mathbf{x}_S$
    - The "$-$" represents the complement of a set
    - Compliment can also be used as $x_{-1}$ representing all elements of $\mathbf{x}$ except for $x_1$
    - $x_{-S}$ repersents all elements of $\mathbf{x}$ except for $x_1, x_2, x_3$

## Matrices
- A 2-D array of numbers
    - Denoted with bold uppercase letters
    - An array of height of m and width of n is denoted $\mathbb{R}^{m \times n}$
    - Indecies are written in italic but not bold 
        - e.g. $\textit{A}_{1,1}$
    - e.g. $\begin{align}
        \mathbf{x} &= \begin{bmatrix}
           x_{1,1} & x_{1,2} & x_{1,3} \\
           x_{2,1} & x_{2,2} & x_{2,3} \\
           \vdots \\
           x_{n,1} & x_{n,2} & x_{n,3}
         \end{bmatrix}
        \end{align}$
    - ":" is used to reference an entire axis. e.g. $\textbf{A}_{i,:}$ refers to the 'i'th vertical column and $\textbf{A}_{:,i}$ refers to the 'i'th horizontal column.
    - functions applied to the matrix can be indexed directly
        - e.g. $\textit{f}(\textbf{A})_{i,j}$

## Tensors
- Tensors are arrays with more than two axes.
    - They are denoted similarly to Matrices
    - e.g. $\textbf{A}_{i,j,k}$

## Transpose (A matrix operation)
- Mirroring a matrix from its main diagonal (main diagonal is from $x_{1,1}$ through $x_{2,2} forming a even cut when the matrix has a width equal to its height)

e.g.

$\begin{align}
    \mathbf{x} &= \begin{bmatrix}
        4 & 7 \\
        5 & 8 \\
        6 & 9
    \end{bmatrix}
\end{align}$

Will be converted to:

$\begin{align}
    \mathbf{x} &= \begin{bmatrix}
        4 & 5 & 6\\
        5 & 8 & 9 \\
    \end{bmatrix}
\end{align}$

- Transpose of $\textbf{A}$ is denoted $/textbf{A}^T$ 

We can convert a column of a matrix into a regular vector using Transpose
- $x - [x_1, x_2, x_3]^T$
    - Again, dash representing complement

## Adding Matrices
Matrices can be added together given they have the same shape.
e.g. $\textbf{A} - \textbf{A} + \textbf{B}$ where $C\textbf{C}_{i,j} - \textbf{A}_{i,j} + \textbf{A}_{i,j}$

## Multiplying and Adding Scalars to Matrices
- Done by performing calculations on each elements of the matrix
    - $\textbf{C} - a \cdot \textbf{B} + c$ where $\textbf{D} - a \cdot B_{i,j} + c$

Deeplearning unconventional notation for matrix addition:
- It yeilds a new matrix
- 


