<a href="https://colab.research.google.com/github/venkatacrc/Notes/blob/master/Math4ML/LinearAlgebra.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Linear Algebra
Source: AWS Machine Learning Course from Brent Werness.

The Machine Learning Pipeline in Mathematics

In practice, Machine Learning is a collection **methods** that allow the extraction of **rules** or **patterns** from data rather than explicit construction from a programmer.

###Pipeline

Phase1: Data Preprocessing
>This is where you format data in a way algorithms can ingest. Uses linear algebra.
* Collection
* Formating
* Labeleing

Phase2: Feature Engineering & Selection
> This is where you transform data to make it easy for algorithms to understand. You can use linear algebra to perform **multiplication** and **addition** on:
* Vectors
* Matrices

Phase3: Modeling
>This is where you define the problem in a way the algorithm can optimize. The goal is to learn what is drving observed events. Represented using Loss functions, probability of data being generated, uses norms to produce how close it is to the true value to the observed, and informed by statistics
* Geometry
* Probablity
* Norms
* Statistics

Phase4: Optimization
>To fit your data as best as possible. This is where you iterate until certain conditions are met, and then you choose the best model. Uses vector calculus (functions and derivatives) and in practice uses numerical methods such as gradient descent.
* Training phase
* Data evaluation (validation)
* Predictions (real-world)

* Vectors and Linear Spaces
  * Vector representation
  * norms (L1, L2, L$\infty$)
  * inner products
  * Linear independence
  * Orthogonality
  * Hyperplanes
  * Subspaces
* Matrix Theory
  * Basic matrix operations
  * Matrices as linear operators, rank
  * Span , Linear dependence
  * Solving systems of linear equations

##Vectors and Matrices
* Column vectors
* Row vectors
* Matrices
> can represent a collection of data points
  * Addition and the Zero vector
  * Scalar Multiplication
  * Transpose

###Geometry of Column Vectors
* Vectors as Directions
* Scalar Multiplication
> Stretching the vectors

* Addition as Displacement
> Concatenation of vectors

* Subtraction as Mapping
> Takes one vector to another






###Measures of Magnitude

* Definitions of Norms
>are a measure of distance

* Norm Properties
> 1. All distances are non-negative $\Vert \overrightarrow v \Vert \ge 0$
  1. Distances multiply with scalar multiplication $\Vert a \overrightarrow v \Vert = |a|.\Vert \overrightarrow v \Vert$
  1. Triangle Inequality: If I travel from **A** to **B** then **B** to **C**, that is at least as far as going from **A** to **C**. $\Vert \overrightarrow v + \overrightarrow w\Vert \le \Vert \overrightarrow v \Vert + \Vert \overrightarrow w \Vert$
* Types of Norms
  * Euclidean Norm
  >$\Vert \overrightarrow v \Vert_2 = \sqrt { v_1^2 + \cdots + v_n^2} = \sqrt {\sum\limits_{i=1}^{n}v_i^2}$

  * $L_p$-Norm
  > for $p \ge 1$ all the axioms hold
  >$\Vert \overrightarrow v \Vert_p =  \big({\sum\limits_{i=1}^{n}|v_i|^p\big)^{\frac{1}{p}}}, |v_i|^p \ge 0$

  * $L_1$-Norm
  > Taxicab Metric, Manhattan Norm Only allowed to travel in the restricted grid instead of diagonal:
  > $L_p$-Norm for p=1
  > $\Vert \overrightarrow v \Vert_1 =  \big({\sum\limits_{i=1}^{n}|v_i|\big)}$

  * $L_{\infty}$-Norm
  > $L_p$-Norm for $p\rightarrow \infty$
  > $\Vert \overrightarrow v \Vert_\infty =  \lim \limits_{p\rightarrow \infty} \Vert \overrightarrow v \Vert_p = \lim \limits_{p\rightarrow \infty}\big({\sum\limits_{i=1}^{n}|v_i|^p\big)^{\frac{1}{p}}}$

   > zooms in on the largest components, maximum displacement in any direction. Used for worst case analysis.

   > $\Vert \overrightarrow v \Vert_\infty =  \max \limits_ i |v_i|$

  * Geometry of Norms
    * $L_2$-Norm = Circle
    * $L_1$-Norm = Diamond inside Circle
    * $L_\infty$-Norm = Unit Square 
  * A special case: The $L_0$-Norm
  > Despite the name, this is **not** a norm.
  >$\Vert \overrightarrow v \Vert_0$ = number of non-zero elements of the vector $\overrightarrow v$. 
  >$\lim \limits_{p\rightarrow 0} = \Vert \overrightarrow v \Vert_p^p = \Vert \overrightarrow v \Vert_0$

    > for a $\ne$ 0 $\Vert a\overrightarrow v \Vert_0 = \Vert \overrightarrow v \Vert_0$ this is not a real norm. But used to measure the sparsity of a vector.

In [5]:
import numpy as np

v = [1, 2, 3]
A = [[1, 2, 3], [-1, 0, 1], [1, 1, 1]]

print('L1-norm = {}'.format(np.linalg.norm(v, ord=1)))
print('L2-norm = {}'.format(np.linalg.norm(v, ord=2)))
print('Linf-norm = {}'.format(np.linalg.norm(v, ord=np.inf)))

L1-norm = 6.0
L2-norm = 3.7416573867739413
Linf-norm = 3.0


In [6]:
# Python, in general, takes a different convention for matrix norms. Most will not do what you think they will from our notation. 
# However, you can type the following for L2 norm. 
print(np.linalg.norm(A))

4.358898943540674
