# Linear Algebra with Examples Using Numpy

#### Based on Moses Marsh's material

In [None]:
import numpy as np
from numpy.random import randn as randn
from numpy.random import randint as randint
import matplotlib.pyplot as plt

plt.style.use("ggplot")

%matplotlib inline

## Linear Algebra and Machine Learning

Linear algebra is **everywhere**, it is **the most important topic in applied mathematics**.

* Calculus in higher dimensional spaces reduces to applied linear algebra.
    * The derivative of a function is best understood as a matrix.
* Solving regression problems: linear, logistic, etc...
    * Reduces to solving either a single, or multiple systems of equations.
* Ranking web pages in order of importance
    * Solved as the problem of finding the eigenvector of the page score matrix, which is solved either as systems of equations, or an iterative multiplication algorithm.
* Dimensionality reduction - Principal Component Analysis
    * Again, eigenvalues.
* Movie recommendation
    * Use singular value decomposition (SVD, a matrix factorization) to break down user-movie into user-feature and movie-feature matrices, keeping only the top $k$-ranks to identify the best matches.
* Topic modeling
    * Extensive use of SVD and matrix factorization can be found in Natural Language Processing, specifically in topic modeling and semantic analysis.

## Vectors

A **vector** is an array of real numbers

$$\boldsymbol{x} = [x_1, x_2, \ldots, x_n]$$

In [None]:
x = np.array([1, 2, 3, 4])
print x
print x.shape

In [None]:
x2 = x.reshape(-1, 1)
x2

In [None]:
x2.shape

In [None]:
print x2.T
print x2.T.shape

In [None]:
print x.T
print x.T.shape

Geometrically, a vector specifies the coordinates of the tip of the vector if the tail were placed at the origin

In [None]:
fig, ax = plt.subplots()

def plot_arrow_from_origin(ax, x, color="black"):
    xlim = ax.get_xlim()
    new_xlim = [0, 0]
    new_xlim[0] = np.min([xlim[0], x[0] - 0.5])
    new_xlim[1] = np.max([xlim[1], x[0] + 0.5])
    ax.set_xlim(new_xlim)
    ylim = ax.get_ylim()
    new_ylim = [0, 0]
    new_ylim[0] = np.min([ylim[0], x[1] - 0.5])
    new_ylim[1] = np.max([ylim[1], x[1] + 0.5])
    ax.set_ylim(new_ylim)
    ax.set(adjustable='box-forced', aspect='equal')
    ax.arrow(0, 0, x[0], x[1],
             head_width=0.1, linewidth=3, head_length=0.2,
             fc=color, ec=color)
    
plot_arrow_from_origin(ax, (1, 1))
plot_arrow_from_origin(ax, (0.5, -1))
plot_arrow_from_origin(ax, (-1, 0.75))

Adding a constant to a vector adds the constant to each element


$$a + \boldsymbol{x} = [a + x_1, a + x_2, \ldots, a + x_n]$$

In [None]:
a = 4
print x
print x + a

Multiplying a vector by a constant multiplies each term by the constant.

$$a \boldsymbol{x} = [ax_1, ax_2, \ldots, ax_n]$$

Any operation can be used this way on numpy arrays and numbers. 

In [None]:
print a * x

The **norm** or **length of a vector** $\mathbf{x}$ is defined by using the pythagorean theorem

$$||\boldsymbol{x}|| = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2}$$

And a **unit vector** is one for which

$$||\boldsymbol{x}|| = 1 $$


In [None]:
print x
print x**2
print np.sqrt(np.sum(x**2))
print np.linalg.norm(x)

In [None]:
def unit_vector_from_angle(angle):
    return np.array([np.cos(angle), np.sin(angle)])

fig, axs = plt.subplots(1, 2, figsize=(12, 4))

angles = np.linspace(0, 2*np.pi, num=16)
for angle in angles:
    plot_arrow_from_origin(axs[0], unit_vector_from_angle(angle))
axs[0].set_title("Unit Vectors")
    
scales = np.linspace(0.2, 2, num=16)
for angle, scale in zip(angles, scales):
    plot_arrow_from_origin(axs[1], scale*unit_vector_from_angle(angle))
axs[1].set_title("Vectors of Increasing Length")

### Dot Products and Angles Between Vectors

If we have two vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ of the same length $(n)$, then the _dot product_ is given by

$$\boldsymbol{x} \cdot \boldsymbol{y} = x_1y_1 + x_2y_2 + \cdots + x_ny_n$$

In [None]:
y = np.array([4, 3, 2, 1])
print x
print y
np.dot(x,y)

The dot product looks like a strange thing, but it is important because it allows you to easily communicate the **angle between two vectors**.

The **cosine of the angle between the two vectors** can be computed by the following formula:

$$cos(\theta) = \frac{\boldsymbol{x} \cdot \boldsymbol{y}}{||\boldsymbol{x}|| \text{ } ||\boldsymbol{y}||}$$



In [None]:
x = np.array([1, 2, 3, 4])
y = np.array([5, 6, 7, 8])
np.dot(x, y) / (np.linalg.norm(x) * np.linalg.norm(y))

In [None]:
fig, axs = plt.subplots(3, 3, figsize=(12, 8))

angles = np.linspace(0, 2*np.pi, num=11)
for ax, angle in zip(axs.flatten(), angles[1:-1]):
    v = unit_vector_from_angle(angle)
    plot_arrow_from_origin(ax, v)
    ax.set_xlim((-2, 2)); ax.set_ylim(-1.5, 1.5)
    plot_arrow_from_origin(ax, (1, 0), color="green")
    
    dot_prod = np.dot((1, 0), v)
    ax.set_title("Dot Product: {:2.2f}".format(dot_prod))
    
plt.tight_layout()

If $\boldsymbol{x} \cdot \boldsymbol{y} = 0$ then $\boldsymbol{x}$ and $\boldsymbol{y}$ are *orthogonal* (aligns with the intuitive notion of perpendicular)

In [None]:
w = np.array([1, 2])
v = np.array([-2, 1])
np.dot(w, v)

A picture shows that these are perpendicular.

In [None]:
fig, ax = plt.subplots()

plot_arrow_from_origin(ax, (1, 2))
plot_arrow_from_origin(ax, (-2, 1))

The square of the norm of a vector is just the vector dot product with itself
$$
||\boldsymbol{x}||^2 = \boldsymbol{x} \cdot \boldsymbol{x}
$$

In [None]:
print np.linalg.norm(x)**2
print np.dot(x, x)

The **distance between two vectors** is the norm of the difference.

$$
d(\boldsymbol{x},\boldsymbol{y}) = ||\boldsymbol{x}-\boldsymbol{y}||
$$

In [None]:
np.linalg.norm(x - y)

### Linear Combinations of Vectors

If we have two vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ of the same length $(n)$, then

$$\boldsymbol{x} + \boldsymbol{y} = [x_1+y_1, x_2+y_2, \ldots, x_n+y_n]$$

In [None]:
x = np.array([1, 2, 3, 4])
y = np.array([5, 6, 7, 8])
print x + y

Geometrically, this adds vectors with the **parallelogram rule**:

![](parallelogram-rule.gif)

A **linear combination** of a collection of vectors $(\boldsymbol{x}_1,
                                                    \boldsymbol{x}_2, \ldots,
                                                    \boldsymbol{x}_m)$ 
is a vector of the form

$$a_1 \cdot \boldsymbol{x}_1 + a_2 \cdot \boldsymbol{x}_2 + 
\cdots + a_m \cdot \boldsymbol{x}_m$$

That is, **a linear combination is formed out of a collection of vectors by multiplying them by various constants, and then adding the results.**

In [None]:
a1=2
x1 = np.array([1, 2, 3, 4])
print "a1*x1 = {}".format(a1*x1)
a2=4
x2 = np.array([5, 6, 7, 8])
print "a2*x2 = {}".format(a2*x2)
print "a1*x1 + a2*x2 = {}".format(a1*x1 + a2*x2)

Geometrically, all the linear combinations of a collection of vectors trace out the **plane spanned by those vectors.**

![](linear-combination-plane.png)

### Linear Independence

In [None]:
# A collection of vectors is linearly independent if ...

linear_indep.png![image.png](attachment:image.png)

# Matrices

An **$n \times p$ matrix** is an array of numbers with $n$ rows and $p$ columns:

$$
X =
  \begin{bmatrix}
    x_{11} & x_{12} & \cdots & x_{1p} \\
    x_{21} & x_{22} & \cdots & x_{2p} \\
    \vdots & \vdots & \ddots & \vdots \\
    x_{n1} & x_{n2} & \cdots & x_{np} 
  \end{bmatrix}
$$

Often in data science, matricies are used as a low level representation of a data set.  In this context:

- $n$ = number of rows = the number of observations  
- $p$ = number of columns = the number of features

### Matrices in Numpy

In `numpy` matricies and vectors are both represented as `numpy.array`s.

  - A vector is a **1-dimensional** array.
  - A matrix is a **2-dimensional** array.

For example, the following $2 \times 3$ matrix
$$
X =
  \begin{bmatrix}
    1 & 2 & 3\\
    4 & 5 & 6
  \end{bmatrix}
$$

Can be created using numpy as:

In [None]:
L = [[1, 2, 3], [4, 5, 6]]
X = np.array(L)
print X

The `shape` attribute now tells us the number of rows and columns

In [None]:
print X.shape

### Basic Arithmetic


Let $X$ and $Y$ be matrices **of shape $n \times p$** (i.e. of **the same shape**).

We often use $x_{ij}$ $y_{ij}$ for $i=1, 2, \ldots, n$ and $j=1, 2, \ldots, p$ denote the entries of the matrix.

1. $X+Y$ is the matrix whose $(i,j)^{th}$ entry is $x_{ij} + y_{ij}$.  That is, the **matrix created by adding entry-by-entry**.
2. $X-Y$ is the matrix whose $(i,j)^{th}$ entry is $x_{ij} - y_{ij}$ That is, the **matrix created by subtracting entry-by-entry**.
3. $aX$, where $a$ is any real number, is the matrix whose $(i,j)^{th}$ entry is $ax_{ij}$. That is, the **matrix created by multiplying every entry by $a$**.

In [None]:
X = np.array([[1, 2, 3], [4, 5, 6]])
print "X = \n{}".format(X)
print

Y = np.array([[7, 8, 9], [10, 11, 12]])
print "Y = \n{}".format(Y)
print

print "X + Y = \n{}".format(X + Y)

In [None]:
X = np.array([[1, 2, 3], [4, 5, 6]])
print "X = \n{}".format(X)
print

Y = np.array([[7, 8, 9], [10, 11, 12]])
print "Y = \n{}".format(Y)
print

print "X - Y = \n{}".format(X - Y)

In [None]:
X = np.array([[1, 2, 3], [4, 5, 6]])
print "X = \n{}".format(X)
print

a=5
print "5X = \n{}".format(a*X)

### Systems of Linear Equations

Matrices were invented in the context of **systems of linear equations**.

A system of linear equations is a collection of equations like:

\begin{align*}
    a_{11}x_1 + \cdots + a_{1n}x_n &= b_1 \\
    a_{21}x_1 + \cdots + a_{2n}x_n &= b_2 \\
    \vdots \hspace{1in} \vdots \\
    a_{m1}x_1 + \cdots + a_{mn}x_n &= b_m 
\end{align*}

So for example:

\begin{align*}
    2 x_1 - x_2 = 4 \\
    x_1 + 6 x_2 = 0
\end{align*}

Systems of linear equations are **ubiquitous** in applied and computational mathematics.  Seriously, they are everywhere.

**The most important thing for us to be able to compute is solutions to systems of linear equations**.

We can represent the system above as a matrix and a vector:

$$ \
\text{LHS} =   \begin{bmatrix}
    2 & -1 \\
    1 & 6
  \end{bmatrix},
\
\text{RHS} = [4, 0]
$$

We can solve systems of linear equations in `numpy` by using `np.linalg.solve`

In [None]:
M = np.array([[2, -1], [1, 6]])
b = np.array([4, 0])

x1, x2 = np.linalg.solve(M, b)
print "x1 = {:2.2f}".format(x1)
print "x2 = {:2.2f}".format(x2)
print "2*x1 - x2 = {:2.2f}".format(2*x1 - x2)
print "x1 + 6*x2 = {:2.2f}".format(x1 + 6*x2)

### Rank of a matrix

#### Solutions of a system of equations are closely related to the Rank of a matrix.

In [None]:
# Rank or column rank of a matrix is ...


rank.png![image.png](attachment:image.png)

In [None]:
from numpy import linalg
from numpy.linalg import matrix_rank
matrix_rank(M)

In [None]:
M1 = np.array([[2, -1], [1, 6], [4, 0]])

In [None]:
matrix_rank(M1)

## Column Space and Row Space

col_row.png![image.png](attachment:image.png)

### Matrix Multiplication

**Matrix Multiplication** was invented to describe how solutions to systems of linear equations are related to each other.

Suppose we have two systems of equations, where the right hand side of the second is equal to the solutions to the first.

**First Equation:**

\begin{align*}
    2 x_1 - x_2 = 4 \\
    x_1 + 6 x_2 = 0
\end{align*}

**Second Equation:**

\begin{align*}
    y_1 + y_2 = x_1 \\
    2 y_1 - 3 y_2 = x_2
\end{align*}

Then we can substitute the second equation **into** the first to get a **single** system of equations for $y$:

\begin{align*}
    (2 \times 1 + (-1) \times 2) y_1 + (2 \times 1 + (-1) \times (-3)) y_2 = 4 \\
    ((1) \times (1) + (6) \times (2)) y_1 + ((1) \times (1) + (6) \times (-3) ) y_2 = 0
\end{align*}

The coefficients in this new equation are an example of **matrix multiplication**

$$
\begin{bmatrix}
    2 & -1 \\
    1 & 6
\end{bmatrix}
\begin{bmatrix}
    1 & 1 \\
    2 & -3
\end{bmatrix} = 
\begin{bmatrix}
    (2)(1) + (-1)(2) & (2)(1) + (-1)(-3) \\
    (1)(1) + (6)(2) & (1)(1) + (6)(-3)
\end{bmatrix}
$$

In order to multiply two matrices, they must be **conformable**: the number of columns of the first matrix must be the same as the number of rows of the second matrix.

Let $X$ be a matrix with shape $n \times k$ and let $Y$ be a matrix with shape $k \times p$, then the product $XY$ will be a matrix with shape $n \times p$ whose $(i,j)^{th}$ element is given by the dot product of the $i^{th}$ row of $X$ and the $j^{th}$ column of $Y$

$$(XY)_{i,j} = x_{i1}y_{1j} + \cdots + x_{ik}y_{kj}$$

In [None]:
X = np.array([[2, 1, 0], [-1, 2, 3]])
print "X = \n{}".format(X)
print

Y = np.array([[0, -2], [1, 2], [1, 1]])
print "Y = \n{}".format(Y)
print

# Matrix multiply with dot operator
print "XY = \n{}".format(np.dot(X, Y))

#### Note: 

$$XY \neq YX$$

In [None]:
print "XY = \n{}".format(np.dot(X, Y))
print
print "YX = \n{}".format(np.dot(Y, X))

Notice that $XY$ and $YX$ **do not even have the same shape**.

If $X$ and $Y$ are square matrices of the same dimension then the both the product $XY$ and $YX$ exist, but even in this case there is no guarantee the two products will be the same

### Matrix Transpose

The **transpose** of an $n \times p$ matrix is a $p \times n$ matrix with rows and columns interchanged

$$
X^T =
  \begin{bmatrix}
    x_{11} & x_{12} & \cdots & x_{1n} \\
    x_{21} & x_{22} & \cdots & x_{2n} \\
    \vdots & \vdots & \ddots & \vdots \\
    x_{p1} & x_{p2} & \cdots & x_{pn} 
  \end{bmatrix}
$$

In [None]:
print X
print X.transpose()
print X.T
print X.shape, X.T.shape

The transpose is hard to interpret in a natural way, but it does come up often in computations.

### Vector in Matrix Form
A **column vector** is a matrix with $n$ rows and 1 column and to differentiate from a standard matrix $X$ of higher dimensions can be denoted as a bold lower case $\boldsymbol{x}$

$$
\boldsymbol{x} =
  \begin{bmatrix}
    x_{1}\\
    x_{2}\\
    \vdots\\
    x_{n}
  \end{bmatrix}
$$

In numpy, when you create a vector, it will not normally have the second dimension, so we can reshape it:

In [None]:
x = np.array([1,2,3,4])
print "x = {}".format(x)
print x.shape

In [None]:
y = x.reshape(4, 1)
print "y = \n{}".format(y)
print
print y.shape

and a row vector is generally written as the transpose

$$\boldsymbol{x}^T = [x_1, x_2, \ldots, x_n]$$

In [None]:
x_T = y.transpose()
print x_T
print x_T.shape
print x

If we have two vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ of the same length $(n)$, then the _dot product_ is give by matrix multiplication

$$\boldsymbol{x}^T \boldsymbol{y} =   
    \begin{bmatrix} x_1& x_2 & \ldots & x_n \end{bmatrix}
    \begin{bmatrix}
    y_{1}\\
    y_{2}\\
    \vdots\\
    y_{n}
  \end{bmatrix}  =
  x_1y_1 + x_2y_2 + \cdots + x_ny_n$$

### The Identity Matrix
The identity matrix $I$ is a special matrix with 1's along the diagonal and zeros elsewhere, that when multiplied by another matrix returns the same value.
$$X = XI = IX$$

In [None]:
X = np.array([[1, 2, 3], [4, 5, 6], [7 ,8 ,9]])
I = np.identity(3)
print "X = \n{}".format(X)
print
print "I = \n{}".format(I)

In [None]:
print "XI = \n{}".format(np.dot(X, I))
print
print "IX = \n{}".format(np.dot(I, X))

## Inverse of a Matrix

The inverse of a square $n \times n$ matrix $X$ is an $n \times n$ matrix $X^{-1}$ such that 

$$X^{-1}X = XX^{-1} = I$$

Where $I$ is the identity matrix. 

If such a matrix exists, then $X$ is said to be **invertible** or **nonsingular**, otherwise $X$ is said to be **noninvertible** or **singular**.

In [None]:
X = np.array([[1, 2, 3], [0, 1, 0], [-2, -1, 0]])
Y = np.linalg.inv(X)
print "XY = \n{}".format(np.dot(X, Y))

Note that **only square matrices can be invertible**, even then **many square matrices do not have inverses**.

It is not too difficult to characterize exactly what matrices have inverses:

**A matrix $X$ has an inverse if and only if no column is a linear combination of the remaining columns.** This is summarized by saying that the columns are **linearly independent**.

### Properties of Transpose
1. Let $X$ be an $n \times p$ matrix and $a$ a real number, then 
$$(cX)^T = cX^T$$
2. Let $X$ and $Y$ be $n \times p$ matrices, then
$$(X \pm Y)^T = X^T \pm Y^T$$
3. Let $X$ be an $n \times k$ matrix and $Y$ be a $k \times p$ matrix, then
$$(XY)^T = Y^TX^T$$

### Additional Properties of Matrices
1. **Commutative Addition:** If $X$ and $Y$ are both $n \times p$ matrices,
then $$X+Y = Y+X$$
2. **Associative Addition:** If $X$, $Y$, and $Z$ are all $n \times p$ matrices,
then $$X+(Y+Z) = (X+Y)+Z$$
3. **Associative Multiplication:** If $X$, $Y$, and $Z$ are all conformable,
then $$X(YZ) = (XY)Z$$
4. **Distributive Law on Left:** If $X$ is of dimension $n \times k$ and $Y$ and $Z$ are of dimension $k \times p$, then $$X(Y+Z) = XY + XZ$$
5. **Distributive Law on Right:** If $X$ is of dimension $p \times n$ and $Y$ and $Z$ are of dimension $k \times p$, then $$(Y+Z)X = YX + ZX$$
6. **Distributive Scalar Multiplication**: If $a$ and $b$ are real numbers, and $X$ is an $n \times p$ matrix,
then $$(a+b)X = aX+bX$$
7. **Distributive Scalar Multiplication:** If $a$ is a real number, and $X$ and $Y$ are both $n \times p$ matrices,
then $$a(X+Y) = aX+aY$$
8. **Associative & Commutative scalar multiplication:** If $a$ is a real number, and $X$ and $Y$ are conformable, then
$$X(aY) = a(XY)$$

### Properties of Inverse
1. If $X$ is invertible, then $X^{-1}$ is invertible and
$$(X^{-1})^{-1} = X$$
2. If $X$ and $Y$ are both $n \times n$ invertible matrices, then $XY$ is invertible and
$$(XY)^{-1} = Y^{-1}X^{-1}$$
3. If $X$ is invertible, then $X^T$ is invertible and
$$(X^T)^{-1} = (X^{-1})^T$$