# Linear Algebra with examples using Numpy

Chris Overton - adapted from version most recently by Ivan Corneillet  
2016.09.26

# Linear Algebra with examples using Numpy

Goals:
    - Understand addition and multiplication between scalars, vectors, and matrices
    - Understand vector norm and square matrix determinant
    - Be able to calculate these using numpy
    - Understand eigenvalues and eigenvectors
    - Understand how to solve systems of linear equations using matrices

## Linear Algebra and Machine Learning

Linear algebra is a language. It is one of the fundamental tools to describe mathematical systems. As such, it appears everywhere.

### Where can it be found in data science?

* Ranking web pages in order of importance
  * Solved as the problem of finding the eigenvector of the page score matrix
* Dimensionality reduction - Principal Component Analysis
* Movie recommendation
  * Use singular value decomposition (SVD) to break down user-movie into user-feature and movie-feature matrices, keeping only the top $k$-ranks to identify the best matches
* Topic modeling
  * Extensive use of SVD and matrix factorization can be found in Natural Language Processing, specifically in topic modeling and semantic analysis
* Almost any multivariate optimization

## Numpy

Numpy is a fast way to perform operations on arrays. Compare adding one to each element of a list with a for loop to doing it with a numpy array:

In [None]:
import numpy as np

In [82]:
x = range(1000000)
y = np.array(x)

In [83]:
type(x)

list

In [84]:
type(y)

numpy.ndarray

In [85]:
def f(x):
    for i in x:
        i += 1
    return x
        
%time f(x)

def g(y):
    y += 1
    return y

%time g(y)

None

CPU times: user 160 ms, sys: 21.9 ms, total: 181 ms
Wall time: 167 ms
CPU times: user 1.69 ms, sys: 403 µs, total: 2.09 ms
Wall time: 2.21 ms


The numpy array implementation is way faster!

## Vectors

A vector can be represented by an array of real numbers:

$$\mathbf{x}^T = [x_1, x_2, \ldots, x_n]$$

Geometrically, a vector can specify the coordinates of points in $\mathbb{R}^n$.  It can also specify a translation from one point to another.

<img src = 'assets/vectors.png'></img>

In [None]:
x = np.array([3, -1])

In [None]:
print 'x =', x

### Vector addition

If we have two vectors $\boldsymbol{u}$ and $\boldsymbol{v}$ of the same dimension, e.g., 2, then

$$\boldsymbol{u}^T + \boldsymbol{v}^T = [u_1+v_1, u_2+v_2, \ldots, u_n+v_n]$$

There are multiple ways of imaging this. Here's one: If $\boldsymbol{u}$ and $\boldsymbol{v}$ in $\mathbb{R}^2$ are represented as points in the plane, then $\boldsymbol{u} + \boldsymbol{v}$ corresponds to the fourth vertex of the parallelogram whose other vertices are $\boldsymbol{u}$, $\boldsymbol{0}$, and $\boldsymbol{v}$.

<img src = 'assets/vector-addition.png'></img>

In [None]:
u = np.array([1, 3])
v = np.array([5, 1])

print 'u =', u
print 'v =', v

In [None]:
print 'u + v =', u + v 

### Adding a constant to a vector

Adding a constant to a vector adds the constant to each element of the vector:

$$a + \boldsymbol{x}^T = [a + x_1, a + x_2, \ldots, a + x_n]$$

In [None]:
a = 4
x = np.array([1, 3, 4])

print 'a =', a
print 'x =', x

In [None]:
print 'a + x =', a + x #

### Length of a vector

The norm (or length) of a vector $\mathbf{x}$ is defined by:

$$||\boldsymbol{x}|| = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2}$$

(Often written just as $|\boldsymbol{x}|$)

In two dimensions, it corresponds to the familiar Pythagorean theorem.

<img src='assets/norm.png'></img>

In [86]:
u = np.array([3, 4]) #

print np.sqrt(np.sum(u**2)), '(calculated "manually")'
print np.linalg.norm(u), '(using the \'norm\' function)'

5.0 (calculated "manually")
5.0 (using the 'norm' function)


### Scaling a vector

Multiplying a vector by a constant multiplies each term by the constant:

$$\alpha \cdot \boldsymbol{u} = [\alpha u_1, \alpha u_2, \ldots, \alpha u_n]$$

Intuitively, this is the same as stretching the vector by a factor of $\alpha$.

<img src='assets/vector-scaling.png'></img>

($a \cdot \boldsymbol{x}$ can also be denoted more simply without the dot as $a\boldsymbol{x}$)

In [87]:
a = np.array([1, 2, 3])

print 'a =', a

a = [1 2 3]


In [88]:
print '2*a =', 2*a  #

2*a = [2 4 6]


### Linear combinations of vectors

A _linear combination_ of a collection of vectors $(\boldsymbol{x}_1,
                                                    \boldsymbol{x}_2, \ldots,
                                                    \boldsymbol{x}_m)$
is a vector of the form:

$$a_1 \boldsymbol{x}_1 + a_2 \boldsymbol{x}_2 + \cdots + a_m \boldsymbol{x}_m$$

In [89]:
a1 = 2
x1 = np.array([1, 2, 3, 4])
a2 = 4
x2 = np.array([5, 6, 7, 8])

print 'a1 =', a1
print 'x1 =', x1
print 'a2 =', a2
print 'x2 =', x2

a1 = 2
x1 = [1 2 3 4]
a2 = 4
x2 = [5 6 7 8]


In [90]:
print 'a1*x1 + a2*x2 =', a1*x1 + a2*x2 #

a1*x1 + a2*x2 = [22 28 34 40]


### Linear subspaces

The set of all linear combination of a collection of vectors $(\boldsymbol{x}_1,
                                                    \boldsymbol{x}_2, \ldots,
                                                    \boldsymbol{x}_m)$
is a ('Euclidean') vector space *spanned* by the $x_i$  

How may dimensions does it have?

Answer $<=m$, with equality only if the $x_i$ are _linearly independent_, i.e.:  

$a_1 \boldsymbol{x}_1 + a_2 \boldsymbol{x}_2 + \cdots + a_m \boldsymbol{x}_m = 0$ implies all the $a_i = 0$

Linear dependence is the extreme case of 'near' linear dependence, where it is numerically difficult to see what linear combination made a given vector

### Distance between vectors

The distance between two vectors is the norm of their difference:

$$d(\boldsymbol{u},\boldsymbol{v}) = ||\boldsymbol{u} - \boldsymbol{v}||$$

<img src='assets/distance.png'></img>

In [91]:
u = np.array([7, 1])
v = np.array([3, 2])

print 'u =', u
print 'v =', v

u = [7 1]
v = [3 2]


In [93]:
print (u-v)
print 'd(u, v) =', np.linalg.norm(u - v) #

[ 4 -1]
d(u, v) = 4.12310562562


### The dot product (or inner product)

If we have two vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ of the same length $(n)$, then the _dot product_ is given by:

$$\boldsymbol{x} \cdot \boldsymbol{y} = x_1y_1 + x_2y_2 + \cdots + x_ny_n$$

Luckily, this formula calculates a trigonometry function 'for free':

$$\boldsymbol{x} \cdot \boldsymbol{y} = |\boldsymbol{x}| \cdot |\boldsymbol{y}| \cdot cos(\theta),$$

where $\theta$ is the angle between the vectors

In [94]:
x = np.array([np.cos(np.pi/3), np.sin(np.pi/3)])
y = np.array([np.cos(np.pi/2), np.sin(np.pi/2)])


print 'x =', x
print 'y =', y

x = [ 0.5        0.8660254]
y = [  6.12323400e-17   1.00000000e+00]


In [95]:
print np.dot(x, y)
print x.dot(y)
print y.dot(x)

0.866025403784
0.866025403784
0.866025403784


Note:  
As the angle between vectors increases, $60^o$ and $90^o$ are the angles at which half (respectively all) of the dot product shrinks away.  

[Remember angle measure in radians!]

In [96]:
(np.cos(np.pi/3), np.cos(np.pi/2))

(0.50000000000000011, 6.123233995736766e-17)

If $\mathbf{x} \cdot \mathbf{y} = 0$ then $x$ and $y$ are *orthogonal* (aligns with the intuitive notion of perpendicular).

In [97]:
x = np.array([np.cos(np.pi/3), np.sin(np.pi/3)])
y = np.array([np.cos(5*np.pi/6), np.sin(5*np.pi/6)])

np.dot(x, y)

-2.2204460492503131e-16

The norm squared of a vector is just the vector dot product with itself:

$$||x||^2 = x \cdot x$$

In [98]:
print np.linalg.norm(x)**2
print np.dot(x, x)

1.0
1.0


### Cosine Similarity

The _cosine similarity_ of the vectors is the cosine of the angle between them:

$$cos(\theta) = \frac{\boldsymbol{u} \cdot \boldsymbol{v}}{||\boldsymbol{u}|| \text{ } ||\boldsymbol{v}||}$$

<img src='assets/cosine-similarity.png'></img>

In [99]:
u = np.array([-7, 5])
v = np.array([5, 1])

print np.dot(u,v)/(np.linalg.norm(u)*np.linalg.norm(v))

-0.683941128881


You can think of a column vector as a 'feature', where the rows of the vector are just values taken by the feature for different members of the population.

If both $\boldsymbol{u}$ and $\boldsymbol{v}$ are zero-centered, this calculation $cos(\theta)$ is the _correlation_ between $\boldsymbol{u}$ and $\boldsymbol{v}$.  

Unless $\boldsymbol{u}$ and $\boldsymbol{v}$ are 'standardized' (also standard deviation = 1), $\boldsymbol{u} \cdot  \boldsymbol{v}$  is just the _covariance_ between $\boldsymbol{u}$ and $\boldsymbol{v}$.

In [101]:
u_centered = u - np.mean(u)
print u_centered

v_centered = v - np.mean(v)
print v_centered

print np.dot(u_centered, v_centered)/ \
(np.linalg.norm(u_centered)*np.linalg.norm(v_centered))

[-6.  6.]
[ 2. -2.]
-1.0


# Matrices

Matrices are two dimensional arrays of numbers which can transform one vector into another one in a _linear_ way. 

A function is linear if both $f(x+y)=f(x)+f(y)$ and $f(ax)=af(x)$ for a constant $a$.

An $n \times p$ matrix is an array of numbers with $n$ rows and $p$ columns:

$$
X =
  \begin{bmatrix}
    X_{11} & X_{12} & \cdots & X_{1p} \\
    X_{21} & X_{22} & \cdots & X_{2p} \\
    \vdots & \vdots & \ddots & \vdots \\
    X_{n1} & = X_{n2} & \cdots & X_{np} 
  \end{bmatrix}
$$

For the following $3 \times 3$ matrix

$$
X =
  \begin{bmatrix}
    cos(\pi / 6) & -sin(\pi / 6) & 0\\
    sin(\pi / 6) & cos(\pi / 6) & 0\\
    0 & 0 & 1\\    
  \end{bmatrix}
$$

<img src='assets/matrix-rotation.png'></img>

We can make a matrix in numpy using a two dimensional array. For instance, we can create $X$ as follows:

In [102]:
X = np.array([[np.cos(np.pi / 6), -np.sin(np.pi / 6), 0], \
              [np.sin(np.pi / 6), np.cos(np.pi / 6), 0], \
              [0, 0, 1]])

print X

[[ 0.8660254 -0.5        0.       ]
 [ 0.5        0.8660254  0.       ]
 [ 0.         0.         1.       ]]


### Shape of a matrix

The shape of a matrix tell us how many rows and columns it has. 

In [103]:
print X.shape

print 'The dimension of the matrix is {}x{} (rows x columns)'.\
format(X.shape[0], X.shape[1]) 

(3, 3)
The dimension of the matrix is 3x3 (rows x columns)


### Elements of a matrix

Let $X_{ij}$ and $(X)_{ij}$ denote the value in the $i$th row and $j$th column of the matrix $X$. We can look at these values in Python as follows:

In [104]:
print 'Element at i = 0, j = 0: ', X[0, 0]
print 'Second column: ', X[:, 1]
print 'Third row: ', X[2, :]

Element at i = 0, j = 0:  0.866025403784
Second column:  [-0.5        0.8660254  0.       ]
Third row:  [ 0.  0.  1.]


### Scalar multiplication

Scalar multiplication is defined just like scalar multiplication with vectors. Multiplying both a matrix and a vector by a scalar simply multiplies each element of both a matrix and a vector by a scalar.

In [105]:
print "2*X =\n", 2*X

2*X =
[[ 1.73205081 -1.          0.        ]
 [ 1.          1.73205081  0.        ]
 [ 0.          0.          2.        ]]


### Matrix addition

Matrix addition is also defined just like addition with vectors: The addition is carried element-wise:

<img src='assets/matrix-addition.png'></img>

In [106]:
X = np.array([[np.cos(np.pi / 6), -np.sin(np.pi / 6), 0], [np.sin(np.pi / 6), np.cos(np.pi / 6), 0], [0, 0, 1]])
Y = np.array([[1, 0, 2], [0, 1, -1], [0, 0, 1]])

print "X =\n", X, "\n"
print "Y =\n", Y

X =
[[ 0.8660254 -0.5        0.       ]
 [ 0.5        0.8660254  0.       ]
 [ 0.         0.         1.       ]] 

Y =
[[ 1  0  2]
 [ 0  1 -1]
 [ 0  0  1]]


In [107]:
print "X + Y =\n", X + Y

X + Y =
[[ 1.8660254 -0.5        2.       ]
 [ 0.5        1.8660254 -1.       ]
 [ 0.         0.         2.       ]]


### Multiplying matrices

In order to multiply two matrices, they must be _conformable_ such that the number of columns of the first matrix must be the same as the number of rows of the second matrix.

Let $X$ be a matrix of dimension $n \times p$ and let $Y$ be a matrix of dimension $p \times q$, then the product $XY$ will be a matrix of dimension $n \times q$ whose $(i,j)^{th}$ element is given by the dot product of the $i^{th}$ row of $X$ and the $j^{th}$ column of $Y$:

$$\sum_{k=1}^p X_{ik}Y_{kj} = X_{i1}Y_{1j} + \cdots + X_{ip}Y_{pj}$$

### Multiplying matrices
<img src='assets/matrix-multiplication.png'></img>

In [110]:
X = np.array([1, 0, 1])
R = np.array([[np.cos(np.pi / 6), -np.sin(np.pi / 6), 0], [np.sin(np.pi / 6), np.cos(np.pi / 6), 0], [0, 0, 1]])

print "X =\n", X, "\n"
print "R =\n", R

X =
[1 0 1] 

R =
[[ 0.8660254 -0.5        0.       ]
 [ 0.5        0.8660254  0.       ]
 [ 0.         0.         1.       ]]


In [111]:
X_p = np.dot(R, X)

print X_p

[ 0.8660254  0.5        1.       ]


$X_p$ is at $(cos(\pi / 6), sin(\pi / 6), 1)$

In [114]:
#R.dot(X)

#MAKE THIS A NEW CELL


X.dot(R)

array([ 0.8660254, -0.5      ,  1.       ])

### Matrix (element-wise) multiplication

In [115]:
# Regular multiply operator is just 

X = np.array([[np.cos(np.pi / 6), -np.sin(np.pi / 6), 0], [np.sin(np.pi / 6), np.cos(np.pi / 6), 0], [0, 0, 1]])
Y = np.array([[1, 0, 2], [0, 1, -1], [1, 1, 1]])

print "X =\n", X, "\n"
print "Y =\n", Y, "\n"

print "X * Y =\n", X * Y

X =
[[ 0.8660254 -0.5        0.       ]
 [ 0.5        0.8660254  0.       ]
 [ 0.         0.         1.       ]] 

Y =
[[ 1  0  2]
 [ 0  1 -1]
 [ 1  1  1]] 

X * Y =
[[ 0.8660254 -0.         0.       ]
 [ 0.         0.8660254 -0.       ]
 [ 0.         0.         1.       ]]


### Commutativity

A matrix is a square matrix if it has the same number of rows as columns. In that case if $X$ and $Y$ is a square matrix than we can both evaluate $XY$ and $YX$. However, in general, $XY \neq YX$ although equality can happen in special circumstances.

In [116]:
X = np.array([1, 0, 1])

R = np.array([[np.cos(np.pi / 6), -np.sin(np.pi / 6), 0], [np.sin(np.pi / 6), np.cos(np.pi / 6), 0], [0, 0, 1]])
T = np.array([[1, 0, 2], [0, 1, -1], [0, 0, 1]])

print "X =\n", X, "\n"
print "R =\n", R, "\n"
print "T =\n", T, "\n"

X =
[1 0 1] 

R =
[[ 0.8660254 -0.5        0.       ]
 [ 0.5        0.8660254  0.       ]
 [ 0.         0.         1.       ]] 

T =
[[ 1  0  2]
 [ 0  1 -1]
 [ 0  0  1]] 



In [117]:
print "Skew then rotation:"
print X, ' -> ', T.dot(X), '(first skewed)', R.dot(T.dot(X)), "(then rotated)\n"

print "R * T =\n", R.dot(T)

Skew then rotation:
[1 0 1]  ->  [ 3 -1  1] (first skewed) [ 3.09807621  0.6339746   1.        ] (then rotated)

R * T =
[[ 0.8660254  -0.5         2.23205081]
 [ 0.5         0.8660254   0.1339746 ]
 [ 0.          0.          1.        ]]


In [118]:
print "Rotation then skew:"
print X, ' -> ', R.dot(X), '(first rotated)', T.dot(R.dot(X)), "(then skewed)\n"

print "T * R =\n", T.dot(R)

Rotation then skew:
[1 0 1]  ->  [ 0.8660254  0.5        1.       ] (first rotated) [ 2.8660254 -0.5        1.       ] (then skewed)

T * R =
[[ 0.8660254 -0.5        2.       ]
 [ 0.5        0.8660254 -1.       ]
 [ 0.         0.         1.       ]]


### Additional Properties of Matrices: addition and associativity
1. If $X$ and $Y$ are both $n \times p$ matrices,
then $$X+Y = Y+X$$

2. If $X$, $Y$, and $Z$ are all $n \times p$ matrices,
then $$X+(Y+Z) = (X+Y)+Z$$

3. If $X$, $Y$, and $Z$ are all conformable,
then $$X(YZ) = (XY)Z$$


### Additional Properties of Matrices: distributive rules
4. If $X$ is of dimension $n \times k$ and $Y$ and $Z$ are of dimension $k \times p$, then $$X(Y+Z) = XY + XZ$$

5. If $X$ is of dimension $p \times n$ and $Y$ and $Z$ are of dimension $k \times p$, then $$(Y+Z)X = YX + ZX$$

6. If $a$ and $b$ are real numbers, and $X$ is an $n \times p$ matrix,
then $$(a+b)X = aX+bX$$

### Additional Properties of Matrices: scalar multiplication
6. If $a$ and $b$ are real numbers, and $X$ is an $n \times p$ matrix,
then $$(a+b)X = aX+bX$$
(second appearance of this rule...)

7. If $a$ is a real number, and $X$ and $Y$ are both $n \times p$ matrices,
then $$a(X+Y) = aX+aY$$

8. If $a$ is a real number, and $X$ and $Y$ are conformable, then
$$X(aY) = a(XY)$$

### Matrix Transpose

The transpose of an $n \times p$ matrix is a $p \times n$ matrix with rows and columns interchanged

$$
X =
  \begin{bmatrix}
    x_{11} & x_{12} & \cdots & x_{1n} \\
    x_{21} & x_{22} & \cdots & x_{2n} \\
    \vdots & \vdots & \ddots & \vdots \\
    x_{p1} & x_{p2} & \cdots & x_{pn} 
  \end{bmatrix} 
\implies
X^T =
  \begin{bmatrix}
    x_{11} & x_{21} & \cdots & x_{p1} \\
    x_{12} & x_{22} & \cdots & x_{p2} \\
    \vdots & \vdots & \ddots & \vdots \\
    x_{1n} & x_{2n} & \cdots & x_{pn} 
  \end{bmatrix}
$$



In [119]:
X = np.array([[0, 1, 2], [3, 4, 5]])

print "X's shape is", X.shape, "\n"
print "X =\n", X, "\n"

X_T = X.transpose()

print "X_T's shape is", X_T.shape, "\n"
print "X_T =\n", X_T

X's shape is (2, 3) 

X =
[[0 1 2]
 [3 4 5]] 

X_T's shape is (3, 2) 

X_T =
[[0 3]
 [1 4]
 [2 5]]


### Properties of Transpose
1. Let $X$ be an $n \times p$ matrix and $a$ a real number, then
$$(aX)^T = aX^T$$

2. Let $X$ and $Y$ be $n \times p$ matrices, then
$$(X + Y)^T = X^T + Y^T$$

3. Let $X$ be an $n \times k$ matrix and $Y$ be a $k \times p$ matrix, then
$$(XY)^T = Y^TX^T$$

### Vector in Matrix Form
A column vector is a matrix with $n$ rows and 1 column and to differentiate from a standard matrix $X$ of higher dimensions can be denoted as a bold lower case $\boldsymbol{x}$

$$
\boldsymbol{x} =
  \begin{bmatrix}
    x_{1}\\
    x_{2}\\
    \vdots\\
    x_{n}
  \end{bmatrix}
$$

In numpy, when we enter a vector, it will not normally have the second dimension.

In [120]:
x = np.array([1,2,3,4])

print 'x =', x
print "x's shape is", x.shape

x = [1 2 3 4]
x's shape is (4,)


In [121]:
x = x.transpose()

print 'x =', x, "\n"
print "x's shape is", x.shape

x = [1 2 3 4] 

x's shape is (4,)


In [122]:
y = x.reshape(4, 1)

print "y =\n", y, "\n"
print "y's shape is", y.shape

y =
[[1]
 [2]
 [3]
 [4]] 

y's shape is (4, 1)


In [123]:
z = x[:, np.newaxis]

print "z =\n", z, "\n"
print "z's shape is", z.shape

z =
[[1]
 [2]
 [3]
 [4]] 

z's shape is (4, 1)


In [124]:
t = z.transpose()

print "t =\n", t, "\n"
print "t's shape is", t.shape

t =
[[1 2 3 4]] 

t's shape is (1, 4)


A row vector is generally written as a transpose

$$\boldsymbol{x}^T = [x_1, x_2, \ldots, x_n]$$

If we have two vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ of the same length $(n)$, then the _dot product_ is given by matrix multiplication

$$\boldsymbol{x}^T \boldsymbol{y} =   
    \begin{bmatrix} x_1& x_2 & \ldots & x_n \end{bmatrix}
    \begin{bmatrix}
    y_{1}\\
    y_{2}\\
    \vdots\\
    y_{n}
  \end{bmatrix}  =
  x_1y_1 + x_2y_2 + \cdots + x_ny_n$$

## Inverse of a Matrix

The inverse of a square $n \times n$ matrix $X$ is an $n \times n$ matrix $X^{-1}$ such that 

$$X^{-1}X = XX^{-1} = I$$

Where $I$ is the identity matrix, an $n \times n$ diagonal matrix with 1's along the diagonal. 

If such a matrix exists, then $X$ is said to be _invertible_ or _nonsingular_, otherwise $X$ is said to be _noninvertible_ or _singular_.

In [139]:
#To better spot zeros, round to 4 decimml places

X = np.array([[1, 2, 3], [0, 1, 0], [-2, -1, 0]])
#X = np.array([[1, 2, 3], [1,2,3], [0,0,1]])
print "X =\n", X, "\n"

Y = np.linalg.inv(X)
print "Y =\n", np.round(Y,4), "\n"

print "XY =\n", np.round(Y.dot(X))

X =
[[ 1  2  3]
 [ 0  1  0]
 [-2 -1  0]] 

Y =
[[-0.     -0.5    -0.5   ]
 [ 0.      1.      0.    ]
 [ 0.3333 -0.5     0.1667]] 

XY =
[[ 1.  0. -0.]
 [ 0.  1.  0.]
 [ 0. -0.  1.]]


### Properties of Inverse
1. If $X$ is invertible, then $X^{-1}$ is invertible and
$$(X^{-1})^{-1} = X$$
2. If $X$ and $Y$ are both $n \times n$ invertible matrices, then $XY$ is invertible and
$$(XY)^{-1} = Y^{-1}X^{-1}$$
3. If $X$ is invertible, then $X^T$ is invertible and
$$(X^T)^{-1} = (X^{-1})^T$$

The 'size' of a matrix X is its _determinant_, written $det(M)$ or $|M|$. This number captures how X multiplies the volume of a unit n-dimensional box.  

Mirror-image transformations have a negative determinant.

X is invertible exactly when $|X| \neq 0$, and $X^{-1}$ is numberically unstable when $|X|$ is small.

### Orthogonal Matrices

Let $X$ be an $n \times n$ matrix such than $X^TX = I$, then $X$ is said to be orthogonal which implies that $X^T=X^{-1}$.

This is equivalent to saying that the columns of $X$ are all orthogonal to each other (and have unit length).

## Matrix Equations

A system of equations of the form:
\begin{align*}
    a_{11}x_1 + \cdots + a_{1n}x_n &= b_1 \\
    \vdots \hspace{1in} \vdots \\
    a_{m1}x_1 + \cdots + a_{mn}x_n &= b_m 
\end{align*}
can be written as a matrix equation:
$$
A\mathbf{x} = \mathbf{b}
$$
If A is invertable, this has solution
$$
\mathbf{x} = A^{-1}\mathbf{b}
$$

## Eigenvectors and Eigenvalues

Let $A$ be an $n \times n$ matrix and $\boldsymbol{x}$ be an $n \times 1$ nonzero vector. An _eigenvalue_ of $A$ is a number $\lambda$ such that

$$A \boldsymbol{x} = \lambda \boldsymbol{x}$$

A vector $\boldsymbol{x}$ satisfying this equation is called an eigenvector associated with $\lambda$.

Eigenvectors and eigenvalues will play a huge roll in matrix methods later in the course (PCA, SVD, NMF).

In [136]:
A = np.array([[1, 1], [1, 2]])
lambdas, X = np.linalg.eig(A)

print 'lambdas =', lambdas
print "X =\n", X

lambdas = [ 0.38196601  2.61803399]
X =
[[-0.85065081 -0.52573111]
 [ 0.52573111 -0.85065081]]


In [137]:
first_lambda = lambdas[0]
first_x = X[:, 0]

print 'Ax         =', A.dot(first_x)
print 'lambda * x =', first_lambda * first_x

Ax         = [-0.3249197   0.20081142]
lambda * x = [-0.3249197   0.20081142]


In [138]:
second_lambda = lambdas[1]
second_x = X[:, 1]

print 'Ax         =', A.dot(second_x)
print 'lambda * x =', second_lambda * second_x

Ax         = [-1.37638192 -2.22703273]
lambda * x = [-1.37638192 -2.22703273]


### Stochastic matrices

A stochastic matrix represents the probability of going from one state to another. If P is a stochastic matrix, then $P_{ij}$ has the interpretations that the if a system is in state $j$, it has a probability $P_{ij}$ to be in state $i$ on the next iteration. If a system has a probability distribution $x$ then after applying $P$, the probability distribution will be $P \cdot x$.

<img src='assets/markov-chain.png'></img>

In [129]:
P = np.array([[.9, .15, .25], [.075, .8, .25], [.025, .05, .5]])

print "P =\n", P

P =
[[ 0.9    0.15   0.25 ]
 [ 0.075  0.8    0.25 ]
 [ 0.025  0.05   0.5  ]]


In [132]:
np.sum(P, axis = 0)

array([ 1.,  1.,  1.])

If we start out with a probability distribution of $x^T=[.5,.25,.25]$ then what will be the probability at the first iteration? How about iteration 1,000?

In [133]:
x = np.array([[.5], [.25], [.25]])

P.dot(x)

array([[ 0.55],
       [ 0.3 ],
       [ 0.15]])

In [134]:
for _ in xrange(1001):
    x = P.dot(x)

x

array([[ 0.625 ],
       [ 0.3125],
       [ 0.0625]])

In [135]:
np.sum(x)

1.0000000000000047