# Linear Algebra with examples using Numpy

## Linear Algebra and Machine Learning

Linear algebra is a language. You can describe any mathematical system with linear algebra. As such, it appears everywhere.

Where can it be found within data science?

* Ranking web pages in order of importance
  * Solved as the problem of finding the eigenvector of the page score matrix
* Dimensionality reduction - Principal Component Analysis
* Movie recommendation
  * Use singular value decomposition (SVD) to break down user-movie into user-feature and movie-feature matrices, keeping only the top $k$-ranks to identify the best matches
* Topic modeling
  * Extensive use of SVD and matrix factorization can be found in Natural Language Processing, specifically in topic modeling and semantic analysis

In [1]:
import numpy as np

## Numpy

Numpy is a fast way to perform operations on arrays. Compare adding one to each element of a list with a for loop to doing it with a numpy array:

In [30]:
x = range(1000000)
y = np.array(x)

In [24]:
y

array([     0,      1,      2, ..., 999997, 999998, 999999])

In [25]:
type(x)

list

In [26]:
type(y)

numpy.ndarray

In [31]:
def f(x):
    for i, _ in enumerate(x):
        x[i] += 1
    return x
        
%time f(x)
# x_new = f(x)

def g(y):
    y += 1
    return y

%time g(y)

None

CPU times: user 156 ms, sys: 20.5 ms, total: 176 ms
Wall time: 181 ms
CPU times: user 1.58 ms, sys: 11 µs, total: 1.59 ms
Wall time: 1.64 ms


In [34]:
y

array([      1,       2,       3, ...,  999998,  999999, 1000000])

In [33]:
x

[1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,
 185

In [36]:
a = np.array([1,2])
a += 1
print a

[2 3]


The numpy array implementation is way faster!

In [37]:
vector_1 = np.array([1,2,3])

## Vectors

A vector can be represented by an array of real numbers:

$$\mathbf{x} = [x_1, x_2, \ldots, x_n]$$

Geometrically, a vector can specify the coordinates of points in $\mathbb{R}^n$.  It can also specify a translation from one point to another.

<img src = 'assets/vectors.png'></img>

In [38]:
x = np.array([3, -1])

In [40]:
x

array([ 3, -1])

In [41]:
print 'x =', x

x = [ 3 -1]


### Vector addition

If we have two vectors $\boldsymbol{u}$ and $\boldsymbol{v}$ of the same dimension, e.g., 2, then

$$\boldsymbol{u} + \boldsymbol{v} = [u_1+v_1, u_2+v_2, \ldots, u_n+v_n]$$

There are multiple ways of imaging this. Here's one: If $\boldsymbol{u}$ and $\boldsymbol{v}$ in $\mathbb{R}^2$ are represented as points in the plane, then $\boldsymbol{u} + \boldsymbol{v}$ corresponds to the fourth vertex of the parallelogram whose other vertices are $\boldsymbol{u}$, $\boldsymbol{0}$, and $\boldsymbol{v}$.

<img src = 'assets/vector-addition.png'></img>

In [42]:
u = np.array([1, 3])
v = np.array([5, 1])

print 'u =', u
print 'v =', v

u = [1 3]
v = [5 1]


In [43]:
print 'u + v =', u + v # TODO

u + v = [6 4]


### Adding a constant to a vector

Adding a constant to a vector adds the constant to each element of the vector:

$$a + \boldsymbol{x} = [a + x_1, a + x_2, \ldots, a + x_n]$$

In [45]:
a = 4
x = np.array([1, 3, 4])

print 'a =', a
print 'x =', x

a = 4
x = [1 3 4]


In [46]:
print 'a + x =', a + x # TODO

a + x = [5 7 8]


### Length of a vector

The norm (or length) of a vector $\mathbf{x}$ is defined by:

$$||\boldsymbol{x}|| = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2}$$

In two dimensions, it corresponds to the familiar Pythagorean theorem.

<img src='assets/norm.png'></img>

In [49]:
u = np.array([3, 4])

# TODO
print u
print u**2
print np.sqrt(np.sum(u**2)), '(calculated "manually")'
print np.linalg.norm(u), '(using the \'norm\' function)'

[3 4]
[ 9 16]
5.0 (calculated "manually")
5.0 (using the 'norm' function)


### Scaling a vector

Multiplying a vector by a constant multiplies each term by the constant:

$$\alpha \cdot \boldsymbol{u} = [\alpha u_1, \alpha u_2, \ldots, \alpha u_n]$$

Intuitively, this is the same as stretching the vector by a factor of $\alpha$.

<img src='assets/vector-scaling.png'></img>

($a \cdot \boldsymbol{x}$ can also be denoted more simply without the dot as $a\boldsymbol{x}$)

In [50]:
a = np.array([1, 2, 3])

print 'a =', a

a = [1 2 3]


In [51]:
print '2*a =', 2*a # TODO

2*a = [2 4 6]


### Linear combinations of vectors

A _linear combination_ of a collection of vectors $(\boldsymbol{x}_1,
                                                    \boldsymbol{x}_2, \ldots,
                                                    \boldsymbol{x}_m)$
is a vector of the form:

$$a_1 \boldsymbol{x}_1 + a_2 \boldsymbol{x}_2 + \cdots + a_m \boldsymbol{x}_m$$

In [52]:
a1 = 2
x1 = np.array([1, 2, 3, 4])
a2 = 4
x2 = np.array([5, 6, 7, 8])

print 'a1 =', a1
print 'x1 =', x1
print 'a2 =', a2
print 'x2 =', x2

a1 = 2
x1 = [1 2 3 4]
a2 = 4
x2 = [5 6 7 8]


In [53]:
print 'a1*x1 + a2*x2 =', a1*x1 + a2*x2 # TODO

a1*x1 + a2*x2 = [22 28 34 40]


### Distance between vectors

The distance between two vectors is the norm of their difference:

$$d(u,v) = ||u-v||$$

<img src='assets/distance.png'></img>

In [54]:
u = np.array([7, 1])
v = np.array([3, 2])

print 'u =', u
print 'v =', v

u = [7 1]
v = [3 2]


In [55]:
print 'd(u, v) =', np.linalg.norm(u - v)

d(u, v) = 4.12310562562


### The dot product (or inner product)

If we have two vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ of the same length $(n)$, then the _dot product_ is given by:

$$\boldsymbol{x} \cdot \boldsymbol{y} = x_1y_1 + x_2y_2 + \cdots + x_ny_n$$

In [56]:
x = np.array([np.cos(np.pi/3), np.sin(np.pi/3)])
y = np.array([np.cos(np.pi/2), np.sin(np.pi/2)])


print 'x =', x
print 'y =', y

x = [ 0.5        0.8660254]
y = [  6.12323400e-17   1.00000000e+00]


In [57]:
print np.dot(x, y)
print x.dot(y)
print y.dot(x)

0.866025403784
0.866025403784
0.866025403784


Note:

In [None]:
np.cos(np.pi/6)

If $\mathbf{x} \cdot \mathbf{y} = 0$ then $x$ and $y$ are *orthogonal* (aligns with the intuitive notion of perpendicular).

In [58]:
x = np.array([1,0])
y = np.array([0,1])

np.dot(x, y)

0

In [None]:
x = np.array([np.cos(np.pi/3), np.sin(np.pi/3)])
y = np.array([np.cos(5*np.pi/6), np.sin(5*np.pi/6)])

np.dot(x, y)

The norm squared of a vector is just the vector dot product with itself:

$$||x||^2 = x \cdot x$$

In [59]:
print np.linalg.norm(x)**2
print np.dot(x, x)

1.0
1


### Cosine Similarity

The _cosine similarity_ of the vectors is the cosine of the angle between them:

$$cos(\theta) = \frac{\boldsymbol{u} \cdot \boldsymbol{v}}{||\boldsymbol{u}|| \text{ } ||\boldsymbol{v}||}$$

<img src='assets/cosine-similarity.png'></img>

In [65]:
u = np.array([-7, 5])
v = np.array([5, 1])

print np.dot(u,v)/(np.linalg.norm(u)*np.linalg.norm(v))

-0.683941128881


If both $\boldsymbol{u}$ and $\boldsymbol{v}$ are zero-centered, this calculation is the _correlation_ between $\boldsymbol{u}$ and $\boldsymbol{v}$.

In [66]:
u_centered = u - np.mean(u)
print u_centered

v_centered = v - np.mean(v)
print v_centered

print np.dot(u_centered, v_centered)/(np.linalg.norm(u_centered)*np.linalg.norm(v_centered))

[-6.  6.]
[ 2. -2.]
-1.0


# Matrices

Matrices are two dimensional arrays of numbers which can transform one vector into another one in a linear way. A function is linear if both $f(x+y)=f(x)+f(y)$ and $f(ax)=af(x)$ for a constant $a$.

An $n \times p$ matrix is an array of numbers with $n$ rows and $p$ columns:

$$
X =
  \begin{bmatrix}
    X_{11} & X_{12} & \cdots & X_{1p} \\
    X_{21} & X_{22} & \cdots & X_{2p} \\
    \vdots & \vdots & \ddots & \vdots \\
    X_{n1} & = X_{n2} & \cdots & X_{np} 
  \end{bmatrix}
$$

For the following $3 \times 3$ matrix

$$
X =
  \begin{bmatrix}
    cos(\pi / 4) & -sin(\pi / 4) & 0\\
    sin(\pi / 4) & cos(\pi / 4) & 0\\
    0 & 0 & 1\\    
  \end{bmatrix}
$$

<img src='assets/matrix-rotation.png'></img>

We can make a matrix in numpy using a two dimensional array. For instance, we can create $X$ as follows:

In [69]:
y = np.array([1,2,3])
y

array([1, 2, 3])

In [73]:
X = np.array([[np.cos(np.pi / 4), -np.sin(np.pi / 4), 0], 
              [np.sin(np.pi / 4), np.cos(np.pi / 4), 0], 
              [0, 0, 1]])
# age, favorite color, height

X = np.array([30, 'green', 60],
            [23, 'blue', 55],
            [32, 'yellow', 40])

print X

ValueError: only 2 non-keyword arguments accepted

In [75]:
a = np.array(['blue', 2])


array(['blue', '2'], 
      dtype='|S4')

### Shape of a matrix

The shape of a matrix tell us how many rows and columns it has. 

In [72]:
print X.shape

print 'The dimension of the matrix is {}x{} (rows x columns)'.format(X.shape[0], X.shape[1]) 

(3, 3)
The dimension of the matrix is 3x3 (rows x columns)


### Elements of a matrix

Let $X_{ij}$ and $(X)_{ij}$ denote the value in the $i$th row and $j$th column of the matrix $X$. We can look at these values in Python as follows:

In [77]:
X

array([[ 0.70710678, -0.70710678,  0.        ],
       [ 0.70710678,  0.70710678,  0.        ],
       [ 0.        ,  0.        ,  1.        ]])

In [79]:
print 'Element at i = 0, j = 0: ', X[0, 0]
print 'Second column: ', X[:, 1]
print 'Third row: ', X[2, :]

Element at i = 0, j = 0:  0.707106781187
Second column:  [-0.70710678  0.70710678  0.        ]
Third row:  [ 0.  0.  1.]


### Scalar multiplication

Scalar multiplication is defined just like scalar multiplication with vectors. Multiplying both a matrix and a vector by a scalar simply multiplies each element of both a matrix and a vector by a scalar.

In [80]:
print "2*X =\n", 2*X

2*X =
[[ 1.41421356 -1.41421356  0.        ]
 [ 1.41421356  1.41421356  0.        ]
 [ 0.          0.          2.        ]]


### Matrix addition

Matrix addition is also defined just like addition with vectors: The addition is carried element-wise:

<img src='assets/matrix-addition.png'></img>

In [81]:
X = np.array([[np.cos(np.pi / 4), -np.sin(np.pi / 4), 0], [np.sin(np.pi / 4), np.cos(np.pi / 4), 0], [0, 0, 1]])
Y = np.array([[1, 0, 2], [0, 1, -1], [0, 0, 1]])

print "X =\n", X, "\n"
print "Y =\n", Y

X =
[[ 0.70710678 -0.70710678  0.        ]
 [ 0.70710678  0.70710678  0.        ]
 [ 0.          0.          1.        ]] 

Y =
[[ 1  0  2]
 [ 0  1 -1]
 [ 0  0  1]]


In [82]:
print "X + Y =\n", X + Y

X + Y =
[[ 1.70710678 -0.70710678  2.        ]
 [ 0.70710678  1.70710678 -1.        ]
 [ 0.          0.          2.        ]]


### Multiplying matrices

In order to multiply two matrices, they must be _conformable_ such that the number of columns of the first matrix must be the same as the number of rows of the second matrix.

Let $X$ be a matrix of dimension $n \times p$ and let $Y$ be a matrix of dimension $p \times q$, then the product $XY$ will be a matrix of dimension $n \times q$ whose $(i,j)^{th}$ element is given by the dot product of the $i^{th}$ row of $X$ and the $j^{th}$ column of $Y$:

$$\sum_{k=1}^p X_{ik}Y_{kj} = X_{i1}Y_{1j} + \cdots + X_{ip}Y_{pj}$$

<img src='assets/matrix-multiplication.png'></img>

In [98]:
a = np.array([1,2,3])

In [103]:
a.shape

(3,)

In [100]:
b = np.array([[1,2,3]])

In [101]:
b.shape

(1, 3)

In [110]:
X = np.array([[1, 0, 1]])
R = np.array([[np.cos(np.pi / 4), -np.sin(np.pi / 4), 0], [np.sin(np.pi / 4), np.cos(np.pi / 4), 0], [0, 0, 1]])

print "X =\n", X, "\n"
print "R =\n", R

X =
[[1 0 1]] 

R =
[[ 0.70710678 -0.70710678  0.        ]
 [ 0.70710678  0.70710678  0.        ]
 [ 0.          0.          1.        ]]


In [108]:
X_p = np.dot(R, X)

print X_p

ValueError: shapes (3,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)

In [111]:
X_p = np.dot(X, R)
print X_p

[[ 0.70710678 -0.70710678  1.        ]]


In [112]:
X.dot(R)

array([[ 0.70710678, -0.70710678,  1.        ]])

In [113]:
R.dot(X)

ValueError: shapes (3,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)

In [114]:
# np.matmul()

In [None]:
a = np.array([], [])

$X'$ is at $(cos(\pi / 4), sin(\pi / 4))$

In [None]:
R.dot(X)

### Matrix (element-wise) multiplication

In [115]:
# Regular multiply operator is just 

X = np.array([[np.cos(np.pi / 4), -np.sin(np.pi / 4), 0], [np.sin(np.pi / 4), np.cos(np.pi / 4), 0], [0, 0, 1]])
Y = np.array([[1, 0, 2], [0, 1, -1], [1, 1, 1]])

print "X =\n", X, "\n"
print "Y =\n", Y, "\n"

print "X * Y =\n", X * Y

X =
[[ 0.70710678 -0.70710678  0.        ]
 [ 0.70710678  0.70710678  0.        ]
 [ 0.          0.          1.        ]] 

Y =
[[ 1  0  2]
 [ 0  1 -1]
 [ 1  1  1]] 

X * Y =
[[ 0.70710678 -0.          0.        ]
 [ 0.          0.70710678 -0.        ]
 [ 0.          0.          1.        ]]


### Commutativity

A matrix is a square matrix if it has the same number of rows as columns. In that case if $X$ and $Y$ is a square matrix than we can both evaluate $XY$ and $YX$. However, in general, $XY \neq YX$ although equality can happen in special circumstances.

In [116]:
X = np.array([1, 0, 1])

R = np.array([[np.cos(np.pi / 4), -np.sin(np.pi / 4), 0], [np.sin(np.pi / 4), np.cos(np.pi / 4), 0], [0, 0, 1]])
T = np.array([[1, 0, 2], [0, 1, -1], [0, 0, 1]])

print "X =\n", X, "\n"
print "R =\n", R, "\n"
print "T =\n", T, "\n"

X =
[1 0 1] 

R =
[[ 0.70710678 -0.70710678  0.        ]
 [ 0.70710678  0.70710678  0.        ]
 [ 0.          0.          1.        ]] 

T =
[[ 1  0  2]
 [ 0  1 -1]
 [ 0  0  1]] 



In [117]:

R.dot(T)

array([[ 0.70710678, -0.70710678,  2.12132034],
       [ 0.70710678,  0.70710678,  0.70710678],
       [ 0.        ,  0.        ,  1.        ]])

In [118]:
T.dot(R)

array([[ 0.70710678, -0.70710678,  2.        ],
       [ 0.70710678,  0.70710678, -1.        ],
       [ 0.        ,  0.        ,  1.        ]])

In [None]:
print "Translation then rotation:"
print X, ' -> ', T.dot(X), '(first translated)', R.dot(T.dot(X)), "(then rotated)\n"

print "R * T =\n", R.dot(T)

In [None]:
print "Rotation then translation:"
print X, ' -> ', R.dot(X), '(first rotated)', T.dot(R.dot(X)), "(then translated)\n"

print "T * R =\n", T.dot(R)

### Additional Properties of Matrices
1. If $X$ and $Y$ are both $n \times p$ matrices,
then $$X+Y = Y+X$$

2. If $X$, $Y$, and $Z$ are all $n \times p$ matrices,
then $$X+(Y+Z) = (X+Y)+Z$$

3. If $X$, $Y$, and $Z$ are all conformable,
then $$X(YZ) = (XY)Z$$

4. If $X$ is of dimension $n \times k$ and $Y$ and $Z$ are of dimension $k \times p$, then $$X(Y+Z) = XY + XZ$$

5. If $X$ is of dimension $p \times n$ and $Y$ and $Z$ are of dimension $k \times p$, then $$(Y+Z)X = YX + ZX$$

6. If $a$ and $b$ are real numbers, and $X$ is an $n \times p$ matrix,
then $$(a+b)X = aX+bX$$

7. If $a$ is a real number, and $X$ and $Y$ are both $n \times p$ matrices,
then $$a(X+Y) = aX+aY$$

8. If $a$ is a real number, and $X$ and $Y$ are conformable, then
$$X(aY) = a(XY)$$

### Matrix Transpose

The transpose of an $n \times p$ matrix is a $p \times n$ matrix with rows and columns interchanged

$$
X^T =
  \begin{bmatrix}
    x_{11} & x_{12} & \cdots & x_{1n} \\
    x_{21} & x_{22} & \cdots & x_{2n} \\
    \vdots & \vdots & \ddots & \vdots \\
    x_{p1} & x_{p2} & \cdots & x_{pn} 
  \end{bmatrix}
$$



In [119]:
X = np.array([[0, 1, 2], [3, 4, 5]])

print "X's shape is", X.shape, "\n"
print "X =\n", X, "\n"

X_T = X.transpose()

print "X_T's shape is", X_T.shape, "\n"
print "X_T =\n", X_T

X's shape is (2, 3) 

X =
[[0 1 2]
 [3 4 5]] 

X_T's shape is (3, 2) 

X_T =
[[0 3]
 [1 4]
 [2 5]]


### Properties of Transpose
1. Let $X$ be an $n \times p$ matrix and $a$ a real number, then
$$(aX)^T = aX^T$$

2. Let $X$ and $Y$ be $n \times p$ matrices, then
$$(X + Y)^T = X^T + Y^T$$

3. Let $X$ be an $n \times k$ matrix and $Y$ be a $k \times p$ matrix, then
$$(XY)^T = Y^TX^T$$

### Vector in Matrix Form
A column vector is a matrix with $n$ rows and 1 column and to differentiate from a standard matrix $X$ of higher dimensions can be denoted as a bold lower case $\boldsymbol{x}$

$$
\boldsymbol{x} =
  \begin{bmatrix}
    x_{1}\\
    x_{2}\\
    \vdots\\
    x_{n}
  \end{bmatrix}
$$

In numpy, when we enter a vector, it will not normally have the second dimension.

In [126]:
x = np.array([1,2,3,4])

print 'x =', x
print "x's shape is", x.shape

x = [1 2 3 4]
x's shape is (4,)


In [121]:
x = x.transpose()

print 'x =', x, "\n"
print "x's shape is", x.shape

x = [1 2 3 4] 

x's shape is (4,)


In [129]:
y = x.reshape(4, 1)

print "y =\n", y, "\n"
print "y's shape is", y.shape

y =
[[1]
 [2]
 [3]
 [4]] 

y's shape is (4, 1)


In [128]:
z = x[:, np.newaxis]

print "z =\n", z, "\n"
print "z's shape is", z.shape

z =
[[1]
 [2]
 [3]
 [4]] 

z's shape is (4, 1)


In [124]:
t = y.transpose()

print "t =\n", t, "\n"
print "t's shape is", t.shape

t =
[[1 2 3 4]] 

t's shape is (1, 4)


A row vector is generally written as the transpose

$$\boldsymbol{x}^T = [x_1, x_2, \ldots, x_n]$$

If we have two vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ of the same length $(n)$, then the _dot product_ is give by matrix multiplication

$$\boldsymbol{x}^T \boldsymbol{y} =   
    \begin{bmatrix} x_1& x_2 & \ldots & x_n \end{bmatrix}
    \begin{bmatrix}
    y_{1}\\
    y_{2}\\
    \vdots\\
    y_{n}
  \end{bmatrix}  =
  x_1y_1 + x_2y_2 + \cdots + x_ny_n$$

## Inverse of a Matrix

The inverse of a square $n \times n$ matrix $X$ is an $n \times n$ matrix $X^{-1}$ such that 

$$X^{-1}X = XX^{-1} = I$$

Where $I$ is the identity matrix, an $n \times n$ diagonal matrix with 1's along the diagonal. 

If such a matrix exists, then $X$ is said to be _invertible_ or _nonsingular_, otherwise $X$ is said to be _noninvertible_ or _singular_.

In [None]:
X = np.array([[1, 2, 3], [0, 1, 0], [-2, -1, 0]])
print "X =\n", X, "\n"

Y = np.linalg.inv(X)
print "Y =\n", Y, "\n"

print "XY =\n", Y.dot(X)

### Properties of Inverse
1. If $X$ is invertible, then $X^{-1}$ is invertible and
$$(X^{-1})^{-1} = X$$
2. If $X$ and $Y$ are both $n \times n$ invertible matrices, then $XY$ is invertible and
$$(XY)^{-1} = Y^{-1}X^{-1}$$
3. If $X$ is invertible, then $X^T$ is invertible and
$$(X^T)^{-1} = (X^{-1})^T$$

### Orthogonal Matrices

Let $X$ be an $n \times n$ matrix such than $X^TX = I$, then $X$ is said to be orthogonal which implies that $X^T=X^{-1}$.

This is equivalent to saying that the columns of $X$ are all orthogonal to each other (and have unit length).

## Matrix Equations

A system of equations of the form:
\begin{align*}
    a_{11}x_1 + \cdots + a_{1n}x_n &= b_1 \\
    \vdots \hspace{1in} \vdots \\
    a_{m1}x_1 + \cdots + a_{mn}x_n &= b_m 
\end{align*}
can be written as a matrix equation:
$$
A\mathbf{x} = \mathbf{b}
$$
and hence, has solution
$$
\mathbf{x} = A^{-1}\mathbf{b}
$$

## Eigenvectors and Eigenvalues

Let $A$ be an $n \times n$ matrix and $\boldsymbol{x}$ be an $n \times 1$ nonzero vector. An _eigenvalue_ of $A$ is a number $\lambda$ such that

$$A \boldsymbol{x} = \lambda \boldsymbol{x}$$

A vector $\boldsymbol{x}$ satisfying this equation is called an eigenvector associated with $\lambda$.

Eigenvectors and eigenvalues will play a huge roll in matrix methods later in the course (PCA, SVD, NMF).

In [138]:
A

array([[1, 1],
       [1, 2]])

In [139]:
A = np.array([[1, 1], [1, 2]])
lambdas, X = np.linalg.eig(A)

print 'lambdas =', lambdas
print "X =\n", X

lambdas = [ 0.38196601  2.61803399]
X =
[[-0.85065081 -0.52573111]
 [ 0.52573111 -0.85065081]]


In [142]:
first_lambda = lambdas[0]
first_x = X[:, 0]

print 'Ax         =', A.dot(first_x)
print 'lambda * x =', first_lambda * first_x

Ax         = [-0.3249197   0.20081142]
lambda * x = [-0.3249197   0.20081142]


In [143]:
second_lambda = lambdas[1]
second_x = X[:, 1]

print 'Ax         =', A.dot(second_x)
print 'lambda * x =', second_lambda * second_x

Ax         = [-1.37638192 -2.22703273]
lambda * x = [-1.37638192 -2.22703273]


### Stochastic matrices

A stochastic matrix represents the probability of going from one state to another. If P is a stochastic matrix, then $P_{ij}$ has the interpretations that the if a system is in state $j$, it has a probability $P_{ij}$ to be in state $i$ on the next iteration. If a system has a probability distribution $x$ then after applying $P$, the probability distribution will be $P \cdot x$.

<img src='assets/markov-chain.png'></img>

In [132]:
P = np.array([[.9, .15, .25], [.075, .8, .25], [.025, .05, .5]])

print "P =\n", P

P =
[[ 0.9    0.15   0.25 ]
 [ 0.075  0.8    0.25 ]
 [ 0.025  0.05   0.5  ]]


In [133]:
np.sum(P, axis = 0)

array([ 1.,  1.,  1.])

If we start out with a probability distribution of $x^T=[.5,.25,.25]$ then what will be the probability at the first iteration? How about iteration 1,000?

In [134]:
x = np.array([[.5], [.25], [.25]])

P.dot(x)

array([[ 0.55],
       [ 0.3 ],
       [ 0.15]])

In [135]:
for _ in xrange(1001):
    x = P.dot(x)

x

array([[ 0.625 ],
       [ 0.3125],
       [ 0.0625]])

In [136]:
np.sum(x)

1.0000000000000047