In [1]:
import numpy as np

# Chapter 2: Linear Algebra

A branch of mathematics widely used throughout science and engineering.

Essential for understanding and working with machine learning algorithms.

## 2.1: Scalars, Vectors, Matrices and Tensors

- **Scalars**: A single number
- **Vectors**: An array of numbers; each number is identified by its index.
- **Matrices**: A 2D array of numbers; each number is identified by two indices
- **Tensors**: A general, variably-dimensional array of numbers;

### Scalars

We'll use NumPy throughout these notebooks.  Numpy defines a set of fundamental scalar types.

The default data type in numpy is `float_`.

In [2]:
"""
Some scalar numbers in Python
"""
x_1 = -0.000498
x_2 = 34

"""
NumPy defines 24 scalar types.  Note in cases such as bool_ and int_,
these are substantially different than the native Python counterparts.
"""

for t in np.ScalarType:
    print(t)

<class 'int'>
<class 'float'>
<class 'complex'>
<class 'int'>
<class 'bool'>
<class 'bytes'>
<class 'str'>
<class 'memoryview'>
<class 'numpy.int8'>
<class 'numpy.uint8'>
<class 'numpy.float16'>
<class 'numpy.timedelta64'>
<class 'numpy.object_'>
<class 'numpy.int16'>
<class 'numpy.uint16'>
<class 'numpy.float32'>
<class 'numpy.complex64'>
<class 'numpy.bytes_'>
<class 'numpy.int32'>
<class 'numpy.uint32'>
<class 'numpy.float64'>
<class 'numpy.complex128'>
<class 'numpy.str_'>
<class 'numpy.int64'>
<class 'numpy.uint64'>
<class 'numpy.float128'>
<class 'numpy.complex256'>
<class 'numpy.bool_'>
<class 'numpy.void'>
<class 'numpy.longlong'>
<class 'numpy.ulonglong'>
<class 'numpy.datetime64'>


### Vectors

NumPy’s main object is the homogeneous multidimensional array.

We can represent a vector simply as a 1-dimensional array:

$\Large \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}$

In [3]:
# Let's represent a vector:
v_1 = np.array( [20, 30, 40, 50] )

print('Representing a vector of values:\n')
print(v_1)

print("---")
print(f'Shape: {v_1.shape}')
print(f'Dimensions: {v_1.ndim}')
print(f'Type: {type(v_1)}')
print(f'Type of values in matrix: {v_1.dtype}')

Representing a vector of values:

[20 30 40 50]
---
Shape: (4,)
Dimensions: 1
Type: <class 'numpy.ndarray'>
Type of values in matrix: int64


### Matrices

We can continue to use the same data structure (the `ndarray` type) to build matrices, or 2-dimensional  arrays:

$\Large \begin{bmatrix} A_{1,1} && A_{1,2} \\ A_{2,1} && A_{2,2} \end{bmatrix}$

In [4]:
# Let's create a few matrices with NumPy:

# An 8x8 matrix of ones of type int32
m_1 = np.ones((8, 8), dtype=np.int32)

print('Representing a matrix of 1s:\n')
print(m_1)

print("---")
print(f'Shape: {m_1.shape}')
print(f'Dimensions: {m_1.ndim}')
print(f'Type: {type(m_1)}')
print(f'Type of values in matrix: {m_1.dtype}')

Representing a matrix of 1s:

[[1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1]]
---
Shape: (8, 8)
Dimensions: 2
Type: <class 'numpy.ndarray'>
Type of values in matrix: int32


In [5]:
# A 2x6 matrix of random `float64s`
# we can do this with an instance of the default random number generator
rng = np.random.default_rng(1)
m_2 = rng.random((2, 6))

print('Representing a matrix of random float64s:\n')
print(m_2)

print("---")
print(f'Shape: {m_2.shape}')
print(f'Dimensions: {m_2.ndim}')
print(f'Type: {type(m_2)}')
print(f'Type of values in matrix: {m_2.dtype}')

Representing a matrix of random float64s:

[[0.51182162 0.9504637  0.14415961 0.94864945 0.31183145 0.42332645]
 [0.82770259 0.40919914 0.54959369 0.02755911 0.75351311 0.53814331]]
---
Shape: (2, 6)
Dimensions: 2
Type: <class 'numpy.ndarray'>
Type of values in matrix: float64


### Tensors

Given that __vectors__ are just first-order __tensors__, and __matrices__ are second-order __tensors__, we can continue to use the `ndarray` type to represent n-dimensional tensors:

In [6]:
# Let's make a 3-dimensional tensor!
t_1 = rng.random((3, 3, 3))

print('Representing a tensor:\n')
print(t_1)

print("---")
print(f'Shape: {t_1.shape}')
print(f'Dimensions: {t_1.ndim}')
print(f'Type: {type(t_1)}')
print(f'Type of values in tensor: {t_1.dtype}')

Representing a tensor:

[[[0.32973172 0.7884287  0.30319483]
  [0.45349789 0.1340417  0.40311299]
  [0.20345524 0.26231334 0.75036467]]

 [[0.28040876 0.48519097 0.9807372 ]
  [0.96165719 0.72478994 0.54122686]
  [0.2768912  0.16065201 0.96992541]]

 [[0.51606859 0.11586561 0.62348976]
  [0.77668311 0.6130033  0.9172977 ]
  [0.03959288 0.52858926 0.45933588]]]
---
Shape: (3, 3, 3)
Dimensions: 3
Type: <class 'numpy.ndarray'>
Type of values in tensor: float64


### Transpose Operation

> The transpose of a matrix is the mirror image of the matrix across a diagonal line, called the __main diagonal__, running down and to the right, starting from its upper left corner.

The __transpose__ of $A$ is denoted as $A^\top$.

$$\huge (A^\top)_{i,j} = A_{j,i}$$

Notice in the illustration below how the transpose operation mirrors the matrix over the __main diagonal__:

_(GitHub currently doesn't render the transposed matrix correctly, try in your own notebook for better results)_

$$\Large A = \begin{bmatrix} A_{1,1} && A_{1,2}\\ A_{2,1} && A_{2,2} \\ A_{3,1} && A_{3,2} \end{bmatrix} \Rightarrow A^\top = \begin{bmatrix} A_{1,1} && A_{2,1} && A_{3,1}\\ A_{1,2} && A_{2,2} && A_{3,2} \end{bmatrix} $$

If we consider __vectors__ to be special cases of __matrices__ with only a single column, then the __transpose__ of a vector is a matrix with a single row.

When writing a vector out in text (such as in a book), it's often practical to represent it as a transpose row, e.g:

$$\huge [3.2, 2.1, 9.0, 4.8]^\top$$

The tranpose of a __scalar__ is just itself.

With NumPy, it's easy to compute the transpose of `ndarray` types:

### Transpose of a scalar in NumPy:

In [7]:
# Construct a scalar value of type float64
s_2 = np.float64(23.342)

s_2_transpose = s_2.transpose()

print(s_2, s_2.shape)
print(s_2_transpose, s_2.shape)

# Notice it's the same!

23.342 ()
23.342 ()


### Transpose of a vector in NumPy:

In [8]:
v_1_transpose = v_1.transpose()

print(v_1, v_1.shape)
print(v_1_transpose, v_1_transpose.shape)

# Notice it's the same, as far as NumPy is concerned!

[20 30 40 50] (4,)
[20 30 40 50] (4,)


### Transpose of a matrix in NumPy:

In [9]:
m_3 = np.random.randint(8, size=(2, 6))
m_3_transpose = m_3.transpose()

print("Original Matrix:")
print(m_3, m_3.shape)

print("Transposed Matrix:")
print(m_3_transpose, m_3_transpose.shape)

# Notice how the shape has changed!

Original Matrix:
[[4 7 2 0 7 1]
 [0 7 6 4 6 5]] (2, 6)
Transposed Matrix:
[[4 0]
 [7 7]
 [2 6]
 [0 4]
 [7 6]
 [1 5]] (6, 2)


### Matrix Addition - Multiple Matrices

Matrices of the same shape can be added together.

In [10]:
m_4 = np.random.randint(8, size=(2, 3))
m_5 = np.random.randint(8, size=(2, 3))

In [11]:
m_6 = np.add(m_4, m_5)

In [12]:
print(m_4)

[[7 4 6]
 [2 1 7]]


In [13]:
print(m_5)

[[0 7 3]
 [6 0 2]]


In [14]:
print(m_6)

[[ 7 11  9]
 [ 8  1  9]]


### Matrix Addition - Adding Scalars

Scalars can be added to matrices by adding the scalar to each element of the matrix.

In [15]:
m_7 = 10 + m_6

In [16]:
print(m_7)

[[17 21 19]
 [18 11 19]]


### Matrix Multiplication - Multiplying Scalars

Matrices can be multiplied by scalars by multiplying the scalar against each element of the matrix.

In [17]:
m_8 = m_7 * 10

In [18]:
print(m_8)

[[170 210 190]
 [180 110 190]]


### Matrix Broadcasting

We can allow the addition of a matrix and a vector to produce a new matrix, where the vector is added to each row of the matrix.

In [19]:
m_9 = np.random.randint(8, size=(5, 3))
print(m_9)

[[0 4 5]
 [1 2 7]
 [4 3 4]
 [1 6 3]
 [4 1 1]]


In [20]:
# Let's make an array (vector) to broadcast against the m_9 array (matrix):
m_10 = [1, 2, 3]

In [21]:
# Now we'll broadcast the addition of m_10 over m_9:
m_11 = m_9 + m_10
print(m_11)

[[ 1  6  8]
 [ 2  4 10]
 [ 5  5  7]
 [ 2  8  6]
 [ 5  3  4]]


This broadcasting of addition can be written as:

$$\huge C_{i,j} = A_{i,j} + b_j$$

In NumPy, arrays may be broadcasted according to a set of rules:

> When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions and works its way forward. Two dimensions are compatible when
> - they are equal, or
> - one of them is 1
>
> If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the size that is not 1 along each axis of the inputs.
>
> (https://numpy.org/doc/stable/user/basics.broadcasting.html#general-broadcasting-rules

See also: (http://www.astroml.org/book_figures/appendix/fig_broadcast_visual.html)

## 2.2: Multiplying Matrices and Vectors

Matrix multiplication is one of the most important operations involving matrices.

### Matrix Product

For the matrix product:

$$\huge C = AB $$

$A$ must have the same number of columns as $B$ has rows.

If $A$ has a shape $(m, n)$ and $B$ has a shape $(n, p)$, then $C$ is of shape $(m,p)$

The *matrix product* operation is defined as:

$$\huge C_{i,j} = \sum_{k}A_{i,k}B_{k,j}$$

In [22]:
# Matrix Multiplication in NumPy:

A = np.array([[1, 0, 0],
              [0, 1, 1]])

B = np.array([[8, 4],
              [4, 2],
              [1, 0]])

C = np.matmul(A, B)

print(C)

[[8 4]
 [5 2]]


It's important to make a distinction between the **matrix prouct**, as defined above, and the **Hadamard product**, which is an **element-wise product**.  The **Hadmard product** is denoted as $A \odot B$

In [23]:
# Hadamard product in NumPy:

A = np.array([[1, 0, 0],
              [0, 1, 1]])

B = np.array([[2, 2, 2],
              [2, 2, 2]])

C = np.multiply(A, B)

print(C)

[[2 0 0]
 [0 2 2]]


### Dot Product

The **dot product** of two vectors $x$ and $y$ is the **matrix product** $x^\top y$

In [24]:
# Dot Product in NumPy

# For 2-D arrays it is the same as the matrix product:

A = np.array([[1, 0, 0],
              [0, 1, 1]])

B = np.array([[8, 4],
              [4, 2],
              [1, 0]])

C = np.dot(A, B)

# Alternative syntax
C_alt = A.dot(B)

print(C)

print(C_alt)

[[8 4]
 [5 2]]
[[8 4]
 [5 2]]


### Other Useful Properties of Matrix Product Operations

There are many useful properties of matrix product operations.  Here are a few mentioned in the book:

### Distributive:

$$\large A(B+C) = AB + AC$$

In [25]:
# Demonstration of the Distributive property of Matrix Multiplication using NumPy:

A = np.array([[1, 0, 0],
              [0, 1, 1]])

B = np.array([[8, 4],
              [4, 2],
              [1, 0]])

C = np.array([[8, 4],
              [4, 2],
              [1, 0]])

def assert_distributive(A, B, C):
    Y_1 = np.dot(A, B + C)
    Y_2 = np.dot(A, B) + np.dot(A, C)
    if np.array_equal(Y_1, Y_2):
        return True
    return False
    
print(assert_distributive(A, B, C))

True


### Associative:

$$\large A(BC) = (AB)C$$

In [26]:
# Demonstration of the Associative property of Matrix Multiplication in NumPy:

A = np.array([[1, 2],
              [1, 4],
              [1, 8]])

B = np.array([[1, 2, 2],
              [0, 1, 1]])

C = np.array([[8, 4],
              [4, 2],
              [1, 0]])

def assert_associative(A, B, C):
    Y_1 = A.dot(B.dot(C))
    Y_2 = np.dot(A, B).dot(C)
    if np.array_equal(Y_1, Y_2):
        return True
    return False
    
print(assert_associative(A, B, C))

True


### Dot Product of two Vectors  (1-D Tensors) is Commutative

$$\large x^\top y = y^\top x$$

In [27]:
# Demonstration of the commutative property of the dot product using NumPy:

# Here we can look at NumPy's arange to return a vector of evenly-spaced values at a given interval

A = np.arange(0, 20)
B = np.arange(0, 20)

def assert_dot_product_vectors_commutative(A, B):
    Y_1 = A.dot(B)
    Y_2 = B.dot(A)
    if np.array_equal(Y_1, Y_2):
        return True
    return False
    
print("Shape of A: ", A.shape)
print("Shape of B: ", A.shape)
print("A: ", A)
print("B: ", B)

print("C_1 and C_2:")
C_1 = A.dot(B)
C_2 = B.dot(A)
print(C_1)
print(C_2)

print(assert_dot_product_vectors_commutative(A, B))

Shape of A:  (20,)
Shape of B:  (20,)
A:  [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
B:  [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
C_1 and C_2:
2470
2470
True


### The transpose of the product of two matrices is equivalent to the product of their transposes reversed

$$\large (AB)^\top = B^\top A^\top$$

In [28]:
# Demonstration of commutative property of transpose in NumPy:

A = np.array([[1, 0, 0],
              [0, 1, 1]])

B = np.array([[8, 4],
              [4, 2],
              [1, 0]])

def assert_transpose_commutative(A, B):
    Y_1 = np.matmul(A, B).T
    Y_2 = B.T.dot(A.T)
    if np.array_equal(Y_1, Y_2):
        return True
    return False
    
print(assert_transpose_commutative(A, B))
      
Y_1 = A.dot(B).T
Y_2 = B.T.dot(A.T)

print(Y_1)
print(Y_2)

True
[[8 5]
 [4 2]]
[[8 5]
 [4 2]]


## 2.3: Identity and Inverse Matrices

We can express a system of linear equations with matrices using the form

$$\huge Ax = b$$

where:
- $ A \in \mathbb{R}^{m \times n}$ is a known matrix
- $ b \in \mathbb{R}^{n} $ is a known vector, and
- $ x \in \mathbb{R}^{n} $ is a vector of unknown variables to be solved for

Each element $x_{i}$ of $x$ is one of these unknown variables.

Each row of $A$ and each element of $b$ provide another constraint.

We can use **matrix inversion** to analytically solve this equation for many values of $A$.  **Matrix inversion** can be accomplished using the **identity matrix**.

An identity matrix $I_{3}$.  This probably doesn't display correctly on GitHub, but it should render properly in a local Jupyter notebook.

$$\Large \begin{bmatrix} 1 && 0 && 0 \\ 0 && 1 && 0 \\ 0 && 0 && 1 \end{bmatrix}$$

> In linear algebra, the identity matrix of size n is the n × n square matrix with ones on the main diagonal and zeros elsewhere. [Wikipedia - Identity Matrix](https://en.wikipedia.org/wiki/Identity_matrix)

When a vector is multiplied by an identity matrix, the result is equivalent to the vector. I.e.:

$$\huge \forall x \in \mathbb{R}^{n}, I_{n}x = x$$

This is to say that for the identity matrix $I_n \in \mathbb{R}^{n \times n}$, and all vectors $x$ containing real numbers, $I_n$ multiplied by that vector equals that vector.

In [29]:
# In NumPy, we can easily generate an identity matrix:

print(np.identity(3))

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [30]:
# Note that under the hood, np.identity calls np.eye, which provides an matrix with zeros on the diagonal,
# but with the ability to offset the diagonal and provide matrices of different shapes than the square 
# identity matrix.  Thus, np.identity is a special case of np.eye

print(np.eye(3, 9, 0))
print("^ not an identity matrix!")

[[1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0.]]
^ not an identity matrix!


The **matrix inverse** of a matrix $A$ is denoted as $A^{-1}$.  Thus, 

$$ \huge A^{-1}A = I_{n} $$

I.e. When we multiply a **matrix** by its **inverse**, we get the **identity matrix**.

The steps to solve the equation $ Ax=b $ are as follows:

1.) $Ax = b$

2.) $A^{-1}Ax = A^{-1}b$

3.) $I_{n}x = A^{-1}b$

4.) $x=A^{-1}b$

All of this depends on the possibility that $A^{-1}$ exists and can be found.  Due to the limits of the precision of representing real numbers digitally, using $A^{-1}$ is generally not the best tool for the job when solving systems of equations using computation.

## 2.4: Linear Dependence and Span

