# Linear Algebra

## Basics

### Column Operations

Column operations happen on the right hand side of the matrix.

Each column corresponds to the column that you want to get as the answer.
Each item in the column corresponds to the column that you want to take to get to the answer.

For example,

[[1 2]  [[0 1]    [[2 1]

 [3 4]]  [1 0]] =  [4 3]]

This means that to get the first column of the answer, we take 0 of the first column + 1 of the second column. This essentially puts the second column of the matrix to the first column of the answer.
This means that to get the second column of the answer, we take 1 of the first column + 0 of the second column. This essentially puts the first column of the matrix to the second column of the answer.

### Row Operations

Row operations with matrix happen on a simple principle. The params are on the left hand side and the matrix you want to operate on is on the right handside.

For example:

[1 0 0] [[1 2 3] [4 5 6] [7 8 9]] 
basically means, take 1 x row 1, take 0 x row 2, and take 0 x row 3.

Note that when you multiply a row by a matrix, you get a row back.
Now, going back to eliminations, we can use a matrix as the operator on the matrix instead.
Every row in the matrix operator decides on the corresponding row of the result. Every column in the matrix operator decides on how we want to use the surrounding rows to obtain the result.

Take the identity matrix:
[[1 0 0] [0 1 0] [0 0 1]] x [[1 2 3] [4 5 6] [7 8 9]] = [[1 2 3] [4 5 6] [7 8 9]]

To narrate through what is happening,
The first row of the operator decides on what the first row of the answer is. In this case, we said we take the 1 x first row of the matrix, and add it with 0 of the second row and 0 of the thrid row, meaning that we get back the first row for the answer.
In the second row of the operator, we said that we take 0 x first row, 1 x second row, 0 x third row, meaning get we get back second row of the matrix for the second row of the answer.
In the third row of the operator, we said that we take 0 x first, row, 0 x second row, 1 x third row, meaning get we get back third row of the matrix for the third row of the answer.

## Elimination

Elimination of linear algebra is the basic process of eliminating all the variables from a equation that you have 1 variable left. For example, if you have an equation with X, Y, Z variables, you can use the other equations to knockout X, Y variables such that you have Z left. 

In most text, they will teach to use a certain position as a pivot, but it really doesn't matter. Just do whatever it takes to eliminate the variables, you can subtract from any position as long as it serves the purpose in eliminating the variables.

Moreover, you can observe that once you have eliminated one equation to only have 1 variable, you can then use that equation to eliminate the other equations. For example, if you have an equation with 0X + 0Y + 2Z, this becomes a powerful tool to eliminate Z from the other 2 equations because X and Y is zero and has no effect. In the end, you keep doing eliminations until you have only 1 variable per equation.

### Definitions

A = matrix to be eliminated

b = answer on the righthand side

U = eliminated matrix, also called upper triangular because only the top contains values and the bottom triangle are all zeroes

c = eliminated answer

E = elimination matrix to permutate A matrix

Ax = b

with E matrix permutations, it becomes,

Ux = c


### Failures in elimination

Intuitively, elimination will fail when you eliminate more than 1 variable when you subtract. This will cause it to fail because you will end up with a eqution with X isolated, Y isolated, but none with Z isolated, because you accidentally eliminated Z together with another variable. This problem occurs because two equations are not independent and they are just derivations of another. In this case, you really only have information for two equations, so it is natural that you don't have enough information to solve for all three variables.

### In Row Form

#### Row Operations with matrix

Row operations with matrix happen on a simple principle. The params are on the left hand side and the matrix you want to operate on is on the right handside.

For example:

[1 0 0] [[1 2 3] [4 5 6] [7 8 9]] 
basically means, take 1 x row 1, take 0 x row 2, and take 0 x row 3.

Note that when you multiply a row by a matrix, you get a row back.
Now, going back to eliminations, we can use a matrix as the operator on the matrix instead.
Every row in the matrix operator decides on the corresponding row of the result. Every column in the matrix operator decides on how we want to use the surrounding rows to obtain the result.

Take the identity matrix:
[[1 0 0] [0 1 0] [0 0 1]] x [[1 2 3] [4 5 6] [7 8 9]] = [[1 2 3] [4 5 6] [7 8 9]]

To narrate through what is happening,
The first row of the operator decides on what the first row of the answer is. In this case, we said we take the 1 x first row of the matrix, and add it with 0 of the second row and 0 of the thrid row, meaning that we get back the first row for the answer.
In the second row of the operator, we said that we take 0 x first row, 1 x second row, 0 x third row, meaning get we get back second row of the matrix for the second row of the answer.
In the third row of the operator, we said that we take 0 x first, row, 0 x second row, 1 x third row, meaning get we get back third row of the matrix for the third row of the answer.

#### In code

Let's define some utility function

In [None]:
def mat_invert(mat):
    inverted_mat = []
    for col_index, _ in enumerate(mat[0]):
        inv_row = []
        for row in mat:
            inv_row.append(row[col_index])
        inverted_mat.append(inv_row)

    return inverted_mat

def mat_add_row(mat):
    summed_mat = []
    for row in mat:
        row_sum = 0
        for column in row:
            row_sum += column
        summed_mat.append(row_sum)

    return summed_mat

def mat_add_col(mat):
    inverted_mat = mat_invert(mat)
    return mat_add_row(inverted_mat)

def mat_mul(mat1, mat2):
    final_mat = []
    for row_index, row in enumerate(mat1):

        new_rows = [] # the new row at row_index of the answer before collapsing
        for col_index, col_val in enumerate(row):
            # Identified the col_val and row location: col_index
            # Note that col_index of operator is the row_index of the target mat
            new_row = []
            for row_val in mat2[col_index]:
                new_row.append(col_val * row_val)
            
            new_rows.append(new_row)
        
        # Add the columns together
        final_row = mat_add_col(new_rows)
        final_mat.append(final_row)
    return final_mat

def gen_identity_mat(mat):
    max_row, max_col = len(mat) - 1, len(mat[0]) - 1
    i_mat = []
    for i_col in range(max_col + 1):
        row = []
        for i_row in range(max_row + 1):
            if i_row == i_col:
                row.append(1)
            else:
                row.append(0)
        i_mat.append(row)
    return i_mat

def subset_mat(mat):
    subset_mat = []
    for row_i, row in enumerate(mat):
        subset_row = []
        for col_i, col in enumerate(row):
            if row_i == 0 or col_i == 0:
                continue
            subset_row.append(col)
        if subset_row:
            subset_mat.append(subset_row)

    return subset_mat

def superset_mat(mat, super_row, super_col):
    superset = []
    for i_col, col_val in enumerate(super_col):
        if i_col == 0:
            superset.append(super_row)
            continue
        row = []
        for i_row, row_val in enumerate(super_row):
            if i_row == 0:
                row.append(col_val)
            else:
                row.append(mat[i_col - 1][i_row - 1])
        superset.append(row)

    return superset

Next, let's add the main code for elimination

In [None]:
# Note that this function accepts b as a column instead of row format
def eliminate_mat(mat, b):
    # pre-process to make sure the first pivot point is not zero
    if mat[0][0] == 0 and mat[1][0] != 0:
        temp = mat[1]
        mat[1] = mat[0]
        mat[0] = temp

    pivot = mat[0][0]
    curr_mat = mat
    for row_i, row_val in enumerate(mat_invert(mat)[0]):
        if row_i == 0: continue

        param = row_val / pivot
        
        # generate the first operator
        operator = gen_identity_mat(mat)
        operator[row_i][0] = -param

        curr_mat = mat_mul(operator, mat_invert([*mat_invert(curr_mat), b]))
        curr_mat = mat_invert(curr_mat)
        b = curr_mat.pop()
        curr_mat = mat_invert(curr_mat)

    max_row, max_col = len(mat) - 1, len(mat[0]) - 1

    if max_col > 1 and max_row > 1:
        first_row = curr_mat[0]
        first_column = mat_invert(curr_mat)[0]

        mat_subset = subset_mat(curr_mat)
        eliminated_subset, b_subset = eliminate_mat(mat_subset, b[1:])
        return superset_mat(eliminated_subset, first_row, first_column), [b[0], *b_subset]
    else:
        return curr_mat, b

# Note that this function accepts b as a column instead of row format            
def back_substitution(mat, b):
    b_reverse = b[::-1]
    answers = [None] * len(mat)
    for eq_i, eq in enumerate(mat[::-1]):
        lhs_sum = 0
        var_index = None
        for ans_i, ans in enumerate(answers):
            if ans is None and eq[ans_i] != 0:
                var_index = ans_i
            elif ans is None and eq[ans_i] == 0:
                lhs_sum += 0
            else:
                lhs_sum += eq[ans_i] * ans
        new_ans = (b_reverse[eq_i] - lhs_sum) / eq[var_index]
        answers[var_index] = new_ans
    return answers


def solve_mat(mat, b):
    eliminated_mat, eliminated_b = eliminate_mat(mat, b)
    result = back_substitution(eliminated_mat, eliminated_b)
    return result


## Matrix Multiplication

### Mulplication by row times column

To get the result of row i and column j in the answer, multiply each item of row i of matrix A with each item of column j of matrix B and add them up.

![image](https://github.com/yiheinchai/learn/assets/76833604/badd0f17-963f-42c4-bd7c-fe46c392955e)

Notice that because we are multiplying the row of Matrix A with the column of Matrix B, the number of rows of Matrix A must be the same as the number of columns of Matrix B for it to work. Moreover, the result must have the same number of rows as Matrix A and the same number of columns of Matrix B.

So if we have matrix A with shape m x n, and matrix B with shape n x p, then the result is m x p.

In [None]:
def shape(mat):
    return len(mat), len(mat[0])

def mat_mul_row_col(mat1, mat2):
    rows, cols = shape(mat1)
    ans_mat = []
    for i in range(rows):
        ans_row = []
        for j in range(cols):
            row_mat = mat1[i]
            col_mat = mat_invert(mat2)[j]
            
            cell_val = 0
            for idx, row_val in enumerate(row_mat):
                cell_val += row_val * col_mat[idx]

            ans_row.append(cell_val)
        ans_mat.append(ans_row)
    return ans_mat
                

### Multiplication by columns

The operator is on the righthand side of the matrix. Each column in the operator decides on the corresponding column of the answer. Each item in the column in the operator tells us the combinations of the columns of matrix A to use to generate the corresponding column in the answer.

We can also see multiplicaiton of columns as each column of matrix B is multiplied by the entire matrix A (based on the instructed combinations), to produce the respective column in the answer matrix C.

It is easy to see that because of this, the number of columns in the operator (matrix B) must be the same as the number of column in the answer. Moreover, the length of each column is henceforth determined by the length of the columns of matrix A, because we are simply taking combinations of matrix A.

![image](https://github.com/yiheinchai/learn/assets/76833604/f0b33e22-4854-400a-8c4e-598a80585a20)

In [None]:
def mat_mul_col(mat1, mat2):
    inv_mat1 = mat_invert(mat1)
    inv_mat2 = mat_invert(mat2)

    ans_mat = []

    for col in inv_mat2:
        new_cols = []
        for idx, multiplier in enumerate(col):
            new_cols.append([val * multiplier for val in inv_mat1[idx]])
        ans_col = mat_add_col(new_cols)
        ans_mat.append(ans_col)
    
    return mat_invert(ans_mat)

### Multiplication by rows

When looking at rows, the operator is the on the lefthand side of the matrix. Each row in the operator (matrix A) decides on the corresonding row of the answer. Each item in the row in the operator tells us the combinations of the rows of matrix B to use to generate the corresonding row in the answer.

We cna also see multiplication of rows as each row of matrix A is multiplied by the entire matrix B (based on instructed combinations), to produce the respective row in the answer matrix C.

It is easy to see that because of this, the number of rows in the operator (matrix A) must be hte same as the number of rows in the answer. Morover, the length of each row is henceforth determined by the lenght of the rows of matrix B, because we are simply taking combinations of matrix B.

![image](https://github.com/yiheinchai/learn/assets/76833604/a298e452-4ba0-4f2b-a33b-074fd894ac17)

In [None]:
def mat_mul_row(mat1, mat2):
    ans_mat = []
    for row in mat1:
        ans_row = []
        for row_no, multiplier in enumerate(row):
            ans_row.append([val * multiplier for val in mat2[row_no]])
        ans_mat.append(mat_add_col(ans_row))
    return ans_mat

### Multiplication by columns x rows
We can see multiplication as columns x rows, where we are simply taking combinations of a single column, which means that the answer will be multiples of the single column. Note that because of this property, we realise that all the rows vectors will lie on the same line, all the column vectors wil also lie on the same line (because they are simply multiples)

![image](https://github.com/yiheinchai/learn/assets/76833604/f1811da7-4bc5-4ecc-ba42-dd5da8569f52)

Applying this single operation to the entire matrix, we simply do (first column x first row) + (second column x second row) etc.

![image](https://github.com/yiheinchai/learn/assets/76833604/9a799fdf-a6c4-4a62-8eb4-49453220f4a9)

In [None]:
def add_mats(*mats):
    ans_mat = []
    for row_no, row in enumerate(mats[0]):
        ans_row = []
        for col_no, col in enumerate(row):
            cell_val = 0
            for mat in mats:
                cell_val += mat[row_no][col_no]
            ans_row.append(cell_val)
        ans_mat.append(ans_row)

    return ans_mat


def mat_mul_col_row(mat1, mat2):
    ans_mats = []
    for idx, col in enumerate(mat_invert(mat1)):
        row = mat2[idx]
        ans_mats.append(mat_mul([[val] for val in col], [row]))
    
    return add_mats(*ans_mats)


### Multiplicaton by blocks

We can also multiply matrices using a recurisve algorithm by splitting them up into blocks and multiplying the blocks using the same principles as multiplying numbers in matrices.

![image](https://github.com/yiheinchai/learn/assets/76833604/47ad1395-6d83-42a7-b606-389a366ab858)

In [None]:
def split_mat(mat):
    rows, cols = shape(mat)
    return (
       [[[row[: int((rows / 2))] for row in mat][: int((rows / 2))], # 0,0
        [row[int((rows / 2)) :] for row in mat][: int((rows / 2))]], # 0,1
        [[row[: int((rows / 2))] for row in mat][int((rows / 2)) :], # 1,0
        [row[int((rows / 2)):] for row in mat][int((rows / 2)):]]] # 1,1
    )

def join_block_mat(block_mat):
    joined_mat = []
    for block_row in block_mat:
        for mat_row_idx, mat_row in enumerate(block_row[0]):
            # for each mat in block row, extract the rows and combine them in one row.
            ans_row = []
            for mat in block_row:
              for val in mat[mat_row_idx]:
                ans_row.append(val)
            joined_mat.append(ans_row)
    return joined_mat  

def mat_mul_blocks(mat1, mat2):
    rows, cols = shape(mat1)

    if rows % 2 == 0 and rows > 2:
        mat1_split = split_mat(mat1)
        mat2_split = split_mat(mat2)
        ans_mat = []
        for row_i, row in enumerate(mat1_split):
            ans_row = []
          
            for col_i, col in enumerate(row):
                row_to_mul = mat1_split[row_i]
                col_to_mul = mat_invert(mat2_split)[col_i]
                added_mats = add_mats(*[mat_mul_blocks(row_item, col_to_mul[idx]) for idx, row_item in enumerate(row_to_mul)])
                ans_row.append(added_mats)
            ans_mat.append(ans_row)
        return join_block_mat(ans_mat)
    else:
        return mat_mul(mat1, mat2)  

## Inverses

### Inverses basics

A matrix will be invertible and nonsingular if an inverse exists for the matrix.

A matrix will have an inverse if none of the vectors that make up the matrix is colinear to each other. This means that each vector in the matrix contains new information, there is no repeat information. If there are two vectors with repeat information, then it is not possibble to solve for the matrix and hence it is also not possible to find the inverse of the matrix.

For example, for the matrix [[1,3] [5,15]], the two vectors [1,3] and [5,15] actually point in the exact same direction (colinear) and therefore it does not contain enough information for 2 variables, and hence this matrix is no solvable and does not have an inverse.

### Process of finding inverses

The idea of elimination is simple, reduce the unknown by annilating variables. Use the equations with annilated variables to annilate more variables, until you have gotten isolated variables.

It doesn't matter which line or which position you subtract from you can annilate in any direction as long as you achieve the goal of reaching a single isolated variable.

Typically,
Start with X number of variables. Annilate to get X-1 number of variables. Use equation with X-1 number of variables to annilate to get X-2 number of variables in the next equation. Eventually X-n = 1. With just 1 variable, you can find out what that is. Then use the 1 variable to eliminate upwards.

This way, there is no need to do back substitution, just eliminate all the way. The benefit of using such a method is that we can derive a general algorithm that also works with Gauss-Jordan solutions to find the inverse of matrices.



First, we need to modify `eliminate_mat` so that it accepts b as a matrix

In [None]:
def compose_aug_mat(mat, b):
    return [[*row, *b[i]] for i, row in enumerate(mat)]

def decompose_aug_mat(mat_b, b_cols):
    inv_mat = mat_invert(mat_b)
    b = []
    for i in range(b_cols):
        b.append(inv_mat.pop())
 
    mat = mat_invert(inv_mat)
    b = mat_invert(list(reversed(b)))

    return mat, b

def eliminate_mat(mat, b):
    # pre-process to make sure the first pivot point is not zero
    if mat[0][0] == 0 and mat[1][0] != 0:
        temp = mat[1]
        mat[1] = mat[0]
        mat[0] = temp

    pivot = mat[0][0]
    curr_mat = mat
    curr_b = b
    for row_i, row_val in enumerate(mat_invert(mat)[0]):
        if row_i == 0: continue

        param = row_val / pivot
        
        # generate the first operator
        operator = gen_identity_mat(mat)
        operator[row_i][0] = -param

        eliminated_mat_with_b = mat_mul(operator, compose_aug_mat(curr_mat, curr_b))

        curr_mat, curr_b = decompose_aug_mat(eliminated_mat_with_b, len(b[0]))

    max_row, max_col = len(mat) - 1, len(mat[0]) - 1

    if max_col > 1 and max_row > 1:
        first_row = curr_mat[0]
        first_column = mat_invert(curr_mat)[0]

        mat_subset = subset_mat(curr_mat)
        eliminated_subset, b_subset = eliminate_mat(mat_subset, curr_b[1:])
        return superset_mat(eliminated_subset, first_row, first_column), [curr_b[0], *b_subset]
    else:
        return curr_mat, curr_b

Next we need to define a function that does reverse elimination, as a replacement of back substitution. We notice that if we do a double mirror of the eliminated matrix, we can apply the same recursive elimination algorithm to replace the need for back-substitution.

In [None]:
def double_mirror_mat(mat):
    return mat_invert(mat_invert(mat[::-1])[::-1])

def reverse_eliminate_mat(mat, b):
    # double reverse the matrix
    double_reversed_b = double_mirror_mat(b)
    double_reversed_mat = double_mirror_mat(mat)

    # do gaussian elimination
    eliminated_mat, eliminated_b = eliminate_mat(double_reversed_mat, double_reversed_b)

    # return the matrix in the unmirrored form
    return double_mirror_mat(eliminated_mat), double_mirror_mat(eliminated_b)


Next, we noticed that the coefficients of the final matrix is not one, we simply define a function that moves all the coefficents from A, the matrix, to b the vector.

In [None]:
def clean_coeff_mat(mat, b):
    curr_b = []

    for i, row in enumerate(mat):
        coeff = next((x for i, x in enumerate(row) if x), None)
        curr_b.append([ num / coeff for num in b[i]])

    return gen_identity_mat(mat), curr_b

Lastly, let's generate the new matrix solver! That support Gauss-Jordon style

In [None]:
def solve_mat(mat, b):
    return clean_coeff_mat(*reverse_eliminate_mat(*eliminate_mat(mat, b)))

Lastly, we can very simply generate a inverse finder by exploiting the properties of the matrix.

We know that for a given matrix equation Ax = b. Let's make `b` a identity matrix. Ax = I. This means that x must be the inverse of A, A^-1. Therefore, if we solve for x via Gaussian elimination, we will find what is the inverse of A.

As we solve for x, we use the matrix [A I] and apply a series of operations via elimination matrices, E. The result will be E [A I].

We know that in our result we get `I` first. E[A I] = [I ?]
We can rewrite this into blocks as [[EA] [EI]] = [[I] [?]].
From this, it is obvious that EA = I. This implies that E must be A inverse to produce I.
If E is A inverse, then EI must produce A inverse too. And therefore, the righthand side is [I A^-1].

Knowing this property, we can use the new matrix solver, taking the `mat` as A, and `b` as I. Then, in the result answer, the `mat` returned will be I, and the `b` returned will be A^-1

In [None]:
def find_inverse(mat):
    _, inverse_mat = solve_mat(mat, gen_identity_mat(mat))
    return inverse_mat

Looking back, the action of doing a double mirror and doing gaussian elimination again is particularly interesting. This is because the end result of such a process is that we transform A into I. This naturally means that the sequence of permutations steps that we took will be equivalent to A inverse (A^-1). Moreover, we can see that A^-1 x b = c

E Ax = E b

Ix = c

therefore,

E = A^-1

c = A^-1 b

Hence, this is actually a new way to find the inverse of a matrix, simply collect all the permutations, and multiply them together, and we will get the inverse

<img width="488" alt="image" src="https://github.com/yiheinchai/learn/assets/76833604/bfbe7bf7-c77e-4f4e-9318-fe1a83a99bd2">

## Factorisation into A = LU

#### Finding inverses of matrix products

The inverse of matrix A multiplied by matrix B is (B-1)(A-1) which expressed as the following:

AB x B-1 A-1 = I

Notice that B-1 comes first before A-1. Intuitively, we can see why this is the case. Also note that when multiplying matrices we can shift the parenthesis to choose which operation we want to do first. 

Therefore, (A)(B)(B-1)(A-1) = I can be first computed as:
A(B B-1)A-1 = I
AI A-1 = I
A A-1 = I
I = I

Mentally, when you want to find the inverse of the a matrix multiplication, as yourself, what need to multiply to the matrix the annilate all the matrix to produce a identity matrix?

So inverse of KYC will be C-1 (to annilate the last term), the Y-1 to annilate the second last term, and K-1 to annilate the first term. Hence, inverse if C-1 Y-1 K-1. No need to memorise that it is in opposite order, just consider what you need to multiply to annilate them, considering that you can move the parenthesis to multiply in whichever order you want.



#### Finding inverses of transposed matrix

To find the transpose of the inverse of A, you can simply find the inverse and then transpose it. Or, you can also simply find the transpose of A and find the inverse of the transpose.

In general, it does not matter which order you do the inverse or the transpose.

To prove it, consider:
A A-1 = I

Applying transpose to both sides,

(A A^-1)^T = I^T

Note that transpose of an identity matrix is still the identity matrix. <a id='transpose_inside'>Moreover, when applying the transpose individually to the terms in the bracket, the order flips.</a>

A^-1^T A^T = I

Note that the inverse of A^T will be denoted as A^T^-1. Moreover, A^-1^T must be also be the inverse of the transpose of the matrix because when you multiply it be the transpose you get the identity matrix.

Therefore, A^T^-1 = A^-1^T, showing that you can do inverses or tranposes in any order.





#### A = L U factorisation

Normally in Gaussian elimination, we apply E to A to get U, where U is the upper triangular form which makes it easy to calculate x to solve the system of linear equations.

E32 E21 Ex A = U

We can collpase all E permutation matrices into one and find U in a single step like so:

Eall A = U

However, the problem here is that it is not immediately obvious the specific order in which the row operations are conducted by inspecting Eall. This is because the order of operations is reversed. 

Notice that when we are calculating Eall with E21 E13 E12, the latest permutation is always on the lefthand side and the first permuation is on the right hand side. This causes problems. 

![image](https://github.com/yiheinchai/learn/assets/76833604/d7212662-3b63-4482-9758-6c75d6c6cad5)


Notice that E21 is means to modify the second row using upper rows (row 1). E32 is meant to modify the third row using upper rows, (row 2).

Intuitively, E32 modifies the third row by using values of the second row, hence it has a multipler value in the third row. E21 modifies the values of the second row using values of the first row, hence it has multipler value in the second row. When we apply E32 to E21, we end up using the multipler value of the second row (-2) to modify the third row with another multiplier value (-5). This mixes up the permutations to produce an extra 10 in the final E. Where in fact, the individual operations did not include a 10 at all. Operations interfering with each other is not ideal as the final E is not representative of the operations that is carried out in each step of the Gaussian elimination.

It is important to note that if we did the operations in the reverse direction, they will not interfere with each other, as the operations flow in the direction with no multiplier values, allowing for the final E to only include the multiplier values used in Gaussian elimination and no additional values are introduced. In other words, the multipliers go directly into E. 

But how do we do the permutations matrices in reverse order? A = L U factorisation solves this problem.




![image](https://github.com/yiheinchai/learn/assets/76833604/c5ce3537-fe3f-4831-89d5-fc819531bca8)

EA = U


E-1 E A = E-1 U


A = E-1 U

E = E4 E3 E2


E-1 = E2^-1 E3^-1 E4^-1 = L

A = L U

By using L instead of U, the permutation matrices are in the right order such that the multipliers do not interfere with each other and they keep their original values in the final L matrix.

Particularly, E3-1 modifies row 3 using row 2 of E4-1, however row 2 of E4-1 does not contain any multipliers only the identity hence it would not interfere with E3-1 (the multiplier on E4^-1 is on row 3). Moreove,r E2^-1 modifies row 2 using row 1 of E3^-1, which only contains the identity and no multiplier. 

Notice that in L, apart from the identity matrix, only 2 and 5 are present, both of which are the core multipliers for each row operation in Gaussian elimination, there is no new values introduced, hence providing an accurate representation of the row operations that happen in Gaussian elimination.

<img width="697" alt="image" src="https://github.com/yiheinchai/learn/assets/76833604/59afb5db-0b7b-4735-9815-188efff5bd6b">

##### Finding simple inverses

It is very easy to find the inverse of simple permutation matrices. For example,

[[1 0] [5 1]] this permutation matrix permutates the second row by adding 5 of the first row to the second row.

Therefore to reverse the operation (to find the inverse E-1 E A = A), we simple minus 5 of the first row from the second row with the following permutation matrix: [[1 0] [-5 1]].

Moreover, row exchange matrices are also very easy to find the inverse of.

For example, [[0 1] [1 0]] this matrix exchanges the second row with the first row and the first row with the second row. To reverse this operation, we simply do the same operation again and this is swap the two rows back.

### Code for A = LU factorisation

First we need to modify `eliminate_mat` such that it saves the multiplier in each elimination step. Remember that the general formula for a elimination step is,

`target_row - multiplier(pivot row) = eliminated row`

Moreover, we need to tag the multiplier to each particular target row. Every single target cell that is going to be converted into a zero would have a multiplier to do so.

In [None]:
def eliminate_mat_with_factorisation(A, b):
    pivot = A[0][0]
    U = A
    curr_b = b
    curr_L = gen_identity_mat(A)
    for row_i, row_val in enumerate(mat_invert(A)[0]):

        # skip the first row as it is the pivot
        if row_i == 0: continue

        multiplier = row_val / pivot
        
        # generate the E
        E = gen_identity_mat(A)
        E[row_i][0] = -multiplier
        
        L = gen_identity_mat(A)
        L[row_i][0] = multiplier
      
        curr_L = mat_mul(curr_L, L)

        # EA = U
        eliminated_mat_with_b = mat_mul(E, compose_aug_mat(U, curr_b))

        U, curr_b = decompose_aug_mat(eliminated_mat_with_b, len(b[0]))

    max_row, max_col = len(A) - 1, len(A[0]) - 1

    if max_col > 1 and max_row > 1:
        first_row = U[0]
        first_column = mat_invert(U)[0]

        first_row_l = curr_L[0]
        first_column_l = mat_invert(curr_L)[0]

        mat_subset = subset_mat(U)
        eliminated_subset, b_subset, l_subset = eliminate_mat_with_factorisation(mat_subset, curr_b[1:])
        return superset_mat(eliminated_subset, first_row, first_column), [curr_b[0], *b_subset], superset_mat(l_subset, first_row_l, first_column_l)
    else:
        return U, curr_b, curr_L

In [None]:
def factorise(A):
    U, c, L =  eliminate_mat_with_factorisation(A, [1] * len(A))
    return L, U

### Time complexity on matrix multiplications

As per previous implementatin of elimination as shown above, we use a recursive algorithm where we apply the same elimination steps to n-1 size matrix over and over again until we obtain a 2 x 2 matrix.

Note that each elimination steps requires to make the entire column 0. So if each column has n rows, then we need to make n rows 0. Now to make each row 0, we need to use the multiplier on each item of the row and the subtract. Assuming the multiplication and subtraction can be considered as 1 operation, then if there are n items on each row, this means that there is n x n operations for the first elimination step for the first column.

considering that there are n columns, this estimates to n x n x n or n^3 operations. However, we need to consider that every column this is less and less rows and columns because we only take a smaller subset of the matrix to do the elimination steps as shown below:

![image](https://github.com/yiheinchai/learn/assets/76833604/8674c8ee-e27a-4533-ba2a-cc8bd8a5826e)

So it looks something like:
n^2 + (n-1)^2 + (n-2)^2 + ... 2^2

To calculate this, we can integrate with respect to x from x=2 to x=n like this:

integrate(x^2, from=2, to=n)

The operation of integration entails adding one to the power and dividing by the new power, so integration of n^2 becomes 1/3 n^3. 

To calculate the time complexity of the right handside (b), because there is only 1 column, it becomes 1 x n x n when it is n^2.

## Transposes, Permutations

### Permutations of identity matrices

For a 3 x 3 identity matrix, there are 6 permutations by swapping the positions of the 3 rows round, 3 x 2 x 1 = 6. 

For a 4 x 4 identity matrix, there are 24 permutations by swapping the positions of the rows around.

Note that multiplying any of the matrix with another withiin this group will always result in another matrix in the group. This is because since the group contains all permutations of swapping the rows around, and each item in the group applies an operation to swap the rows, multiplying one matrix of the group with another matrix of this same group will still result in the swapping of rows, which we know that the group already encompass all possiblities, and henceforth the resultant matrix will still be part of the group.

Also note that the inverses of the matrix will also be itself, because the to reverse and exchange of rows, we simply do the exchange of rows again.

A-1 = A

![image](https://github.com/yiheinchai/learn/assets/76833604/0bc8e124-b696-431f-a781-01275ba996b8)

However, if you are exchanging more than 2 rows at once, for example, changing the positions of all three rows, the inverse becomes more complicated. For example, these two are permutations which are inverses of each other:

![image](https://github.com/yiheinchai/learn/assets/76833604/37b73ea1-2494-407d-8950-fe93a475bafc)

Interestingly, in this case, the inverse of one permutation is also equal to the transpose of that permutation. 

Hence, A^-1 = A^T
Consequently, A^T A = I

Also notice that for row exchanges, the inverse is also the transpose. So the above equation applies to all permutations of a group.

#### Calculating the number of permutations

Permutations are generated by taking the identity matrix and rearranging to rows. Therefore the number of permutations will be the number of ways of arranging the rows without repeats.

This is a simple P&C problem, where if there are 5 rows, and we need to fill 5 slots, then the number of possible arrangements is 5! (factorial) = 120


Noting that each permutation is a simple operation to do row exchanges, we can use these permutations to add to our algorithm for Gaussian elimination. Remember that there are some situations where there is a need for row exchanges - particularly when the pivot is a zero. We mentioned that this can be solved with A = L U, but that only applies for no row exchanges. When we have row exchanges, it becomes, PA = LU. And if we multiply the inverse on both sides, then we get, A = P^-1 L U, which is also A = P^T L U.



### Transposes

Transposes are simply the operation of converting the row i into column i and column j into row j. Basically, take column 1, make it row 1, take column 2, make it row 2 etc. OR take row 1, make it col 1, take row 2, make it col 2, etc..

Formally, Aij = (A^T)ji

Moreover, a matrix is said to be symmetric if A^T = A

Additionally, taking the multiplication of the matrix and its transpose will always output a symmetric matrix. A A^T = symmetric matrix. To prove this, take the transpose of the symmetric matrix to see it is still gives us back the same symmetric matrix (A^T = A), if so, then we can be sure that it is indeed a symmetric matrix. 

(A A^T)^T

Remember when applying the [transpose inside, we need to flip the order](#transpose_inside), 

A^T^T A^T = A A^T, which is the same matrix as we started with, hence proving that the multiplication of the transpose gives a symmetric matrix

## Vector spaces

R^2 are all two dimensional vectors that exist. For example [0 2] [7 219] [-323 0] etc.

R^3 are all three dimensional vectors that exist.

R^n are all n dimensional vectors that exist.

A vector space is a space where any multiplication of the vector by a multiple still causes it to stay within the same space. For example, a line that passes through the origin can be a vector space. In this case, any point when multiplied by a multiplier still remains on the line. Moreover, multiplying to vectors together from the line still results in a resultant vector that lies on the line. Hence, the line can be considers as a vector space.

These are the 8 rules that a vector space must follow to be considered as a valid vector space:
1. x + y = y + z
2. x + (y + z) = (x + y) + z
3. There is a unique 'zero vector' such that x + 0 =  x for all x
4. For each x there is a unique vector -x such that x + (-x) = 0
5. 1 times x equals x
6. (c1c2)x = c1(c2 x)
7. c(x + y) = cx + cy
8. (c1 + c2)x = c1x + c2x

Intuitively, a vector space:
1. Add any two vectors from the vector space, the resultant vector must still lie in the vector space
2. Take any combination of the vector space, the resultant vector must still lie in the vector space

Subspaces are vector spaces inside of a vector space. R2 is a vector space. A line is a 1 dimensional subspace in R2. Note that subspaces are vector spaces. In general, anything with the word 'space' can be considered as a vector space.

It must be note that all vector spaces and subspaces must pass through the origin. This is because if it does not pass through the orign, the multiplying a vector in the subspace with another vector in the same subspace will not result in a resultant vector that still remains in the subspace.

For the vector space R2, there are the following subspaces:
1. R2 itself (2 dimensions)
2. Line in R2 passing through the origin (1 dimension)
3. Zero vector (0 dimension)

Notice that the zero vector is also a vector space because you can multiply by itself to get itself, you can also multiply by any multiplier and you will still get itself so it stays within that zero vector space.
Moreover, you notice that the subspaces can be classified as n-1 less and less dimensions.


Additionally, combining two subspaces for example a plane (P) and a line (L) passing through the origin does not give you a new subspace. This is because multiplying a point in on the line and a point on the plane, might cause the resultant vector to be in a location outside the line or the plane. However,  if the line lies on the plane, then the combination of the line and a plane does give you a subspace. If the line lies on the plane, then the line is a subspace of the plane. So basically, it is as if you just consider the plane, and of course the plane is a subspace.

### Column space

A column space is generated by taking a few column vectors from a vector space. Then all the linear combinations of the column vectors form its own subspace.
Notice that because we are taking all linear combination to define the subspace, we notice that if we multiply the column vectors by a multiple, if we add them up together, the resultant vector will still lie in the subspace, and hence the linear combinations of the column vectors can be considered as its own subspace.



Given 4 equations (4 rows, 3 columns) and 3 unknowns, there are only certain b in which the equation can be solved.

![image](https://github.com/yiheinchai/learn/assets/76833604/15da5d47-2acf-4c20-8c35-6e481021ce45)

If b is all zeroes then it can be solved easily as x is simply 0.
If b is in one of the 3 columns, then the solution is simply take 1 of the columns and none of the rest. More generally, if b is any linear combination of the 3 columns then it can be solved, because x is simply the multiplier to get the right combinations to find b. Therefore, we can say that b is a subspace of its own as it is a linear combination of the columns of matrix A.

In other words, we can solve Ax = b exactly when b is in the column space of A. Intuitively, the column space contains all vectors A multipled by any x OR the column space contains any linear combination of columns of A. Therefore if b is one of the linear combination of the columns of A, then it can be solved.

The column space of matrix A is denoted as C(A)

### Nullspace

The nullspace is all linear combinations of the columnspace which gives the null vector.

This is space filled by linear combinations. 

The columnspace is three dimensional subspace in a 4 dimensional space.

The nullspace, on the other hand (in this example) is one dimensional subspace in the columnspace's three dimensional subspace (due to it being derived from linear combinations to get that subspace)

The columnspace is the space of all vectors of the RESULT of the linear combinations.

The nullspace is the space for the linear combinations themselves that give a 0 result.

![image](https://github.com/yiheinchai/learn/assets/76833604/911a5f01-badb-4c71-acb2-2ee92d863ca9)

Given 4 equations (4 rows, 3 columns) and 3 unknowns, and the b is a zero vector. What are the solutions of x in which this can be solved.

If x is all zeroes then it can be easily solved as 0 of all columns give you 0.
To get b = 0 (zero vector), x needs to be a linear combination of columns of A which gives a zero vector. In this case, the solutions to this equation are all linear combinations of columns of A which gives a zero vector. This is a null space.

In this case, the null space is c[1 1 -1], where c is any multiplier. You notice that you can give c any number and the resulting vector is still 0. You can plot [1 1 -1] in R^3 which gives a point. Then c[1 1 -1] gives a line, because by changing the value of c, you get all values on the line. Hence, in this case, the null space is a line through R3. This is illustrated as follows:

![image](https://github.com/yiheinchai/learn/assets/76833604/66d30dd0-44fb-4660-9fa0-34d51883afe9)

The null space of matrix A is denoted as N(A).

In order to prove that the solutions to Ax = 0 always give a subspace, we need to prove that we can add any two vectors that solve Ax = 0, and the solution is still within the subspace (ie. the solution also fulfils Ax = 0).

Aw = 0 and Av = 0, where w and v are different vectors which solves the equation, then,

A (w + v) = 0

Using the distributive law in matrices,

Aw + Av = 0

Therefore, substituing Aw and Av, we get 0 + 0 = 0 which is correct.

Next, the prove that the solutions to Ax = 0 always give a subspace, we need to prove that we can multiply the vector that solves Ax = 0 by any multiplier, and the resultant vector is still within the subspace (ie. the solution also fulfils Ax = 0).

Aw = 0

A(12w) = 0

As scalars can be moved outside, 

12 A(w) = 0

Substituting w in, we get 12 (0) = 0 which is correct.

#### Non-null spaces?

An interesting point here to note is that all solutions to Ax = 0 forms a subspace. But does all solutions to a certain b form a subspace too? Ax = [1,2,3,4]?

The answer is no, because if x = [0, 0, 0], the solution fails and we remember that all vector spaces must pass through the origin.

There might be many solutions that solves Ax = [1,2,3,4] and all those solutions will visualised as a line or a plane, however, the line and plane will not pass through the origin, and there cannot be subspace. This is because when you add two vectors on a plane that does not pass through the origin, the result will be out of the plane. This is because the vectors start from the origin, so if the plane passes through the origin, the all vectors of the plane will be parallel and colinear to the plane. However, if the plane does not pass through the origin, then the vectors on the plane are no longer parallel and this will lead to the resultant vector to be out of the plane when adding.

> to clarify, why solutions to [1,2,3,4] does not form a subspace. [0,0,0] will never be a solution so the solutions to [1,2,3,4] cannot contain [0,0,0]. Which means the potential subspace does not pass through [0,0,0]

### Overview of columnspace and nullspaces

In columnspaces, to determine it we built it up. We start with columns, and we add all linear combinations will become the columnspace. This is like the bottom-up approach.
In nullspaces, we first define an equation Ax = 0, and we say all vectors the fulfill this equation will become the nullspace. This is like a top-down approach.

## Solving Ax = 0

### Solving for a rectagular matrix

Here, our goal is to solve for the equation Ax = 0, where A is a rectangular matrix

We can apply the same elimination steps to a rectangular matrix (remember that we had previously only applied elimination to square matrices). In this case, we simply convert all cells below the pivot to 0 and then move to the next column.

![image](https://github.com/yiheinchai/learn/assets/76833604/3a389c3d-f7f5-49ff-9a90-9714702a0bcd)

We take note of the number of pivots that we have used. The number of pivots used is called the rank of Matrix A.

The pivot columns are the columns in which we have a pivot and used row exchanges.

Free columns are columns in which we did not need to have a pivot or have row exchanges because the values are already all zero.

![image](https://github.com/yiheinchai/learn/assets/76833604/b3633250-c7c8-4658-b92a-6186ad3c62ac)

After doing elimination to arrive at the echelon form, we can then do back substitution to find the values of x which solves for Ax = 0. This value is part of the nullspace. Moreover any multiple of the vector will also be part of the nullspace. Moreover, we can keep changing the values of the two free variables to any number and solve it to get more and more vectors of the nullspace.

However, there are certain special solutions in the nullspace which you can use to get all other vectors in the nullspace. For example, you set one of the free variables to 1 and the other to 0. So you just get all the combinations of zeroes and 1 of the free variables. The nullspace vectors that you get are special solutions (or elemental vectors). Taking the linear combination of these special solutions will give you the entire nullspace.

Here, [-2 1 0 0] and [2 0 -2 1] are the two special solutions and the linear combinations, by adding a multiplier c and multiplier d will give the entire nullspace: 

![image](https://github.com/yiheinchai/learn/assets/76833604/8d26dda5-a212-4f70-9ada-5fefbc26ce1c)




The number of special solutions available is equal to the number of free columns. The number of free columns is calculated based by `total_cols - pivot cols`.

In this case, as there are two free cols, then the maximum number special solutions is 2.

Pivots are called rank because they tell us about how many columns carry unique information. Whenever you do elimination, if you eliminate more than one variable at once, that indicates that there are some columns that did not contain unique information. Therefore, it is only whenever we use pivots, we are using unique information to eliminate the other columns. Hence, the number of pivots we have tells us something about the number of columns in the matrix that have unique information.

The reason why free columns correlate with the number of special solutions is because we can think of nullspace as dead zones. Because your equation has some redundant columns, you don't have enough information to cover the 4 dimensional space. Realistically, with the rank 2 matrix that you have, the matrix can only span a 2 dimensional subspace in the 4 dimensional space. Then, this would means that rest of the area that cannot be spanned by the matrix is areas that cannot be reached and hence knowned as the nullspace.

So the more useless columns you have (free columns), the larger the area that you cannot reach, the more free columns you have and the more basis vectors there are in the nullspace.

### Using reduced row echelon form

The reduced echelon form has zeroes above and below the pivots. And it also has the pivots as 1. We can achieve the reduced row echelon form by doing more permutations.

After reducing U into the reduced echelon form, R, we notice and interesting obsevation. R conforms to the format [I F], where I is an identity matrix which is formed by the pivot columns (remmeber we had to make the pivots 1 and anything above and below 0 to get the R form ,so we arrive at the identity matrix). F is a matrix made up of the free columns. 

[I F]

[0 0]

![image](https://github.com/yiheinchai/learn/assets/76833604/27ed04d2-a901-448b-b02a-0828a60ffe84)


![image](https://github.com/yiheinchai/learn/assets/76833604/bd47e5a3-c5fa-4322-b54f-8b79f6920048)

Now, to find the nullspace vector, we simply take negative of the free columns and the identity matrix at the bottom.

![image](https://github.com/yiheinchai/learn/assets/76833604/9d9e5630-f658-40da-a173-55829e1a0c93)

This method with the reduced row echelon form makes it much easier to find the nullspace and does not require any back substitution.

Combining the solution together, we can verify if the solution indeed equals zero by multiplying the reduced matrix, R, with the solutions to x together.

![image](https://github.com/yiheinchai/learn/assets/76833604/3923fda5-c0d7-4737-b3ce-ccbced414394)

And we expect that the solution to be 0. So Xpivot + F (Xfree) = 0

### Code to find nullspace

First, we need to modify the `eliminate_mat` function to support rectangular matrices. However, in the process of doing so, we need to update the permutation matrix to support rectangular matrices. In particular, we need to use the size of the row to determine the size of the identity matrix which will form the permutation matrix. Hence, we first update the `gen_identity_mat` function.

In [None]:
def gen_identity_mat(mat):
  max_row= len(mat) - 1
  i_mat = []
  for i_col in range(max_row + 1):
    row = []
    for i_row in range(max_row + 1):
      if i_row == i_col:
        row.append(1)
      else:
        row.append(0)
    i_mat.append(row)
  return i_mat

Moreover, remember we mentioned that PA = LU solves for elimination with the row exchange step. We now need to implement a new function which generates the permutation matrix that does row exchanges.

In [None]:
def gen_permutation_mat(mat, from_row, to_row):
    I = gen_identity_mat(mat)
    temp_row = I[to_row]
    I[to_row] = I[from_row]
    I[from_row] = temp_row

    return I


Next, we also need to add some logic to find the index of the column that is no zero so as to exchange it

In [None]:
def find_non_zero_index(column):
    return next((i for i, x in enumerate(column) if x), None)

Now, we update the `eliminate_mat` function with those changes. Additionally, we need to modify out subset algorithm. Particularly, when we encounter a free column, we need to skip the free column and take the next column as the subset while retaining the number of rows. 

For example, normally, with each iteration we decrease the size of the subset matrix by 1 column size and 1 row size. However, if we encounter a free column, we need to make sure that the next subset matrix does this includes the current row instead of subtracting 1 row. This is because that row was supposed to have a variable eliminated, but due to the free column, it did not have a variable eliminated, so we still need to eliminate the variable in that row.

In [None]:
def eliminate_mat_no_aug(mat):
    pivot = mat[0][0]
    curr_mat = mat
    is_free_column = False

    # code to exchange rows if the pivot is zero
    if pivot == 0:
        non_zero_index = find_non_zero_index(mat_invert(curr_mat)[0])

        if non_zero_index is not None:
            swap_rows_permutation = gen_permutation_mat(curr_mat, 0, non_zero_index)
            curr_mat = mat_mul(swap_rows_permutation, curr_mat)
            pivot = curr_mat[0][0]

        else:
            is_free_column = True

    if pivot != 0:
        for row_i, row_val in enumerate(mat_invert(curr_mat)[0]):
            if row_i == 0 or row_val == 0 : continue

            param = row_val / pivot
            # generate the first operator
            operator = gen_identity_mat(mat)
            operator[row_i][0] = -param

            curr_mat = mat_mul(operator, curr_mat)

    max_row, max_col = len(mat) - 1, len(mat[0]) - 1

    if max_col > 1 and max_row > 1 and not is_free_column:
        first_row = curr_mat[0]
        first_column = mat_invert(curr_mat)[0]

        mat_subset = subset_mat(curr_mat)
        eliminated_subset = eliminate_mat_no_aug(mat_subset)

        curr_mat = superset_mat(eliminated_subset, first_row,
                            first_column)
    elif is_free_column:
        first_column = mat_invert(curr_mat)[0]

        # the subset should only skip the free column, do not subtract any rows in subset
        mat_subset = mat_invert(mat_invert(curr_mat)[1:])
        eliminated_subset = eliminate_mat_no_aug(mat_subset)

        curr_mat = mat_invert([first_column, *mat_invert(eliminated_subset)])

    return curr_mat


With the next `eliminate_mat` function, we can implement the rref algorithm. We first need to identify the free columns and the pivot columns. This can be easily identified, if we find a column that does not have the natural n-1 zero step, then it must be the free column. To do this, we need to define a function which checks if an array starts with n number of zeroes

In [None]:
def num_trailing_zeroes(column):
    num_zeroes = 0
    for val in column[::-1]:
        if val == 0:
            num_zeroes += 1
        else:
            break
    return num_zeroes

In [None]:
def mark_pivot_columns(U):
    U_T = mat_invert(U)

    is_pivot_column = []

    # offset value to adjust expected zeroes after encountering pivot columns
    offset_value = 1

    for i, column in enumerate(U_T):
        # check if each column has the right number of trailing zeroes
        num_zeroes = num_trailing_zeroes(column)

        # column 0 should have len(U) - 1 zeroes
        # column 1 should have len(U) - 2 zeroes
        # nth column should have len(U) - (n + 1)
        if num_zeroes == len(U) - i - offset_value:
            is_pivot_column.append(True)
        else:
            is_pivot_column.append(False)
            offset_value -= 1

    return is_pivot_column

Now that we have found the pivot columns, we need to be able to ensure that all the pivot values are 1. This can be done by using a simple permutation matrix which takes a fraction of the current row. Moreover, we need to ensure that all values above the pivot are zero. To do this, we can apply a similar algorithm to basic Gaussian elimination. In fact, it would be better to first rearrange it to group the pivot columns together by using column exchanges.

Here we are going to convert U into the format of [I F], except for the fact that I has not be converted yet.

In [1]:
def seperate_pivot_from_free_columns(U, marked_cols):
    U_T = mat_invert(U)

    free_cols = []
    pivot_cols = []
    
    for i, column in enumerate(U_T):
        if marked_cols[i]:
            pivot_cols.append(column)
        else:
            free_cols.append(column)

    return pivot_cols, free_cols

In [None]:
def push_pivot_columns_to_front(U, marked_cols):
    pivot_cols, free_cols = seperate_pivot_from_free_columns(U, marked_cols)

    sorted_U_T = [*pivot_cols, *free_cols]
    sorted_U = mat_invert(sorted_U_T)

    return sorted_U

We should also define a function to throw all zero rows away because they are useless in the elimination process

In [None]:
def throw_zero_rows(U):
    non_zero_U = []
    for row in U:
        if not all((val == 0 for val in row)):
            non_zero_U.append(row)
    return non_zero_U

def add_zero_rows(R, num):
    num_cols = len(R[0])
    for _ in range(num):
        R.append([0] * num_cols)
    return R

We should now implement a permutation to convert the pivot values to 1. This replaces the previously implemented `clean_coeff_mat`.

In [None]:
def clean_coeff_mat(R, b):
    permutation_mat = gen_identity_mat(R)
    for i, row in enumerate(R):
        permutation_mat[i][i] = 1 / row[i]

    cleaned_mat = mat_mul(permutation_mat, compose_aug_mat(R, b))
    return decompose_aug_mat(cleaned_mat, len(b[0]))

We also need to update the `eliminate_mat` function to support rectangular matrices just like `eliminate_mat_no_aug`

In [None]:
def eliminate_mat(mat, b):
    pivot = mat[0][0]
    curr_mat = mat
    curr_b = b
    is_free_column = False

    # code to exchange rows if the pivot is zero
    if pivot == 0:
        non_zero_index = find_non_zero_index(mat_invert(curr_mat)[0])

        if non_zero_index is not None:
            swap_rows_permutation = gen_permutation_mat(curr_mat, 0, non_zero_index)
            swapped_aug_mat = mat_mul(swap_rows_permutation, compose_aug_mat(curr_mat, curr_b))
            curr_mat, curr_b = decompose_aug_mat(swapped_aug_mat, len(b[0]))

            pivot = curr_mat[0][0]

        else:
            is_free_column = True

    if pivot != 0:
        for row_i, row_val in enumerate(mat_invert(curr_mat)[0]):
            if row_i == 0 or row_val == 0 : continue

            param = row_val / pivot
            # generate the first operator
            operator = gen_identity_mat(mat)
            operator[row_i][0] = -param

            eliminated_mat_with_b = mat_mul(operator, compose_aug_mat(curr_mat, curr_b))

            curr_mat, curr_b = decompose_aug_mat(eliminated_mat_with_b, len(b[0]))

    max_row, max_col = len(mat) - 1, len(mat[0]) - 1

    if max_col > 1 and max_row > 1 and not is_free_column:
        first_row = curr_mat[0]
        first_column = mat_invert(curr_mat)[0]

        mat_subset = subset_mat(curr_mat)
        eliminated_subset, b_subset = eliminate_mat(mat_subset, curr_b[1:])

        curr_mat = superset_mat(eliminated_subset, first_row,
                            first_column)
        curr_b = [curr_b[0], *b_subset]

    elif is_free_column:
        first_column = mat_invert(curr_mat)[0]

        # the subset should only skip the free column, do not subtract any rows in subset
        mat_subset = mat_invert(mat_invert(curr_mat)[1:])
        eliminated_subset, b_subset = eliminate_mat(mat_subset, curr_b)

        curr_mat = mat_invert([first_column, *mat_invert(eliminated_subset)])
        curr_b = b_subset

    return curr_mat, curr_b

Now that we have rearranged U, we need to do simple Gaussian elimination. But to do this, we need to do the double mirror and then isolate the free columns as b in the augmented matrix. And then reverse the steps to produce the results

In [None]:
def rref(A):
    U = eliminate_mat_no_aug(A)
    marked_cols = mark_pivot_columns(U)

    U_no_zero = throw_zero_rows(U)

    no_of_zero_rows = len(U) - len(U_no_zero)
    
    pivot_cols, free_cols = seperate_pivot_from_free_columns(U_no_zero, marked_cols)
    pivot_cols, free_cols = mat_invert(pivot_cols), mat_invert(free_cols)
    pivot_cols, free_cols = double_mirror_mat(pivot_cols), double_mirror_mat(free_cols)

    pivot_cols, free_cols = eliminate_mat(pivot_cols, free_cols)
    I, F = clean_coeff_mat(pivot_cols, free_cols)
    I, F = mat_invert(double_mirror_mat(I)), mat_invert(double_mirror_mat(F))

    R = mat_invert([*I, *F])
    R = add_zero_rows(R, no_of_zero_rows)
    
    return R

We notice that after we do apply the `rref` algorithm to A, we immediately get the elements to the nullspace as it is already in the format [I F]. Lastly, we implement a convenient function to get the nullspace special solutions. We know that nullspace is c[-F I]

In [None]:
def find_nullspace_matrix(A):
    U = eliminate_mat_no_aug(A)
    marked_cols = mark_pivot_columns(U)

    U_no_zero = throw_zero_rows(U)
    pivot_cols, free_cols = seperate_pivot_from_free_columns(U_no_zero, marked_cols)
    pivot_cols, free_cols = mat_invert(pivot_cols), mat_invert(free_cols)
    pivot_cols, free_cols = double_mirror_mat(pivot_cols), double_mirror_mat(free_cols)

    pivot_cols, free_cols = eliminate_mat(pivot_cols, free_cols)
    I, F = clean_coeff_mat(pivot_cols, free_cols)
    I, F = mat_invert(double_mirror_mat(I)), mat_invert(double_mirror_mat(F))

    neg_F = [(-val or val for val in col) for col in F]

    return [[*vecF, *I[i]]  for i, vecF in enumerate(neg_F)]

Well, it turns out that this method of finding the special solutions is not super reliable. The most reliable method is to use the R form matrix and substitute the x values of the free columns one by one. So to get 1 special solution, make one of the free columns x value 1, and the others 0, and then solve for the rest of the x variables. Then to get the next, make the next free column 1 and the others 0 and solve again. It can be seen from this algorithm that the number of special solutions is equal to the number of free columns that are present, which is also equal to the number of columns - rank (n -r)

We note that all the pivot variables forms an identity matrix, so solving for the pivot variables are very easy because each row has a single isolated pivot variable with a coefficient of 1. The free variables have different coefficients, but since we are the ones setting the x value we can compute the final result immediately. Therefore, solving this becomes a simple matter.

In [None]:
def find_nullspace_matrix(A):
    U = eliminate_mat_no_aug(A)
    marked_cols = mark_pivot_columns(U)

    U_no_zero = throw_zero_rows(U)
    pivot_cols, free_cols = seperate_pivot_from_free_columns(U_no_zero, marked_cols)
    pivot_cols, free_cols = mat_invert(pivot_cols), mat_invert(free_cols)
    pivot_cols, free_cols = double_mirror_mat(pivot_cols), double_mirror_mat(free_cols)

    pivot_cols, free_cols = eliminate_mat(pivot_cols, free_cols)
    I, F = clean_coeff_mat(pivot_cols, free_cols)
    I, F = double_mirror_mat(I), double_mirror_mat(F)

    neg_F = [[-val or val for val in col] for col in mat_invert(F)]
    
    num_free_cols = len(neg_F)

    neg_F_row = mat_invert(neg_F)

    special_solutions = []

    for i in range(num_free_cols):
        x = [[0]] * num_free_cols
        x[i] = [1]

        F_sol = mat_mul(neg_F_row, x)

        special_solutions.append([*F_sol, *x])

    return special_solutions

#### Summary intuition

Let's summarise what we did to achieve our goal of finding vectors that when multiplied by the matrix will give 0 (nullspace).

This solution is quite similar to what was experienced in basic matrix equation solver. We first equate A x = b. When the made A an identity matrix via elimination and bringing all the coefficents over to b. Bringing the coefficients still makes the equation valid because will are doing an operation on both sides. In the end, the identity matrix will simply tell you what each unknown variable is by comparing it to b.

In this case, it is rather similar, we use rref to get our identity matrix. The independent columns are the pivot columns and these are the columns that we need to find the unknown for. The free variables are dependent columns and we can set the variable to anything. So, we set the free variables, then multiply the set variables into those columns to produce constants. We can then move those constants from the LHS to the RHS, allowing us to simply get values of the unknown variables on the RHS. Remember the RHS was originally 0, but now we subtract by the constant by both sides (the equation is still valid).

To illustrate this, where, P = pivot cols, F = free cols, we need to solve for x,

`A x  = 0`

`E A x  = E 0`

`I F x = 0`

Set x values and multiply, (note the operations where are done in algebriac form rather than matrix form)

`(I F) x = 0`

`Ix + Fc = 0`

Shift the constants over,

`Ix = -Fc`

`x = -Fc`

Hence, each value in 1 will correspond to a particular constant, which is the solution.


#### Other thoughts

Perhaps an interesting thought here is that:

We know that a series of Es applied to pivot cols and free cols gets us to I F.

`E (PIVOT FREE) = I F`

Using the distributive property of matrices,

`E(PIVOT) E(FREE) = I F`

Therefore, E must be equal to PIVOT inverse. And hence, we can derive that PIVOT^-1 FREE = F. Which is an interesting result.

To derive E, we simply tag on another I to the matrix to record the permutations that are conducted.

`E (PIVOT FREE I) = I F ?`

`E(PIVOT) E(FREE) E(I) = I F E`

We can see from this proof that the right hand side of the answer must be E. Since E = Pivot^-1, it makes sense that E(FREE) = F

So, an alternative solution to find F, is to use another algorithm to find Pivot^-1, and then multiply it by FREE.

To follow up on this interesting thought, we mentioned previously that elimination or rref eventually reduces things down to I, and the bulk of the information is stored in b.

`E A = I`

Hence, `E = A^-1.`

So likewise, to find A inverse, we simply, tag I along to record the permutations.

`E A I = E(A) E(I) = I E`

So now that we successful have recorded E in the result, what is E? Well we found previously that `E = A^-1`, so we have successfully found the formula to get the inverse of a matrix.

This serves as a much better explanation given previously.

We can then apply this to find Pivot^-1. So, E Pivot I = I Pivot^-1, where E are the steps for elimination. We know that the pivot variables must form a square and invertible matrix, therefore, this gives rise to a much simpler algorithm using simple elimination to find Pivot^-1. 

In [None]:
def find_nullspace_matrix(A):
    U = eliminate_mat_no_aug(A)
    marked_cols = mark_pivot_columns(U)

    U_no_zero = throw_zero_rows(U)
    pivot_cols, free_cols = seperate_pivot_from_free_columns(U_no_zero, marked_cols)

    pivot_inverse = find_inverse(pivot_cols)

    F = mat_mul(pivot_inverse, free_cols)

    neg_F = [[-val or val for val in row] for row in F]
    
    num_free_cols = len(F[0])

    special_solutions = []

    for i in range(num_free_cols):
        x = [[0]] * num_free_cols
        x[i] = [1]

        F_sol = mat_mul(neg_F, x)

        special_solutions.append([*F_sol, *x])

    return special_solutions

This code basically does the exact same thing. But using existing inverse function instead of re-writing it again.

## Solving Ax = b

### Finding solutions to Ax = b

First let's look at the column picture. Previously, we mentioned that b is solvable as long as b is within the column space of A. This makes perfect sense because the column space is all linear combinations of the columns of A. x is simply the multipler to take different linear combinations of columns of A. So if b is in the column space of A, then there will definitely be a valid x which provides the right linear combination to produce b.

Next, let's look at the row picture. For rectangular matrices where there are dependent rows, Ax = b is solvable only if the b value of the dependent row follows the same linear combination of the dependent A row. In other words, if a combination of rows of A gives zero row, then same combination of entries of b must give 0.



Solutions to Ax = b. So in square matrices, there was no free variables, and this was equivalent to setting free variables as zero and solving the equation with just pivot variables. So this is X particular. Set all free variables as zero and solve the equation. This method handles the case where b is part of the columnspace of A. By solving normally, we will find X particular which is a solution if b is in columnspace of A, by finding the linear combinations which produces b.

![image](https://github.com/yiheinchai/learn/assets/76833604/47583b10-f96f-464e-80bf-e8f1b17ff977)

However, we also need to solve for the case when b is the zero vector. In otherwords, this solution will involve finding the nullspace of A, which gives the solution to Ax = 0 where b is the zero vector.

Remember that the nullspace is formed by any linear combinations of the special solutions. Particularly, `c [solution 1] + d [solution 2] + ....`

Therefore, the full solution to Ax = b would be the combination of both cases, when b is in the columnspace of A and when b is the zero vector. This can be illustrated as:

`x = Xparticular + c [solution 1] + d [solution 2] + ....`

or if we rewrite the special solutions as a nullspace, Xn

`x = Xp + Xn`

But how do we prove the the solutions to Ax = b, when added together still fulfills the condition? We know for sure that each of them fulfils Ax = b, but how do we prove it still holds true when they are added together? So,

`A(x) = b`

`A(Xp) = b`

`A(Xn) = 0`

Then, we should expect,

`A(Xp + Xn) = b`

`A(Xp) + A(Xn) = b`

`b + 0 = b`

`b = b`

Therefore, this shows that the solution Xp + Xn still fulfills the conditions of Ax = b. Intuitively, we notice that because the nullspace does not contribute anything to b, we can add as much of the nullspace vectors as we want and the solution will still fulfil b. It is only the Xparticular that contributes to b (`A(Xp) = b`).

![image](https://github.com/yiheinchai/learn/assets/76833604/44b25111-d00e-4714-a614-2bf68d931308)

Trying to plot the solutions to Ax = b, we first notice that the nullspace forms a plane (if there are two special solutions). The two basis vectors of the nullspace spans a plane. The solution to Xparticular is simply a static vector. Thereforce, when we add Xparticular to the plane, we are translating the plane by that vector, this will cause the plane to no longer pass through the origin and therefore this plane cannot be considered as a subspace as a subscape must pass through the origin as explained a long time ago.

![image](https://github.com/yiheinchai/learn/assets/76833604/94cf39b4-e706-45be-9b55-93d4fd8d5cef)

### Relationships between R and N

Take a m x n matrix, particularly, with m rows and n columns.

There are different scenarios here, which require visualisation of the elimination process. 

First, if we have a matrix that is very tall and very thin, where m >>>> n, we expect to have very little columns and a lot of rows. Therefore, when we do the elimination process, we go down the columns and rows in step for the pivot, and then we will eventually realise that we run out of columns whlie we still have extra rows left. Those extra rows cannot be independent and there should become all zeroes by the process of elimination. If the extra rows are independent, then there is no solution, we need more than the given columns and X variables in order to solve it. So for a solvable, the extra rows must be dependent and become zero. Therefore, the rank of such a matrix is limited by the bottleneck which is the number of columns, n. Hence, in such a matrix, rank = n. Because we have extra rows, those extra rows should be all zeroes. This case is called the 'Full column rank'. In this case, there is a constraint on b, as mentioned previously, that the extra rows must be dependent, and the the b in the extra rows must also be dependent by the same linear combination (for it to turn to an all zero row) for the equation to be solvable.

> Because there are no free columns in this situation, there should be no nullspace. Therefore, the full solution to Ax = b, should just be Xparticular, if that solution exists (on n number of independent rows and the rest are dependent rows) <- this is the case where independent rows and columns reduce to a square matrix. However, if there are more independent rows than there are variables, then the solution is not solvable and there is no solution. So in this case, there can be 1 (Xparticular) or 0 solutoins.

![image](https://github.com/yiheinchai/learn/assets/76833604/773238f0-10d6-46e7-82cd-52d20a31ec28)

Second, if we have a matrix that is very wide and very short, where m <<< n, we expect to have very little rows and many columns. Therefore, when we do the elimination process, we go down columns and rows in step for pivot, when we will eventually realise that we run out of rows while we still have extra columns left. And at this point we are no longer able to continue eliminating. Therefore, the rank of such a matrix is limited by the bottleneck which is the number of rows, m. Hence, in such a matrix, rank = m. Because we have a lot of extra columns, we have a large number of free columns. So free_cols = n - m = n - r. This case is called the 'Full row rank'. In this situation, there is always a solution for every b, no constraint on b (extra columns does not affect b)

> Because there are many free columns, and remember that each free column can produce one special solution, when will be get a nullspace of the dimension equal to the number of free columns. In otherwords, the dimension of the nullspace = n - r = n - m. Therefore, the solution should contain both Xparticular and the nullspace and the solution has the dimension of the nullspace, contained within R^m.

![image](https://github.com/yiheinchai/learn/assets/76833604/dc1af164-3bd3-4cc1-93c7-ed4ba9f10147)

Third, if we have a square matrix that is invertible, and has n = m, same rows as same column, then, as long as all the rows and columns are independent, then we can be sure that there would be no zero rows or any free columns. The rank will be the same size as the matrix, where n = m = r. There are no free columns, therefore there is no nullspace. Therefore, the full solution to Ax = b, should just be Xparticular. Hence, there is a unique solution if it exists in this case (no nullspace in the solution). Moreover, there is always a solution for every b.

![image](https://github.com/yiheinchai/learn/assets/76833604/6db0bc02-5deb-436b-82bc-d0c8223c3dcd)

Fourth, if we have a rectangular matrix that has dependent rows and columns, we still get a combination of the first two cases. We will get rows at the bottom which are all zeroes, and we will get some free columns. The nullspace should be the same dimension as n - r similar to the second case. Therefore, the solution should contain both Xparticular and the nullspace and the solution has the dimension of the nullspace, contained within R^m.


In summary,

![image](https://github.com/yiheinchai/learn/assets/76833604/e76e88dd-5827-4268-bf78-eebd9e96491a)

In intuitive summary, 1 solution exists only when it is a perfectly square matrix with all independent rows. In otherwords, after rref, it should still remain a perfectly square matrix with no full zero columns. rref effectively identifies dependent rows, so if by the end of rref there is still no all zero rows then all the rows are independent. 

For a solution to be solvable, there needs to be the same number of equations as unknown variables.

Too many equations, will put too many constraints that cannot be captured on the variables. Hence, no solution.

The saving grace is that if there are too many equations, but the equations are dependent, then we can effectively throw those equations away (they do not provide us with extra information). We can identify dependent rows via rref. And then we should get the right number of equations to form a square matrix and solve accordingly.

Too little equations, will put too little constraints result in solutions that span a space (nullspace + X particular). And in this case, there are an infinite number of solutions.

<img width="597" alt="image" src="https://github.com/yiheinchai/learn/assets/76833604/4c18ed6d-2fd3-49b9-b3c4-141baa9f9fc3">

## Depedence, Basis and Dimension

### Independence

Intuitively, independence of a vector means that that vector cannot be derived from some linear combination of the other vectors. That vector provides unique information, which cannot be derived from existing vectors.

A dependent vector looks like this:

`[vector A] = c[vector B] + d[vector C]`

In this case, vector A can be derived from a linear combination of vectors B and C.

Moving all terms to one side (putting negatives in the constant),

`b[vector A] + c[vector B] + d[vector C] = 0`

Therefore, for vector A to be independent, then,

`b[vector A] + c[vector B] + d[vector C] != 0`

Formally, the definition of independences is,

> **_Definition:_** Vectors x1, x2, ..., xn are independent if no combination gives zero vector except the zero combination (where all multiplier coefficients are zero)

Notably, the zero vector cannot be part of the matrix which has independent vector. If the zero vector is inside, the the rows of the matrix cannot be independent. This is because you can use v1 to derive the zero vector by simply making the multipler to v2 0, and for the zero vector, you can use any multiplier. Because not all the multipliers are zero, this still falls in line with the linear combination definition. Therefore, this means that some linear combination of the vectors can give the zero vectors, meaning that the zero vector is dependent, meaning that the matrix that contains the zero vector does not have independence.

Interestingly, this links to our operations in rref. We noticed that when we have dependent rows in rref, we will result in rows where there is all zeroes. This fits in line with our explanation that all zero rows are a demonstration of the lack of independence.

![image](https://github.com/yiheinchai/learn/assets/76833604/7712af6a-d0ba-45c5-9f50-5bb73fe9fbf1)

Another intersting to note, that because in rref, we are taking the linear combinations of the rows, we arrive at all zero rows for dependent rows. Because we are not doing linear combinations of the columns this phenomenon does not occur to the columns. 

It is only when we transpose it and take linear combinations of the columns that we realise that the dependent columns will likewise turn into all zero columns.

Another important way of identifying dependence is looking at the number of vectors and the number of dimensions in which the vectors are in (the number of unknowns).

![image](https://github.com/yiheinchai/learn/assets/76833604/cd460004-a03e-49b9-8a95-8c66db18cd8a)

In this case, we have three vectors in a 2 dimensions space. We already know that linear combinations of two independent vectors is already sufficient to span R^2, therefore, the additional 3rd vector must be dependent, because some linear combination of the first two vectors can give the third vector.

Moreover, because we have more unknowns than we have equations (m < n> or in other words, more columns than rows), we should expect to have free variables (free columns).   

![image](https://github.com/yiheinchai/learn/assets/76833604/9a21634f-a103-44be-acf1-54ba7680a9f8)

Because we have free columns, we expect to have a nullspace. Any matrix that has a nullspace (that is not the zero vector) will have dependent columns. 

> Interesting to note, that if we transpose this matrix and eliminate, it will have zero columns and no nullspace.

The reason why having a nullspace (that is not the zero vector) means there is dependent column is because the basis of a nullspace is the solution to Ax = 0. Intuitively, when there is a nullspace, this means that there is a linear combination of the column vectors to produce the zero vector. Going by our previous definition of dependence, this means that the column vectors in the matrix are hence not independent. Only if at least one of the vectors can be derived from the other two, then if it possible to produce the zero vector via a linear combination.

To summarise, when rank < n, there are free variables, hence there are basis for nullspace, hence the column vectors in the matrix is not independent.


### Span
> **_Definition:_** Vectors V1, ..., Vl span a space means that the space consists of all combinatoins of those vectors

Intuitively, when all linear combinations of the vectors give rise to the space, we can say that the vectors span the space.

### Basis
> **_Definition:_** Basis for a space is a sequence of vectors v1, v2, ..., vd with 2 properties:
> 1) They are independent
> 2) They span the space



Intuitively, a basis are independent vectors whose linear combinations will result in a given space. The point about them independent is important, because this means that the basis is the MINIMAL amount of vector information required in order to construct the vectorspace. Having dependent vectors do not provide any additional information as they can be derived and therefore cannot be the basis.

Note that there can be multiple basis for a given space, just that all the bases must be independent.

Moreover, to focus in more on the word span, for basis to span a third dimensional space, then it must have 3 basis. Having 2 basis will only result in it spanning a plane. Therefore, number of vectors in basis = dimension of spanned space.

Going back to the point that there can be multiple basis for a given space, we note now that for every set of valid basis, the number of vectors in the basis will always be the same, which is equal to the dimension of the space in which they span.

Formally,
> For the space R^n, n vectors give basis if the nxn matrix with those cols is invertible (all independent)

> Every basis for the space has the same number of vectors

> `Rank(A) = # pivot columns = dimension of C(A) = # vectors in basis`

> `dimension of C(A) = R`

## Four fundemental subspaces



### Overview

For a matrix A, of shape m x n,

| Space Name       | Space symbol | Enclosed space | Size of Basis     | Dimension |
|------------------|--------------|----------------|-------------------|-----------|
| Columnspace      | C(A)         | R^m            | r = # pivot cols  | R^r       |
| Nullspace        | N(A)         | R^n            | n-r = # free cols | R^n-r     |
| Rowspace         | C(A^T)       | R^n            | r                 | R^r       |
| Nullspace of A^T | N(A^T)       | R^m            | m-r               | R^m-r     |


For the enclosing space, the size of the vector indicates the dimension in which the vector is in. For example, [1,2,3,4] is a point in the 4 dimension space. a[1,2,3,4] + b[0,0,1,0] is a line in the 4 dimensional space. The number of vectors indicates the dimension of the actual space.

Therefore, for the columnspace, the dimension of the enclosing space will be the size of the column vector, which is the number of rows, which is m. When the enclosing space has the dimension of R^m.

For the nullspace, the solutions to the nullspace is equal to the number of variables of x. The number of variables of x is equal to the number of columns in the matrix, hence the enclosing space has the dimension of R^n.

For the rowspace, the size of the row is the number of columns, hence R^n.

For the nullspace of A^T, the size of x is equal to the size of the columns which is the number of rows, hence R^m.

Interesting point to note is that the dimension of the rowspace and the columnspace are both r (rank). Therefore, when doing elimination, we find the rank of the matrix, that rank defines the dimension of both the column and rowspace. In otherwords, if we notice that a matrix only has r independent row, then it must also have r independent columns (for a nxn matrix).



 Moreover, it is interesting that the size of the basis of `C(A) + N(A) = n`, and `C(A^T) + N(A^T) = m`

### Finding the rowspace without redoing elimination

When looking at the column picture, we simply took linear combinations of the different rows again and again. We remember that the definition of the rowspace is all the linear combinations of the rows. Therefore, the results of the steps of eliminations (which simply involves taking linear combinations), must result in rows that are also linear combinations of the original rows. Therefore, the end result of rref or elimination must also be the basis of the rowspace. Particularly, when we use rref, we can identify the dependent row as it will turn all zero. And therefore at the end of rref, all the non-zero rows must be independent. And therefore all the non-zero rows will form the rowspace. In particular, all the non-zero rows will have a pivot and will be part of rank. Therefore, the basis for row space is the first r rows of R. Rank is the dimension of the rowspace. Moreover, the basis of the rowspace from rref is the best basis, meaning they are the most simplified and easy to use basis (mainly 1 and 0).

Also note that the process of elimination will cause the columnspace to change, this is because we did operations by row, the new columns formed are not neccessarily linear combinations of previous columns, hence the columnspace will change.

Hence,

`C(R) != C(A)`

### Nullspace of A^T (Left )

We can simplify the nullspace of A^T to something more familiar and understandable.

`A^T y = 0`

Applying transpose to both sides,

`(A^T y)^T = 0`

Using the reverse order distributive property of transpose,

`y^T A^T^T = 0`

`y^T A = 0`

![image](https://github.com/yiheinchai/learn/assets/76833604/85fc05ff-2e45-4f55-a12d-dec7029b5ed2)

To be able to solve for y^T is simple, we simple do rref to convert A into R. Remember that there must be dependent rows to have a nullspace, and therefore, rref will result in entire rows of zero at the bottom.

Remember that the zero row at the bottom is derived from the series of elimination steps. Each elimination step is a linear combination of earlier rows. Therefore, by recording the elimination steps, we will be able to find out what the linear combinations of the rows of A will result in the 0 row.

To record the elimination steps, we simply add on the identity matrix at the end, just like what we did for Gauss-Jordan elimination. When we arrive at R, I will be transformed to E, the permutation matrix that converts A to R. And taking the row of the permutation matrix that gives rise to the zero row would be the linear combinations of A to give 0.

![image](https://github.com/yiheinchai/learn/assets/76833604/c570523a-db7d-4402-ad7a-f9f0991872ac)

![image](https://github.com/yiheinchai/learn/assets/76833604/00dbd934-a831-495e-ab32-08c9bb68c57f)




It is also interesting to observe, that the basis of the nullspace of A^T, will the the rows of the permutation matrix that gives the 0 rows. So the more zero rows there are, there the larger the basis. Remember we said that the number of vectors in the basis is equal to the dimension? Therefore, the larger the more zero rows there are the larger the dimension of the nullspace of A^T. 

Moreover, the number of zero rows is determined by the number of rows - pivots. Essentially, all the non-pivot rows will become the zero rows, which will be come the basis and hence the dimensions.

Therefore, if we refer to the table above, the basis of nullspace of A^T is m-r.

Again, the dimension of nullspace and rowspace adds to m.

## Matrix spaces, Rank 1, Small world graphs

### Matrix spaces

Instead of considering groups of vectors forming a vectorspace, we can now also consider groups of matrix forming a vectorspace. We define the matrix vectorspace as `M`.

Matrix spaces follow the same rule as vectorspaces, you should be able to add two matrices in a matrixspace and still remain in the matrixspace. You should be able to multiply a matrix by a constant and still remain in the matrix space

For example, all 3x3 matrices span a vectorspace, with 9 dimensions. 9 different values to vary.

All 3x3 upper tringular matrices form a subspace, with 6 dimensions. 6 different values to vary.

All 3x3 symmetrical matrices form another subspace, with 6 dimensions. 6 different values to vary.

All matrices that are both upper triangular and symmetrical form a smaller subspace of diagonal matrices, this has 3 dimensions, with 3 different values to vary. Is will be the case of `S n U`, symmetric matrixspace intersects upper triangular matrixspace.

The identity matrix is the smallest unit just a single matrix of 0 dimension with no values to vary.

Now, how do we prove that those matrices are indeed in the same vectorspace? For example, all upper triangular matrices, we can add both together and we still get an upper triangular matrix. We can multiply by a constant, and we still get a upper triangular matrix, so that fulfills the conditions for it to be a vectorspace. Same as the others.

As another keen reminder, when M has 9 dimensions, then it must require 9 basis matrices to span that vectorspace.

Lastly, we can consider `S u U`, symmetric matrixspace union upper triangular matrixspace. The two matrixspaces are in different directions and therefore taking the union will not produce any the matrixspace, as it would not fulfil the rules. Adding a symmetric matrix with an upper triangular matrix, would result in a matrix that is possibly neither symmetric nor upper triangular. More likely than not, adding matrix from S and U will result in matrices that actually spans the entire R^9 space (for 3x3 matrices). Therefore, we can then write the expression for the R^9 space as `c[S] + d[U]`, a linear combination of any matrix from S and any matrix of U, to be able to produce any 3x3 matrices. For a such a matrixspace, it would have a dimension of 9 as it has 9 variable values.

We notice that subspaces can add up to produce constants. Dimension of S is 6 whlie dimension of U is also 6. Both add up to 12. Dimension of S intersect U is 3, while dimension of S+U is 9, both of which adds up to 12 too.

Formally, 

`dim(S) = 6`

`dim(U) = 6`

`dim(S) + dim(U) = 12`

`dim(S n U) = 3`
`dim(S + U) = 9`

`dim(S n U) + dim(S + U) = 9`

Hence,

`dim(S) + dim(U) = dim(S n U) + dim(S + U)` 


![image](https://github.com/yiheinchai/learn/assets/76833604/6eefb629-f2ea-4d94-bb92-3155e53ee393)


### Rank 1 matrices

We notice that rank 1 matrices only have 1 independent row and henceforth 1 independent column. In otherwords, the first row of the rank 1 matrix contains all information about the matrix. Technically speaking, all other rows can be thrown away, because the other rows are simply a multiple of the first row.

Therefore, all the information contained within a rank 1 matrix is = first row + multipliers for each row.

With this insight, we can rewrite rank 1 matrices as a column (containing all the multipiers) multiplied by the first row. Doing the simple matrix multiplication of column x row, will give you back the original rank 1 matrix.

Formally, this is represented as, where A is rank 1 matrix, u is multiplier column, v is the first row vector (vectors are vertical so we use transpose to show that we mean a row):

`A = u v^T`

Moreover, for a rank 4 matrix, it can be represented by 4 x rank 1 matrices. Rank 1 matrices are bulding blocks.

![image](https://github.com/yiheinchai/learn/assets/76833604/771f253a-ef0f-4ae0-a5bc-74cb440db698)

### Rank matrix spaces

Given that M = all 5 x 17 matrices, the question is are all rank 4 matrices in M a new matrix space, a subspace of all 5 x 17 matrix space?

The answer is no. It is likely that when you add two rank 4 matrices together you would get a rank 5 matrix, and therefore, this would result in you to be outside of the subspace. Hence, rank 4 matrices cannot be considered as a subspace matrixspace.



Now consider `M = all 4 x 1 matrices (V)`, that forms a matrixspace. `S = all Vs that equal to 0`.

Can we consider S to be a new matrix subspace?

Yes, because, if we add two matrices in S together, it will still be equal to zero because `0 + 0 = 0`. And if we multiply by a multiple, it still is equal to zero because `3324(0) = 0`. 

So how is this related to the nullspace. We last remembered the nullspace as the solutions to Ax = 0. We can relate this finding to that. We found the solutions for `v = 0`. If `A = [1,1,1,1]`, then `Av = v = 0`. Therefore, v is the nullspace for matrix A.

The basis which spans such a space would have a dimension of 3. This is because [1,1,1,1] is rank 1 matrix, meaning it has 3 free columns, and therefore 3 solutions to `Ax = 0`, hence the basis of the nullspace has dimension of 3. In otherwords, the nullspace of A is 3 dimensions enclosed in a 4 dimensional space (the size of the vector is 4). 

Recall that:

`dim(N(A)) = n - r = 4 - 1 = 3`

Considering the 4 fundemental subspaces, we find that,

Rowspace = 1 dimensional = only 1 row and rank 1

Columnspace = 1 dimensional = rank 1

Nullspace = 3 dimensional = 3 free columns = n - r = 3 vectors in the nullspace

![image](https://github.com/yiheinchai/learn/assets/76833604/6a560c98-7f62-4f7b-82dd-7916a21be712)

Nullspace of A^T = 0 dimensional = only the zero vector = only 1 row hence 1 pivot no free columns = no other combination of the row to give the zero row (because it's all just 1)

Again, to add the dimensions

`dim(R(A)) = 1`

`dim(N(A^T)) = 0`

`dim(R(A)) + dim(N(A^T)) = 1 = # num rows (m)`

`dim(C(A)) = 1`

`dim(N(A)) = 3`

`dim(C(A)) + dim(N(A)) = 4 = # num columns (n)`

#### Rank factorisation

A rank 2 matrix can be factorised as the addition of two rank 1 matrices. To do this, we first conduct, A = LU factorisation, or A = CR factorisation, where R is rref form, and C is the inverse permutation matrix to achieve rref form. Then, we can simply apply the principle of columns x row to achieve rank factorisation.

We first need to update the rref function to record the steps taken to achieve reduced-row echelon form.

In [None]:
def rref(A):
    UE = eliminate_mat(A, gen_identity_mat(A))
    U = UE[:len(A)]

    # TO BE CONTINUED
    # TO BE CONTINUED
    # TO BE CONTINUED
    # TO BE CONTINUED
    # TO BE CONTINUED
    # TO BE CONTINUED
    # TO BE CONTINUED
    # TO BE CONTINUED

    marked_cols = mark_pivot_columns(U)

    U_no_zero = throw_zero_rows(U)

    no_of_zero_rows = len(U) - len(U_no_zero)
    
    pivot_cols, free_cols = seperate_pivot_from_free_columns(U_no_zero, marked_cols)
    pivot_cols, free_cols = mat_invert(pivot_cols), mat_invert(free_cols)
    pivot_cols, free_cols = double_mirror_mat(pivot_cols), double_mirror_mat(free_cols)

    pivot_cols, free_cols = eliminate_mat(pivot_cols, free_cols)
    I, F = clean_coeff_mat(pivot_cols, free_cols)
    I, F = mat_invert(double_mirror_mat(I)), mat_invert(double_mirror_mat(F))

    R = mat_invert([*I, *F])
    R = add_zero_rows(R, no_of_zero_rows)
    
    return R

def rank_factorise():
    pass

## Graphs, Networks, Incidence matrices

### Incidence matrices

![image](https://github.com/yiheinchai/learn/assets/76833604/e1cffae6-5c05-46de-ad8d-2a33e4e807be)

A graph can be represented by a matrix, specifically an incidence matrix, by having it's columns be the nodes and the rows be the edges. Every row will then contain information about the directionality of the edge. For example, if an edge connects node 1 to node 2, then the node 1 value of the edge will be -1 and hte node 2 cell value of the edge will be +1, all other node values will be zero for that row.

Looking by rows, you can see for each edge, what are the 2 nodes that it connects.

Look by columns, you can see for each node how many edges it has an the directionality of the edges.

![image](https://github.com/yiheinchai/learn/assets/76833604/5d30477f-3395-485d-8fda-04f475828819)

Particularly interesting to note is that edges the form a loop will be formed by dependent edge vectors (rows). This is because if there is a loop, it simply means that the destination reached via two of the vectors can be equally reached using a single vector (think Hess's law). Therefore, the third vector is simply a linear combination of the first and second vector and hence, the third vector is dependent on the first and the second vector.

We can also observe that all the independent vectors leads to open ends, no loops, it leads to novel locations and destinations. Recall that the number of independent vectors is the same as the rank of the matrix. In contrast, the dependent vectors are vectors which closes and formed the loop. In otherwords, if we stack many independent vectors, we will result in a tree-structure with no loops.

In general, it can be noticed that every single dependent vector will result in the formation of the loop, and the loop is enclosed by the other vectors that it is a linear combination of. In general, we can see that the number of loops will be equal to the number of dependent vectors, ie. more dependent vectors means more loops. We also recall that we can derive the number of independent vectors by finding the rank of the matrix.

To find the rank of the matrix, we recall that rank = independent vectors = non-looped links. Therefore, the rank of the matrix can be found visually by finding the minimum number of vectors in order to link the nodes without forming a loop. The easiest way to do this is to link them via a straight line. And similar to the lamppost distance problems, the number of edges required to link n nodes is n-1 edges. Therefore, we can conclude that the rank of the matrix is also n-1.

More mathematically, to find the rank, we can simply ground the value of a node to be zero, setting the entire column to be zero, and then we realise the other columns are actually independent columns and we can use it to solve for the other unknowns.




### Potential difference

We notice that we can multiply the incidence matrix with the values of the unknowns, x1, x2, x3, x4, in this case the x unknowns are the potential contained within the nodes.

![image](https://github.com/yiheinchai/learn/assets/76833604/88ca6e3c-f573-4bbc-aca4-63c01c1c8458)

For example, looking by columns, it can be observed that x1 mainly have negative values, electrons flow from negative to positive, so likely that electrons will flow away from x1 node.

This can be represented in a simpler form:

![image](https://github.com/yiheinchai/learn/assets/76833604/f84b4a73-1c96-4480-b8e7-32ab336413c5)

Now, looking by row, we can see that every row shows the potential difference at each edge. Interesting to note that if the potential differnece is non-zero then there would be flow of electrons, else, there would be no flow of electrons.

It would then be interesting so think about what are the different potentials that we can put on the nodes such that all the potential differences are zero. This is essentially the same operation as finding the nullsapce of the matrix A. However, by visual observation, we can note that if all the nodes have the same potential, then all their potential differences must be zero! So we can simplify the basis of the nullspace as `c[1,1,1,1]` where c is the multiplier to set whatever values we want and the ones matrix is to make all the nodes have the same potential.

Here, we can observe that the nullspace lies in R^4 dimension an has a dimension of 1.

![image](https://github.com/yiheinchai/learn/assets/76833604/0e28d078-b92e-45d8-876d-836ea0dde20d)


Overall, we see that Ax gives the potential difference. And we have found the nullspace, for values of x which makes the potential difference zero.

![image](https://github.com/yiheinchai/learn/assets/76833604/1218123d-4568-482a-985d-84c690e7ffcc)


### Current

The next piece of the puzzle is current. Current is defined by the rate of flow of electrons. We have found that the potential difference between two nodes is a factor that contributes to the flow of electrons, generally, the greater the potential difference (steeper the slope), the faster in which electrons flow. However, there is another factor at play: conductivity.

Conductivity is how willing the wire is willing to conduct electrons. If the wire is extremely conductable, then it would amplify the effects of the potential difference, making electrons flow even faster. We can see conductivity as a multiplier for the effects of the potential difference. Conductivity can be defined as the inverse to resistance, following Ohm's law I = V 1/R.

So to get current (I), we simply multiply the potential difference between the two nodes, with the conductivity of each edge.

Now, to put the focus on the edges, and the conductivity instead of the nodes, we simply take A^T. Now, each column of the transposed A represents an edge. Remember each edge has a conductivity value that needs to be multiplied to them. This is represented by y. Each value of y represnts the conductivity of each edge. As there are 5 edges, there is 5 y unknowns.

![image](https://github.com/yiheinchai/learn/assets/76833604/a5f030c4-8ac7-4002-81c8-c9da91c16686)

Similarly, to what we did before, we can find the values of the conductivity which results in no net flow of electrons from the nodes, in other words, find the values of conductivity such that no nodes start accumulating electrons. We needs to find the values of conductivity of edges to ensure that the inflow of electrons and the outflow of electrons of a node is equal.

As we are equating A^T y = 0, this is essentially finding the solutions to the nullspace of A^T.

Multiplying the y unknowns in A^T, 



![image](https://github.com/yiheinchai/learn/assets/76833604/3c69f7bb-9b8d-4513-9535-44bec7bac4ae)

Looking at the row picture, we can see that the net accumulation of electrons of electrons is calculated from the input and output to and from the node. For example, in node 1, there are only output edges, so we expect would be constant loss of electrons in node 1, node 1 will always be in deficit of electrons. 

However, recall that we want to make each node net zero in terms of gain or loss of electrons. This problem can be solved by saying that a specific edge has negative conductivity, meaning that it would only conduct electrons in the opposite directions. This then helps to balance out input and output of electrons to be zero.

Again, we can further simplify the matrix to remove the zeroes.

![image](https://github.com/yiheinchai/learn/assets/76833604/8caed583-6040-40f7-a1e7-5fb6ff9d2c7e)

How many solutions for the conductivity values do we expect to have which allows the circuit to have no accumulation of electrons at the nodes? In otherwords, what is the dimensionality of the nullspace of A^T? 

Recall, that the rank of the matrix is 3 (which also applies to row rank here). We have 5 rows (or columns because we take transpose), so there can only be 2 free rows left. Hence, the dimensionality of the nullspace must be 2, and it should only contain 2 basis vectors.

To find the basis vectors we don't exactly have to do the old method of setting values of the free columns. We can do it more intuitively here. For example, set y1 to 1, y2 must be 1 as y1= y2 based on the equation. Here, we can say that y3, can be -1 to form a complete loop. y4 and y5 and then also be zero.

Revising slightly on linear combinations, we can see that the nullspace basis vector tells us what are the linear combinations of the edges (rows) that gives us zero. And that also tells us which row is the dependent row.

Intuitively, we can see that one of the solutions is the loop in the circuit. This follows from the previous explanation that any edge that forms a loop is from a dependent vector, and each dependent vector is critical to form the loop in the nullspace basis. Therefore, intuitively, we can simply generate the nullspace basis by choosing vectors that will produce basic loops. Note that we only try to create the most inner loops. Outer loops can be generated by adding innerloops together so that will not serve as a good nullspace basis.

![image](https://github.com/yiheinchai/learn/assets/76833604/498acc82-3b19-44f1-ad59-8f47ebeeebc0)

Again, the row vectors that produce independent edge will also give rise to edge that forms a straight line, the minimal edge that are required to connect the nodes together. Whereas the row vectors that produce dependent edge will give rise to vectors that forms loops.

![image](https://github.com/yiheinchai/learn/assets/76833604/805c6233-bdd6-4075-9103-8889fe15300a)

Lastly, and interestingly, we find an interesting result from linking the semantics of graph networks and what we found in matrices.

Previously, we mentioned that the `dim(N(A^T))` is equal to the number of free rows (m - r), which is equal to the number of dependent rows, which is equal to the number of loops. Hence, we can say, 

`dim(N(A^T)) = m - r`

`m = # num of edges`

`r = # num of nodes - 1`

`dim(N(A^T)) = # loops`

Substituting m and r inside,

`dim(N(A^T)) = # num of edges - # num of nodes - 1`

Lastly, substituting dim(N(A^T)),

`# num of loops = # num of edges - # num of nodes - 1`

Rearranging the terms in the format of **Euler's formula**

`# num of nodes - # num of edges + # num of loops = 1`

![image](https://github.com/yiheinchai/learn/assets/76833604/c2bd96c4-22dc-404a-b76f-26ae6f803b0c)

### Kirchoff's Current Law

The Kirchoff's current law is A^T y = f, we would explain this by building it up.

First, we mentioned that we can find the potential difference via Ax. Potential difference is denoted as e. Recall that we had solved for the nodes values that makes the potential difference 0.

Next, we mentioned that we can find the current by multiplying the potential difference with the conductivity of the edges. This is denoted as y = C e, where y is the current, C is the conductivity. Recall that we had solved for the conductivity values that makes no net electron accumulation at nodes.

We can then find the net flow of the circuit (input - output of entire circuit) by multiplying the A^T with the current. This essentially multiplies the directionality of the edges (A^T) with the current at each edge, and this will output what is the net direction of the current.

Remember that we needs to use A^T, to put the edges at the columns so that the current can multiply the edges. If we don't tranpose it, then the current will multiply the nodes instead.

![image](https://github.com/yiheinchai/learn/assets/76833604/5c7c132b-9767-4936-a457-c7acdfe7058d)


This is all expressed as,

`A^T C Ax = f`

where,

`A^T = directoinality of edges`

`C = conductivity of edges`

`Ax = potential difference of edges`

`f = net flow in circuit`

Lastly, and interestingly, we have `A^T A` in the equation, that will produce a symmetric matrix as mentioned in earlier chapters.

## Orthogonal vectors and subspaces

### Orthogonal vectors

Vectors are orthogonal if they are right angles to each other.

To check if two vectors are orthogonal, take the dot product of the two vectors and check if the result is equals to zero. The dot product can be represented as matrix multiplication as row x column, hence to get the row we take the transpose of the vector. In notation, `x^T y = 0`.

![image](https://github.com/yiheinchai/learn/assets/76833604/c7c0c7c0-98ac-417c-9341-512602c3bb4b)

Next, to prove that when the dot product of the two vectors is zero they are indeed orthogonal we can utilise the Pythagoras theorum. Pythagoram theorum states that for a right angled triangle, `x^2 + y^2 = (z)^2`, where x, y and z are the lengths of the triangles. We can apply this theorum to vectors by taking the length of the vectors. If we can prove that the length of two vectors follow this formula, then we can prove that the two vectors + the vector between them, forms a right angled triangle and hence the two vectors are orthogonal.

But first we need to find the length of a vector to be able to utilise Pythagoras theorum. To find the length of a vector, we can imagine that a vector is made up of a it's components. For example, for a vector in 2-d space, it is made up for a x axis component and a y-axis component. Adding both components produces the vector. Now, we can imagine the vector as the hypotenuse, the x component as the base and the y component as the opposite side. So applying Pythagoras theorum to find the length of the vector, `x^2 + y^2 = (length of vector)^2`. Notice that this formula of adding up the squares of it's components is the same as taking the dot product of itself. `[x y]^T [x y] = (length)^2`. Formally, for a vector `a`, this can be expressed as, `a^T a = ||a||^2`.

Now, we can check the pythagoras theorum for orthogonal vectors. 

Given two vectors, `a = [1 2 3] b = [2 -1 0]`, we need to prove that they are orthogonal using pythagoras theorum. The hypotenuse vector is the addition of the two orthogonal vectors, which produces `c = [3 1 3]`. We need to prove that the squared of length of a and squared of length of b is equal to the squared of the length of c. We can express length with `||a|| ||b|| ||c||`.

`||a||^2 = a^T a = [1 4 9] -> 14`

`||b||^2 = b^T b = [4 1 0] -> 5`

`||c||^2 = c^T c = [9 1 9] -> 19`

Indeed, we see that `||a||^2 + ||b||^2 = ||c||^2` hence indeed as 14 + 5 = 19, it does satisfy the equation. Therefore, as the length of the vectors a and b satisfy the Pythagoras theorum, we can conclude that they are orthogonal.

Since c is just the addition of a and b, we can rewrite it as,

`||a||^2 + ||b||^2 = ||a + b||^2`

![image](https://github.com/yiheinchai/learn/assets/76833604/db60e975-1bc0-4b50-99a4-a639ad3cd8da)

![image](https://github.com/yiheinchai/learn/assets/76833604/2f6a41f9-34b9-4f44-a870-f158a2500b1c)

So what is the relationship between the Pytharogras and the dot product, why is both methods able to prove that two vectors are orthogonal? Well the dot product way is a shortcut and it is derived from the Pythagoras theorum method. To show this, we need to rewrite the lengths of the vectors as matrix multiplication via `||a|| = a^T a` form.

`a^T a + b^T b = (a + b)^T (a + b)`

Our end goal is to turn is equation to the form `a^T b = 0`, we can turn it to that form, then we show that the dot product of a and b fulfils the pythagoras method, and henceforth it likewise proves orthogonality.

Shifting all terms of the lefthand side to make the right hand side zero, note that we can minus both sides because they are scalar lengths. (need to take the inverse if they are matrices instead)

`a^T a + b^T b - (a + b)^T (a + b) = 0`

We can distribute the transpose since it's a + b, adding two rows vs adding two columns results in the same, just that the answer is transposed too.

`a^T a + b^T b - (a^T + b^T) (a + b) = 0`

Explanding using the distributive properties of matrix multiplication in `(a^T + b^T) (a + b)`,

`a^T a + b^T b - a^T a - a^T b - b^T a - b^T b = 0`

Simplifying,

`- a^T b - b^T a = 0`

`a^T b + b^T a = 0`

Notice that, `a^T b = b^T a` because the dot product (represented as matrix multiplication) creates the same scalar value. Hence,

`2a^T b = 0`

Dividing both sides by 2,

`a^T b = 0`

Therefore, we can determine that the dot product of vectors a and b = 0 fulfills the pythagoras theorum, which means that the two vectors are orthogonal.

![image](https://github.com/yiheinchai/learn/assets/76833604/957c5e05-6f6f-4fbb-a284-37b4ab35a55e)

> In addition, notice that the zero vector is always orthogonal to any vector because the dot product also produces a zero.

### Orthogonal subspaces

> **_Definition:_** Subspace S is orthogonal to subspace T means that every vector in S is orthogonal to every vector in T.

It was mentioned previously that the rowspace is orthogonal to the nullspace.

![image](https://github.com/yiheinchai/learn/assets/76833604/b27695bd-4f42-43aa-9844-1e297645b254)

In order to prove the rowspace and nullspace fulfills the subspace orthogonal definition, we first look back at the nullspace definition.

Recall that the nullspace is solutions to Ax = 0, where x is a vector in the nullspace.

The linear combinations of the rows of A produce the rowspace.

Notice that x is a column vector. Therefore, we can see the matrix multiplication of Ax as rows of A times the column x. Rows times column is essentially the dot product. And we know that because Ax = 0, every dot product between the row of A and x will produce the zero, hence we can conclude that every row vector is orthogonal to the nullspace vector.

However, this only proves that the row vectors in the matrix and the particular nullspace vector is orthogonal. How do we prove that the entire two subspaces are orthogonal?

We know that the basis vectors in the matrix produce zero, `R1 x = 0` and `R2 x = 0`

Other vectors in the rowspace are simply linear combinations of the basis vectors. 

`(3R1 + 2R2) x`
`= 3R1 x + 2R2 x` here we are applying the distributive property of matrix multiplication

Substituting, `R1 x = 0` and `R2 x = 0`, 

`= 3(0) + 2(0) = 0`

Hence, the dot produce still gives 0 for any linear combination of the basis vectors of the rowspace.

This is the same for any linear combination of the nullspace vector.

`R1 x = 0`

`R1 23874x = 2387 R1 x = 2387(0) = 0`, here we used the ability to move scalar values out as a matrix multiplication property.

Another further definition,

> **_Definition:_** Nullspace and rowspace are orthogonal complements in R^n. The nullspace contains all vectors perpendicular to the rowspace.

This means that for a matrix that exists in R^n, the dimensions of the nullspace and rowspace will add up to n, in other words, not only are the nullspace and rowspace orthogonal, the dimensions which they exist in is split up among them, hence the term complements in R^n.

Another way of understanding is that the addition of the rowspace and the nullspace will span the entire R^n, together they contain enough information where their linear combinations can produce any vector in the R^n.

### Invertibility of symmetric matrices

`A^T A` recall that multiplying the matrix by its transpose will result in a symmetric matrix. But what about it's invertibility?

We notice that multiplying two matrices, that the rank of the answer will never be larger than the rank of the original two matrices.

For `A^T A`, we are multiplying `n x m` with `m x n` to produce a `n x n` matrix. For this resultant matrix to be invertible, it must have rank m. Therefore, if the original matrix has a rank less than n, then the final matrix must to be invertible. In other words, if the original matrix does not have all independent columns, then the final matrix cannot be invertible.

> **_Definition:_** `A^T A` is invertible exactly if A has independent columns.

`A^T A` is important because it helps to find the best approximation for `Ax = b`.

`Ax = b`

`A^T A x' = A^T b`, we expect that by multiplying A^T by both sides the equation will be solvable and the solution would be the best approximation of `Ax = b`.




## Projections onto subspaces

### Approximated solutions to Ax = b

Recall,

![image](https://github.com/yiheinchai/learn/assets/76833604/773238f0-10d6-46e7-82cd-52d20a31ec28)

![image](https://github.com/yiheinchai/learn/assets/76833604/e76e88dd-5827-4268-bf78-eebd9e96491a)



When n < m, we do not have enough columns but we have lots of rows. This means that we have less levers to pull in order to fulfills the lots of contraints. Unless some of the constraints are repeated, there is no way of fulfilling all of them with the limited multipliers for linear combinations that we have. In this case of Ax = b, b is likely not to fall within the columnspace of A, and therefore it is impossible to find a linear combination of the columns of A (via x) to produce b. 

However, a solution to this is to find the best approximate solution. What is the linear combinations of A that gets us closest to b?

We also realise that the point on the columnspace of A that is closest to b is a vector in the columnspace that is orthogonal to b. Imagine a point floating on top of a slanted plane, the fastest way to get onto the plane is the walk via a vector directly perpendicular to the plane. In other words, to find the closest approximate solution of Ax = b, we need to project b onto the columnspace of A.

#### Solving by elimination

To start with a simpler example, think of a 2 dimensional space, with a point b, `[2 2]` and a line. The goal is to project the point onto the line (a) to solve the equation. We want to find the vector on the line which goes to the projected point. First, we can find a vector that is perpendicular to the line. This is simply the nullspace of the line. So we can solve Ax = 0. Given `a = [1 2]` (linear combinations of [1,2] gives the line), the solution is `[-2 1]`. Hence, we know that the projection vector, `e = d[-2 1]` is orthogonal to the line. 

So we know the line (e) that is orthogonal to `a`. However, we need to know what is the vector in the line e that allows us to project b onto `a`. In another sense, we can call this vector the projection vector, `e`, because when you add this vector to point b, the result is the closest point on the line to b. Another way of seeing it, we want to know the vector in the line `a` which allows us to reach the projection, this vector is a linear combination of `a` via the mulplier x. 

We can represent the projection as x[1 2] or d[-2 1] where z and d are multipliers which will allow us to reach the projection. x[1 2] is a vector travelling along line A travelling from the zero vector to the projection, while d[-2 1] is the orthogonal vector travelling from b to the projection. We know that b ([2, 2]) subtract by the projection vector, `e`, d[-2 1] (`p = b - e` or `e = b - p`), will allow us to reach the projection, which is c[1 2]. Therefore, we can write this as an equation.

`[2 2] - d[-2 1] = c[1 2]`

Rewriting by shifting the unknowns to one side. With two equations, we want to solve what are the multipliers c and d to produce b. 

`p + e = b`

`x[1 2] + d[-2 1] = [2 2]`

This can be seen as a simple Ax = b solution question, where we solve for c and d using the two equations.

`[[1 2] [-2 1]] [x d] = [2 2]`

`E (p + e) x = E(b)`

`Ix = (p + e)^-1 b`

where x and d are unknowns.

Solving, we get x = 1.2, d = -0.4.

Hence, these are the multiplier for the two lines to find the projected vector.

![image](https://github.com/yiheinchai/learn/assets/76833604/2c8f7d90-0bbc-4aed-87ee-4ea523fe94c9)

#### Solving by algebra

![image](https://github.com/yiheinchai/learn/assets/76833604/b015978a-2cff-482d-ae68-2e5cbc2680b7)

The projection vector which goes from b to p, can be formed by the linear combination of the projection and b. We can rewrite `e = b - p`. Think Hess's law.



So far, we have shown the relation of the three vectors, in that they form a loop, as shown by e = b - p (e is not independent). Remember where that we do not know what is e and we do not know what is p. So we got two unknowns. Naturally, we need another equation to solve this.

Another equation can be formed, by using the property that p and e are orthogonal. Therefore, the dot product between them must be 0 (proven earlier). We also know that p is simply a linear combination of a, represented by `xa` where x is the multplier on a.

`p e = 0`

`p = x a`

`a^T (b - x a) = 0`

`a^T b - a^T x a = 0`

`a^T b = a^T x a`

We can rearrange x as it is a scalar multiplier,

`a^T b = x a^T a`

And then isolate x by dividing both sides,

`x = a^T b / a^T a`

Therefore, to find hte projection p, where `p = xa` or `p = ax` arrange it any way because x is a scalar multiplier.

`p = a (a^T b / a^T a)`

Therefore to find the projection vector that brings us from b to p, we just isolate out the b,

`P = a a^T / a^T a`

And hence,

`p = P b`

`p = a (a^T / a^T a) b = a (a^T b / a^T a)`


![image](https://github.com/yiheinchai/learn/assets/76833604/cc4fb1c4-ce9e-437d-bcb3-187f0bfc1f5c)

#### Properties of the projection matrix (vector)

The columns space of P is simply the space formed by the linear combination of P. Notice that to produce p, we multiply P b. Hence, we can see b as a multiplier, to get the linear combinations of P to get p. We know that one of those linear combinations passes through p. All further linear combinations forms a line. Therefore, C(A) = line through p.

Furthermore, analysing whether P is symmetric, we notice that the denominator of P, a^T a, is simply a dot product and therefore results in a scalar value. We notice that the numerator, is the multiplying of a matrix and it's transpose and we learnt before that this produces a symmetric matrix. It is also intuitive that multiplying a scalar value to a symmetric matrix will still produce a symmetric matrix. Hence, the projection matrix P, is symmetric.

Moreover, the projection matrix P, projects any point on to the line. If the point already lies on the line, then the result will be the same point. Hence, `P^2 = P`.

This is a rank 1 matrix, because it is simply 1 columns x 1 row, which is the format of a rank 1 matrix, where the column is the basis.

Formally, we can conclude,

> `C(P) = line through a`

> `rank(P) = 1`

> `P^T = P`

> `P^2 = P`



### 3D Projections

![image](https://github.com/yiheinchai/learn/assets/76833604/1c1b54f0-721b-4069-b858-a12af172e34d)

Here, we want to project the line b onto the plane A, which is formed by two basis vectors, a1 and a2.

Following the previous logic, we know that the projection p, is some linear combination of the basis vectors, because the projection lies on the plane.

Therefore, `p = A x^`, where x^ is a matrix which defines the linear combinations of A, to produce the vector p on the plane.

Our goal is to find p. 

Likewise, we know that the projection vector, e = b - p.

We also know that the plane is orthogonal to e.

`A e = 0`

`A^T (b - Ax^) = 0`

`A^T b - A^T A x^ = 0`

`A^T b = A^T A x^`

`x^ = A^T b (A^T A)^-1`

`p = A A^T b (A^T A)^-1`

`P = A A^T (A^T A)^-1`, which rather similar to the `a a^T / a^T a` we got previously, except we use matrices this time.

![image](https://github.com/yiheinchai/learn/assets/76833604/f5651db4-d121-4673-9953-e994d302c88b)