<h2>Lab 2: LU Factorization</h2>
<b>Demo Date: </b> Sept. 22 <br>
<b>Due Date: </b> Sept. 25

In this lab you will implement two versions of the LU Factorization algorithm: the one presented in the pseudocode of the textbook and another that uses Numpy operations with matrices. We will then compare the performance of the two implementations on artificial problems. Here we will assume that the linear system has a single solution and that pivoting isn't needed (we will study pivoting in our Tuesday lecture).

In class we discussed how the matrix $A$ of a linear system $Ax = b$ can be decomposed into a lower triangular matrix $L$ and an upper triangular matrix $U$. i.e., $A = LU$. The decomposition allows us to write the original system as $LUx = b$. Then, we make $y = Ux$ and solve the system $Ly = b$ with an algorithm called forward-substitution. The solution $y$ is then be used to discover the solution to the original problem, by making $Ux = y$ and solving this system with the back-substitution algorithm. 

In class we studied the back-substitution algorithm, which is very similar to the forward-substitution algorithm. Back-substitution solves systems whose matrix A is an upper triangular matrix, while forward-substitution solves systems whose matrix A is a lower triangular matrix. 

Before moving forward, please take a look at the pseudocode of the forward and back-substitution algorithms in the textbook (see Algorithm 2.1 on page 64 and Algorithm 2.2 on page 65). If you understand the forward and back-substitution algorithms, then please go ahead and study the pseudocode of the LU-factorization (see Algorithm 2.3 on page 68 of the textbook). 

Let's now implement these three algorithms to solve the system used as example in class. 

\begin{align*}
Ax = \begin{bmatrix}
1 & 2 & 2 \\
4 & 4 & 2 \\
4 & 6 & 4 \\
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2 \\
x_3 \\
\end{bmatrix} = 
\begin{bmatrix}
3 \\
6 \\
10 \\
\end{bmatrix} = b
\end{align*}

In [None]:
import numpy as np
import time
import copy
import scipy.linalg

A = np.array([[1, 2, 2], [4, 4, 2], [4, 6, 4]])
b = np.array([3, 6, 10]).reshape(3, 1)

Finish the implementation of the algorithms below. The implementation of these algorithms should follow the pseudocode of the textbook. 

The output should be $x = [-1, 3, -1]^T$

In [None]:
def forward_substituion(L, b):
  # change the data type of b
  n = len(A)
  b = b + 0.0
  x = np.zeros(n)
  for j in range(0, n):
    if L[j][j] == 0:
      break # singular matrix
    x[j] = b[j] / L[j][j]

    for i in range(j, n):
      b[i] = b[i] - L[i][j] * x[j]
  return x
    

def back_substituion(U, b):
  # change the data type of b
  n = len(A)
  b = b + 0.0
  x = np.zeros(n)
  for j in range(n - 1, -1, -1):
    if U[j][j] == 0:
      break # singular matrix
    x[j] = b[j] / U[j][j]

    for i in range(0, j):
      b[i] = b[i] - U[i][j] * x[j]
  return x
    

def lu_factor_v1(A):
  # change the data type of A
  A = A + 0.0
  # create matrix M
  n = len(A)
  M = np.zeros(shape=(A.shape[0], A.shape[1]))

  # iterate from 0 to n - 2
  for k in range(0, n - 1):
    if A[k][k] == 0:
      break
    for i in range(k + 1, n):
      M[i][k] = A[i][k] / A[k][k]
    for j in range(k + 1, n):
      for i in range(k + 1, n):
        A[i][j] = A[i][j] - M[i][k] * A[k][j]

  # Set all values which are on M's diagonal equal to 1
  for index_i in range(A.shape[0]):
    for index_j in range(A.shape[1]):
      if index_i == index_j:
        M[index_i][index_j] = 1
  
  # set the values of lower triangular equal to 0
  for i in range(A.shape[0]):
    for j in range(i):
      A[i][j] = 0

  return M, A


n = len(b)
A1 = copy.deepcopy(A)
b1 = copy.deepcopy(b)

L, U = lu_factor_v1(A1)
#print(L)
#print(U)
print(b1)
y = forward_substituion(L, b1)
#print(y) 
x = back_substituion(U, y)     

print('x: ', x)

[[ 3]
 [ 6]
 [10]]
x:  [-1.  3. -1.]


Next, we will write a vectorized implementation of the LU factorization. For that you will modify your previous implementation. The only for-loop you will keep in the vectorized implementation is the outer loop of the non-vectorized implementation, the one that iterates over the $k-1$ columns of $A$. You should rely on numpy functions to rewrite the code.

In [None]:
def lu_factor_v2(A):
  '''
  n = len(A)
  A = A + 0.0
  M = np.eye(n, k = 0)
  for k in range(0, n - 1):
    if A[k][k] == 0:
      break
    M[k+1:n, k] = A[k+1:n, k] / A[k][k]
    # set A[k+1:n, k] = 0
    A[k+1:n, k] = 0
    # reshape M and A into Matrix
    M_reshape = M[k+1:n, k].reshape(n - 1 - k, 1)
    A_reshape = A[k, k+1:n].reshape(1, n - 1 - k)
    # update A[k+1:n, k+1:n]
    A[k+1:n, k+1:n] = A[k+1:n, k+1:n] - np.dot(M_reshape, A_reshape)
  return M, A
  '''
  
  # Another way to do this part
  # change the datatype of A
  A = A + 0.0
  n = len(A)
  L_minus = 1
  U = copy.deepcopy(A)
  L = np.eye(n, k = 0)
  # iterate from the first column to the column before the last column
  for k in range(0, n - 1):
    if A[k][k] == 0:
      break
    # create a len(A) * len(A) identity matrix
    # I = np.eye(len(A), k = 0, dtype = float)
    I = np.eye(n, k = 0)
    # create an e_k which is the transpose of the kth column of the identity matrix
    e_k = np.zeros(shape = (1, n))
    e_k[0][k] = 1
    # create a m which is the kth column of the matrix U
    m = copy.deepcopy(U[:, k])
    m = m.reshape(n, 1)
    # m_i = a_i / a_k, i = k + 1, ...
    m = np.divide(m, U[k][k])
    # set the values before m_k+1 equal to 0
    m[0:(k + 1), 0] = 0
    # M = I - m * e_k
    M = I - np.dot(m, e_k)
    # L = I + m * e_k
    l = I + np.dot(m, e_k)
    # L_-1 
    
    L_minus = np.dot(M, L_minus)
    #print(L_minus)
    # M_n * ... * M_2 * M_1 * A = U
    U = np.dot(M, U)
    # L_1 * L_2 * ... * L_n = L
    L = np.dot(L, l)

  return L_minus, L, U

  


L_minus, L, U = lu_factor_v2(copy.deepcopy(A))
print(L_minus)
print(L)
print(U)
y = forward_substituion(L, copy.deepcopy(b))
#print(y)   
x = back_substituion(U, y)
print('x: ', x)

[[ 1.   0.   0. ]
 [-4.   1.   0. ]
 [-2.  -0.5  1. ]]
[[1.  0.  0. ]
 [4.  1.  0. ]
 [4.  0.5 1. ]]
[[ 1.  2.  2.]
 [ 0. -4. -6.]
 [ 0.  0. -1.]]
x:  [-1.  3. -1.]


In the following snippet we will compare the running time of the vectorized and non-vectorized implementation by performing the LU-factorization on larger $200 \times 200$ matrices. 

In [None]:
running_time_vectorized = []
running_time_non_vectorized = []

for _ in range(10):
    test_A = np.tril(np.random.rand(200, 200))
    
    A = copy.deepcopy(test_A)
    start = time.time()
    L, U = lu_factor_v1(A)
    end = time.time()
    running_time_non_vectorized.append(end - start)
    
    A = copy.deepcopy(test_A)
    start = time.time()
    L, U = lu_factor_v2(A)
    end = time.time()
    running_time_vectorized.append(end - start)

print('Non-Vectorized: %.4f seconds' % np.average(running_time_non_vectorized))
print('Vectorized: %.4f seconds' % np.average(running_time_vectorized))

Non-Vectorized: 3.2248 seconds
Vectorized: 0.0123 seconds
