# Solving $ A x = b $ via Blocked LU Factorization

In this notebook, you will implement a blocked LU factorization, solve a system with a unit lower triangular matrix, and solve a system with an upper triangular matrix.  This notebook culminates in a routine that combines these three steps into a routine that solves $ A x = b $ in a computationally efficient way.

<font color=red> Be sure to make a copy!!!! </font>

<h2>Preliminaries</h2>

Here is a list of laff routines that you might want to use in this notebook:
<ul>
<li> <code>laff.trsv!( uplo, trans, diag, A, b )</code> Solves $Ax = b$ where $x$ and $b$ are vectors.
<li> <code>laff.trsm!( uplo, trans, diag, A, B )</code> Solves $AX = B$ where $X$ and $B$ are matrices.
<li> <code>laff.gemm!( alpha, A, B, beta, C )</code> $C := \alpha A B + \beta C$
</ul>

And last but not least, __*copy and paste your method from 6.3 Solving A x b via LU factorization and triangular solves into the box below.*__ We'll be using it during this notebook. Recall that it overwrites $A$ with $L$ in the strictly lower triangular part and $U$ in the upper triangular part.
<font color=red> Make sure you call the routine LU_unb_var5! </font>



In [19]:
include("../flame.jl")
include("../laff/laff.jl")

function LU_unb_var5!(A)

    ATL, ATR, 
    ABL, ABR  = flame.part_2x2(A, 
                               0, 0, "TL")

    while size(ATL, 1) < size(A, 1)

        A00,  a01,     A02,  
        a10t, alpha11, a12t, 
        A20,  a21,     A22   = flame.repart_2x2_to_3x3(ATL, ATR, 
                                                       ABL, ABR, 
                                                       1, 1, "BR")

        #------------------------------------------------------------#

        laff.invscal!( alpha11, a21 )        #  a21 := a21 / alpha11
        laff.ger!( -1.0, a21, a12t, A22 )    #  A22 := A22 - a21 * a12t

        #------------------------------------------------------------#

        ATL, ATR, 
        ABL, ABR  = flame.cont_with_3x3_to_2x2(A00,  a01,     A02,  
                                               a10t, alpha11, a12t, 
                                               A20,  a21,     A22,  
                                               "TL")

    end

    flame.merge_2x2!(ATL, ATR, 
                     ABL, ABR, A)

end



LU_unb_var5! (generic function with 1 method)

<h2> Now, let's create a matrix $ A $ and right-hand side $ b $</h2>

In [20]:
using LinearAlgebra

# Create matrix A from lower and upper triangular matrices L and U
m = 200

# Are these diagonals important?
L = LowerTriangular(rand(m, m))
U = UpperTriangular(rand(m, m))
# Populate the diagonals of `L` and `U` with 1.0:
for i in 1:m; L[i, i] = 1.0; U[i, i] = 1.0; end

A = L * U

# Create a large, random solution vector x
x = rand(m)

#Store the original value of x
xold = copy(x)

# Create a solution vector b so that A x = b
b = A * x

# Later, we are also going to solve A x = b2.  Here we create that b2:
x2 = rand(m)
b2 = A * x2;

<h2> Implement the blocked LU factorization routine from 6.4.1 </h2>

Here is the algorithm:

<img src="https://studio.edx.org/c4x/UTAustinX/UT.5.01x/asset/LU_blk_var5.png" alt="Blocked LU factorization algorithm" width=50%>
    
<font color=red> Important: if you make a mistake, rerun ALL cells above the cell in which you were working before you rerun the one in which you are working. </font>

Create the routine
<code> LU_blk_var5!( A ) </code>
with the <a href="https://studio.edx.org/c4x/UTAustinX/UT.5.01x/asset/index.html">Spark webpage</a> for the algorithm given above.

In [21]:
include("../flame.jl")
include("../laff/laff.jl")

function LU_blk_var5!(A, nb_alg)

    ATL, ATR, 
    ABL, ABR  = flame.part_2x2(A, 
                               0, 0, "TL")

    while size(ATL, 1) < size(A, 1)

        block_size = min(size(ABR, 1), nb_alg)

        A00, A01, A02, 
        A10, A11, A12, 
        A20, A21, A22  = flame.repart_2x2_to_3x3(ATL, ATR, 
                                                 ABL, ABR, 
                                                 block_size, block_size, "BR")

        #------------------------------------------------------------#

        LU_unb_var5!( A11 )
        laff.trsm!( "Lower triangular", "No transpose", "Unit diagonal", A11, A12 )
        laff.trsm!( "Upper triangular", "Transpose", "Nonunit diagonal", A11, A21 )
        laff.gemm!( -1.0, A21, A12, 1.0, A22 )

        #------------------------------------------------------------#

        ATL, ATR, 
        ABL, ABR  = flame.cont_with_3x3_to_2x2(A00, A01, A02, 
                                               A10, A11, A12, 
                                               A20, A21, A22, 
                                               "TL")

    end

    flame.merge_2x2!(ATL, ATR, 
                     ABL, ABR, A)

end



LU_blk_var5! (generic function with 1 method)

<h3> Test the routine </h3>

Note that the code you generated using Spark has two input parameters, <code>A</code> and <code>nb_alg</code>. This <code>nb_alg</code> is the block size that you want to use to do your blocked LU decomposition, we'll set it arbitrarily to 20 for now and store it in the variable <code>nb</code>.

<br>

<font color=red> Important: if you make a mistake, rerun ALL cells above the cell in which you were working before you rerun the one in which you are working. </font>

In [22]:
 # Since we're just messing around with blocked algorithms,
# we set the block size totally arbitrarily
nb = 20

# recreate matrix A
A = L * U

# recreate the right-hand side
b = A * xold

# apply blocked LU to matrix A
# remember nb holds our block size
LU_blk_var5!( A, nb )

AssertionError: AssertionError: laff.trsm!: size mismatch between B and A: (180, 20), (20, 20)

Compare the overwritten matrix, $ A $, to the original matrices, $ L $ and $ U $.  The upper triangular part of $ A $ should equal $ U $ and the strictly lower triangular part of $ A $ should equal the strictly lower triangular part of $ L $. If this is the case, the maximum value in the matrix $A - L - U$ should be close to zero.

<font color=red> Important: if you make a mistake, rerun ALL cells above the cell in which you were working before you rerun the one in which you are working. </font>


In [None]:
# Compare A to the original L and U matrices
print( 'Maximum value of (A - L - U) after factorization' )
print( np.max( np.abs( A - np.tril(L,-1) - U ) ) ) #The "-1" ignores the diagonal

<h2> Implement the routine Solve from 6.3.4 using the blocked LU instead of the regular LU </h2>

(if you have not yet visited Unit 6.3.4, do so now!)

This time, we do NOT use Spark!  What we need to do is write a routine that, when given a matrix $ A $ and right-hand side vector $ b $, solves $ A x = b $, overwriting $ A $ with the LU factorization and overwriting $ b $ with the solution vector $ x $:

<ul>
<li>
$ A \rightarrow L U $, overwriting $ A $ with $ L $ and $ U $. Use the Blocked version.
</li>
<li>
Solve $ L z = b $, overwriting $ b $ with $ z $.
</li>
<li>
Solve $ U x = z $, where $ z $ is stored in vector $ b $ and $ x $ overwrites $ b $.
</li>
</ul>

<font color=red> Important: if you make a mistake, rerun ALL cells above the cell in which you were working before you rerun the one in which you are working. </font>

Create the routine
<code> Solve( A, b ) </code>

In [None]:
def Solve( A, b ):
    
    # insert appropriate calls to routines you have written here!
    # remember the variable nb holds our block size
    LU_blk_var5( A, 1 )
    laff.trsv( 'Lower triangular', 'No transpose', 'Unit diagonal', A, b )
    laff.trsv( 'Upper triangular', 'No transpose', 'Nonunit diagonal', A, b )
    

<h3> Test Solve </h3>

<font color=red> Important: if you make a mistake, rerun ALL cells above the cell in which you were working before you rerun the one in which you are working. </font>

In [None]:
# just to be sure, let's start over.  We'll recreate A, x, and b, run all the routines, and
# then compare the updated b to the original vector x.

A = L * U
b = A * x

Solve( A, b )

print( '2-Norm of Updated b - original x' )
print( np.linalg.norm(b - x) )


In theory, <code> b - x </code> should yield a zero vector whose two-norm, $||b -x||_2$, is close to zero...

<h3> What if a new right-hand side comes along? </h3>

What if we are presented with a new right-hand side, call it $ b_2 $, with which we want to solve $ A x = b_2 $, overwriting $ b_2 $ with the solution?  (We created such a $ b_2 $ at the top of this notebook.)

<font color=red> Important: if you make a mistake, rerun ALL cells above the cell in which you were working before you rerun the one in which you are working. </font>

Notice that you can take the matrix $A $ that was modified by <code>Solve</code> and use it with the appropriate calls to <code>laff.trsv</code>:

In [None]:
# insert appropriate calls here.

laff.trsv( 'Lower triangular', 'No transpose', 'Unit diagonal', A, b2 )
laff.trsv( 'Upper triangular', 'No transpose', 'Nonunit diagonal', A, b2 )

print( '2-Norm of updated b2 - original x2' )
print( np.linalg.norm(b2 - x2) )

$||x_2 - b_2||_2$ should be close to zero...


<h2> <font color=red> Important: you should not refactor $ A $!!!! <font> </h2>