# Lower Triangular Matrix Vector Multiply Routines

This notebook walks you through how to implement $ y := L x + y $ where $ L $ is lower triangular.

## Getting started

We will use some functions that are part of our laff library (of which this function will become a part) as well as some routines from the FLAME API (Application Programming Interface) that allows us to write code that closely resembles how we typeset algorithms using the FLAME notation.  These functions are imported with `include` and `using` statements.

## The `Tmvmult_ln_unb_var1!( L, x, y )` routine

This routine, given lower triangular $ L \in \mathbb{R}^{n \times n} $, $ x \in \mathbb{R}^n $, and $ y \in \mathbb{R}^n $, computes $ y := L x + y $.  The "_ln_" in the name of the routine indicates this is the "lower triangular, no transpose" matrix-vector multiplication.  

The specific laff functions we will use are 
<ul>
<li> <code> laff.dots!( x, y, alpha ) </code> which computes $ \alpha := x^T y + \alpha $.  </li>
</ul>

Use the <a href="https://studio.edx.org/c4x/UTAustinX/UT.5.01x/asset/index.html"> Spark webpage</a> to generate a code skeleton.  (Make sure you adjust the name of the routine.)

In [7]:
include("../flame.jl")
using .flame
include("../laff/laff.jl")
using .laff

function Tmvmult_ln_unb_var1!(L, x, y)

    LTL, LTR, 
    LBL, LBR  = flame.part_2x2(L, 
                               0, 0, "TL")

    xT, 
    xB  = flame.part_2x1(x, 
                         0, "TOP")

    yT, 
    yB  = flame.part_2x1(y, 
                         0, "TOP")

    while size(LTL, 1) < size(L, 1)

        L00,  l01,      L02,  
        l10t, lambda11, l12t, 
        L20,  l21,      L22   = flame.repart_2x2_to_3x3(LTL, LTR, 
                                                        LBL, LBR, 
                                                        1, 1, "BR")

        x0,   
        chi1, 
        x2    = flame.repart_2x1_to_3x1(xT, 
                                        xB, 
                                        1, "BOTTOM")

        y0,   
        psi1, 
        y2    = flame.repart_2x1_to_3x1(yT, 
                                        yB, 
                                        1, "BOTTOM")

        #------------------------------------------------------------#

        laff.dots!( l10t, x0, psi1 )
        laff.dots!( lambda11, chi1, psi1 )

        #------------------------------------------------------------#

        LTL, LTR, 
        LBL, LBR  = flame.cont_with_3x3_to_2x2(L00,  l01,      L02,  
                                               l10t, lambda11, l12t, 
                                               L20,  l21,      L22,  
                                               "TL")

        xT, 
        xB  = flame.cont_with_3x1_to_2x1(x0,   
                                         chi1, 
                                         x2,   
                                         "TOP")

        yT, 
        yB  = flame.cont_with_3x1_to_2x1(y0,   
                                         psi1, 
                                         y2,   
                                         "TOP")

    end
    flame.merge_2x1!(yT, 
                    yB, y)
end




Tmvmult_ln_unb_var1! (generic function with 1 method)

## Testing

Let's quickly test the routine by creating a 4 x 4 matrix and related vectors, performing the computation.

In [8]:
L = rand(4, 4)
x = rand(4)
y = rand(4)
yold = rand(4)

# Notice that L is not lower triangular.  We will only use the lower triangular part.

println( "L before =" )
L

L before =


4×4 Array{Float64,2}:
 0.64307    0.0202959  0.09173   0.792   
 0.937238   0.982946   0.828569  0.262435
 0.304826   0.622455   0.986724  0.732755
 0.0743717  0.604782   0.758972  0.318948

In [9]:
println( "x before =" )
x

x before =


4-element Array{Float64,1}:
 0.5537696552408193
 0.5783011266839857
 0.3396707667901766
 0.4767183319696071

In [10]:
println( "y before =" )
y

y before =


4-element Array{Float64,1}:
 0.2429989489402955 
 0.6513041638725159 
 0.6126286624214028 
 0.20988220354656728

In [11]:
laff.copy!( y, yold )   # save the original vector y

Tmvmult_ln_unb_var1!( L, x, y )

println( "y after =" )
y

y after =


4-element Array{Float64,1}:
 0.5991118585293383
 1.738756964951383 
 1.4765600041535165
 1.0106622002214223

In [13]:
using LinearAlgebra
println( "y - ( LowerTriangular( L ) * x + yold ) = " ) #LowerTriangular extracts the matrix lower triangular
y - ( LowerTriangular( L ) * x + yold ) 

y - ( LowerTriangular( L ) * x + yold ) = 


4-element Array{Float64,1}:
 0.0                  
 0.0                  
 0.0                  
 2.220446049250313e-16

Bingo, it seems to work!  (Notice that we are doing floating point computations, which means that due to rounding you may not get an exact "0".)

## Watch your code in action!

Copy and paste the code into <a href="http://edx-org-utaustinx.s3.amazonaws.com/UT501x/PictureFlame/PictureFLAME.html"> PictureFLAME </a>, a webpage where you can watch your routine in action.  Just cut and paste into the box.  

Disclaimer: we implemented a VERY simple interpreter.  If you do something wrong, we cannot guarantee the results.  But if you do it right, you are in for a treat.

If you want to reset the problem, just click in the box into which you pasted the code and hit "next" again.

## The `Tmvmult_ln_unb_var2!( L, x, y )` routine

This routine, given lower triangular $ L \in \mathbb{R}^{n \times n} $, $ x \in \mathbb{R}^n $, and $ y \in \mathbb{R}^n $, computes $ y := L x + y $.  The "_ln_" in the name of the routine indicates this is the "lower triangular, no transpose" matrix-vector multiplication.  

The specific laff functions we will use are 
<ul>
<li> <code> laff.axpy!( alpha, x, y ) </code> which computes $ y := \alpha x +  y  $.  </li>
</ul>

Use the <a href="https://studio.edx.org/c4x/UTAustinX/UT.5.01x/asset/index.html"> Spark webpage</a> to generate a code skeleton.  (Make sure you adjust the name of the routine.)

In [19]:
include("../flame.jl")
using .flame
include("../laff/laff.jl")
using .laff

function Tmvmult_ln_unb_var2!(L, x, y)

    LTL, LTR,
    LBL, LBR  = flame.part_2x2(L,
                               0, 0, "TL")

    xT,
    xB  = flame.part_2x1(x,
                         0, "TOP")

    yT, 
    yB  = flame.part_2x1(y, 
                         0, "TOP")

    while size(LTL, 1) < size(L, 1)

        L00,  l01,      L02,  
        l10t, lambda11, l12t, 
        L20,  l21,      L22   = flame.repart_2x2_to_3x3(LTL, LTR, 
                                                        LBL, LBR, 
                                                        1, 1, "BR")

        x0,   
        chi1, 
        x2    = flame.repart_2x1_to_3x1(xT, 
                                        xB, 
                                        1, "BOTTOM")

        y0, 
        psi1,
        y2    = flame.repart_2x1_to_3x1(yT,
                                        yB, 
                                        1, "BOTTOM")

        #------------------------------------------------------------#

        laff.axpy!( chi1, lambda11, psi1 )
        laff.axpy!( chi1, l21, y2 )

        #------------------------------------------------------------#

        LTL, LTR, 
        LBL, LBR  = flame.cont_with_3x3_to_2x2(L00,  l01,      L02,  
                                               l10t, lambda11, l12t, 
                                               L20,  l21,      L22,  
                                               "TL")

        xT, 
        xB  = flame.cont_with_3x1_to_2x1(x0,   
                                         chi1, 
                                         x2,   
                                         "TOP")

        yT, 
        yB  = flame.cont_with_3x1_to_2x1(y0,   
                                         psi1, 
                                         y2,   
                                         "TOP")

    end
    flame.merge_2x1!(yT,
                     yB, y)
end




Tmvmult_ln_unb_var2! (generic function with 1 method)

## Testing

Let's quickly test the routine by creating a 4 x 4 matrix and related vectors, performing the computation.

In [20]:
L = rand(4, 4)
x = rand(4)
y = rand(4)
yold = rand(4)

# Notice that L is not lower triangular.  We will only use the lower triangular part.

println( "L before =" )
L

L before =


4×4 Array{Float64,2}:
 0.149784  0.601658   0.27546   0.0341998
 0.506955  0.0410626  0.631249  0.738038 
 0.683994  0.270742   0.545561  0.914777 
 0.37186   0.627505   0.397284  0.195869 

In [21]:
println( "x before =" )
x

x before =


4-element Array{Float64,1}:
 0.8790709040739446
 0.730044289502376 
 0.9727124823392397
 0.9886277129468488

In [22]:
println( "y before =" )
y

y before =


4-element Array{Float64,1}:
 0.7278705361427134 
 0.5195457574718487 
 0.7718167252116068 
 0.24255603757329025

In [23]:
laff.copy!( y, yold )   # save the original vector y

Tmvmult_ln_unb_var2!( L, x, y )

println( "y after =" )
y

y after =


4-element Array{Float64,1}:
 0.8595412431601117
 0.9951728079045491
 2.101423692692483 
 1.6076375804526157

In [24]:
using LinearAlgebra
println( "y - ( LowerTriangular( L ) * x + yold ) = " ) #LowerTriangular extracts the matrix lower triangular
y - ( LowerTriangular( L ) * x + yold ) 

y - ( LowerTriangular( L ) * x + yold ) = 


4-element Array{Float64,1}:
  0.0                   
  1.1102230246251565e-16
 -4.440892098500626e-16 
  2.220446049250313e-16 

Bingo, it seems to work!  (Notice that we are doing floating point computations, which means that due to rounding you may not get an exact "0".)

## Watch your code in action!

Copy and paste the code into <a href="http://edx-org-utaustinx.s3.amazonaws.com/UT501x/PictureFlame/PictureFLAME.html"> PictureFLAME </a>, a webpage where you can watch your routine in action.  Just cut and paste into the box.  

Disclaimer: we implemented a VERY simple interpreter.  If you do something wrong, we cannot guarantee the results.  But if you do it right, you are in for a treat.

If you want to reset the problem, just click in the box into which you pasted the code and hit "next" again.