# Matrix-vector multiplication multiply with transpose matrix

This notebook walks you through how to implement $ y := A^T x + y $ <i> without explicitly transposing the matrix</i>.

## Getting started

We will use some functions that are part of our laff library (of which this function will become a part) as well as some routines from the FLAME API (Application Programming Interface) that allows us to write code that closely resembles how we typeset algorithms using the FLAME notation.  These functions are imported with `include` and `using` statements.

## Algorithm (via dot products)

<img src="https://studio.edx.org/c4x/UTAustinX/UT.5.01x/asset/431Mvmult_n_unb_var1.png" alt="Matrix-vector multiplication via dot products algorithm" width="48%"><image src="https://studio.edx.org/c4x/UTAustinX/UT.5.01x/asset/431Mvmult_t_unb_var1.png" alt="Matrix-vector multiplication via dot products algorithm" width="48%">


Above, you find two algorithms.  The one on the left computes $ y := A x + y $ via dot products and the one on the right computes $ y := A^T x + y $ via dot products.  You are to implement the one on the right.

## The `Mvmult_t_unb_var1!( A, x, y )` routine

This routine, given $ A \in \mathbb{R}^{m \times m} $, $ x \in \mathbb{R}^m $, and $ y \in \mathbb{R}^m $, computes $ y := A^T x + y $.  The "_t_" in the name of the routine indicates this is the "transpose" matrix-vector multiplication.  

The specific laff functions we will use are 
<ul>
<li> <code> laff.dots!( x, y, alpha ) </code> which computes $ \alpha := x^T y + \alpha $.  </li>
</ul>

Use the <a href="https://studio.edx.org/c4x/UTAustinX/UT.5.01x/asset/index.html"> Spark webpage</a> to generate a code skeleton.  (Make sure you adjust the name of the routine.)

In [8]:
include("../flame.jl")
using .flame
include("../laff/laff.jl")
using .laff

function Mvmult_t_unb_var1!(A, x, y)

    AL, AR = flame.part_1x2(A,
                            0, "LEFT")

    yT,
    yB  = flame.part_2x1(y,
                         0, "TOP")

    while size(AL, 2) < size(A, 2)

        A0, a1, A2 = flame.repart_1x2_to_1x3(AL, AR,
                                             1, "RIGHT")

        y0,
        psi1,
        y2    = flame.repart_2x1_to_3x1(yT,
                                        yB,
                                        1, "BOTTOM")

        #------------------------------------------------------------#

        laff.dots!( a1, x, psi1 )

        #------------------------------------------------------------#

        AL, AR = flame.cont_with_1x3_to_1x2(A0, a1, A2,
                                            "LEFT")

        yT,
        yB  = flame.cont_with_3x1_to_2x1(y0,
                                         psi1,
                                         y2,
                                         "TOP")
    end
    
    flame.merge_2x1!(yT,
                    yB, y)

end







Mvmult_t_unb_var1! (generic function with 1 method)

## Testing

Let's quickly test the routine by creating a 3 x 4 matrix and related vectors, performing the computation.

In [14]:
A = rand(3, 4)
x = rand(3)
y = rand(4)
yold = rand(4)

println( "A before =" )
A

A before =


3×4 Array{Float64,2}:
 0.428349  0.576898  0.241769  0.812738
 0.279553  0.526557  0.517578  0.203456
 0.379209  0.777828  0.388709  0.671916

In [15]:
println( "x before =" )
x

x before =


3-element Array{Float64,1}:
 0.6166481016643255
 0.6178046752381527
 0.9599587629066502

In [16]:
println( "y before =")
y

y before =


4-element Array{Float64,1}:
 0.568637154684936  
 0.6134924923630478 
 0.07494891370843648
 0.9348374807900308 

In [20]:
laff.copy!(y, yold)
Mvmult_t_unb_var1!( A, x, y )

println( "y after =" )
y

y after =


4-element Array{Float64,1}:
 2.9712617490405413
 4.896699376293401 
 2.6009273048419477
 4.750478457076714 

In [21]:
println( "y - ( transpose( A ) * x + yold ) = " )
y - ( transpose( A ) * x + yold )

y - ( transpose( A ) * x + yold ) = 


4-element Array{Float64,1}:
 -4.440892098500626e-16
  0.0                  
  0.0                  
  0.0                  

Bingo, it seems to work!  (Notice that we are doing floating point computations, which means that due to rounding you may not get an exact "0".)

## Watch your code in action!

Copy and paste the code into <a href="http://edx-org-utaustinx.s3.amazonaws.com/UT501x/PictureFlame/PictureFLAME.html"> PictureFLAME </a>, a webpage where you can watch your routine in action.  Just cut and paste into the box.  

Disclaimer: we implemented a VERY simple interpreter.  If you do something wrong, we cannot guarantee the results.  But if you do it right, you are in for a treat.

If you want to reset the problem, just click in the box into which you pasted the code and hit "next" again.

## Algorithm (via <code>axpy</code>s)

<img src="https://studio.edx.org/c4x/UTAustinX/UT.5.01x/asset/431Mvmult_n_unb_var2.png" alt="Matrix-vector multiplication via axpys algorithm" width="55%"><image src="https://studio.edx.org/c4x/UTAustinX/UT.5.01x/asset/431Mvmult_t_unb_var2.png" alt="Matrix-vector multiplication via dot axpys algorithm" width="45%">

Above, you find two algorithms.  The one on the left computes $ y := A x + y $ via <code> axpy</code>s and the one on the right computes $ y := A^T x + y $ via <code> axpy</code>s.  You are to implement the one on the right.

## The `Mvmult_t_unb_var2!( A, x, y )` routine

This routine, given $ A \in \mathbb{R}^{m \times m} $, $ x \in \mathbb{R}^m $, and $ y \in \mathbb{R}^m $, computes $ y := A^T x + y $.  The "_t_" in the name of the routine indicates this is the "transpose" matrix-vector multiplication.  

The specific laff functions we will use are 
<ul>
<li> <code> laff.axpy!( alpha, x, y ) </code> which computes $ y := \alpha x + y $.  </li>
</ul>

Use the <a href="https://studio.edx.org/c4x/UTAustinX/UT.5.01x/asset/index.html"> Spark webpage</a> to generate a code skeleton.  (Make sure you adjust the name of the routine.)

In [26]:
include("../flame.jl")
using .flame
include("../laff/laff.jl")
using .laff

function Mvmult_t_unb_var2!(A, x, y)

    AT,
    AB  = flame.part_2x1(A,
                         0, "TOP")

    xT,
    xB  = flame.part_2x1(x,
                         0, "TOP")

    while size(AT, 1) < size(A, 1)

        A0,  
        a1t, 
        A2   = flame.repart_2x1_to_3x1(AT, 
                                       AB,
                                       1, "BOTTOM")

        x0,   
        chi1, 
        x2    = flame.repart_2x1_to_3x1(xT, 
                                        xB,
                                        1, "BOTTOM")

        #------------------------------------------------------------#

        laff.axpy!( chi1, a1t, y )

        #------------------------------------------------------------#

        AT, 
        AB  = flame.cont_with_3x1_to_2x1(A0,  
                                         a1t, 
                                         A2,  
                                         "TOP")

        xT, 
        xB  = flame.cont_with_3x1_to_2x1(x0,   
                                         chi1, 
                                         x2,   
                                         "TOP")
    end
end





Mvmult_t_unb_var2! (generic function with 1 method)

## Testing

Let's quickly test the routine by creating a 3 x 4 matrix and related vectors, performing the computation.

In [27]:
A = rand(3, 4)
x = rand(3)
y = rand(4)
yold = rand(4)

println( "A before =" )
A

A before =


3×4 Array{Float64,2}:
 0.315182  0.315952  0.726766  0.00640795
 0.300639  0.704135  0.323766  0.621724  
 0.300431  0.367013  0.734243  0.832233  

In [28]:
println( "x before =" )
x

x before =


3-element Array{Float64,1}:
 0.016019278572811047
 0.39423730740394225 
 0.47264136277092295 

In [29]:
println( "y before =")
y

y before =


4-element Array{Float64,1}:
 0.5242355247506634 
 0.38937519478006055
 0.18171669893496922
 0.30401077019270417

In [30]:
laff.copy!(y, yold)
Mvmult_t_unb_var2!( A, x, y )

println( "y after =" )
y

y after =


4-element Array{Float64,1}:
 0.7898036860730834
 0.8454982748056697
 0.6680334561870325
 0.9425678710610826

In [31]:
println( "y - ( transpose( A ) * x + yold ) = " )
y - ( transpose( A ) * x + yold )

y - ( transpose( A ) * x + yold ) = 


4-element Array{Float64,1}:
 -1.1102230246251565e-16
  0.0                   
  0.0                   
  0.0                   

Bingo, it seems to work!  (Notice that we are doing floating point computations, which means that due to rounding you may not get an exact "0".)

## Watch your code in action!

Copy and paste the code into <a href="http://edx-org-utaustinx.s3.amazonaws.com/UT501x/PictureFlame/PictureFLAME.html"> PictureFLAME </a>, a webpage where you can watch your routine in action.  Just cut and paste into the box.  

Disclaimer: we implemented a VERY simple interpreter.  If you do something wrong, we cannot guarantee the results.  But if you do it right, you are in for a treat.

If you want to reset the problem, just click in the box into which you pasted the code and hit "next" again.