# Lots of loops

This notebook illustrates the different ways in which loops for matrix-matrix multiplication can be ordered.  Let's start by creating some matrices.

In [2]:
m = 4
n = 3
k = 5

println("C = ")
C = rand(m, n)

C = 


4×3 Array{Float64,2}:
 0.261545   0.254755   0.492339
 0.552558   0.231981   0.409594
 0.150717   0.299299   0.301166
 0.0406529  0.0498646  0.553314

In [4]:
Cold = copy(C) # an alternative way of doing a "hard" copy

4×3 Array{Float64,2}:
 0.261545   0.254755   0.492339
 0.552558   0.231981   0.409594
 0.150717   0.299299   0.301166
 0.0406529  0.0498646  0.553314

In [5]:
println("A = ")
A = rand(m, k)

A = 


4×5 Array{Float64,2}:
 0.326325  0.838951  0.698483  0.313721  0.933725
 0.083477  0.886588  0.230835  0.498924  0.444282
 0.990166  0.502168  0.721407  0.817125  0.780143
 0.602018  0.446371  0.573852  0.211951  0.860292

In [6]:
println("B = ")
B = rand(k, n)

B = 


5×3 Array{Float64,2}:
 0.926506    0.48537    0.548547
 0.00952294  0.0168649  0.9625  
 0.190341    0.153879   0.924508
 0.341832    0.0149595  0.094451
 0.0837492   0.334056   0.181855

## <h2>The basic algorithm</h2  <p> Given $ A \in \mathbb{R}^{m \times k} $, $ B \in \mathbb{R}^{k \times n} $, and $ C \in \mathbb{R}^{m \times n} $, we will consider $ C := A B + C $. </p>      <p>     Now, recall that the $ i,j $ element of $ A B $ is computed as the dot product of  the $ i $th row of $ A $ with the $ j $th column of $ B $: </p>  <p>     $\sum_{p=0}^{k-1} \alpha_{i,j} \beta_{i,j}$ </p>  <p>     and here, by adding to $ C $ we get </p>  <p> $ \gamma_{i,j} = \sum_{p=0}^{k-1} \alpha_{i,j} \beta_{i,j} + \gamma_{i,j}.$ </p>  <p>     Now, we have to loop over all elements of $ C $.  The code, without the FLAMEpy API, becomes </p>

In [10]:
function MMmult_lots_of_loops!( A, B, C )

    m, n = size( C )
    m, k = size( A )
    
    # i,j,p
    for i in 1:m                     
        for j in 1:n                    
            for p in 1:k                    
                C[ i,j ] = A[ i,p ] * B[ p, j ] + C[ i,j ]
            end
        end
    end
                
#     # i,p,j
#     for i in 1:m                     
#         for p in 1:k
#             for j in 1:n
#                 C[ i,j ] = A[ i,p ] * B[ p, j ] + C[ i,j ]
#             end
#         end
#     end
                
#     # j,i,p                  
#     for j in 1:n                    
#         for i in 1:m 
#             for p in 1:k
#                 C[ i,j ] = A[ i,p ] * B[ p, j ] + C[ i,j ]
#             end
#         end
#     end

#     # j,p,i
#     for j in 1:n                    
#         for p in 1:k
#             for i in 1:m 
#                 C[ i,j ] = A[ i,p ] * B[ p, j ] + C[ i,j ]
#             end
#         end
#     end

#     # p,i,j
#     for p in 1:k
#         for i in 1:m
#             for j in 1:n
#                 C[ i,j ] = A[ i,p ] * B[ p, j ] + C[ i,j ]
#             end
#         end
#     end

#     # p,j,i
#     for p in 1:k
#         for j in 1:n
#             for i in 1:m
#                 C[ i,j ] = A[ i,p ] * B[ p, j ] + C[ i,j ]
#             end
#         end
#     end
    
end

MMmult_lots_of_loops! (generic function with 1 method)

In [11]:
C = copy(Cold)            # restore C

MMmult_lots_of_loops!( A, B, C )

println("C - ( Cold + A * B )" )
C - ( Cold + A * B )

C - ( Cold + A * B )


4×3 Array{Float64,2}:
 0.0           1.11022e-16  4.44089e-16
 1.11022e-16   0.0          0.0        
 0.0           0.0          0.0        
 0.0          -1.11022e-16  0.0        

Now, go back and systematically move the loops around, so that in the end you try out all six orders of the loops: three choices for the first, outermost, loop; two choices for the secod loop; one choice for the third loop, for a total of $ 3! $ (3 factorial) choices. Check that you get the right answer, regardless. 

(We suggest you just change the box in which the routine is defined and comment out variations that you've already tested.  Be careful with indentation.)

## Why $ C := A B + C $ rather than $ C := A B $?

Notice that we could have written a routine to compute $ C := A B $ instead, given below.

In [14]:
function MMmult_C_eq_AB!( A, B, C )

    m, n = size( C )
    m, k = size( A )
    
    for i in 1:m                     
        for j in 1:n  
            C[ i,j ] = 0.0
            for p in 1:k                    
                C[ i,j ] = A[ i,p ] * B[ p, j ] + C[ i,j ]
            end
        end
    end
end

MMmult_C_eq_AB! (generic function with 1 method)

In [15]:
C = copy(Cold)             # restore C

MMmult_C_eq_AB!( A, B, C )

println( "C - ( A * B )" )
C - ( A * B )

C - ( A * B )


4×3 Array{Float64,2}:
  0.0           0.0           2.22045e-16
  0.0           0.0          -2.22045e-16
 -2.22045e-16   1.11022e-16   4.44089e-16
  0.0          -1.11022e-16  -2.22045e-16

Now, start changing the order of the loops.  You notice it is not quite as simple.  But, if you have a routine for computing $ C := A B + C $, you can always initialize $ C = 0 $ (the zero matrix) and then use it to call $ C := A B $:

In [16]:
C = fill(0.0, size(Cold))         # initialize C = 0 

MMmult_lots_of_loops!( A, B, C )

println( "C - ( A * B )" )
C - ( A * B )

C - ( A * B )


4×3 Array{Float64,2}:
  0.0           0.0           2.22045e-16
  0.0           0.0          -2.22045e-16
 -2.22045e-16   1.11022e-16   4.44089e-16
  0.0          -1.11022e-16  -2.22045e-16