System information (for reproducibility):

In [1]:
versioninfo()

Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.5.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = code


Load packages:

In [2]:
using Pkg

Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()

[32m[1m  Activating[22m[39m project at `~/Documents/github.com/ucla-biostat-257/2023spring/slides/10-trisys`


[32m[1mStatus[22m[39m `~/Documents/github.com/ucla-biostat-257/2023spring/slides/10-trisys/Project.toml`
 [90m [6e4b80f9] [39mBenchmarkTools v1.3.2
 [90m [37e2e46d] [39mLinearAlgebra
 [90m [9a3f8284] [39mRandom


For the next couple of lectures, we consider computer algorithms for solving linear equations $\mathbf{A} \mathbf{x} = \mathbf{b}$, a ubiquitous task in statistics. 

Key idea: turning original problem into an **easy** one, e.g., triangular system.

## Lower triangular system

To solve $\mathbf{A} \mathbf{x} = \mathbf{b}$, where $\mathbf{A} \in \mathbb{R}^{n \times n}$ is **lower triangular**

$$
\begin{pmatrix}
    a_{11} & 0 & \cdots & 0 \\
    a_{21} & a_{22} & \cdots & 0 \\
    \vdots & \vdots & \ddots & \vdots \\
    a_{n1} & a_{n2} & \cdots & a_{nn}
\end{pmatrix}
\begin{pmatrix}
x_1 \\ x_2 \\ \vdots \\ x_n
\end{pmatrix} = \begin{pmatrix}
b_1 \\ b_2 \\ \vdots \\ b_n
\end{pmatrix}.
$$

* **Forward substitution**: 
$$
\begin{eqnarray*}
    x_1 &=& b_1 / a_{11} \\
    x_2 &=& (b_2 - a_{21} x_1) / a_{22} \\
    x_3 &=& (b_3 - a_{31} x_1 - a_{32} x_2) / a_{33} \\
    &\vdots& \\
    x_n &=& (b_n - a_{n1} x_1 - a_{n2} x_2 - \cdots - a_{n,n-1} x_{n-1}) / a_{nn}.
\end{eqnarray*}
$$

* $1 + 3 + 5 + \cdots + (2n-1) = n^2$ flops. 

* $\mathbf{A}$ can be accessed by row ($ij$ loop) or column ($ji$ loop).

## Upper triangular system

To solve $\mathbf{A} \mathbf{x} = \mathbf{b}$, where $\mathbf{A} \in \mathbb{R}^{n \times n}$ is upper triangular  
$$
\begin{pmatrix}
    a_{11} & \cdots & a_{1,n-1} & a_{1n} \\
    \vdots & \ddots & \vdots & \vdots \\
    0 & \cdots & a_{n-1,n-1} & a_{n-1,n} \\
    0 & 0 & 0 & a_{nn}
\end{pmatrix}
\begin{pmatrix}
x_1 \\ \vdots \\ x_{n-1} \\ x_n
\end{pmatrix} = \begin{pmatrix}
b_1 \\ \vdots \\ b_{n-1} \\ b_n
\end{pmatrix}.
$$

* **Back substitution** 
$$
\begin{eqnarray*}
    x_n &=& b_n / a_{nn} \\
    x_{n-1} &=& (b_{n-1} - a_{n-1,n} x_n) / a_{n-1,n-1} \\
    x_{n-2} &=& (b_{n-2} - a_{n-2,n-1} x_{n-1} - a_{n-2,n} x_n) / a_{n-2,n-2} \\
    &\vdots& \\
    x_1 &=& (b_1 - a_{12} x_2 - a_{13} x_3 - \cdots - a_{1,n} x_{n}) / a_{11}.
\end{eqnarray*}
$$

* $n^2$ flops.

* $\mathbf{A}$ can be accessed by row ($ij$ loop) or column ($ji$ loop).

## Implementation

* BLAS level 2 function: [?trsv](http://www.netlib.org/lapack/explore-html/d6/d96/dtrsv_8f.html) (triangular solve with one right hand side).

* BLAS level 3 function: [?trsm](http://www.netlib.org/lapack/explore-html/de/da7/dtrsm_8f.html) (matrix triangular solve, i.e., multiple right hand sides).

* Julia  
    - The left divide `\` operator in Julia is used for solving linear equations or least squares problem.  
    - If `A` is a triangular matrix, the command `A \ b` uses forward or backward substitution  
    - Or we can call the BLAS wrapper functions directly: [trsv!](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.BLAS.trsv!), [trsv](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.BLAS.trsv), [trsm!](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.BLAS.trsm!), [trsm](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.BLAS.trsm)

In [3]:
using LinearAlgebra, Random

Random.seed!(257) # seed
n = 5
A = randn(n, n)
b = randn(n)
# a random matrix
A

5×5 Matrix{Float64}:
  0.679063   1.52556    0.234923  -0.111974  -1.10328
  1.24568   -1.69501   -1.12138   -0.16883    1.88349
 -1.21007   -0.245347   0.222238  -0.501868   0.488434
 -0.817491  -0.131288  -1.15676    0.86352    0.0769591
  1.04395   -1.06533    0.708506  -1.32251   -2.28938

In [4]:
Al = LowerTriangular(A) # does not create extra matrix

5×5 LowerTriangular{Float64, Matrix{Float64}}:
  0.679063    ⋅          ⋅          ⋅         ⋅ 
  1.24568   -1.69501     ⋅          ⋅         ⋅ 
 -1.21007   -0.245347   0.222238    ⋅         ⋅ 
 -0.817491  -0.131288  -1.15676    0.86352    ⋅ 
  1.04395   -1.06533    0.708506  -1.32251  -2.28938

In [5]:
dump(Al)

LowerTriangular{Float64, Matrix{Float64}}
  data: Array{Float64}((5, 5)) [0.6790633442371218 1.5255628642992316 … -0.11197407057583378 -1.1032824444790374; 1.2456776800889142 -1.6950136862944665 … -0.16883028457206983 1.8834907119704818; … ; -0.8174908512677062 -0.1312875922954256 … 0.8635202236283535 0.07695906622703877; 1.0439509789805828 -1.0653277558175929 … -1.3225143450097687 -2.289382892165633]


In [6]:
Al.data

5×5 Matrix{Float64}:
  0.679063   1.52556    0.234923  -0.111974  -1.10328
  1.24568   -1.69501   -1.12138   -0.16883    1.88349
 -1.21007   -0.245347   0.222238  -0.501868   0.488434
 -0.817491  -0.131288  -1.15676    0.86352    0.0769591
  1.04395   -1.06533    0.708506  -1.32251   -2.28938

In [7]:
# same data
pointer(Al.data), pointer(A)

(Ptr{Float64} @0x000000011014eb40, Ptr{Float64} @0x000000011014eb40)

In [8]:
Al \ b # dispatched to BLAS function for triangular solve

5-element Vector{Float64}:
 -0.6752595578784236
 -0.3076650040919988
 -4.102671130071105
 -5.384670255453669
  1.4844958985110646

In [9]:
# or use BLAS wrapper directly
BLAS.trsv('L', 'N', 'N', A, b)

5-element Vector{Float64}:
 -0.6752595578784236
 -0.3076650040919988
 -4.102671130071105
 -5.384670255453669
  1.4844958985110646

In [10]:
?BLAS.trsv

```
trsv(ul, tA, dA, A, b)
```

Return the solution to `A*x = b` or one of the other two variants determined by [`tA`](@ref stdlib-blas-trans) and [`ul`](@ref stdlib-blas-uplo). [`dA`](@ref stdlib-blas-diag) determines if the diagonal values are read or are assumed to be all ones.


* Some other BLAS functions for triangular systems such as triangular matrix-vector and triangular matrix-matrix multiplications: [trmv](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.BLAS.trmv), [trmv!](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.BLAS.trmv!), [trmm](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.BLAS.trmm), [trmm!](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.BLAS.trmm!)

## Some algebraic facts of triangular system (HW1)

* Eigenvalues of a triangular matrix $\mathbf{A}$ are diagonal entries $\lambda_i = a_{ii}$. 

* Determinant $\det(\mathbf{A}) = \prod_i a_{ii}$.

* The product of two upper (lower) triangular matrices is upper (lower) triangular.

* The inverse of an upper (lower) triangular matrix is upper (lower) triangular.

* The product of two unit upper (lower) triangular matrices is unit upper (lower) triangular.

* The inverse of a unit upper (lower) triangular matrix is unit upper (lower) triangular.

## Julia types for triangular matrices

[LowerTriangular](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.LowerTriangular), UnitLowerTriangular, 
[UpperTriangular](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.UpperTriangular), UnitUpperTriangular.  

In [11]:
A

5×5 Matrix{Float64}:
  0.679063   1.52556    0.234923  -0.111974  -1.10328
  1.24568   -1.69501   -1.12138   -0.16883    1.88349
 -1.21007   -0.245347   0.222238  -0.501868   0.488434
 -0.817491  -0.131288  -1.15676    0.86352    0.0769591
  1.04395   -1.06533    0.708506  -1.32251   -2.28938

In [12]:
LowerTriangular(A)

5×5 LowerTriangular{Float64, Matrix{Float64}}:
  0.679063    ⋅          ⋅          ⋅         ⋅ 
  1.24568   -1.69501     ⋅          ⋅         ⋅ 
 -1.21007   -0.245347   0.222238    ⋅         ⋅ 
 -0.817491  -0.131288  -1.15676    0.86352    ⋅ 
  1.04395   -1.06533    0.708506  -1.32251  -2.28938

In [13]:
LinearAlgebra.UnitLowerTriangular(A)

5×5 UnitLowerTriangular{Float64, Matrix{Float64}}:
  1.0         ⋅          ⋅          ⋅        ⋅ 
  1.24568    1.0         ⋅          ⋅        ⋅ 
 -1.21007   -0.245347   1.0         ⋅        ⋅ 
 -0.817491  -0.131288  -1.15676    1.0       ⋅ 
  1.04395   -1.06533    0.708506  -1.32251  1.0

In [14]:
using BenchmarkTools, LinearAlgebra, Random

Random.seed!(257) # seed
A = randn(1000, 1000);

In [15]:
# if we don't tell Julia it's triangular: O(n^3) complexity
# tril(A) returns a full triangular matrix, same as Matlab
@benchmark eigvals(tril($A))

BenchmarkTools.Trial: 205 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m23.478 ms[22m[39m … [35m 27.739 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 6.13%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m24.380 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m24.445 ms[22m[39m ± [32m801.560 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m1.21% ± 1.70%

  [39m█[39m█[39m▄[39m▃[39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m▁[39m [34m▁[39m[39m▆[32m [39m[39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m█[39m█[39m█[39m▇[

In [16]:
# if we tell Julia it's triangular: O(n) complexity
@benchmark eigvals(LowerTriangular($A))

BenchmarkTools.Trial: 10000 samples with 197 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m435.492 ns[22m[39m … [35m13.128 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 93.29%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m772.421 ns              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m  1.005 μs[22m[39m ± [32m 1.218 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m22.75% ± 16.39%

  [39m▅[39m▃[39m█[34m▇[39m[39m▅[32m▄[39m[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[39m█[34

In [17]:
# if we don't tell Julia it's triangular: O(n^3) complexity
# tril(A) returns a full triangular matrix, same as Matlab
@benchmark det(tril($A))

BenchmarkTools.Trial: 5721 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m690.041 μs[22m[39m … [35m  2.335 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 49.64%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m762.541 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m873.063 μs[22m[39m ± [32m247.996 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m12.33% ± 16.96%

  [39m [39m [39m [39m▆[39m█[39m▄[34m▁[39m[39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▃[39m

In [18]:
# if we tell Julia it's triangular: O(n) complexity
@benchmark det(LowerTriangular($A))

BenchmarkTools.Trial: 10000 samples with 155 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m670.968 ns[22m[39m … [35m16.614 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 91.58%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m989.787 ns              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m  1.245 μs[22m[39m ± [32m 1.504 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m20.12% ± 14.70%

  [39m▅[39m█[34m█[39m[39m▅[32m▁[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[34m█[39

In [19]:
@benchmark det(LowerTriangular($A))

BenchmarkTools.Trial: 10000 samples with 155 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m670.703 ns[22m[39m … [35m26.863 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 95.01%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m982.258 ns              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m  1.229 μs[22m[39m ± [32m 1.508 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m19.96% ± 14.70%

  [39m▆[39m█[34m█[39m[39m▅[32m▄[39m[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[34m█[39