System information (for reproducibility):

In [1]:
versioninfo()

Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.5.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = code


Load packages:

In [2]:
using Pkg

Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()

[32m[1m  Activating[22m[39m project at `~/Documents/github.com/ucla-biostat-257/2023spring/slides/19-easylineq`


[32m[1mStatus[22m[39m `~/Documents/github.com/ucla-biostat-257/2023spring/slides/19-easylineq/Project.toml`
 [90m [6e4b80f9] [39mBenchmarkTools v1.3.2
 [90m [42fd0dbc] [39mIterativeSolvers v0.9.2
 [90m [b51810bb] [39mMatrixDepot v1.0.10
 [90m [b8865327] [39mUnicodePlots v3.5.2
 [90m [efce3f68] [39mWoodburyMatrices v0.5.5
 [90m [37e2e46d] [39mLinearAlgebra
 [90m [9a3f8284] [39mRandom
 [90m [2f01184e] [39mSparseArrays


# Introduction

Consider $\mathbf{A} \mathbf{x} = \mathbf{b}$, $\mathbf{A} \in \mathbb{R}^{n \times n}$. Or, consider matrix inverse (if you want). $\mathbf{A}$ can be huge. Keep massive data in mind: 1000 Genome Project, NetFlix, Google PageRank, finance, spatial statistics, ... We should be alert to many easy linear systems. 

Don't blindly use `A \ b` and `inv` in Julia or `solve` function in R. **Don't waste computing resources by bad choices of algorithms!**

## Diagonal matrix

Diagonal $\mathbf{A}$: $n$ flops. Use `Diagonal` type of Julia.

In [3]:
using BenchmarkTools, LinearAlgebra, Random

# generate random data
Random.seed!(123)
n = 1000
A = diagm(0 => randn(n)) # a diagonal matrix stored as Matrix{Float64}
b = randn(n);

In [4]:
# should give link to the source code
@which A \ b

In [5]:
# check `istril(A)` and `istriu(A)` (O(n^2)), then call `Diagonal(A) \ b` (O(n))
@benchmark $A \ $b

BenchmarkTools.Trial: 9346 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m489.208 μs[22m[39m … [35m 2.729 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 80.04%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m534.417 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m533.725 μs[22m[39m ± [32m27.909 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.04% ±  0.83%

  [39m▄[39m▃[39m▁[39m▃[39m [39m [39m▂[39m▁[39m▁[39m▁[39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m█[39m█[32m▅[39m[34m▇[39m[39m▅[39m▄[39m▅[39m▄[39m▄[39m▄[39m▄[39m▃[39m▃[39m▃[39m▂[39m▃[39m▂[39m▂[39m▂[39m▁[39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[39m█[39m█[39m

In [6]:
# O(n) computation, no extra array allocation
@benchmark Diagonal($A) \ $b

BenchmarkTools.Trial: 10000 samples with 21 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m962.286 ns[22m[39m … [35m74.079 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 96.61%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m  1.673 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m  2.165 μs[22m[39m ± [32m 5.510 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m21.56% ±  8.22%

  [39m [39m [39m█[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▇[39m▂[39m▄[39m▂[34m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▁[39m▂[39m█[39m

## Bidiagonal, tridiagonal, and banded matrices

Bidiagonal, tridiagonal, or banded $\mathbf{A}$: Band LU, band Cholesky, ... roughly $O(n)$ flops.   
* Use [`Bidiagonal`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.Bidiagonal), [`Tridiagonal`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.Tridiagonal), [`SymTridiagonal`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.SymTridiagonal) types of Julia.

In [7]:
Random.seed!(123) 

n  = 1000
dv = randn(n)
ev = randn(n - 1)
b  = randn(n) # rhs
# symmetric tridiagonal matrix
A  = SymTridiagonal(dv, ev)

1000×1000 SymTridiagonal{Float64, Vector{Float64}}:
 0.808288   0.598212    ⋅         ⋅        …    ⋅           ⋅          ⋅ 
 0.598212  -1.12207   -1.00753    ⋅             ⋅           ⋅          ⋅ 
  ⋅        -1.00753   -1.10464   1.97923        ⋅           ⋅          ⋅ 
  ⋅          ⋅         1.97923  -0.416993       ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅       -0.34622        ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅        …    ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅        …    ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           ⋅          ⋅ 
  ⋅          ⋅          ⋅         ⋅             ⋅           

In [8]:
# convert to a full matrix
Afull = Matrix(A)

# LU decomposition (2/3) n^3 flops!
@benchmark $Afull \ $b

BenchmarkTools.Trial: 1271 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m3.739 ms[22m[39m … [35m  8.754 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 8.88%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m3.787 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m3.934 ms[22m[39m ± [32m346.327 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m3.35% ± 6.40%

  [39m▅[39m█[39m█[34m▇[39m[39m▅[39m▂[39m▁[39m [39m [39m▁[39m▁[32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂[39m▁[39m▃[39m▂[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m█[39m█[34m█[39m[39m█[39m

In [9]:
# specialized algorithm for tridiagonal matrix, O(n) flops
@benchmark $A \ $b

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m10.917 μs[22m[39m … [35m 1.530 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 98.20%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m11.917 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m12.654 μs[22m[39m ± [32m24.384 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m3.30% ±  1.70%

  [39m [39m▇[39m [39m [39m▇[39m█[39m█[34m [39m[39m█[39m▃[39m▃[39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▆[39m█[39m▅[39m▃[39m█[39m█

## Triangular matrix

Triangular $\mathbf{A}$: $n^2$ flops to solve linear system.

In [10]:
Random.seed!(123)

n = 1000
A = tril(randn(n, n)) # a lower-triangular matrix stored as Matrix{Float64}
b = randn(n)

# check istril() then triangular solve
@benchmark $A \ $b

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m320.250 μs[22m[39m … [35m543.584 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m322.042 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m324.638 μs[22m[39m ± [32m 14.096 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m█[39m▇[34m▇[39m[39m▆[39m▁[39m▆[32m▅[39m[39m▅[39m▄[39m▂[39m▂[39m▃[39m▂[39m▂[39m▂[39m▁[39m▁[39m▁[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[34m█[3

In [11]:
# triangular solve directly; save the cost of istril()
@benchmark LowerTriangular($A) \ $b

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m91.458 μs[22m[39m … [35m319.000 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m92.500 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m94.141 μs[22m[39m ± [32m 13.168 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m█[39m▇[39m▆[34m▄[39m[39m▆[39m▇[39m▆[39m▄[39m▃[32m▁[39m[39m [39m▃[39m▂[39m▁[39m▂[39m▃[39m▃[39m▂[39m▂[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[39m█[34m█[39m

## Block diagonal matrix

Block diagonal: Suppose $n = \sum_b n_b$. For linear equations, $(\sum_b n_b)^3$ (without using block diagonal structure) vs $\sum_b n_b^3$ (using block diagonal structure).  

Julia has a [`blockdiag`](https://docs.julialang.org/en/v1/stdlib/SparseArrays/#SparseArrays.blockdiag) function that generates a **sparse** matrix. **Anyone interested writing a `BlockDiagonal.jl` package?**

In [12]:
using SparseArrays

Random.seed!(123)

B  = 10 # number of blocks
ni = 100
A  = blockdiag([sprandn(ni, ni, 0.01) for b in 1:B]...)

1000×1000 SparseMatrixCSC{Float64, Int64} with 975 stored entries:
⢶⡷⢬⡮⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣬⣿⣽⣿⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠈⠀⠁⠀⢻⣿⢿⣿⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⡿⣽⣳⡾⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠁⠁⢸⣬⣷⣟⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⢺⣿⣛⣲⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠈⠈⠁⠈⣝⢾⢻⣟⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣯⣫⣗⣮⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⢐⣟⣟⣷⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣶⢫⣿⣽⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢾⠾⢿⣛⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣼⣺⠿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢈⣉⡾⣟⣷⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⡞⣾⢿⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣎⣜⣿⣭⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢟⣭⡟⣇⣀⠀⣀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⣼⢾⣋⡿⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢿⣺⣯⡽⡀⢀⠀⡀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⣿⣷⢿⡵
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠾⡛⠼⡯

In [13]:
using UnicodePlots
spy(A)

         [38;5;8m┌──────────────────────────────────────────┐[0m    
       [38;5;8m1[0m [38;5;8m│[0m[38;5;5m⢷[0m[38;5;5m⡷[0m[38;5;5m⢸[0m[38;5;5m⡽[0m⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[38;5;8m│[0m [38;5;1m> 0[0m
        [38;5;8m[0m [38;5;8m│[0m[38;5;5m⣼[0m[38;5;5m⣾[0m[38;5;5m⣽[0m[38;5;5m⣿[0m[38;5;5m⠃[0m⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[38;5;8m│[0m [38;5;4m< 0[0m
        [38;5;8m[0m [38;5;8m│[0m[38;5;5m⠉[0m⠀[38;5;4m⠁[0m⠀[38;5;5m⢽[0m[38;5;5m⣿[0m[38;5;5m⢶[0m[38;5;5m⣷[0m[38;5;5m⠆[0m⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[38;5;8m│[0m [38;5;8m[0m   
        [38;5;8m[0m [38;5;8m│[0m⠀⠀⠀⠀[38;5;5m⢿[0m[38;5;5m⢯[0m[38;5;5m⡷[0m[38;5;5m⣴[0m[38;5;5m⠅[0m⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[38;5;8m│[0m [38;5;8m[0m   
        [38;5;8m[0m [38;5;8m│[0m⠀⠀⠀⠀[38;5;1m⠈[0m[38;5;5m⠉[0m[38;5;5m⠉[0m[38;5;5m⠁[0m[38;5;1m⢀[0m[38;5;5m⣦[0m[38;5;5m⣶[0m[38;5;5m⣴[0m[38;5;5m⠲[0m⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[38;5;8m│[0m [38;5

## Kronecker product

Use
$$
\begin{eqnarray*}
    (\mathbf{A} \otimes \mathbf{B})^{-1} &=& \mathbf{A}^{-1} \otimes \mathbf{B}^{-1} \\
    (\mathbf{C}^T \otimes \mathbf{A}) \text{vec}(\mathbf{B}) &=& \text{vec}(\mathbf{A} \mathbf{B} \mathbf{C}) \\
    \text{det}(\mathbf{A} \otimes \mathbf{B}) &=& [\text{det}(\mathbf{A})]^p [\text{det}(\mathbf{B})]^m, \quad \mathbf{A} \in \mathbb{R}^{m \times m}, \mathbf{B} \in \mathbb{R}^{p \times p}
\end{eqnarray*}    
$$
to avoid forming and doing costly computation on the potentially huge Kronecker $\mathbf{A} \otimes \mathbf{B}$.

**Anyone interested writing a package?**

In [14]:
using MatrixDepot, LinearAlgebra

A = matrixdepot("lehmer", 50) # a pd matrix

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mverify download of index files...
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mreading database
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39madding metadata...
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39madding svd data...
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mwriting database
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mused remote sites are sparse.tamu.edu with MAT index and math.nist.gov with HTML index


50×50 Matrix{Float64}:
 1.0        0.5        0.333333   0.25       …  0.0208333  0.0204082  0.02
 0.5        1.0        0.666667   0.5           0.0416667  0.0408163  0.04
 0.333333   0.666667   1.0        0.75          0.0625     0.0612245  0.06
 0.25       0.5        0.75       1.0           0.0833333  0.0816327  0.08
 0.2        0.4        0.6        0.8           0.104167   0.102041   0.1
 0.166667   0.333333   0.5        0.666667   …  0.125      0.122449   0.12
 0.142857   0.285714   0.428571   0.571429      0.145833   0.142857   0.14
 0.125      0.25       0.375      0.5           0.166667   0.163265   0.16
 0.111111   0.222222   0.333333   0.444444      0.1875     0.183673   0.18
 0.1        0.2        0.3        0.4           0.208333   0.204082   0.2
 0.0909091  0.181818   0.272727   0.363636   …  0.229167   0.22449    0.22
 0.0833333  0.166667   0.25       0.333333      0.25       0.244898   0.24
 0.0769231  0.153846   0.230769   0.307692      0.270833   0.265306   0.26
 ⋮  

In [15]:
B = matrixdepot("oscillate", 100) # pd matrix

100×100 Matrix{Float64}:
  0.539265      0.403989     -0.00217105   …   4.34736e-15  -2.06948e-14
  0.403989      0.531479      0.00497288      -4.47758e-15   2.13147e-14
 -0.00217105    0.00497288    0.660112         1.55726e-14  -7.41304e-14
  9.57703e-5   -0.000206496   0.0413618       -1.6975e-13    8.08067e-13
  7.50775e-6    0.000158391  -0.00736855       1.77959e-13  -8.47144e-13
  0.000158924  -8.79789e-6   -0.00523002   …  -2.20985e-13   1.05196e-12
 -1.92377e-5    2.14476e-5   -0.000180051      7.96194e-12  -3.79014e-11
  9.91107e-6   -1.07804e-5    0.000129376     -4.44381e-12   2.1154e-11
 -1.50826e-5    1.42813e-5    0.000145861      6.83341e-12  -3.25292e-11
  6.37565e-6   -5.92068e-6   -7.21201e-5      -4.05359e-12   1.92964e-11
  1.49406e-6   -1.03353e-6   -7.66895e-5   …   2.01848e-11  -9.60863e-11
 -8.76766e-7    8.08713e-7    1.27876e-5      -8.7208e-12    4.15138e-11
  3.24596e-6   -3.30136e-6    2.49045e-6       3.11066e-11  -1.48077e-10
  ⋮                        

In [16]:
M = kron(A, B)
c = ones(size(M, 2)) # rhs
# Method 1: form Kronecker product and Cholesky solve
x1 = cholesky(Symmetric(M)) \ c;

In [17]:
# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
m, p = size(A, 1), size(B, 1)
x2 = vec(transpose(cholesky(Symmetric(A)) \ 
    transpose(cholesky(Symmetric(B)) \ reshape(c, p, m))));

In [18]:
# relative error
norm(x1 - x2) / norm(x1)

1.548434035267782e-7

In [19]:
using BenchmarkTools

# Method 1: form Kronecker and Cholesky solve
@benchmark cholesky(Symmetric(kron($A, $B))) \ c

BenchmarkTools.Trial: 28 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m170.260 ms[22m[39m … [35m264.623 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m3.33% … 30.62%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m174.065 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m3.24%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m180.987 ms[22m[39m ± [32m 23.241 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m6.39% ±  7.86%

  [39m [39m [34m█[39m[39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▆[39m▇[34m█[39

In [20]:
# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
@benchmark vec(transpose(cholesky(Symmetric($A)) \ 
    transpose(cholesky(Symmetric($B)) \ reshape($c, p, m))))

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m66.833 μs[22m[39m … [35m 12.594 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 99.18%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m75.416 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m87.913 μs[22m[39m ± [32m376.408 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m12.36% ±  2.97%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂[39m▄[39m▇[39m▇[39m█[39m▆[39m▄[34m▆[39m[39m▄[39m▃[39m▃[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▁[39m▁[39m▁[39m▁[39m▁

## Sparse matrix

Sparsity: sparse matrix decomposition or iterative method.  
* The easiest recognizable structure. Familiarize yourself with the sparse matrix computation tools in Julia, Matlab, R (`Matrix` package), MKL (sparse BLAS), ... as much as possible.

In [21]:
using MatrixDepot

Random.seed!(123)

# a 7701-by-7701 sparse pd matrix
A = matrixdepot("wathen", 50)
# random generated rhs
b = randn(size(A, 1))
Afull = Matrix(A)
count(!iszero, A) / length(A) # sparsity

0.001994776158751544

In [22]:
using UnicodePlots
spy(A)

         [38;5;8m┌──────────────────────────────────────────┐[0m    
       [38;5;8m1[0m [38;5;8m│[0m[38;5;5m⢿[0m[38;5;5m⣷[0m[38;5;5m⣄[0m⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[38;5;8m│[0m [38;5;1m> 0[0m
        [38;5;8m[0m [38;5;8m│[0m⠀[38;5;5m⠙[0m[38;5;5m⢿[0m[38;5;5m⣷[0m[38;5;5m⣄[0m⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[38;5;8m│[0m [38;5;4m< 0[0m
        [38;5;8m[0m [38;5;8m│[0m⠀⠀⠀[38;5;5m⠙[0m[38;5;5m⢿[0m[38;5;5m⣷[0m[38;5;5m⣄[0m⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[38;5;8m│[0m [38;5;8m[0m   
        [38;5;8m[0m [38;5;8m│[0m⠀⠀⠀⠀⠀[38;5;5m⠙[0m[38;5;5m⢿[0m[38;5;5m⣷[0m[38;5;5m⣄[0m⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[38;5;8m│[0m [38;5;8m[0m   
        [38;5;8m[0m [38;5;8m│[0m⠀⠀⠀⠀⠀⠀⠀[38;5;5m⠙[0m[38;5;5m⢿[0m[38;5;5m⣷[0m[38;5;5m⣄[0m⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[38;5;8m│[0m [38;5;8m[0m   
        [38;5;8m[0m [38;5;8m│[0m⠀⠀⠀⠀⠀⠀⠀⠀⠀[38;5;5m⠙[0m[38;5;5m⢿[0m[38;5;5m⣷[0m[38;5;5m⣄[0m⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[

### Matrix-vector multiplication

In [23]:
# dense matrix-vector multiplication
@benchmark $Afull * $b

BenchmarkTools.Trial: 645 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m7.641 ms[22m[39m … [35m 7.926 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m7.732 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m7.748 ms[22m[39m ± [32m54.508 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▂[39m▃[39m█[39m█[39m▇[34m▅[39m[39m▄[39m▂[32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▁[39m▂[39m▁[39m▂[39m▁[39m▃[39m▃[

In [24]:
# sparse matrix-vector multiplication
@benchmark $A * $b

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m62.416 μs[22m[39m … [35m 12.228 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 99.27%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m67.041 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m69.873 μs[22m[39m ± [32m121.673 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m1.74% ±  0.99%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m▄[39m█[39m▇[39m [34m [39m[39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▁[39m▂[39m▂[39m▂[39

### Solve linear equation

In [25]:
# solve via dense Cholesky
xchol = cholesky(Symmetric(Afull)) \ b
@benchmark cholesky($(Symmetric(Afull))) \ $b

BenchmarkTools.Trial: 10 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m493.011 ms[22m[39m … [35m509.650 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.04% … 2.48%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m500.659 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.67%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m501.096 ms[22m[39m ± [32m  6.445 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m1.13% ± 1.22%

  [39m█[39m [39m [39m [39m█[39m█[39m [39m [39m [39m [39m [39m [39m█[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [34m█[39m[39m [39m [39m [39m [39m [32m [39m[39m█[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m█[39m [39m [39m [39m [39m [39m [39m█[39m [39m [39m█[39m [39m [39m [39m█[39m [39m 
  [39m█[39m▁[39m▁[39m▁

In [26]:
# solve via sparse Cholesky
xcholsp = cholesky(Symmetric(A)) \ b
@show norm(xchol - xcholsp)
@benchmark cholesky($(Symmetric(A))) \ $b

norm(xchol - xcholsp) = 5.062481022858237e-15


BenchmarkTools.Trial: 749 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m6.165 ms[22m[39m … [35m17.770 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 1.85%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m6.463 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m6.676 ms[22m[39m ± [32m 1.162 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.12% ± 0.43%

  [39m▆[39m█[34m█[39m[39m▆[32m▃[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m█[34m█[39m[39m█[32m█[39m[39m█[39

In [27]:
# sparse solve via conjugate gradient
using IterativeSolvers

xcg = cg(A, b)
@show norm(xcg - xchol)
@benchmark cg($A, $b)

norm(xcg - xchol) = 1.418530254611637e-7


BenchmarkTools.Trial: 250 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m19.719 ms[22m[39m … [35m 24.666 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m19.935 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m20.051 ms[22m[39m ± [32m533.024 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m▇[39m█[34m▆[39m[32m▄[39m[39m▁[39m [39m [39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m█[39m█[34m█[39m[3

## Easy plus low rank

Easy plus low rank: $\mathbf{U} \in \mathbb{R}^{n \times r}$, $\mathbf{V} \in \mathbb{R}^{r \times n}$, $r \ll n$. Woodbury formula
\begin{eqnarray*}
	(\mathbf{A} + \mathbf{U} \mathbf{V}^T)^{-1} &=& \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{U} (\mathbf{I}_r + \mathbf{V}^T \mathbf{A}^{-1} \mathbf{U})^{-1} \mathbf{V}^T \mathbf{A}^{-1} \\
    \text{det}(\mathbf{A} + \mathbf{U} \mathbf{V}^T) &=& \text{det}(\mathbf{A}) \text{det}(\mathbf{I}_r + \mathbf{V} \mathbf{A}^{-1} \mathbf{U}^T).
\end{eqnarray*}

* Keep HW3 (multivariate density) and HW4 (PageRank) problems in mind.  

* [`WoodburyMatrices.jl`](https://github.com/timholy/WoodburyMatrices.jl) package can be useful.

In [28]:
using BenchmarkTools, Random, WoodburyMatrices

Random.seed!(123)
n = 1000
r = 5

A = Diagonal(rand(n))
B = randn(n, r)
D = Diagonal(rand(r))
b = randn(n)
# Woodbury structure: W = A + B * D * B'
W = SymWoodbury(A, B, D)
Wfull = Matrix(W) # stored as a Matrix{Float64}

1000×1000 Matrix{Float64}:
  4.59798    -0.281279   -2.22717   …   2.15423     0.381752   -0.563599
 -0.281279    3.46848     1.39267       1.68392    -2.2848      0.956956
 -2.22717     1.39267     4.81456       0.879207   -0.516142    1.95953
  0.995152    0.835961   -0.198421      1.22304    -0.534476   -0.158006
 -4.18483     0.83762     2.07489      -3.41192    -2.40336     0.12795
  1.2769      0.450253    1.94231   …   3.05973     1.06062     1.58223
  1.2693     -0.993521   -0.522544     -0.180006   -0.0885834   0.433461
  0.263329    1.33478     1.84111       2.02565    -0.281757    1.85271
  1.56504    -1.2659     -1.79125      -0.0459542   0.927798   -0.648945
 -1.7482      0.478207    2.26841      -1.72566    -2.54041     1.36365
  0.0224748   1.0181      0.580171  …   0.640478   -0.998439   -0.171846
 -2.50815     1.01533     1.43186      -0.950055   -0.619325    0.162349
 -0.192541   -2.14724    -2.73878      -2.82316     0.861588   -1.74536
  ⋮                           

In [29]:
# compares storage
Base.summarysize(W), Base.summarysize(Wfull)

(64720, 8000040)

### Solve linear equation

In [30]:
# solve via Cholesky
@benchmark cholesky($(Symmetric(Wfull))) \ $b

BenchmarkTools.Trial: 1823 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m2.509 ms[22m[39m … [35m32.652 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% …  0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m2.555 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m2.743 ms[22m[39m ± [32m 1.372 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m5.67% ± 10.11%

  [34m█[39m[39m▁[32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [34m█[39m[39m█[32m▅[39m[39m▁[39m▁[39m▅

In [31]:
# solve using Woodbury formula
@benchmark $W \ reshape($b, n, 1) # hack; need to file an issue 

BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m5.674 μs[22m[39m … [35m 3.233 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m 0.00% … 99.64%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m7.625 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m 0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m9.099 μs[22m[39m ± [32m63.002 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m13.83% ±  1.99%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m█[39m█[34m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁

### Matrix-vector multiplication

In [32]:
# multiplication without using Woodbury structure
@benchmark $Wfull * $b

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m137.042 μs[22m[39m … [35m 10.114 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m183.750 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m192.307 μs[22m[39m ± [32m208.956 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▄[39m▄[39m▃[39m▆[39m█[34m▆[39m[39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▁[39m▁[3

In [33]:
# multiplication using Woodbury structure
@benchmark $W * $b

BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m2.120 μs[22m[39m … [35m 2.410 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 99.82%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m2.588 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m2.880 μs[22m[39m ± [32m24.079 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m8.35% ±  1.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▂[39m▄[39m▄[39m▄[39m▇[39m█[39m▅[39m▆[34m█[39m[39m▆[39m▄[39m▃[39m▃[39m▂[39m▁[39m [39m [39m▁[32m▁[39m[39m▁[39m▂[39m▂[39m▁[39m▁[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m▆[39m▆[39m▅[39m▆[39m▆[39m▆[39m▄[3

### Determinant

In [34]:
# determinant without using Woodbury structure
@benchmark det($Wfull)

BenchmarkTools.Trial: 1337 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m3.504 ms[22m[39m … [35m13.603 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 72.46%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m3.570 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m3.740 ms[22m[39m ± [32m 1.242 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m4.32% ±  9.32%

  [34m█[39m[32m▂[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [34m█[39m[32m█[39m[39m▁[39m▁[39m▁[39m▁

In [35]:
# determinant using Woodbury structure
@benchmark det($W)

BenchmarkTools.Trial: 10000 samples with 197 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m455.797 ns[22m[39m … [35m104.728 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 99.44%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m489.005 ns               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m507.681 ns[22m[39m ± [32m  1.043 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m2.05% ±  0.99%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▆[39m▇[39m█[39m█[39m▆[34m▃[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▂[39

## Easy plus border

Easy plus border: For $\mathbf{A}$ pd and $\mathbf{V}$ full row rank,
$$
	\begin{pmatrix}
	\mathbf{A} & \mathbf{V}^T \\
	\mathbf{V} & \mathbf{0}
	\end{pmatrix}^{-1} = \begin{pmatrix}
	\mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \\
	(\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & - (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1}
	\end{pmatrix}.
$$
**Anyone interested writing a package?**

## Orthogonal matrix

Orthogonal $\mathbf{A}$: $n^2$ flops **at most**. Why? Permutation matrix, Householder matrix, Jacobi matrix, ... take less.

## Toeplitz matrix

Toeplitz systems (constant diagonals):
$$
	\mathbf{T} = \begin{pmatrix}
	r_0 & r_1 & r_2 & r_3 \\
	r_{-1} & r_0 & r_1 & r_2 \\
	r_{-2} & r_{-1} & r_0 & r_1 \\
	r_{-3} & r_{-2} & r_{-1} & r_0
	\end{pmatrix}.
$$
$\mathbf{T} \mathbf{x} = \mathbf{b}$, where $\mathbf{T}$ is pd and Toeplitz, can be solved in $O(n^2)$ flops. Durbin algorithm (Yule-Walker equation), Levinson algorithm (general $\mathbf{b}$), Trench algorithm (inverse). These matrices occur in auto-regressive models and econometrics.

* [`ToeplitzMatrices.jl`](https://github.com/JuliaMatrices/ToeplitzMatrices.jl) package can be useful.

## Circulant matrix

Circulant systems: Toeplitz matrix with wraparound
$$
	C(\mathbf{z}) = \begin{pmatrix}
	z_0 & z_4 & z_3 & z_2 & z_1 \\
	z_1 & z_0 & z_4 & z_3 & z_2 \\
	z_2 & z_1 & z_0 & z_4 & z_3 \\
	z_3 & z_2 & z_1 & z_0 & z_4 \\
	z_4 & z_3 & z_2 & z_1 & z_0
	\end{pmatrix},
$$
FFT type algorithms: DCT (discrete cosine transform) and DST (discrete sine transform).

## Vandermonde matrix

Vandermonde matrix: such as in interpolation and approximation problems
$$
	\mathbf{V}(x_0,\ldots,x_n) = \begin{pmatrix}
	1 & 1 & \cdots & 1 \\
	x_0 & x_1 & \cdots & x_n \\
	\vdots & \vdots & & \vdots \\
	x_0^n & x_1^n & \cdots & x_n^n
	\end{pmatrix}.
$$
$\mathbf{V} \mathbf{x} = \mathbf{b}$ or $\mathbf{V}^T \mathbf{x} = \mathbf{b}$ can be solved in $O(n^2)$ flops.

## Cauchy-like matrix

Cauchy-like matrices:
$$
	\Omega \mathbf{A} - \mathbf{A} \Lambda = \mathbf{R} \mathbf{S}^T,
$$
where $\Omega = \text{diag}(\omega_1,\ldots,\omega_n)$ and $\Lambda = \text{diag}(\lambda_1,\ldots, \lambda_n)$. $O(n)$ flops for LU and QR.

## Structured-rank matrix

Structured-rank problems: semiseparable matrices (LU and QR takes $O(n)$ flops), quasiseparable matrices, ...