System information (for reproducibility):

In [9]:
versioninfo()

Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin24.0.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m2)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = code


Load packages:

In [10]:
using Pkg

Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()

[32m[1mStatus[22m[39m `~/Documents/github.com/ucla-biostat-257/2025spring/slides/15-linreg/Project.toml`
  [90m[6e4b80f9] [39mBenchmarkTools v1.6.0
  [90m[7522ee7d] [39mSweepOperator v0.3.4
  [90m[37e2e46d] [39mLinearAlgebra v1.11.0


[32m[1m  Activating[22m[39m project at `~/Documents/github.com/ucla-biostat-257/2025spring/slides/15-linreg`


## Comparing methods for linear regression

Methods for solving linear regression $\widehat \beta = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$:

| Method            | Flops                  | Remarks                 | Software | Stability   |
| :---------------: | :--------------------: | :---------------------: | :------: | :---------: |
| Sweep             | $np^2 + p^3$           | $(X^TX)^{-1}$ available | SAS      | less stable |
| Cholesky          | $np^2 + p^3/3$         |                         |          | less stable |
| QR by Householder | $2np^2 - (2/3)p^3$     |                         | R        | stable      |
| QR by MGS         | $2np^2$                | $Q_1$ available         |          | stable      | 
| QR by SVD         | $4n^2p + 8np^2 + 9p^3$ | $X = UDV^T$             |          | most stable |  

Remarks:

1. When $n \gg p$, sweep and Cholesky are twice faster than QR and need less space.  
2. Sweep and Cholesky are based on the **Gram matrix** $\mathbf{X}^T \mathbf{X}$, which can be dynamically updated with incoming data. They can handle huge $n$, moderate $p$ data sets that cannot fit into memory.  
3. QR methods are more stable and produce numerically more accurate solution.  
4. Although sweep is slower than Cholesky, it yields standard errors and so on.  
5. MGS appears slower than Householder, but it yields $\mathbf{Q}_1$.

> **There is simply no such thing as a universal 'gold standard' when it comes to algorithms.**

## Benchmark

In [11]:
using SweepOperator, BenchmarkTools, LinearAlgebra

linreg_cholesky(y::Vector, X::Matrix) = cholesky!(X'X) \ (X'y)

linreg_qr(y::Vector, X::Matrix) = X \ y

function linreg_sweep(y::Vector, X::Matrix)
    p = size(X, 2)
    xy = [X y]
    tableau = xy'xy
    sweep!(tableau, 1:p)
    return tableau[1:p, end]
end

function linreg_svd(y::Vector, X::Matrix)
    xsvd = svd(X)
    return xsvd.V * ((xsvd.U'y) ./ xsvd.S)
end

linreg_svd (generic function with 1 method)

In [12]:
using Random

Random.seed!(123) # seed

n, p = 10, 3
X = randn(n, p)
y = randn(n)

# check these methods give same answer
@show linreg_cholesky(y, X)
@show linreg_qr(y, X)
@show linreg_sweep(y, X)
@show linreg_svd(y, X);

linreg_cholesky(y, X) = [-0.07196570434574735, -0.13575699455859386, -0.18820199689456507]
linreg_qr(y, X) = [-0.07196570434574738, -0.1357569945585939, -0.18820199689456502]
linreg_sweep(y, X) = [-0.07196570434574734, -0.1357569945585939, -0.188201996894565]
linreg_svd(y, X) = [-0.07196570434574735, -0.13575699455859377, -0.1882019968945651]


In [13]:
n, p = 1000, 300
X = randn(n, p)
y = randn(n)

@benchmark linreg_cholesky(y, X)

BenchmarkTools.Trial: 6327 samples with 1 evaluation per sample.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m665.625 μs[22m[39m … [35m  4.052 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 80.71%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m750.000 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m789.258 μs[22m[39m ± [32m160.343 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m4.17% ±  9.70%

  [39m [39m▂[39m [39m▂[39m▅[39m█[39m█[34m▇[39m[39m▄[39m▃[32m▂[39m[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[3

In [14]:
@benchmark linreg_sweep(y, X)

BenchmarkTools.Trial: 1027 samples with 1 evaluation per sample.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m4.493 ms[22m[39m … [35m  8.864 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 45.70%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m4.678 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m4.869 ms[22m[39m ± [32m413.134 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m4.02% ±  6.51%

  [39m [39m [39m [39m▅[39m▇[39m█[39m▇[34m▅[39m[39m▄[39m▃[39m▂[39m▁[39m [39m [32m [39m[39m [39m [39m [39m [39m▁[39m▁[39m▁[39m▁[39m [39m [39m [39m [39m▁[39m▁[39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▄[39m▁[39m█[39m█[3

In [15]:
@benchmark linreg_qr(y, X)

BenchmarkTools.Trial: 360 samples with 1 evaluation per sample.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m12.866 ms[22m[39m … [35m45.692 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m13.332 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m13.901 ms[22m[39m ± [32m 2.718 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m1.43% ± 3.46%

  [39m▅[39m█[39m█[39m▇[34m▆[39m[39m▄[39m▄[39m▃[32m▂[39m[39m [39m▃[39m▁[39m▂[39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m█[39m█[39m█[34m█

In [16]:
@benchmark linreg_svd(y, X)

BenchmarkTools.Trial: 191 samples with 1 evaluation per sample.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m24.808 ms[22m[39m … [35m 30.799 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m25.986 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m1.52%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m26.180 ms[22m[39m ± [32m851.641 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m1.92% ± 2.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m▃[39m▄[39m▄[39m▁[39m▂[39m█[34m▆[39m[39m▃[39m [32m [39m[39m [39m▂[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▃[39m▄[39m▄[3