System information (for reproducibility):

In [1]:
versioninfo()

Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.5.0)
  CPU: 8 × Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 6 on 6 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 6


In [1]:
using Pkg

Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()

[32m[1m  Activating[22m[39m project at `~/Documents/UCLA_files/course_work/BIS_M257/hw/biostat-257-2023-spring/hw3`


[32m[1mStatus[22m[39m `~/Documents/UCLA_files/course_work/BIS_M257/hw/biostat-257-2023-spring/hw3/Project.toml`
 [90m [6e4b80f9] [39mBenchmarkTools v1.3.2
 [90m [31c24e10] [39mDistributions v0.25.88
 [90m [37e2e46d] [39mLinearAlgebra
 [90m [9a3f8284] [39mRandom


Load packages:

In [1]:
using LinearAlgebra, Random
using BenchmarkTools, Distributions

Consider a linear mixed effects model
$$
    \mathbf{Y}_i = \mathbf{X}_i \boldsymbol{\beta} + \mathbf{Z}_i \boldsymbol{\gamma} + \boldsymbol{\epsilon}_i, \quad i=1,\ldots,n,
$$
where   
- $\mathbf{Y}_i \in \mathbb{R}^{n_i}$ is the response vector of $i$-th individual,  
- $\mathbf{X}_i \in \mathbb{R}^{n_i \times p}$ is the fixed effect predictor matrix of $i$-th individual,  
- $\mathbf{Z}_i \in \mathbb{R}^{n_i \times q}$ is the random effect predictor matrix of $i$-th individual,  
- $\boldsymbol{\epsilon}_i \in \mathbb{R}^{n_i}$ are multivariate normal $N(\mathbf{0}_{n_i},\sigma^2 \mathbf{I}_{n_i})$,  
- $\boldsymbol{\beta} \in \mathbb{R}^p$ are fixed effects, and  
- $\boldsymbol{\gamma} \in \mathbb{R}^q$ are random effects assumed to be $N(\mathbf{0}_q, \boldsymbol{\Sigma}_{q \times q}$) independent of $\boldsymbol{\epsilon}_i$.

## Q1 Formula (10 pts)

Write down the log-likelihood of the $i$-th datum $(\mathbf{y}_i, \mathbf{X}_i, \mathbf{Z}_i)$ given parameters $(\boldsymbol{\beta}, \boldsymbol{\Sigma}, \sigma^2)$. 

### Q1 Solution

According to the notations, we have: $\mathbf{y}_i \sim \mathcal{N}(\mathbf{X}_i\boldsymbol{\beta},\ \mathbf{Z}_i\boldsymbol{\Sigma}_{q\times q}\mathbf{Z}_i^T+\sigma^2\mathbf{I}_{n_i})$.

For a random vector $\mathbf{x}$ following a Multivariate Normal Distribution $\mathcal{N}(\boldsymbol{\mu},\ \boldsymbol{\Sigma})$, the probability density function is given by: $$f(\mathbf{x})=\frac{1}{\sqrt{(2\pi)^k\det(\boldsymbol{\Sigma})}}\exp\left(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})\right)$$

So, we can plug in the parameters and get:
$$
\begin{aligned}
\mathcal{L}(\boldsymbol{\beta},\boldsymbol{\Sigma}, \sigma^2) &= \log\mathbf{P}(y_i|\boldsymbol{\beta},\boldsymbol{\Sigma}, \sigma^2)\\
&= \log\left(\frac{1}{\sqrt{(2\pi)^{n_i}\det(\mathbf{Z}_i\boldsymbol{\Sigma}_{q\times q}\mathbf{Z}_i^T+\sigma^2\mathbf{I}_{n_i})}}\exp\left(-\frac{1}{2}(\mathbf{y}_i-\mathbf{X}_i\boldsymbol{\beta})^T(\mathbf{Z}_i\boldsymbol{\Sigma}_{q\times q}\mathbf{Z}_i^T+\sigma^2\mathbf{I}_{n_i})^{-1}(\mathbf{y}_i-\mathbf{X}_i\boldsymbol{\beta})\right)\right)\\
&= -\frac{1}{2}\log\left((2\pi)^{n_i}\det(\mathbf{Z}_i\boldsymbol{\Sigma}_{q\times q}\mathbf{Z}_i^T+\sigma^2\mathbf{I}_{n_i})\right)
-\frac{1}{2}(\mathbf{y}_i-\mathbf{X}_i\boldsymbol{\beta})^T(\mathbf{Z}_i\boldsymbol{\Sigma}_{q\times q}\mathbf{Z}_i^T+\sigma^2\mathbf{I}_{n_i})^{-1}(\mathbf{y}_i-\mathbf{X}_i\boldsymbol{\beta}) \\
&= -\frac{1}{2}\log\left(\det(\mathbf{Z}_i\boldsymbol{\Sigma}_{q\times q}\mathbf{Z}_i^T+\sigma^2\mathbf{I}_{n_i})\right) 
-\frac{1}{2}(\mathbf{y}_i-\mathbf{X}_i\boldsymbol{\beta})^T(\mathbf{Z}_i\boldsymbol{\Sigma}_{q\times q}\mathbf{Z}_i^T+\sigma^2\mathbf{I}_{n_i})^{-1}(\mathbf{y}_i-\mathbf{X}_i\boldsymbol{\beta})
-\frac{n_i}{2}\log\left(2\pi\right) \\
\end{aligned}
$$

## Q2 Start-up code

Use the following template to define a type `LmmObs` that holds an LMM datum $(\mathbf{y}_i, \mathbf{X}_i, \mathbf{Z}_i)$. 

In [2]:
# define a type that holds LMM datum
struct LmmObs{T <: AbstractFloat}
    # data
    y :: Vector{T}
    X :: Matrix{T}
    Z :: Matrix{T}
    # working arrays
    # whatever intermediate vectors/arrays you may want to pre-allocate
    storage_p  :: Vector{T}
    storage_q  :: Vector{T}
    storage_q2 :: Vector{T}
    xtx        :: Matrix{T}
    ztx        :: Matrix{T}
    ztz        :: Matrix{T}
    xty        :: Vector{T}
    zty        :: Vector{T}
    storage_qq :: Matrix{T}
end

# constructor
function LmmObs(
        y::Vector{T}, 
        X::Matrix{T}, 
        Z::Matrix{T}
        ) where T <: AbstractFloat
    storage_p  = Vector{T}(undef, size(X, 2))
    storage_q  = Vector{T}(undef, size(Z, 2))
    storage_q2 = Vector{T}(undef, size(Z, 2))
    xtx        = transpose(X) * X
    ztx        = transpose(Z) * X
    ztz        = transpose(Z) * Z
    xty        = transpose(X) * y
    zty        = transpose(Z) * y
    storage_qq = similar(ztz)
    LmmObs(y, X, Z, storage_p, storage_q, storage_q2, 
           xtx, ztx, ztz, xty, zty, storage_qq)
end

LmmObs

Write a function, with interface   
```julia
logl!(obs, β, L, σ²)
```
that evaluates the log-likelihood of the $i$-th datum. Here `L` is the lower triangular Cholesky factor from the Cholesky decomposition `Σ=LL'`. Make your code efficient in the $n_i \gg q$ case. Think the intensive longitudinal measurement setting.  

**Hint**: This function shouldn't be very long. Mine, obeying 92-character rule, is 30 lines. If you find yourself writing very long code, you're on the wrong track. Think about algorithm (flop count) first then use BLAS functions to reduce memory allocations.

### Q2 Solution

In [136]:
function logl!(
    obs :: LmmObs{T}, 
    β   :: Vector{T}, 
    L   :: Matrix{T}, 
    σ²  :: T) where T <: AbstractFloat

n, p, q = size(obs.X, 1), size(obs.X, 2), size(obs.Z, 2)    
# compute and return the log-likelihood

l1, l2, l3, l = 0.0, 0.0, 0.0, 0.0
# part 1 determinant
mul!(obs.storage_qq, UpperTriangular(L'), obs.ztz)
rmul!(obs.storage_qq, LowerTriangular(L))
obs.storage_qq .+= Diagonal(ones(q)) .* σ² # one allocation 
cholesky!(Hermitian(obs.storage_qq))

l1 = prod(diag(obs.storage_qq))^2          # one allocation 

# part 2 inversion
obs.storage_qq .= inv(UpperTriangular(obs.storage_qq)) # one allocation
mul!(obs.storage_qq, LowerTriangular(obs.storage_qq'), 
                     UpperTriangular(obs.storage_qq))

mul!(obs.storage_p, obs.xtx, β)
mul!(obs.storage_q, obs.ztx, β)
obs.storage_q .= obs.zty .- obs.storage_q
mul!(obs.storage_q, UpperTriangular(L'), obs.storage_q)
copyto!(obs.storage_q2, obs.storage_q)
mul!(obs.storage_q, obs.storage_qq, obs.storage_q)

l2 = obs.y' * obs.y - 2 * obs.xty' * β + β' * obs.storage_p
l3 = obs.storage_q2' * obs.storage_q

l = -0.5 * log(l1) - 0.5 * (l2 - l3) / σ² - 0.5 * n * log(2π)

return l

end

logl! (generic function with 1 method)

In [137]:
# test code 1
logl!(obs, β, L, σ²)

-4325.35618025044

In [138]:
#test code 2
@benchmark logl!($obs, $β, $L, $σ²)

BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.981 μs[22m[39m … [35m 16.250 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m2.736 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m2.535 μs[22m[39m ± [32m495.370 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m▅[39m█[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [34m [39m[39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m█[39m▃[39m▁[39m▁[39m▂[

## Q3 Correctness (15 pts)

Compare your result (both accuracy and timing) to the [Distributions.jl](https://juliastats.org/Distributions.jl/stable/multivariate/#Distributions.AbstractMvNormal) package using following data.

In [24]:
Random.seed!(257)
# dimension
n, p, q = 2000, 5, 3
# predictors
X  = [ones(n) randn(n, p - 1)]
Z  = [ones(n) randn(n, q - 1)]
# parameter values
β  = [2.0; -1.0; rand(p - 2)]
σ² = 1.5
Σ  = fill(0.1, q, q) + 0.9I
# generate y
y  = X * β + Z * rand(MvNormal(Σ)) + sqrt(σ²) * randn(n)

# form an LmmObs object
obs = LmmObs(y, X, Z)

LmmObs{Float64}([-1.450910909560209, 1.5185224894450862, 5.265021705624027, 4.485272594164557, 0.694969966642933, 1.7723256696372405, 1.1065838446466518, 3.729166811829607, 4.288899999400642, 2.8241842645202406  …  4.058027151891634, 1.0909724390970443, 0.026692243086209766, -0.8927757653299448, 6.94725248926293, 3.5193020855673436, 4.914007299083773, 2.1610206566690797, 1.857389542694909, 6.513818951020866], [1.0 0.6790633442371218 … 0.5400611947971554 -0.632040682052606; 1.0 1.2456776800889142 … -0.4818455756130373 0.6467830314674976; … ; 1.0 0.0733124748775436 … 0.6125080259511859 0.4181258283983667; 1.0 -1.336609049786048 … -0.18567490803712938 1.0745977099307227], [1.0 -1.0193326822839996 -0.15855601254314888; 1.0 1.7462667837699666 -0.4584376230657152; … ; 1.0 1.4843185594903878 0.42458303115266854; 1.0 0.3791714762820068 0.25150666970865837], [5.0e-324, 5.0e-324, 5.0e-324, 5.0e-324, 0.0], [1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [2000.0 -16.870943820386742 … -4.678756487678475 -33.0139

This is the standard way to evaluate log-density of a multivariate normal, using the Distributions.jl package. Let's evaluate the log-likelihood of this datum.

In [4]:
μ  = X * β
Ω  = Z * Σ * transpose(Z) +  σ² * I
mvn = MvNormal(μ, Symmetric(Ω)) # MVN(μ, Σ)
logpdf(mvn, y)

-3256.179335805832

Check that your answer matches that from Distributions.jl

In [70]:
L = Matrix(cholesky(Σ).L)
logl!(obs, β, L, σ²)

-4325.35618025044

**You will lose all 15 + 30 + 30 = 75 points** if the following statement throws `AssertionError`.

In [41]:
@assert logl!(obs, β, Matrix(cholesky(Σ).L), σ²) ≈ logpdf(mvn, y)

## Q4 Efficiency (30 pts)

Benchmarking your code and compare to the Distributions.jl function `logpdf`.

In [103]:
# benchmark the `logpdf` function in Distribution.jl
bm1 = @benchmark logpdf($mvn, $y)

BenchmarkTools.Trial: 8448 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m558.875 μs[22m[39m … [35m807.541 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m577.958 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m585.918 μs[22m[39m ± [32m 22.740 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m▁[39m▃[39m▄[39m▆[39m▇[39m█[39m█[39m█[34m▇[39m[39m▇[39m▆[39m▆[32m▅[39m[39m▅[39m▄[39m▄[39m▃[39m▃[39m▃[39m▃[39m▂[39m▂[39m▃[39m▁[39m▁[39m▁[39m▁[39m▁[39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▃
  [39m▅[39m▅[39m█[39

In [104]:
# benchmark your implementation
L = Matrix(cholesky(Σ).L)
bm2 = @benchmark logl!($obs, $β, $L, $σ²)

BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m1.986 μs[22m[39m … [35m 32.519 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m2.032 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m2.131 μs[22m[39m ± [32m440.235 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m▃[39m█[34m█[39m[39m▅[39m▃[39m▁[39m [39m▂[32m▃[39m[39m▂[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▃[39m▃[39m▁[39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[34m█[39m[39m█[39m█[3

The points you will get is
$$
\frac{x}{1000} \times 30,
$$
where $x$ is the speedup of your program against the standard method.

In [105]:
# this is the points you'll get
clamp(median(bm1).time / median(bm2).time / 1000 * 30, 0, 30)

8.53144497293751

**Hint**: Apparently I am using 1000 as denominator because I expect your code to be at least $1000 \times$ faster than the standard method.

## Q5 Memory (30 pts)

You want to avoid memory allocation in the "hot" function `logl!`. You will lose 1 point for each `1 KiB = 1024 bytes` memory allocation. In other words, the points you get for this question is

In [106]:
clamp(30 - median(bm2).memory / 1024, 0, 30)

29.71875

**Hint**: I am able to reduce the memory allocation to 0 bytes.

## Q6 Misc (15 pts)

Coding style, Git workflow, etc. For reproducibity, make sure we (TA and myself) can run your Jupyter Notebook. That is how we grade Q4 and Q5. If we cannot run it, you will get zero points.