In [1]:
# load necessary packages; make sure install them first
using BenchmarkTools, CSV, DataFrames, DelimitedFiles, Distributions
using Ipopt, LinearAlgebra, MathOptInterface, MixedModels, NLopt
using PrettyTables, Random, RCall

const MOI = MathOptInterface

MathOptInterface

### **Q1. (Optional, 30 bonus pts) Derivatives**

1. Prove the following derivatives:

- $\nabla_\boldsymbol{\beta} \ell_i (\boldsymbol{\beta}, \mathbf{L}, \sigma^2) = \mathbf{X_i}^{T} \mathbf{\Omega_i}^{-1}\mathbf{r_i}$,
- $\nabla_{\sigma^2} \ell_i (\boldsymbol{\beta}, \mathbf{L}, \sigma^2) = -\frac{1}{2} tr(\mathbf{\Omega_i}^{-1}) + \frac{1}{2}\mathbf{r_i^{T}\Omega_i^{-2}r_i}$,

### **Q2. (20 pts) Objective and gradient evaluator for a single datum**

We expand the code from HW3 to evaluate both objective and gradient. I provide my code for HW3 below as a starting point. You do not have to use this code. If your come up faster code, that's even better.

#### **Expansion of** $\nabla_\boldsymbol{\beta} \ell_i (\boldsymbol{\beta}, \mathbf{L}, \sigma^2) = \mathbf{X_i}^{T} \mathbf{\Omega_i}^{-1}\mathbf{r_i}$: 

We can use the Sherman Woodbury formula and a Cholesky decomposition to simplify $\Omega_i = (\sigma^2I + Z_iLL^{T}Z_i^{T})^{-1}$, resulting in: $\Omega_i^{-1} = \frac{1}{\sigma^2}I - \frac{1}{\sigma^2}Z_iL(AA^{T})^{-1}L^{T}Z_i^{T}$

$(AA^{T})^{-1}$ is the result of the Cholesky decomposition. A is a lower triangular matrix, and A' is an upper triangular matrix. However, in the code below, we extract the upper triangular matrix and store it as 'V' (not explicitly stored in the code as V, but I will write it as such in the math below so the results in the code are more clear). 

\begin{align}
X_i^{T} \Omega_i r_i &= X_i^{T} (\sigma^2I + Z_iLL^{T}Z_i^{T})^{-1}(y_i - X_i\beta) \\
&= X_i^{T}\Big[\frac{1}{\sigma^2}I - \frac{1}{\sigma^2}Z_iL(V^{T}V)^{-1}L^{T}Z_i^{T}\Big](y_i - X_i\beta) \\
&= \frac{1}{\sigma^2}\Big[X_i^{T}y_i - X_i^{T}Z_iLV^{-1}(V^{T})^{-1}L^{T}Z_i^{T}y_i - X_i^{T}X_i\beta + X_i^{T}Z_iV^{-1}(V^{T})^{-1}L^{T}Z_i^{T}X_i\beta\Big] \\
&= \frac{1}{\sigma^2} \Big[X_i^{T}y_i - X_i{T}X_i\beta - X_i^{T}Z_iLV^{-1}(V^{T})^{-1}(Z_i^{T}y_i - Z_i^{T}X_i\beta)\Big]
\end{align}

#### **Expansion of** $\nabla_{\sigma^2} \ell_i (\boldsymbol{\beta}, \mathbf{L}, \sigma^2) = -\frac{1}{2} tr(\mathbf{\Omega_i}^{-1}) + \frac{1}{2}\mathbf{r_i^{T}\Omega_i^{-2}r_i}$ :

Starting with the first term, $-\frac{1}{2} tr({\Omega_i}^{-1}):$

\begin{align}
-\frac{1}{2} tr({\Omega_i}^{-1}) &= tr\Big(\frac{1}{\sigma^2}I - \frac{1}{\sigma^2}Z_iL(V^{T}V)^{-1}L^{T}Z_i^{T}\Big) = \frac{1}{2\sigma^2}\Big[tr(I) - tr(Z_iL(V^{T}V)^{-1}L^{T}Z_i^{T})\Big] = -\frac{1}{2\sigma^2}\Big[n - tr(V^{T}V)^{-1}L^{T}Z_i^{T}Z_iL)\Big]
\end{align}

Moving onto the second term, $\frac{1}{2}r_i^{T}\Omega_i^{-2}r_i$:

\begin{align}
\frac{1}{2}r_i^{T}\Omega_i^{-2}r_i &= \frac{1}{2} (y_i - X_i\beta)^{T}\Big[\frac{1}{\sigma^2}I - \frac{1}{\sigma^2} Z_iL(V^{T}V)^{-1}L^{T}Z^{T}\Big]\Big[\frac{1}{\sigma^2}I - \frac{1}{\sigma^2} Z_iL(V^{T}V)^{-1}L^{T}Z^{T}\Big](y_i - X_i\beta) \\
&= \frac{1}{2\sigma^4} \Big[(y_i - X_i\beta) - Z_iL(V^{T}V)^{-1}L^{T}(Z_i^{T}y_i - Z_i^{T}X_i\beta)\Big]^{T} \Big[(y_i - X_i\beta) - Z_iL(V^{T}V)^{-1}L^{T}(Z_i^{T}y_i - Z_i^{T}X_i\beta)\Big] \\
&\Rightarrow C = Z_iL(V^{T}V)^{-1}L^{T}(Z_i^{T}y_i - Z_i^{T}X_i\beta) \\
&= \frac{1}{2\sigma^4}\Big[(y_i - X_i\beta) - C\Big]^{T} \Big[(y_i - X_i\beta) - C\Big] \\
&= \frac{1}{2\sigma^4}\Big[(y_i - X_i\beta)^{T}(y_i - X_i\beta) - 2(y_i - X_i\beta)^{T}C + C^{T}C\Big]
\end{align}

Combining the two terms we get:
\begin{align}
-\frac{1}{2} tr({\Omega_i}^{-1}) + \frac{1}{2}r_i^{T}\Omega_i^{-2}r_i 
&= \frac{1}{\sigma^2}\Big[n - tr(V^{T}V)^{-1}L^{T}Z_i^{T}Z_iL)\Big] + \frac{1}{2\sigma^4}\Big[(y_i - X_i\beta)^{T}(y_i - X_i\beta) - 2(y_i - X_i\beta)^{T}C + C^{T}C\Big]
\end{align}

In [111]:
# define a type that holds an LMM datum
struct LmmObs{T <: AbstractFloat}
    # data
    y          :: Vector{T}
    X          :: Matrix{T}
    Z          :: Matrix{T}
    # arrays for holding gradient
    ∇β         :: Vector{T}
    ∇σ²        :: Vector{T}
    ∇Σ         :: Matrix{T} 
    ∇L         :: Matrix{T} 
    # working arrays
    # TODO: whatever intermediate arrays you may want to pre-allocate
    yty        :: T
    xty        :: Vector{T}
    zty        :: Vector{T}
    storage_p  :: Vector{T}
    storage_q  :: Vector{T}
    storage_q2 :: Vector{T}
    storage_q3 :: Vector{T}
    storage_q4 :: Vector{T}
    xtx        :: Matrix{T}
    ztx        :: Matrix{T}
    ztz        :: Matrix{T}
    xtz        :: Matrix{T} # added by me
    storage_qq :: Matrix{T}
    storage_qq2:: Matrix{T} # added by me
    storage_qq3:: Matrix{T} # added by me
    storage_qq4:: Matrix{T} # added by me
    storage_qq5:: Matrix{T} # added by me
end

"""
    LmmObs(y::Vector, X::Matrix, Z::Matrix)

Create an LMM datum of type `LmmObs`.
"""
function LmmObs(
        y::Vector{T}, 
        X::Matrix{T}, 
        Z::Matrix{T}
    ) where T <: AbstractFloat
    n, p, q    = size(X, 1), size(X, 2), size(Z, 2)    
    ∇β         = Vector{T}(undef, p)
    ∇σ²        = Vector{T}(undef, 1)
    ∇Σ         = Matrix{T}(undef, q, q) 
    ∇L         = Matrix{T}(undef, q, q)    
    yty        = abs2(norm(y))
    xty        = transpose(X) * y
    zty        = transpose(Z) * y    
    storage_p  = Vector{T}(undef, p)
    storage_q  = Vector{T}(undef, q)
    storage_q2 = Vector{T}(undef, q)
    storage_q3 = Vector{T}(undef, q)
    storage_q4 = Vector{T}(undef, q)
    xtx        = transpose(X) * X
    ztx        = transpose(Z) * X
    ztz        = transpose(Z) * Z
    xtz        = transpose(X) * Z # added by me
    storage_qq = similar(ztz)
    storage_qq2= similar(ztz) # added by me
    storage_qq3= similar(ztz) # added by me
    storage_qq4= similar(ztz) # added by me
    storage_qq5= similar(ztz) # added by me
    LmmObs(y, X, Z, ∇β, ∇σ², ∇Σ, ∇L, 
        yty, xty, zty, storage_p, storage_q,
        storage_q2, storage_q3, storage_q4, xtx, ztx, ztz, xtz, storage_qq, storage_qq2, storage_qq3, storage_qq4,
        storage_qq5)
end

"""
    logl!(obs::LmmObs, β, L, σ², needgrad=false)

Evaluate the log-likelihood of a single LMM datum at parameter values `β`, `L`, 
and `σ²`. If `needgrad==true`, then `obs.∇β`, `obs.∇Σ`, and `obs.σ² are filled 
with the corresponding gradient.
"""
function logl!(
        obs      :: LmmObs{T}, 
        β        :: Vector{T}, 
        L        :: Matrix{T}, 
        σ²       :: T,
        needgrad :: Bool = true
    ) where T <: AbstractFloat
    n, p, q = size(obs.X, 1), size(obs.X, 2), size(obs.Z, 2)
    ####################
    # Evaluate objective
    ####################    
    # form the q-by-q matrix: M = σ² * I + Lt Zt Z L
    copy!(obs.storage_qq, obs.ztz)
    BLAS.trmm!('L', 'L', 'T', 'N', T(1), L, obs.storage_qq) 
    BLAS.trmm!('R', 'L', 'N', 'N', T(1), L, obs.storage_qq) 
    @inbounds for j in 1:q
        obs.storage_qq[j, j] += σ²
    end
    # cholesky on M = σ² * I + Lt Zt Z L
    LAPACK.potrf!('U', obs.storage_qq) # extract A' = V from cholesky on M 
    # storage_q = (Mchol.U') \ (Lt * (Zt * res))
    BLAS.gemv!('N', T(-1), obs.ztx, β, T(1), copy!(obs.storage_q, obs.zty)) # z'y - z'xβ
    BLAS.trmv!('L', 'T', 'N', L, obs.storage_q)    # L'(z'y - z'xβ)
    BLAS.trsv!('U', 'T', 'N', obs.storage_qq, obs.storage_q) # V'^{-1} L'(z'y - z'xβ)
    # l2 norm of residual vector
    copy!(obs.storage_p, obs.xty)
    rtr  = obs.yty +
        dot(β, BLAS.gemv!('N', T(1), obs.xtx, β, T(-2), obs.storage_p))
    # assemble pieces
    logl::T = n * log(2π) + (n - q) * log(σ²) # constant term
    @inbounds for j in 1:q
        logl += 2log(obs.storage_qq[j, j])
    end
    qf    = abs2(norm(obs.storage_q)) # quadratic form term
    logl += (rtr - qf) / σ² 
    logl /= -2
    ###################
    # Evaluate gradient
    ###################    
    if needgrad
        # TODO: fill ∇β, ∇L, ∇σ² by gradients
        #sleep(1e-3) # pretend this step takes 1ms
        
        ####### gradient wrt β #######
        
        ### term 1 xty - xtxβ ###
        
        copy!(obs.∇β, obs.xty) # ∇β now contains xty
        BLAS.gemv!('N', T(-1), obs.xtx, β, T(1), obs.∇β) # overwriting ∇β with x'y - x'x β
        
        ### term 2 xtzL(V'V)^{-1}L'(zty - ztxβ) ###
        
        copy!(obs.storage_q2, obs.storage_q)
        BLAS.trsv!('U', 'N', 'N', obs.storage_qq, obs.storage_q2) 
        # cholesky extracted for M was upper 
        # this gets us V^{-1}V'^{-1} L'(zty - ztxβ)
        BLAS.trmv!('L', 'N', 'N', L, obs.storage_q2)
        # this gets us L*(V)^{-1}V'^{-1} L'(zty - ztxβ)
        
        ### combine the two terms ###
        
        BLAS.gemv!('N', T(-1)/σ², obs.xtz, obs.storage_q2, T(1)/σ², obs.∇β)
        # subtracting terms 1 and 2 and dividing by σ²
        
        ####### gradient wrt σ² #######
        
        ### term 1 ###
        
        copy!(obs.storage_qq2, obs.ztz)
        BLAS.trmm!('R', 'L', 'N', 'N', T(1), L, obs.storage_qq2) 
        # ztzL
        BLAS.trmm!('L', 'L', 'T', 'N', T(1), L, obs.storage_qq2)
        # L'ztzL
        LAPACK.potrs!('U', obs.storage_qq, obs.storage_qq2)
        # (V'V)^{-1} L'ztzL
        obs.∇σ²[1] = (-n + tr(obs.storage_qq2)) / (2*σ²)
       
        ### term 2 ###
        
        mul!(obs.storage_q3, obs.ztz, obs.storage_q2) 
        # ztz*L*V^{-1}(V)'^{-1} L'(zty - ztxβ)
        
        ### combine the two terms ###
        obs.∇σ²[1] +=  (rtr - 2*qf + dot(obs.storage_q3, obs.storage_q2)) / (2*σ²*σ²) 
        
        ####### gradient wrt L #######
    
        #### term 1: -z'omega^{-1}zL = -ztzL + ztzL(V'V)^{-1} L'ztzL #### !!! CORRECT !!! 
        
        mul!(obs.storage_qq3, obs.ztz, L) 
        # ztzL
        copy!(obs.storage_qq4, obs.storage_qq3) 
        # need to copy bc gemm! won't work properly if A and C are the same
        BLAS.gemm!('N', 'N', T(1/σ²), obs.storage_qq3, obs.storage_qq2, T(-1/σ²), obs.storage_qq4)
        # 1/σ²*ztzL*(V'V)^{-1} L'ztzL - 1/σ²*ztzL 
        # // note: (V'V)^{-1} L'ztzL computed previously, stored in obs.storage_qq2
        
        #### term 2:  ####
        copy!(obs.storage_q4, obs.zty) #GOOD
        BLAS.gemv!('N', T(-1), obs.ztx, β, T(1), obs.storage_q4) #zty - ztxβ #GOOD
        BLAS.axpy!(T(-1), obs.storage_q3, obs.storage_q4)
        #zty - ztxβ - ztzL(V'V)^{-1}L'(ztz-ztxβ)
        copy!(obs.storage_qq5, obs.ztz) 
        BLAS.gemm!('N', 'T', T(1/σ²^2), obs.storage_q4, obs.storage_q4, T(0), obs.storage_qq5) #looks good up to here
        mul!(obs.∇L, obs.storage_qq5, L)
        
         #### combine terms  ####
        
        BLAS.axpy!(T(1), obs.storage_qq4, obs.∇L)
        
        
        # need to check sigmas
        
    end    
    ###################
    # Return
    ###################        
    return logl 
end

logl!

In [112]:
Random.seed!(257)
# dimension
n, p, q = 2000, 5, 3
# predictors
X  = [ones(n) randn(n, p - 1)]
Z  = [ones(n) randn(n, q - 1)]
# parameter values
β  = [2.0; -1.0; rand(p - 2)]
σ² = 1.5
Σ  = fill(0.1, q, q) + 0.9I # compound symmetry 
L  = Matrix(cholesky(Symmetric(Σ)).L)
# generate y
y  = X * β + Z * rand(MvNormal(Σ)) + sqrt(σ²) * randn(n)

# form the LmmObs object
obs = LmmObs(y, X, Z);

In [113]:
logl!(obs, β, L, σ², true)

-3256.179335805826

In [114]:
obs.∇L

3×3 Matrix{Float64}:
 -0.970913    0.0301459  -0.299679
 -0.0138436  -0.97012     0.281913
 -0.156986    0.38892     1.15936

In [87]:
S = -Z'*inv(σ²*I + Z*L*L'*Z')*Z*L # first term for gradient wrt L

3×3 Matrix{Float64}:
 -0.999236     0.100354    0.0916096
 -7.04315e-5  -1.00426     0.0916357
 -7.14965e-5  -4.0826e-5  -1.00844

In [88]:
B = Z'*inv(σ²*I + Z*L*L'*Z')*(y-X*β)*transpose((y-X*β))*inv(σ²*I + Z*L*L'*Z')*Z*L # second term for gradient wrt L

3×3 Matrix{Float64}:
  0.0283232  -0.0702077  -0.391289
 -0.0137731   0.0341409   0.190278
 -0.156915    0.388961    2.1678

In [85]:
S+B

3×3 Matrix{Float64}:
 -0.970913    0.0301459  -0.299679
 -0.0138436  -0.97012     0.281913
 -0.156986    0.38892     1.15936

In [130]:
obs.∇L

3×3 Matrix{Float64}:
 -0.970913    0.0301459  -0.299679
 -0.0138436  -0.97012     0.281913
 -0.156986    0.38892     1.15936

### **2.1  Correctness**

In [123]:
@show logl = logl!(obs, β, L, σ², true)
@show obs.∇β
@show obs.∇σ²
#@show obs.∇Σ;

logl = logl!(obs, β, L, σ², true) = -3256.179335805826
obs.∇β = [0.2669810805714521, 41.61418337067322, -34.34664962312688, 36.108985107075306, 27.913948208793148]
obs.∇σ² = [1.6283715138411026]


In [124]:
@assert abs(logl - (-3256.1793358058258)) < 1e-4
@assert norm(obs.∇β - [0.26698108057144054, 41.61418337067327, 
        -34.34664962312689, 36.10898510707527, 27.913948208793144]) < 1e-4
# @assert norm(obs.∇Σ - 
#     [-0.9464482950697888 0.057792444809492895 -0.30244127639188767; 
#         0.057792444809492895 -1.00087164917123 0.2845116557144694; 
#         -0.30244127639188767 0.2845116557144694 1.170040927259726]) < 1e-4
@assert abs(obs.∇σ²[1] - (1.6283715138412163)) < 1e-4

### **2.2  Efficiency**

Benchmark for evaluating the objective function only. This is what we did in HW3.

In [99]:
@benchmark logl!($obs, $β, $L, $σ², false)

BenchmarkTools.Trial: 10000 samples with 81 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m800.346 ns[22m[39m … [35m  3.211 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m807.963 ns               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m858.929 ns[22m[39m ± [32m139.396 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m█[34m [39m[39m▆[39m▁[39m [32m [39m[39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m▃[39m▂[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[34m▇[39m[

Benchmark for objective + gradient evaluation.

In [100]:
bm_objgrad = @benchmark logl!($obs, $β, $L, $σ², true)

BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m2.333 μs[22m[39m … [35m 17.856 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m2.379 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m2.548 μs[22m[39m ± [32m509.477 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m█[34m█[39m[39m▆[39m▅[39m▄[39m [32m▂[39m[39m▁[39m [39m [39m▁[39m▁[39m [39m [39m▃[39m [39m [39m▂[39m [39m [39m [39m▁[39m [39m [39m▁[39m [39m [39m [39m [39m▁[39m [39m▂[39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m▂
  [39m█[34m█[39m[39m█[39m█[39m█[3

In [101]:
#  The points you will get are
clamp(10 / (median(bm_objgrad).time / 1e3) * 10, 0, 10)

10.0

### **Q3. LmmModel type**

We create a `LmmModel` type to hold all data points and model parameters. Log-likelihood/gradient of a `LmmModel` object is simply the sum of log-likelihood/gradient of individual data points.

In [126]:
# define a type that holds LMM model (data + parameters)
struct LmmModel{T <: AbstractFloat} <: MOI.AbstractNLPEvaluator
    # data
    data :: Vector{LmmObs{T}}
    # parameters
    β    :: Vector{T}
    L    :: Matrix{T}
    σ²   :: Vector{T}    
    # arrays for holding gradient
    ∇β   :: Vector{T}
    ∇σ²  :: Vector{T}
    ∇L   :: Matrix{T}
    # TODO: add whatever intermediate arrays you may want to pre-allocate
    xty  :: Vector{T}
    ztr2 :: Vector{T}
    xtx  :: Matrix{T}
    ztz2 :: Matrix{T}
end

"""
    LmmModel(data::Vector{LmmObs})

Create an LMM model that contains data and parameters.
"""
function LmmModel(obsvec::Vector{LmmObs{T}}) where T <: AbstractFloat
    # dims
    p    = size(obsvec[1].X, 2)
    q    = size(obsvec[1].Z, 2)
    # parameters
    β    = Vector{T}(undef, p)
    L    = Matrix{T}(undef, q, q)
    σ²   = Vector{T}(undef, 1)    
    # gradients
    ∇β   = similar(β)    
    ∇σ²  = similar(σ²)
    ∇L   = similar(L)
    # intermediate arrays
    xty  = Vector{T}(undef, p)
    ztr2 = Vector{T}(undef, abs2(q))
    xtx  = Matrix{T}(undef, p, p)
    ztz2 = Matrix{T}(undef, abs2(q), abs2(q))
    LmmModel(obsvec, β, L, σ², ∇β, ∇σ², ∇L, xty, ztr2, xtx, ztz2)
end

"""
    logl!(m::LmmModel, needgrad=false)

Evaluate the log-likelihood of an LMM model at parameter values `m.β`, `m.L`, 
and `m.σ²`. If `needgrad==true`, then `m.∇β`, `m.∇Σ`, and `m.σ² are filled 
with the corresponding gradient.
"""
function logl!(m::LmmModel{T}, needgrad::Bool = false) where T <: AbstractFloat
    logl = zero(T)
    if needgrad
        fill!(m.∇β , 0)
        fill!(m.∇L , 0)
        fill!(m.∇σ², 0)        
    end
    @inbounds for i in 1:length(m.data)
        obs = m.data[i]
        logl += logl!(obs, m.β, m.L, m.σ²[1], needgrad)
        if needgrad
            BLAS.axpy!(T(1), obs.∇β, m.∇β)
            BLAS.axpy!(T(1), obs.∇Σ, m.∇L)
            m.∇σ²[1] += obs.∇σ²[1]
        end
    end
    logl
end

logl!

### **Q4. (20 pts) Test data**

Let's generate a fake longitudinal data set to test our algorithm.

In [127]:
Random.seed!(257)

# dimension
m      = 1000 # number of individuals
ns     = rand(1500:2000, m) # numbers of observations per individual
p      = 5 # number of fixed effects, including intercept
q      = 3 # number of random effects, including intercept
obsvec = Vector{LmmObs{Float64}}(undef, m)
# true parameter values
βtrue  = [0.1; 6.5; -3.5; 1.0; 5; zeros(p - 5)]
σ²true = 1.5
σtrue  = sqrt(σ²true)
Σtrue  = Matrix(Diagonal([2.0; 1.2; 1.0; zeros(q - 3)]))
Ltrue  = Matrix(cholesky(Symmetric(Σtrue), Val(true), check=false).L)
# generate data
for i in 1:m
    # first column intercept, remaining entries iid std normal
    X = Matrix{Float64}(undef, ns[i], p)
    X[:, 1] .= 1
    @views Distributions.rand!(Normal(), X[:, 2:p])
    # first column intercept, remaining entries iid std normal
    Z = Matrix{Float64}(undef, ns[i], q)
    Z[:, 1] .= 1
    @views Distributions.rand!(Normal(), Z[:, 2:q])
    # generate y
    y = X * βtrue .+ Z * (Ltrue * randn(q)) .+ σtrue * randn(ns[i])
    # form a LmmObs instance
    obsvec[i] = LmmObs(y, X, Z)
end
# form a LmmModel instance
lmm = LmmModel(obsvec);


For later comparison with other software, we save the data into a text file lmm_data.csv. **Do not put this file in Git.** It takes 245.4MB storage.

In [41]:
(isfile("lmm_data.csv") && filesize("lmm_data.csv") == 245369936) || 
open("lmm_data.csv", "w") do io
    p = size(lmm.data[1].X, 2)
    q = size(lmm.data[1].Z, 2)
    # print header
    print(io, "ID,Y,")
    for j in 1:(p-1)
        print(io, "X" * string(j) * ",")
    end
    for j in 1:(q-1)
        print(io, "Z" * string(j) * (j < q-1 ? "," : "\n"))
    end
    # print data
    for i in eachindex(lmm.data)
        obs = lmm.data[i]
        for j in 1:length(obs.y)
            # id
            print(io, i, ",")
            # Y
            print(io, obs.y[j], ",")
            # X data
            for k in 2:p
                print(io, obs.X[j, k], ",")
            end
            # Z data
            for k in 2:q-1
                print(io, obs.Z[j, k], ",")
            end
            print(io, obs.Z[j, q], "\n")
        end
    end
end

### **4.1  Correctness**

Evaluate log-likelihood and gradient of whole data set at the true parameter values.

In [129]:
lmm.∇L

3×3 Matrix{Float64}:
 NaN             -2.73681e191  NaN
   2.16124e256  NaN            NaN
 NaN            NaN            NaN

In [128]:
copy!(lmm.β, βtrue)
copy!(lmm.L, Ltrue)
lmm.σ²[1] = σ²true
@show obj = logl!(lmm, true)
@show lmm.∇β
@show lmm.∇σ²
@show lmm.∇L;

obj = logl!(lmm, true) = -2.84006843836997e6
lmm.∇β = [41.06591670742247, 445.75120353972505, 157.01339922492545, -335.099773607337, -895.6257448385876]
lmm.∇σ² = [-489.53617303824456]
lmm.∇L = [NaN -2.736812490423738e191 NaN; 2.1612424389152652e256 NaN NaN; NaN NaN NaN]


In [122]:
@assert abs(obj - (-2.840068438369969e6)) < 1e-4
@assert norm(lmm.∇β - [41.0659167074073, 445.75120353972426, 
        157.0133992249258, -335.09977360733626, -895.6257448385899]) < 1e-4
@assert norm(lmm.∇L - [-3.3982575935824837 31.32103842086001 26.73645089732865; 
        40.43528672997116 61.86377650461202 -75.37427770754684; 
        37.811051468724486 -82.56838431216435 -56.45992542754974]) < 1e-4
@assert abs(lmm.∇σ²[1] - (-489.5361730382465)) < 1e-4

LoadError: AssertionError: norm(lmm.∇L - [-3.3982575935824837 31.32103842086001 26.73645089732865; 40.43528672997116 61.86377650461202 -75.37427770754684; 37.811051468724486 -82.56838431216435 -56.45992542754974]) < 0.0001

In [131]:
bm_model = @benchmark logl!($lmm, true)

BenchmarkTools.Trial: 1418 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m2.611 ms[22m[39m … [35m 10.742 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m3.366 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m3.494 ms[22m[39m ± [32m761.772 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m▁[39m▂[39m█[39m▂[39m▂[39m▂[39m▁[39m▁[39m▄[34m█[39m[39m▄[32m▂[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▇[39m▆[39m█[39m█[39m█[39m█[39

In [132]:
clamp(10 / (median(bm_model).time / 1e6) * 10, 0, 10)

10.0

In [133]:
clamp(10 - median(bm_model).memory / 100, 0, 10)

10.0