# Introduction

MixedModelsBLB.jl is a Julia package for fitting and performing inference for linear mixed models using the Bag of little bootstrap (BLB) method. The advantages of our method include (1) it can run on extremely large data sets that do not fit into memory; (2) testing random effect standard deviation on the boundary poses no problem because the inference is based on bootstrap rather than normal approximation. 

# Manual

## Installation

In [1]:
# (v1.3) pkg> add https://github.com/xinkai-zhou/MixedModelsBLB.jl

will install this package and its dependencies.

In [2]:
# machine information for this tutorial
versioninfo()

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, broadwell)


In [3]:
#using MixedModelsBLB

## Running BLB on smaller data

If the dataset is small enough to fit into the memory, users can simply call ```blb_full_data``` on the data object (of type ```DataFrame```). The model is specified through the familiar formula interface. Other important arguments include
- ```subset_size```, the desired number of cluters in each subset;
- ```n_subsets```, the total number of BLB subsets;
- ```n_boots```, the number of bootstrap iterations.

We will use the ```sleepstudy``` data to illustrate the usage.

In [None]:
# using DataFrames, MixedModels, RData, StatsBase, Random
# datf = joinpath(dirname(pathof(MixedModels)),"..","test","dat.rda")
# const dat = Dict(Symbol(k)=>v for (k,v) in load(datf));
# dsleepstudy = dat[:sleepstudy]

In [None]:
# β̂, Σ̂, τ̂ = blb_full_data(
#     dsleepstudy, 
#     @formula(Y ~ U + (1 | G));
#     subset_size = 10,
#     n_subsets = 10, 
#     n_boots = 1000
# )

By default, we uses an optimization algorithm that requires gradients. Alternatively, one can switch to a gradient free algorithm by specifying, for example, ```solver = NLopt.NLoptSolver(algorithm=:LN_BOBYQA, maxeval=10000)```.

For documentation of the ```blb_full_data``` function, type ```?blb_full_data``` in Julia REPL.

Once we have the BLB estimates, the final point estimates can be obtained by taking two averages: the first  across all bootstrap iterations in one subset, and the second across all subsets. The confidence intervals can be obtained by averaging the percentile confidence intervals across all subsets. The function ```summary``` is used to extrace point estimates and confidence intervals.

In [None]:
# summary(β̂)

## Running BLB on large data

If the data file is too big to fit into the memory, users can pass the file path as an argument, and the function ```blb_full_data``` will use ```JuliaDB``` to load only small subsets into the memory to run the BLB procedure. To demonstrate the usage, we will use ```sleepstudy.csv``` as a toy example.

In [None]:
# β̂, Σ̂, τ̂ = blb_full_data(
#     "sleepstudy.csv", 
#     @formula(Y ~ U + (1 | G)); 
#     id_name = "id", 
#     subset_size = 10,
#     n_subsets = 10, 
#     n_boots = 1000
# )

# API 

In [None]:
@docs 
blb_one_subset

@docs
blb_full_data