# Gaussian Process Regression with `GpABC`

In [13]:
using GpABC, Distributions, Plots

┌ Info: Precompiling GpABC [e850a1a4-d859-11e8-3d54-a195e6d045d3]
└ @ Base loading.jl:1192
│ This may mean Optim [429524aa-4258-5aef-a3af-852621145aeb] does not support precompilation but is imported by a module that does.
└ @ Base loading.jl:947
┌ Info: Recompiling stale cache file /home/tah17/.julia/compiled/v1.0/Distributions/xILW0.ji for Distributions [31c24e10-a181-5473-b8eb-7969acd0382f]
└ @ Base loading.jl:1190
│ This may mean StatsBase [2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91] does not support precompilation but is imported by a module that does.
└ @ Base loading.jl:947
┌ Info: Recompiling stale cache file /home/tah17/.julia/compiled/v1.0/StatsFuns/530lR.ji for StatsFuns [4c63d2b9-4356-54db-8cca-17b64c39e42c]
└ @ Base loading.jl:1190
│ This may mean SpecialFunctions [276daf66-3868-5448-9aa4-cd146d93841b] does not support precompilation but is imported by a module that does.
└ @ Base loading.jl:947


LoadError: LoadError: LoadError: LoadError: UndefVarError: loggamma not defined
in expression starting at /home/tah17/.julia/packages/StatsFuns/r47Mk/src/misc.jl:60
in expression starting at /home/tah17/.julia/packages/StatsFuns/r47Mk/src/StatsFuns.jl:238
in expression starting at /home/tah17/.julia/packages/Distributions/OdJGZ/src/Distributions.jl:3

## Setup

Define the latent function that we are going to approximate:

In [14]:
f(x) = x ^ 2 + 10 * sin(x) 

f (generic function with 1 method)

Set up some training and test data. Random noise is added to observations in training points, to make the task a little bit harder.

In [15]:
n = 30
training_x = sort(rand(Uniform(-10, 10), n))
training_y = f.(training_x)
training_y += 20 * (rand(n) .- 0.5) # add some noise
test_x = range(min(training_x...), stop=max(training_x...), length=1000) |> collect;


UndefVarError: UndefVarError: Uniform not defined

## Known hyperparameters

The package is built around a type `GPModel`, which encapsulates all the information
required for training the Gaussian Process and performing the regression. In the simplest
scenario the user would instantiate this type with some training data and labels, provide
the hyperparameters and run the regression. `SquaredExponentialIsoKernel` will be used by default. 

Assume we already know the kernel hyperparameters:

- $\sigma_f = 37.08$
- $l = 1.0 $
- $\sigma_n = 6.58$

In [16]:
hypers = [37.08, 1.0, 6.58];

Run the regression and plot the results

In [17]:
gpm = GPModel(training_x, training_y)
set_hyperparameters(gpm, hypers)
test_y, test_var = gp_regression(test_x, gpm)

plot(test_x, test_y, ribbon=1.96 * sqrt.(test_var), c=:red, linewidth=2, label="Approximation")
plot!(test_x, f.(test_x), c=:blue, linewidth=2, label="True function")
scatter!(training_x, training_y, m=:star4, label="Noisy training data")

UndefVarError: UndefVarError: GPModel not defined

## Training the hyperparameters

Normally, kernel hyperparameters are not known in advance. In this scenario the `gp_train` function should be used to find the Maximum Likelihood Estimate (MLE) of hyperparameters. By default,
[Conjugate Gradient](http://julianlsolvers.github.io/Optim.jl/stable/algo/cg/) bounded box optimisation is used, as long as the gradient
with respect to hyperparameters is implemented for the kernel function. If the gradient
implementation is not provided, [Nelder Mead](http://julianlsolvers.github.io/Optim.jl/stable/algo/nelder_mead/) optimiser is used by default.


In [18]:
gp_train(gpm)

UndefVarError: UndefVarError: gp_train not defined

Re-run the regression with optimised hyperparameters, and plot the results

In [19]:
test_y, test_var = gp_regression(test_x, gpm)

plot(test_x, test_y, ribbon=1.96 * sqrt.(test_var), c=:red, linewidth=2, label="Approximation")
plot!(test_x, f.(test_x), c=:blue, linewidth=2, label="True function")
scatter!(training_x, training_y, m=:star4, label="Noisy training data")

UndefVarError: UndefVarError: gp_regression not defined

## Advanced usage of `gp_train`

In [20]:
import Optim.SimulatedAnnealing
gpm = GPModel(training_x, training_y)
gp_train(gpm; optimiser=SimulatedAnnealing(), 
    hp_lower=exp.([-10.0, -1.0, -10.0]), 
    hp_upper=exp.([10.0, 2.0, 10.0]), 
    log_level=1)

UndefVarError: UndefVarError: GPModel not defined

## Creating a cutsom kernel

Suppose we want to implement our own kernel function that adds a periodic element to the standard SE ISO kernel:
$$
k(x, x') = \sigma_f^2 \exp\left(-\frac{(x - x')^2}{2l^2}\right) + \exp(-2\sin^2(\sigma_g\pi(x - x')))
$$

This kernel introduces a new hyperparameter, $\sigma_g$, in addition to the standard hyperparameters of $\sigma_f$ and $l$.

In [21]:
import GpABC.covariance, GpABC.get_hyperparameters_size, 
GpABC.covariance_grad, GpABC.covariance_training, GpABC.covariance_diagonal

mutable struct SeIsoPeriodicKernelCache
    last_theta::AbstractArray{Float64, 1}
    D2::AbstractArray{Float64, 2} 
    D::AbstractArray{Float64, 2}
    se_part::AbstractArray{Float64, 2} 
    periodic_part::AbstractArray{Float64, 2}
end
  
SeIsoPeriodicKernelCache() = SeIsoPeriodicKernelCache(zeros(0), zeros(0, 0), zeros(0, 0), zeros(0, 0), zeros(0, 0))

struct SeIsoPeriodicKernel <: AbstractGPKernel
    cache::SeIsoPeriodicKernelCache
end
SeIsoPeriodicKernel() = SeIsoPeriodicKernel(SeIsoPeriodicKernelCache())

function get_hyperparameters_size(ker::SeIsoPeriodicKernel, training_data::AbstractArray{Float64, 2})
    3
end

function covariance(ker::SeIsoPeriodicKernel, log_theta::AbstractArray{Float64, 1}, 
        x1::AbstractArray{Float64, 2}, x2::AbstractArray{Float64, 2})
    D2 = scaled_squared_distance([log_theta[2]], x1, x2)
    n = size(x1, 1)
    m = size(x2, 1)
    D = repeat(x1, 1, m) - repeat(x2', n, 1)
    sigma_f = exp(log_theta[1] * 2)
    sigma_g = exp(log_theta[3])
    K = sigma_f .* exp.(-D2 ./ 2) .+ exp.(-2 .* (sin.(pi * sigma_g .* D).*sin.(pi * sigma_g .* D)))
end



UndefVarError: UndefVarError: AbstractGPKernel not defined

Note that defining the kernel gradient with respect to hyperparameters is optional, and we are skipping it here. This means that it will not be possible to use gradient-based optimisation for GP training, and Nelder-Mead algorithm will be used instead.

In [22]:
gpm = GPModel(training_x, training_y, SeIsoPeriodicKernel())
gp_train(gpm; log_level=1)
test_y, test_var = gp_regression(test_x, gpm)

plot(test_x, test_y, ribbon=1.96 * sqrt.(test_var), c=:red, linewidth=2, label="Approximation")
plot!(test_x, f.(test_x), c=:blue, linewidth=2, label="True function")
scatter!(training_x, training_y, m=:star4, label="Noisy training data")

UndefVarError: UndefVarError: SeIsoPeriodicKernel not defined

Now, let's implement the gradient for the new kernel, as well as short cirquit functions for computing the covariance using cached computation results (`covariance_training`), and for diagonal-only variance (`covariance_diagonal`)

In [23]:
function update_cache(cache::SeIsoPeriodicKernelCache, log_theta::AbstractArray{Float64, 1}, x::AbstractArray{Float64, 2})
    sigma_f = exp(log_theta[1] * 2)
    sigma_g = exp(log_theta[3])
    D2 = scaled_squared_distance([log_theta[2]], x, x)
    n = size(x, 1)
    D = repeat(x, 1, n) - repeat(x', n, 1)
    cache.last_theta = copy(log_theta)
    cache.se_part = sigma_f .* exp.(-D2 ./ 2)
    cache.periodic_part = exp.(-2 .* (sin.(pi * sigma_g .* D).^2)
    cache.D2 = D2
    cache.D = D
    0
end

function covariance_grad(ker::SeIsoPeriodicKernel, log_theta::AbstractArray{Float64, 1}, 
        x::AbstractArray{Float64, 2}, R::AbstractArray{Float64, 2}) 
    cache = ker.cache
    if log_theta != cache.last_theta 
        update_cache(ker.cache, log_theta, x)
    end
    KR = cache.se_part .* R
    d1 = 2 * sum(KR)
    d2 = KR[:]' * cache.D2[:]
    sigma_g = exp(log_theta[3])
    periodic_arg = sigma_g .* pi .* cache.D
    d3 = sum(R' .* cache.periodic_part .* -2 .* sin.(2 .* periodic_arg) .* periodic_arg)
    return [d1, d2, d3]
end

function covariance_training(ker::SeIsoPeriodicKernel,
        log_theta::AbstractArray{Float64, 1}, x::AbstractArray{Float64, 2}) 
    if log_theta != ker.cache.last_theta
        update_cache(ker.cache, log_theta, x)
    end
    return ker.cache.periodic_part + ker.cache.se_part 
end
  
function covariance_diagonal(ker::SeIsoPeriodicKernel, log_theta::AbstractArray{Float64, 1}, x::AbstractArray{Float64, 2})
    fill(exp(log_theta[1] * 2) + 1, (size(x, 1), 1))
end

LoadError: syntax: missing comma or ) in argument list

Now when we train the GP and run the regression, we can see that Conjugate Gradient Descent is used.

In [24]:
gpm = GPModel(training_x, training_y, SeIsoPeriodicKernel())
gp_train(gpm; log_level=1)
test_y, test_var = gp_regression(test_x, gpm)

plot(test_x, test_y, ribbon=1.96 * sqrt.(test_var), c=:red, linewidth=2, label="Approximation")
plot!(test_x, f.(test_x), c=:blue, linewidth=2, label="True function")
scatter!(training_x, training_y, m=:star4, label="Noisy training data")

UndefVarError: UndefVarError: SeIsoPeriodicKernel not defined