Skip to content
Structurally efficient multi-output linearly coregionalized Gaussian Processes: it's tricky, tricky, tricky, tricky, tricky.
Python TeX Shell MATLAB
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
benchmarks print kernel vals Nov 27, 2017
data synth on 50K Oct 9, 2017
doc Documantation: no more view on github, remove build errors via apidoc Aug 4, 2017
examples Updated README, examples to follow new API Sep 21, 2017
paper added aistats review comments (pt 1) Nov 26, 2017
runlmc fix prediction-mode bug Oct 10, 2017
.gitignore move fx2007,weather benchmarks to asv/ for asv-specific arch Sep 7, 2017
.pylintrc Refactored grid kernel code - allows parallelism Feb 1, 2017
.readthedocs.yml modifications to autodoc compile on RTD, .travis.yml analogously Aug 3, 2017
.travis.yml only support python 3.5+ due to pickling error Sep 22, 2017
LICENSE updated represntation-cmp benchmark to new api Oct 12, 2017 Made paper arxiv-compatible Aug 2, 2017
asv.conf.json runlmc link to repo Sep 12, 2017 updated asvrun to handle fresh repos Oct 1, 2017 Documantation: no more view on github, remove build errors via apidoc Aug 4, 2017 try new paramz version Oct 3, 2017 Added dev scripts Nov 25, 2016

License Documentation Status CI codecov asv


Do you like to apply Bayesian nonparameteric methods to your regressions? Are you frequently tempted by the flexibility that kernel-based learning provides? Do you have trouble getting structured kernel interpolation or various training conditional inducing point approaches to work in a non-stationary multi-output setting?

If so, this package is for you.

runlmc is a Python 3.5+ package designed to extend structural efficiencies from Scalable inference for structured Gaussian process models (Staaçi 2012) and Thoughts on Massively Scalable Gaussian Processes (Wilson et al 2015) to the non-stationary setting of linearly coregionalized multiple-output regressions. For the single output setting, MATLAB implementations are available here.

In other words, this provides a matrix-free implementation of multi-output GPs for certain covariances. As far as I know, this is also the only matrix-free implementation for single-output GPs in python.

Usage Notes

  • Zero-mean only for now.
  • Check out the latest documentation
  • Check out the Dev Stuff section below for installation requirements.
  • Accepts arbitrary input dimensions are allowed, but the number of active dimensions in each kernel must still be capped at two (though a model can have multiple different kernels depending on different subsets of the dimensions).

A note on GPy

GPy is a way more general GP library that was a strong influence in the development of this one. I've tried to stay as faithful as possible to its structure.

I've re-used a lot of the GPy code. The main issue with simply adding my methods to GPy is that the API used to interact between GPy's kern, likelihood, and inference packages centers around the dL_dK object, a matrix derivative of the likelihood with respect to covariance. The materialization of this matrix is the very thing my algorithm tries to avoid for performance.

If there is some quantifiable success with this approach then integration with GPy would be a reasonable next-step.

Examples and Benchmarks


n_per_output = [65, 100]
xss = list(map(np.random.rand, n_per_output))
yss = [f(2 * np.pi * xs) + np.random.randn(len(xs)) * 0.05
       for f, xs in zip([np.sin, np.cos], xss)]
ks = [RBF(name='rbf{}'.format(i)) for i in range(nout)]
ranks = [1]
fk = FunctionalKernel(D=len(xss), lmc_kernels=ks, lmc_ranks=ranks)
lmc = LMC(xss, yss, functional_kernel=fk)
# ... plotting code


# ... more plotting code


For runnable code, check examples/.

Running the Examples and Benchmarks

Make sure that the directory root is in the PYTHONPATH when running the benchmarks. E.g., from the directory root:

PYTHONPATH=.. jupyter notebook examples/example.ipynb
cd benchmarks/fx2007 && ./ # will take a while!

Dev Stuff

All below invocations should be done from the repo root.

Command Purpose
./ Check style with pylint, ignoring TODOs and locally-disabled warnings.
./ Regenerate docs (index will be in doc/_generated/_build/index.html)
nosetests Run unit tests
./ Create an arxiv-friendly tarball of the paper sources
python install Install minimal runtime requirements for runlmc
./ run performance benchmarks

To develop, requirements also include:

sphinx sphinx_rtd_theme matplotlib codecov pylint parameterized pandas contexttimer GPy asv

To build the paper, the packages epstool and epstopdf are required. Developers should also have sphinx sphinx_rtd_theme matplotlib GPy codecov pylint parameterized pandas contexttimer installed.


  1. Make standard_tester stale-tolerable: can't fetch data, code from github without version inconsistency.
  2. Make grad-grid benchmark only generate pdf files directly, get rid of epstool,epstopdf deps.
  3. Make all benchmarks accept --validate (And add --validate test for representation-cmp : inv path should be tested in
  4. Automatically trigger ./ on commit, somehow
  5. Automatically find min_grad_ratio parameter / get rid of it.
  6. Preconditioning
    • Cache Krylov solutions over iterations?
    • Cutajar 2016 iterative inversion approach?
    • T.Chan preconditioning for specialized on-grid case (needs development of partial grid)
  7. TODO(test) - document everything that's missing documentation along the way.
  8. Current prediction generates the full covariance matrix, then throws everything but the diagonal away. Can we do better?
  9. Compare to MTGP, CGP
  10. Minor perf improvements: what helps?
    • CPython; numba.
    • In-place multiplication where possible
    • square matrix optimizations
    • TODO(sparse-derivatives)
    • bicubic interpolation: invert order of xs/ys for locality gains (i.e., interpolate x first then y)
  11. TODO(sum-fast) low-rank dense multiplications give SumKernel speedups?
  12. multidimensional inputs and ARD.
  13. TODO(prior). Compare to spike and slab, also try MedGP (e.g., three-parameter beta) - add tests for priored versions of classes, some tests in parameterization/ (priors should be value-cached, try to use an external package)
  14. HalfLaplace should be a Prior, add vectorized priors (remembering the shape)
  15. Migrate to asv, separate tests/ folder (then no autodoc hack to skip test_* modules; pure-python benchmarks enable validation of weather/ and fx2007 benchmarks on travis-ci but then need to be decoupled from MATLAB implementations)
  16. mean functions
  17. product kernels (multiple factors)
  18. active dimension optimization
  19. Consider other approximate inverse algorithms: see Thm 2.4 of Agarwal, Allen-Zhu, Bullins, Hazan, Ma 2016
You can’t perform that action at this time.