VISH#3
Conversation
…nducing variables
…m Titsias (2009) for interdomain inducing variables
…_funk_hecke, wrap just the scipy integration routine
…ximately the likelihood variance at data points. ChordMatern is still broken in this regard.
|
|
||
| v = integrate.quad(integrand, -1.0, 1.0)[0] | ||
| return v * omega_d / C_1 | ||
| @tf.custom_gradient |
There was a problem hiding this comment.
Could you write down in math what the gradients look like? That said, I don't think we need the gradients of funk hecke wrt the lengthscale and variance. For the lengthscale you can simply multiply the inputs X before evaluating the spherical harmonics and the variance can be accounted for by multiplying the eigenvalues with it.
There was a problem hiding this comment.
The gradient wrt the variance is just the funk hecke integral.
The gradient wrt the lengthscale is the funk hecke integral with the shape function replaced by the derivative of the shape function wrt to the lengthscale, i.e. v \int d/dl s(t, l) f(t) dt, where f(t) is the product of the Gegenbauer polynomial and the weighting function wrt which the Gegenbauers are orthogonal and v is the variance.
| values.append(tf.reshape(v, shape=[-1])) | ||
| return tf.concat(values, axis=0) | ||
|
|
||
| def verify_eigenvalue_cache(self): |
There was a problem hiding this comment.
Why do you need this functionality. Minor: maybe a more descriptive name would be clear_eigenvalue_cache_onchange?
There was a problem hiding this comment.
To avoid having to recompute funk-hecke integrals. A call to the eigenvalues method is made every time the Kuu matrix is required, which means every time a prediction is required. If we didn't cache the eigenvalues then we will do a lot of redundant computation whilst optimising the acquisition function, since we need to recompute the posterior variance at every step.
|
|
||
| def K_diag(self, X: TensorType) -> tf.Tensor: | ||
| """ Approximate the true kernel by an inner product between feature functions. """ | ||
| # TODO: Refactor in terms of truncated Mercer decomposition. |
There was a problem hiding this comment.
So that the posterior variance at the data locations goes down to the likelihood variance. This is necessary for the acquisition function to work properly. The variance of the unwarped GP is combination of the posterior mean and posterior variance of the warped GP. If the posterior variance doesn't get sufficiently small, the acquisition function will keep selecting the peak of the mean function. (You can see this for yourself if you run the experiment but change line 38 of the config file vish.yaml from "spherical_matern" to "chord_matern".)
I appreciate that this issue is specific to using VISH for PI, so maybe you don't want it changed in this repo? If so, we could fork it, or just work in a different branch? We could also just keep using the SphericalMatern kernel, which is now working fine.
| from gspheres.vish.mixed_variables import MixedFeatures | ||
|
|
||
|
|
||
| def map_to_sphere(X: np.ndarray, bias: Union[int, float]) -> np.ndarray: |
There was a problem hiding this comment.
minor: I would place this in a utils.py.
| @@ -0,0 +1,200 @@ | |||
| import tensorflow as tf | |||
There was a problem hiding this comment.
minor: rename file to covariances.py?
| @@ -0,0 +1,66 @@ | |||
| import gpflow.kernels | |||
There was a problem hiding this comment.
Are you still using this? If not, best to remove the file from the PR.
There was a problem hiding this comment.
Not using it anymore; will remove.
| @@ -0,0 +1,94 @@ | |||
| import numpy as np | |||
There was a problem hiding this comment.
I would refactor some of this code and have it as an integration test too in tests.
Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>
Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>
|
I added a model for SGPR in I've also added a script to compare the posterior variances at data points between a TruncatedChordMatern kernel and a ChordMatern kernel. The difference between the two is that the prior variance is computed using a truncated mercer decomposition for the TruncatedChordMatern, and using the ordinary Matern equations for the ChordMatern. When the likelihood variance is allowed to be optimised the posterior variance at data points is below the likelihood variance for all kernels. (I've added a pytest that ensures this in |
No description provided.