# Sequential Gaussian Simulation

## Loading input data

The first step will be to include all the necessary data to perform the interpolation which we have been calculated in the previous chapter. This parameters will be:

- Domain property data: This is the XYZ position as well as the values we want to interpolate with the label of the domain that it belongs to.

- The discretized geomodel: The GeMpy model with the segmented lithologies. We use this to know where we have to interpolate what

- Experimental variogram: We use it to compute the analytical variogram. At the moment is not supported to import directly the analytical variogram

- Grid: The XYZ coordinates of the points to be interpolated. The GeMpy model is only a 3D array. At some point it would make sense to pass the GeMpy object that contains all the model properties.

All this data can be passed to the SGS object by either the init or set methods

In [1]:
import sys, os
sys.path.append("../../..")
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
os.environ['MKL_THREADING_LAYER'] = 'GNU'

# Embedding matplotlib figures in the notebooks
%matplotlib inline

# Importing auxiliary libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pn
import gempy as gp
from gempy import coKriging
from gempy import GridClass
import pymc3 as pm
from numpy import random


  from ._conv import register_converters as _register_converters


In [2]:
# Read data
data = pn.read_pickle("C:/Users/Jan/Desktop/CSIRO/CSIRO/Domained_data.pkl")
geomodel = np.load("C:/Users/Jan/Desktop/CSIRO/CSIRO/3Dmodel.npy")
exp_variogram = pn.read_pickle("C:/Users/Jan/Desktop/CSIRO/CSIRO/experimental_variogram.p")


grid = GridClass
grid.create_regular_grid_3d(grid,[422050, 423090, 8429400, 8432100, -500, 332],
                            [50, 50, 50]);

In [3]:
sgs.plot_cross_covariance??

Object `sgs.plot_cross_covariance` not found.


In [4]:
# Compute analytical variograms
sgs = gp.coKriging.SGS(
    exp_var=exp_variogram, n_exp=5
)  # properties=['Al_ppm-Al_ppm', 'Ca_ppm-Al_ppm', 'Al_ppm-Ca_ppm', 'Ca_ppm-Ca_ppm']

sgs.set_data(data)
sgs.set_lithology('opx')
sgs.set_geomodel(geomodel)

sgs.choose_lithology_elements(elem=[
    'Al_ppm',
    'Ca_ppm',
    'Co_ppm',
    'Fe_ppm',
    'Mg_ppm',
])

sgs.set_grid_to_inter(grid) #originally select_segemented_grid
model = sgs.fit_cross_cov(n_exp=5)
if True:
    with model:
        start = pm.find_MAP()  # Find starting value by optimization
        step = pm.Metropolis()
        #db = pm.backends.SQLite('SQtry.sqlite')
        trace = pm.sample(2000, step, init=start, progressbar=True, njobs=1)
sgs.set_trace(trace)

if False:
    sgs.plot_cross_covariance(iter_plot=1800)

INFO (theano.gof.compilelock): Waiting for existing lock by unknown process (I am process '15828')
INFO (theano.gof.compilelock): To manually release the lock, delete C:\Users\Jan\AppData\Local\Theano\compiledir_Windows-10-10.0.17134-SP0-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.4-64\lock_dir
logp = 1,384.8, ||grad|| = 0.67627: 100%|█████████████████████████████████████████| 2824/2824 [00:17<00:00, 159.90it/s]
Sequential sampling (2 chains in 1 job)
CompoundStep
>Metropolis: [weights_interval__]
>Metropolis: [range]
>Metropolis: [sill]
>Metropolis: [sigma_log__]
100%|█████████████████████████████████████████████████████████████████████████████| 2500/2500 [00:15<00:00, 159.13it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 2500/2500 [00:14<00:00, 171.53it/s]
The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

In [5]:
#sgs.plot_cross_covariance(iter_plot=1800)

In [6]:
print(trace['range'].shape)

(4000, 7)


In [7]:
print(sgs.n_exp)
print(sgs.n_gauss)

5
2


In [8]:
sgs.nuggets.values

array([0.        , 0.        , 0.38043723, 0.74659659, 0.74573928,
       0.        , 0.        , 0.4226653 , 0.69461683, 0.70364318,
       0.38044899, 0.4226633 , 0.        , 0.37218562, 0.44200531,
       0.7465846 , 0.69461813, 0.3722525 , 0.        , 0.330775  ,
       0.74572969, 0.70364356, 0.44212019, 0.33085093, 0.        ])

## Kriging at one point

For testing we just perform the ordinary cokring at one point.

In [9]:
# We compile the euclidean distances function
SED_f = gp.coKriging.theano_sed()

In [10]:
# We extract the XYZ coordinates from the input data
df = sgs.data_to_inter[['X', 'Y', 'Z']]

In [11]:
# We select the point 5000 of our grid
selected_cluster_grid = grid.values[5000] # originally sgs.grid_to_inter
selected_cluster_grid

array([ 4.221020e+05,  8.429427e+06, -4.916800e+02], dtype=float32)

In [12]:
# We compute the distance from all our input data to that point
dist = SED_f(df, [selected_cluster_grid])

#Here lies one problem: selcetion a ist completely false, no close values are selected

# We choose at least the 50 closer points to the simulated point
for r in range(100, 5000, 100):
    select_A = (dist < r).any(axis=1)
    if select_A.sum() > 50:
        break
        
select_b = np.zeros(grid.values.shape[0], dtype=bool) #originally sgs.grid_to_inter.shape
select_b[4000] = True


In [13]:
np.set_printoptions(threshold=np.inf)

The function solve_kriging is called with two boolean arrays to select the input data we want to use and the points in the grid we want to compute. The idea is to compute more than one point at the same time.

To capture the uncertainty of the cross-covariance we sample randomly from the Bayesian inferece trace. This may lead to eventual invalid values that are corrected iterative recursively by checking if the kriging result is valid. 

At the moment only Ordinary kriging is implemented. Ideally, we should aim for universal kriging.


Checking ordinary kriging works. The sum of the weight to each property must be one:

In [14]:
h = sgs.solve_kriging(select_A, select_b) #originally j instead of select_b
h[-2][:-5].sum(axis=0)

(635, 635)
(635, 5)
0.0


NameError: name 'quit' is not defined

In [None]:
print(cov_h)

In [None]:
%debug

Checking the kriging mean is within a range of possible values given the cross-variogram fitting.

In [None]:
l1 = [sgs.solve_kriging(select_A, select_b)[0][0] for i in range(100)]
l2 = [sgs.solve_kriging(select_A, select_b)[0][1] for i in range(100)]

The figure below show the kriging mean for the Aluminum and Calcium.

In [None]:
plt.plot(l1, '.')
plt.plot(l2, '.')

Comparing the values of the elements used as input in the interpolation.

In [None]:
sgs.data_to_inter[select_A][['Al_ppm', 'Ca_ppm']].hist(); sgs.data_to_inter[select_A][['Al_ppm', 'Ca_ppm']].mean()

Checking the range of values of the random interpolation at the given point. The uncertainty comes from the fitting and the kriging variance.

In [None]:
sol = [sgs.solve_kriging(select_A, select_b) for i in range(10)]
gh = []
for i in range(10):
    gh = np.append(gh, np.random.normal(sol[i][0], sol[i][1]))

In [None]:
gh

In [None]:
plt.hist(gh[::2], range=(-30000, 30000)), plt.hist(gh[1::2], range=(-30000, 30000), alpha=0.5);