# PyDaddy with vector data

(This notebook assumes that you have gone through the [Getting Started](./1_getting_started.ipynb) notebook.)

`pydaddy` also works with (2-dimensional) vector data. For a 2-D timeseries $(x(t), y(t))$, `pydaddy` attempts to fit the following model:

$$
\frac{dx}{dt} = f_1(x, y) + g_{11}(x, y) \cdot \eta_1(t) + g_{12}(x, y) \cdot \eta_2(t) \\
\frac{dy}{dt} = f_2(x, y) + g_{21}(x, y) \cdot \eta_1(t) + g_{12}(x, y) \cdot \eta_2(t)
$$

Here, $f_1$ and $f_2$ are the drift functions, $g_{11}$ and $g_{22}$ are the diffusion terms, and $g_{12}$ and  $g_{21}$ are the cross-diffusion terms. This equation can also be written in the vector form as:

$$
\frac{d \mathbf{x}}{dt} = \mathbf{f}(\mathbf{x}) + \mathbf{g}(\mathbf{x}) \boldsymbol \eta
$$

where $\mathbf{x} = \begin{bmatrix} x \\ y \end{bmatrix}$,
$\mathbf{f} = \begin{bmatrix} f_1 \\ f_2 \end{bmatrix}$,
$\mathbf{g} = \begin{bmatrix} g_{11} & g_{12} \\ g_{21} & g_{22} \end{bmatrix}$ and
$\boldsymbol \eta = \begin{bmatrix} \eta_1 \\ \eta_2 \end{bmatrix}$.

PyDaddy can estimate the drift function directly. For diffusion, PyDaddy estimates 
$\mathbf G = \begin{bmatrix} G_{11} & G_{12} \\ G_{21} & G_{22} \end{bmatrix} = \mathbf g \mathbf g^T$.

In [None]:
# Execute this cell to set up PyDaddy in your Colab environment.
%pip install git+https://github.com/tee-lab/PyDaddy.git

In [None]:
import pydaddy

## Initializing the `pydaddy` object

Similar to the scalar analysis, we need to initialize a `pydaddy` object. In this case, `data` will be a two element list.

In [None]:
data, t = pydaddy.load_sample_dataset('model-data-vector-ternary')
ddsde = pydaddy.Characterize(data, t, bins=20)

## Recovering functional forms for drift and diffusion

There are 5 different functions, each of two variables: two drift functions ($F_1$ and $F_2$), two diffusion functions ($G^2_{11}$ and $G^2_{22}$) and a cross diffusion term ($G^2_{12} = G^2_{21}$). As with the 1D example, these can be fit by calling the `ddsde.fit()` function.

In [None]:
F1 = ddsde.fit('F1', order=3, tune=True)
print(F1)

In [None]:
F2 = ddsde.fit('F2', order=3, tune=True)
print(F2)

In [None]:
G11 = ddsde.fit('G11', order=3, tune=True)
print(G11)

In [None]:
G22 = ddsde.fit('G22', order=3, tune=True)
print(G22)

In [None]:
G21 = ddsde.fit('G21', order=3, tune=True)
print(G21)

The coefficients in $G_{21}$ are negligible, i.e. $G_{21}$ is effectively 0.
We can force `pydaddy` to ignore small coefficients by setting an appropriate sparity threshold manually instead of letting it automatically choose a threshold (see [Advanced Function Fitting](./3_advanced_function_fitting.ipynb) for further details).

In [None]:
G21 = ddsde.fit('G21', order=3, threshold=0.1)
print(G21)

**Note:** Since $G_{21}$ and $G_{12}$ are identical, fitting one will automatically assign the value for the other.

## Interactive plots for drift and diffusion

As with the 1D example, we can get interactive plots of drift and diffusion functions using `ddsde.drift()` and `ddsde.diffusion()`. For 2D, there is also the `ddsde.cross_diffusion()` function to get the cross-diffusion plot.

In [None]:
ddsde.drift()

In [None]:
ddsde.diffusion()

In [None]:
ddsde.cross_diffusion()

## Diagnostics

As mentioned in the [Getting Started](./1_getting_started.ipynb) notebook, `pydaddy` allows us to check if all underlying assumptions for fitting a drift-diffusion model are met. In case for 2D, the `noise_diagnostics()` functions creates the following plots:
- The distribution of the noise, along with the correlation matrix as an inset. The residual distribution should be an isotropic Gaussian distribution, and the correlation matrix should be identitity
- Autocorrelation of the components of residuals $\eta_x$, $\eta_y$. These should be uncorrelated, i.e. the autocorrelation times should be close to 0.
- QQ plots of the marginals $\eta_x$ and $\eta_y$ against theoretical Gaussian distributions of the same mean and variance. Ideally (i.e. if the residuals are Gaussian distributed), all points of these plots should fall on a straight line of slope 1.

In [None]:
ddsde.noise_diagnostics()

The `model_diagnostics()` functions checks if the model is self-consistent. 

To do this, a simulated time series, with the same length and sampling time as the original time series, is generated by integrating the discovered SDE. The drift and diffusion functions are now re-estimated from this simulated time series, with the same fitting parameters as the original fit. If the model is self-consistent, the re-estimated drift and diffusion functions should match the original drift and diffusion.

(Note: This might take a few minutes to complete)

In [None]:
ddsde.model_diagnostics(oversample=5)