# Equation discovery for Turing patterns

In [None]:
import sys
data_path = './'
if 'google.colab' in sys.modules:
    from google.colab import drive
    drive.mount('/content/drive')
    data_path = '/content/drive/My Drive/biophysics_summer_school_2025/data_tutorial_3/'
    sys.path.append('/content/drive/My Drive/biophysics_summer_school_2025')

import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt
from utils import get_basis, greedy_basis_search, evaluate_basis, get_latex_model, get_model_from_labels, add_superfluous_functions

## 1. Simulating the Turing patterns in 1D

The goal of this tutorial is to extend the SFI approach to PDE models to learn the equation of Turing patterns. We will study the ASDM model in 1D, with stochastic noise (because why not)

$$\frac{\partial a}{\partial t} = d \Delta a + a^2s - a + \sqrt{2D} \xi_{a,t}, \\ \frac{\partial s}{\partial t} = \Delta s + \mu (1-a^2s) + \sqrt{2D} \xi_{s,t}.$$

Were $\mathbb{E}[\xi_{u,t}\xi_{v,s}] = \delta_{uv}\delta(t-s)$ is a Gaussian white noise. **<font color='red'>Copy paste the code of tutorial 1 to simulate the above equations in 1D and make sure they work. Simulate with $L=20.0, N=128, T=30, d=0.05, \mu=1.4$.</font>**

In [None]:
# TODO

## 2. Extending the SFI approach to Stochastic Partial Differential Equation (SPDEs) learning

The routine `get_basis` has been designed to also be able to produce functions basis over fields $\mathbf{\phi}(\mathbf{x},t)$. On particular, these function can take a field, which is an array of dimension (d, Nt, N_1, ..., N_n) and compute any operator on this field. Operators like Laplacian, gradients, or mixed terms involving products of gradients and fields.

```python
basis = get_basis(field_dim=2, dx=0.1, n=1, degree=3)
```
will create a basis for a field of two dimensions (here substrate and activator concentrations); with grid spacing $dx=0.1$ for gradients and laplacians, $n=1$ space variables (here we will work in 1D), and polynomial terms of degree up to three in the field. It will also return mixed terms that could account for potential convective effects. Y


The SFI approach described in tutorial 2 can be extended in a straightforward way to SPDEs for fields $\mathbf{\phi}(\mathbf{x},t)) \in \mathbb{R}^d$ over variables $\mathbf{x}$ of dimension $n$ by only the *averaging* operator, ie we now have:

$$
\langle \cdot \rangle = \frac{1}{T L^n}\sum_{t,x} \cdot \Delta t \Delta x^n \text{ instead of } \langle \cdot \rangle = \frac{1}{T}\sum_{t,x} \cdot \Delta t
$$

The log-likelihood will also read

$$
\mathcal{L} = -\frac{T L^n}{4} \langle \left( \frac{\Delta \mathbf{x}_t}{\Delta t} - \hat{\mathbf{F}}(\mathbf{x}_t) \right)^T \hat{\mathbf{D}}^{-1} \left( \frac{\Delta \mathbf{x}_t}{\Delta t} - \hat{\mathbf{F}}(\mathbf{x}_t) \right) \rangle  \text{ instead of } \mathcal{L} = \frac{-T}{4} \langle \left( \frac{\Delta \mathbf{x}_t}{\Delta t} - \hat{\mathbf{F}}(\mathbf{x}_t) \right)^T \hat{\mathbf{D}}^{-1} \left( \frac{\Delta \mathbf{x}_t}{\Delta t} - \hat{\mathbf{F}}(\mathbf{x}_t) \right) \rangle
$$

**<font color='red'>Extend the functions implemented in correction of the tutorial to accounts for the extra dimensions and compare with a scatter plot the inferred force with the true force evaluted on the trajectory (like in tutorial 2).</font>**

In [None]:
# TODO

In [None]:
basis = get_basis(field_dim=2, dx=dx, n=1, degree=3)
coeffs = infer_force_coefficients(X, (T, L), basis)

F_inf = reconstruct_force_field(X, coeffs, basis)
F_true = asdm_force(X, dx, force_args)

plt.figure()
plt.scatter(F_inf[:,0,:].flatten(), F_true[:,0,:].flatten())
x = np.linspace(F_true[:,0,:].flatten().min(), F_true[:,0,:].flatten().max(), 20)
plt.plot(x, x, 'k--')
plt.show()

## 3. Final model inference

**<font color='red'>Using the previous tutorial, write a funtion that compute the PASTIS and the AIC scores. Then run the function `greedy_basis_search` to search for the best model. Compare with the true model, play with the parameters and the score used to see when it breaks down.</font>**

In [None]:
def compute_scores(X, args, coeffs, basis_functions, n0, p=0.001):
    """
    Compute model selection criteria (PASTIS and AIC) for the inferred force model.

    Parameters:
        X : ndarray of shape (Nt, d, Nx)
            Observed trajectory data.
        args : tuple (T, L)
            Total simulation time and spatial length.
        coeffs : ndarray of shape (nB,)
            Coefficients of the inferred model.
        basis_functions : dict
            Dictionary of active basis functions.
        n0 : int
            Total number of candidate basis functions.
        p : float
            Prior inclusion probability (for PASTIS).

    Returns:
        PASTIS : float
        AIC : float
            Information criteria scores.
    """
    T, L = args
    dt = T/X.shape[0]

    # TODO

    return PASTIS, AIC

In [None]:
sfi_engine = (infer_force_coefficients, compute_scores, compute_mean_diffusion)
fbasis, fcoeffs, _  = greedy_basis_search(X, (T, L), basis, sfi_engine, method='PASTIS', p = 0.001, max_moves=60, ntrials = 10)

In [None]:
from IPython.display import Math
Math(get_latex_model(fbasis, fcoeffs, scale_factors=None))

In [None]:
true_labels = [
    "$e_{0} u_{0}^{1}$",
    "$e_{0} u_{0}^{2} u_{1}^{1}$",
    "$e_{0} \\Delta u_{0}$",
    "$e_{1}$",
    "$e_{1} u_{0}^{2} u_{1}^{1}$",
    "$e_{1} \\Delta u_{1}$",
]

true_model = get_model_from_labels(true_labels, field_dim=2, dx=dx)
set(true_model.keys()) == set(fbasis.keys())

## 4. Challenge: learn equations on a real reaction diffusion advection system

Now we work with a real system of pattern formation occuring in the C elegans zygote. Two proteins PAR2 and PAR6 diffuse on the membrane of the zygote and are advected. They can associate and dissociated with the membrane at intrinsic rates, and they are subject to antagonistic dissociation. A high concentration of PAR2 will favor the dissociation of PAR6, and reciprocally. We denote P the PAR2 protein, A the PAR6 protein, and v the flow. The membrane is one dimensional so the model is 1D:

$$
\frac{\partial A}{\partial t} = D_A \Delta A - \partial_x (vA) + k_{\mathrm{eff},A}A - k_{\mathrm{AP}}PA^2\\
\frac{\partial P}{\partial t} = D_P \Delta P - \partial_x (vP) + k_{\mathrm{eff},P}P - k_{\mathrm{AP}}AP^2
$$

The flow $v$ is also a one dimensional field, which in practice we don't necessarily want to fit. We provide a first notebook cell to preprocess the data. The matrix `X` then contains the data in the same format as the previous tutorials. **<font color='red'>The goal is to infer these equations from real measurements of this patterning system, try!</font>**

In [None]:
tPAR6  = np.loadtxt('./data_tutorial_3/PAR6.txt',dtype='double')
X_PAR = np.loadtxt('./data_tutorial_3/X_flow.txt',dtype='double')
T_PAR = np.loadtxt('./data_tutorial_3/T_flow.txt',dtype='double')
tPAR2  = np.loadtxt('./data_tutorial_3/PAR2.txt',dtype='double')
tV = np.loadtxt('./data_tutorial_3/Flow.txt', dtype='double')

print(" velocity shape ", tV.shape)
print(" tPAR6 shape ", tPAR6.shape, " tPAR2 shape ", tPAR2.shape)
print(" x_shape ", X_PAR.shape, " t_shape ", T_PAR.shape)
dx = X_PAR[1] - X_PAR[0] #in micro meters
L = dx*(X_PAR.shape[0]-1)
dt = T_PAR[1] - T_PAR[0] #in seconds
T = dt*(T_PAR.shape[0]-1)

print(" dx is ", dx, " dt is ", dt)

PAR6 = tPAR6.T
PAR2 = tPAR2.T
V = tV.T

print(" PAR2 shape ", PAR2.shape, " PAR6 shape ", PAR6.shape, " velocity shape ", V.shape)

PAR2un,PAR2sn,PAR2vn = np.linalg.svd(PAR2, full_matrices = False)
PAR6un,PAR6sn,PAR6vn = np.linalg.svd(PAR6, full_matrices = False)
Vun,Vsn,Vvn = np.linalg.svd(V, full_matrices = False)

# reconstructed
dim = 1
dim_v = 1
rPAR2 = (PAR2un[:,:dim].dot(np.diag(PAR2sn[:dim]).dot(PAR2vn[:dim,:]))).reshape(PAR2.shape[0],PAR2.shape[1])
rPAR6 = (PAR6un[:,:dim].dot(np.diag(PAR6sn[:dim]).dot(PAR6vn[:dim,:]))).reshape(PAR6.shape[0],PAR6.shape[1])
rV    = (Vun[:,:dim_v].dot(np.diag(Vsn[:dim_v]).dot(Vvn[:dim_v,:]))).reshape(V.shape[0],V.shape[1])
X = np.moveaxis(np.array([rPAR2, rPAR6, rV]),-1, 0)

In [None]:
fig, ax = plt.subplots(figsize=(12,6), nrows = 2, ncols = 3)
ax = ax.flatten()

ax[0].set_title('PAR2 concentration')
c = ax[0].pcolor(T_PAR, X_PAR, rPAR2, cmap='plasma')
fig.colorbar(c, ax=ax[0])

ax[0].set_xlabel('time')
ax[0].set_ylabel('curvilinear position')

ax[1].set_title('PAR6 concentration')
c = ax[1].pcolor(T_PAR, X_PAR, rPAR6, cmap='plasma')
fig.colorbar(c, ax=ax[1])
ax[1].set_xlabel('time')
ax[1].set_ylabel('curvilinear position')


ax[2].set_title('Flow')
c = ax[2].pcolor(T_PAR, X_PAR, rV, cmap='plasma')
fig.colorbar(c, ax=ax[2])
ax[2].set_xlabel('time')
ax[2].set_ylabel('curvilinear position')

tot = len(T_PAR)
for i in np.linspace(1, tot-1,3).astype(int):
    ax[3].plot(X_PAR, rPAR2[:,i])
ax[3].set_ylabel('PAR2 concentration')
ax[3].set_xlabel('curvilinear position')

tot = len(T_PAR)
for i in np.linspace(1, tot-1,3).astype(int):
    ax[4].plot(X_PAR, rPAR6[:,i])
ax[4].set_ylabel('PAR6 concentration')
ax[4].set_xlabel('curvilinear position')

tot = len(T_PAR)
for i in np.linspace(1, tot-1,3).astype(int):
    ax[5].plot(X_PAR, rV[:,i])
ax[5].set_ylabel('flow velocity (micrometers/s)')
ax[5].set_xlabel('curvilinear position')
plt.tight_layout()

In order to only fit the concentration fields, we built in a `nofit` flag in the `get_basis` function to explicitely remove any force function which would lie in the velocity dimension (the last dimension of the field $(P, A, v)$). We also need to modify the projection methods (the routine to compute the estimated diffusion as well as the force coefficients) so that we only compute quantities affecting the $P,A$ dimensions.

In [None]:
# TODO

## 5. Going beyond
You can now try a number of other things, either to improve the method or to simply change the model selection approach. We can try to implement LASSO regression to infer the force field. The lasso in this case is simply trying to solve the following problem:

$$
\min_{\hat{F}} \left[ || \mathbf{G} \hat{\mathbf{F}} - \hat{\mathbf{b}} || + \alpha \|\hat{\mathbf{F}}\|_1 \right]
$$

The lasso_path function of the package sklearn allows to vary solve this problem for varying alpha, and see which entries of $\hat{\mathbf{F}}$ are set to zero. **<font color='red'>Plot the lasso_path for this problem, and check the model infer at some alpha of your choice.</font>**

In [None]:
from sklearn.linear_model import lasso_path

def compute_G_b(X, args, basis_functions):

    T, L = args
    dt = T/X.shape[0]

    # TODO

    return G, b

In [None]:
# TODO