# Exercise 13: SINDy 

**General Instructions:**

- Collaborations between students during problem-solving phase on a discussion basis is OK
- However: individual code programming and submissions per student are required
- Code sharing is strictly prohibited
- We will run checks for shared code, general plagiarism and AI-generated solutions
- Any fraud attempt will lead to an auto fail of the entire course
- Do not use any additional packages except for those provided in the task templates
- Please use Julia Version 1.10.x to ensure compatibility
- Please only write between the `#--- YOUR CODE STARTS HERE ---#` and `#--- YOUR CODE ENDS HERE ---#` comments
- Please do not delete, add any cells or overwrite cells other than the solution cells (**Tip:** If you use a jupyerhub IDE, you should not be able to add or delete cells and write in the non-solution cells by default)


In this exercise, we will identify a partial differential equation (PDE) from data. This means that we use the SINDy algorithm with a dictionary that contains typical terms as they can be found in PDEs. For a detailed introduction, see the following article: https://www.pnas.org/doi/10.1073/pnas.1517384113

Note: To simplify the notation, we are going to use the following abbreviations: 

$$x_t = \frac{\partial x}{\partial t}, \quad x_{ss} = \frac{\partial^2 x}{\partial s^2}, \quad \ldots $$ 

The goal is to identify a PDE of the form 

$$
x_t = f(x, x_s, x_{ss}, x^2, x x_s, x_s x_{ss}, \ldots),
\qquad
\text{(PDE)}
$$ 

where the highest-order of derivatives in space is 2, i.e., $x_{ss}$.

**Use the given data matrix $\mathbf X\in\mathbb{R}^{N \times n}$ for the identification, where $N=26$ is the number of time steps, and $n=51$ is the number of points in space.**
We import the matrix as `X` below:

In [None]:
import Random
using MAT
using LinearAlgebra
file = matopen("pde.mat")
X = read(file, "pde") |> transpose  # we flatten the matrix later on, so orientation does not really matter
                                    # `transpose` makes it adhere to the article
close(file)

dt = 0.005
ds = 0.1

![](pde_visualization.png)

## Exercise a)

a) To set up the dictionary $\mathbf \Psi(x)$, we first need to calculate the partial derivatives of the state $x$ with respect to time and space, i.e., $x_t$, $x_s$ and $x_{ss}$.  
Compute these numerically, using central differences. Do not use an extra package for this, but rather do it “manually”, with discretization sizes `dt` and `ds`. 
Store the derivative values appropriate matrices $\mathbf X_t$, $\mathbf X_s$ and $\mathbf X_{ss}$.
Also note that you need only compute the derivatives for interior points.
E.g., the matrix $\mathbf X_t$ will have size ${(N-2) \times (n-2)}$. - (1.5 points)

In [None]:
Xt = nothing
Xs = nothing
Xss = nothing

#--- YOUR CODE STARTS HERE ---#

#--- YOUR CODE ENDS HERE ---#

In [None]:
@assert size(Xt) == (24, 49)
@assert size(Xs) == (24, 49)
@assert size(Xss) == (24, 49)


## Exercise b)

b) Flatten the data matrices `X`, `Xt`, etc. into long vectors of size $ (N-2) (n-2)$.
For `X`, remove boundary points beforehand. - (1 point)

In [None]:
X_flat = nothing
Xt_flat = nothing
Xs_flat = nothing
Xss_flat = nothing

#--- YOUR CODE STARTS HERE ---#

#--- YOUR CODE ENDS HERE ---#

In [None]:
@assert size(X_flat) == (1176,)
@assert size(Xt_flat) == (1176,)
@assert size(Xs_flat) == (1176,)
@assert size(Xss_flat) == (1176,)


## Exercise c)

c) Create a feature matrix $\mathbf \Psi$ of possible right-hand side terms using all terms up to order two.
That is, the column-layout is as follows:
$$ \mathbf \Psi = [
    \mathbf 1, 
    \mathbf X, 
    \mathbf X_s, 
    \mathbf X_{ss}, 
    \mathbf X^2, 
    \mathbf X \cdot \mathbf X_s, 
    \mathbf X \cdot \mathbf X_{ss}, 
    \mathbf X_s^2, 
    \mathbf X_s \cdot \mathbf X_{ss}, 
    \mathbf X_{ss}^2]
    .
$$ 
(1.5 points)

In [None]:
function Psi(X_flat, Xs_flat, Xss_flat)
    #--- YOUR CODE STARTS HERE ---#
    
    #--- YOUR CODE ENDS HERE ---#
    return psi
end;

In [None]:
psi = Psi(X_flat, Xs_flat, Xss_flat)
@assert size(psi) == (1176, 10)

## Exercise d)

d) Use the thresholding algorithm from the lecture to identify the PDE that created the data set $\mathbf X$. - (4 points)

First, assume that $\mathbf Y$ is a column vector with target data and $\mathbf Z$ is a basis matrix of suitable size.
Further, let $\mathbf w$ be the coefficient vector that we want to find, and $λ$ a real regularization parameter.
With LASSO we can solve
$$
\min_{\mathbf w \in ℝ^q} 
\left(
\underbrace{\| \mathbf Y - \mathbf Z \mathbf w \|_2^2}_{\mathcal L_{\text{reg}}(\mathbf w)} 
+
\underbrace{λ \|\mathbf w\|_1}_{\mathcal L_{\text{sparse}}(\mathbf w)}
\right)
.
$$
In this notebook, we takle LASSO by means of **coordinate descent**.
That is, we cyclically sweep over the parameters (i.e., the entries in $\mathbf w$) to optimize the combined loss function with respect to this single parameter whilst keeping all other weights fixed.
The optimal $w_j$ can be found by ensuring that $0$ is contained in the subgradient of the loss function.

It can be shown that
$$
\frac{\partial \mathcal L_{\text{reg}}}{\partial w_j} = 
2
\left(
-~
\underbrace{
\sum_{i=1}^{N_{\mathbf Y}}
\left(
    z_{i,j} 
        \left(
            y_i - \sum_{k\ne j} z_{i,k} w_k
        \right)
\right)
}_{ρ_j}
+
w_j
\underbrace{
\sum_{i=1}^{N_{\mathbf Y}}
   z_{i,j}^2
}_{φ_j}
\right)
=
2(-ρ_j + w_j φ_j)
,
$$
and
$$
\frac{\partial \mathcal L_{\text{sparse}}}{\partial w_j}
=
\begin{cases}
    -λ, &\text{if $w_j < 0$,}\\
    [-λ, λ] & \text{if $w_j = 0$},\\
    λ  &\text{if $w_j > 0$.}
\end{cases}
$$
From this, three cases for finding the optimal value $w_j$ are derived:
$$
w_j =
\begin{cases}
    \frac{ ρ_j + \frac{λ}{2} }{φ_j} & \text{if $ρ_j < - \frac{λ}{2}$,}\\
    0 & \text{ if $ρ_j \in \left[-\frac{λ}{2}, \frac{λ}{2}\right]$,}\\
    \frac{ ρ_j - \frac{λ}{2} }{φ_j} & \text{if $ρ_j > \frac{λ}{2}$.}
\end{cases}
$$

### Task 1 (0.5 + 0.5 points)

Complete the following functions to
* compute the values $ρ_j$ and $φ_j$ given vector $\mathbf Y$ and matrix $\mathbf Z$ and index $j$.
* return the optimal value $w_j$, if we addionally provide $λ>0$.

In [None]:
function calc_rho_phi(Z::AbstractMatrix, Y::AbstractVector, w::AbstractVector, j::Integer)
    rho_j = phi_j = nothing

    #--- YOUR CODE STARTS HERE ---#
    
    #--- YOUR CODE ENDS HERE ---#
    return rho_j, phi_j
end;

In [None]:
let
    Random.seed!(1234)
    
    Z = rand(3, 2)
    w = rand(2)
    Y = rand(3)
    
    rho, phi = calc_rho_phi(Z, Y, w, 1)
    @assert rho isa Real
    @assert phi isa Real
    
    @assert isapprox(rho, 0.24372423119974812; rtol=1e-4)
    @assert isapprox(phi, 0.4554981037149115; rtol=1e-4)
    
end

In [None]:
function opt_wj(Z::AbstractMatrix, Y::AbstractVector, w::AbstractVector, j::Integer, λ::Real)
    w_j_opt = nothing
    
    #--- YOUR CODE STARTS HERE ---#
    
    #--- YOUR CODE ENDS HERE ---#
    
    return w_j_opt
end;

In [None]:
let
    Random.seed!(1234)
    
    Z = rand(3, 2)
    w = rand(2)
    Y = rand(3)
    w_j_opt = opt_wj(Z, Y, w, 1, 0.1)
    @assert w_j_opt > 0
    
end

### Task 2 ( 1 point)
Implement LASSO with Coordinate Descent by completing the cell below.

In iteration $k$, perform coordinate descent for all entries.
Stop, if $\|\mathbf w^{(k)} - \mathbf w^{(k-1)}\|_2 \leq ε_{\text{rel}} \|\mathbf w^{(k-1)}\|_2$ or if the maximum number of iterations is reached.

In [None]:
function LASSO!(
        ## modify `w` in place
        w::AbstractVector,
        ## do not modify these:
        Z::AbstractMatrix, Y::AbstractVector, λ::Real;
        ## parameters:
        tol_rel = 1e-5,
        max_iter = 100_000
)
    #--- YOUR CODE STARTS HERE ---#
    
    #--- YOUR CODE ENDS HERE ---#
    
    return w
end;

In [None]:
let
    ## quickly generate a testset for polynomial fitting
    f(x) = sum( x.^2 )
    X = -2 .+ 4 .* rand(2, 10)
    q = 6
    Z = [
        ones(10);; X[1, :];; X[2, :];; X[1, :] .* X[2, :];; X[1,:] .^2;; X[2, :].^2 ]
    Y = mapreduce(f, vcat, eachcol(X))
    ## initialize parameter vector
    w = Vector{Float64}(undef, q)
    ## call LASSO!
    LASSO!(w, Z, Y, 1e-3; tol_rel = 1e-6)
    
    ## if the algorithm did nothing that is completely weird, the parameters should not have large magnitude:
    @assert all( abs(wj) <= 2 for wj = w)
    
end

### Task 3 (1 point)
Finally, implement the alternating thresholding scheme.
To this end, apply `LASSO!` to those entries of $\mathbf w$ indexed by `I_nz`, the **n**on-**z**ero entries.
For initialization, use the ridge regression solution
$$
\mathbf w_0 = \left(\mathbf Z^\intercal \mathbf Z + λ \mathbf I\right)^{-1} \mathbf Z^\intercal \mathbf Y.
$$
In iteration $\ell$, first update `I_nz`. 
An entry of $\mathbf w^{(\ell)}$ is deemed to be zero, if its absolute value is below the threshold parameter `th`.
In addition to setting `I_nz`, also actually set corresponding the entries of $\mathbf w^{(\ell)}$ to exactly equal `0`.  
Then call `LASSO!` with stopping parameters `max_iter_lasso` and `rel_tol_lasso`.
Beware! We have implemented `LASSO!` to modify the parameter vector in-place.
Be sure to use views or perform copies accordingly!!

Again, implement a relative stopping criterion and respect the keyword-argument `max_iter`.

In [None]:
function alternating_thresholding_lasso(
        Z::AbstractMatrix, Y::AbstractVector, λ::Real;
        max_iter = 1000,
        tol_rel = 1e-4,
        th = 5e-3,
        max_iter_lasso = max_iter,
        tol_rel_lasso = 1e-3
)
    @info """
    Parameters are
      max_iter = $(max_iter),
      tol_rel  = $(tol_rel),
      th       = $(th),
      max_iter_lasso = $(max_iter_lasso),
      tol_rel_lasso = $(tol_rel_lasso)"""
    
    w = nothing
    #--- YOUR CODE STARTS HERE ---#
    
    #--- YOUR CODE ENDS HERE ---#
    return w
end

In [None]:
let
    ## quickly generate a testset for polynomial fitting
    f(x) = sum( x.^2 )
    X = -2 .+ 4 .* rand(2, 10)
    q = 6
    Z = [
        ones(10);; X[1, :];; X[2, :];; X[1, :] .* X[2, :];; X[1,:] .^2;; X[2, :].^2 ]
    Y = mapreduce(f, vcat, eachcol(X))
    ## call algorithm
    w = alternating_thresholding_lasso(
        Z, Y, 0.001;
        max_iter = 1000,
        tol_rel = 1e-3,
        th = 5e-3,
        max_iter_lasso = 1000, # note the reduced number of lasso iterations
        tol_rel_lasso = 1e-2   # and the coarser criterion
    )
    
    ## if the algorithm did nothing that is completely weird, the parameters should not have large magnitude:
    @assert all( abs(wj) <= 2 for wj = w)
    
end

### Task 4 (1 point)
Determine, which vectors or matrices should be used to identify the equations governing $\text{(PDE)}$.
Do any reshaping that is still required.
Adhere to the column order used in previous exercises.
Call `alternating_thresholding_lasso` with $λ = 10^{-3}$ and parameters
```
max_iter = 1000,
tol_rel = 1e-3,
th = 5e-3,
max_iter_lasso = 1000,
tol_rel_lasso = 1e-3
```
Define `w` as the optimal sparse coefficient vector.

In [None]:
lambda = w = Y = Z = nothing
#--- YOUR CODE STARTS HERE ---#

#--- YOUR CODE ENDS HERE ---#

In [None]:
@assert length(w) == 10
@assert lambda ≈ 0.001
@assert Y isa AbstractVector && length(Y) == size(Z, 1)
@assert size(Z, 2) == 10

## Exercise e)

e) What is the name of the equation that you have found?
Assign one of the following values to `pde`:

1. The Kuramoto-Sivashinski equation 
2. The Keller-Segel model for chemotaxis 
3. The heat equation 
4. The Burgers equation 
5. The Navier-Stokes equations 
6. The wave equation 
7. The Poisson equation 
8. The Schrödinger equation 

(2 points)

In [None]:
pde = nothing
#--- YOUR CODE STARTS HERE ---#

#--- YOUR CODE ENDS HERE ---#

In [None]:
#Public test
@assert isa(pde, Number)
@assert pde >= 1 && pde <= 8
