# Solving ODEs

For linear ODEs whose coefficients do not depend on time, if we have an eigendecomposition of the system matrix, we can compute the solution.
That's all fine and good but many interesting problems are nonlinear or depend on time.
Moreover, while the eigendecomposition is useful as a conceptual tool and for evaluating how well solvers work on solvable test cases, it is difficult to compute especially for large systems.
We can't expect to solve general ODEs exactly because the objects of interest to us are trajectories or functions, and it takes infinitely many numbers to describe a function with complete exactness.
The best we can do is approximate.
Here we'll look at a very effective family of methods for approximating solutions of ODE.
First, we'll motivate things with a thought experiment.

In [None]:
import json
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML

## Approximation

I made up some functions, sampled their values at a bunch of points in an interval, and saved the results to a JSON file.
The code below loads in the sample times and data points for each function and plots them.

In [None]:
with open("collocation_data.json", "r") as input_file:
    data = json.load(input_file)
print(data.keys())

In [None]:
times = np.array(data["times"])
periodic_signal = np.array(data["periodic_signal"])
decaying_signal = np.array(data["decaying_signal"])
piecewise_signal = np.array(data["piecewise_signal"])

In [None]:
fig, ax = plt.subplots()
ax.scatter(times, periodic_signal, 0.5)
ax.set_xlabel("time")
ax.set_title("Periodic signal");

In [None]:
fig, ax = plt.subplots()
ax.scatter(times, decaying_signal, 0.5)
ax.set_xlabel("time")
ax.set_title("Decaying signal");

In [None]:
fig, ax = plt.subplots()
ax.scatter(times, piecewise_signal, 0.5)
ax.set_xlabel("time")
ax.set_title("Piecewise signal");

The thought experiment for you is: suppose that you were given lots of functions that roughly look like each of the types above.
How could you approximate them in the most economical way?
In other words, suppose the time series you observe consists of a set of samples from some periodic or smooth decaying or piecewise function $u$.
I tell you some number $K$.
It'll most likely be much smaller than the number of data points.
You get to pick a set of $K$ basis functions $\{\phi_1, \ldots, \phi_K\}$ and coefficients $\{u_1, \ldots, u_K\}$ so that
$$u(t) \approx u_1\phi_1(t) + \ldots u_K\phi_K(t).$$
How would you pick the basis functions and coefficients?
Would you use the same basis functions for each type of signal shown above?
Can you think of a couple different common families of functions that are used to approximate other functions?
Is there more than one sense in which one function can be close to another?

Suppose I told you how many basis functions you could use and you found that the mismatch with the sample values was too high.
I agree to let you add 1 more basis function.
What would you choose?

If you know of a concrete strategy for approximating one or more of the above signals and think you can code it up using numpy and scipy, go ahead and do it below.

## Collocation

Our task is to approximate a function, but hopefully you know something about that function beforehand.
With this knowledge, you can choose a *basis* set, a family $\{\phi_1, \ldots, \phi_K\}$ of functions such that the function you're looking for can be approximated as a linear combination:
$$u(t) \approx u_1\phi_1(t) + \ldots + u_K\phi_K(t).$$
It then falls to us to find a way to pick the coefficients $\{u_1, \ldots, u_K\}$.
For example, we might require that the approximation exactly interpolate a set of data points $\{(t_1, u(t_1)), \ldots, (t_N, u(t_N))\}$.
Or we might hope to approximate the exact solution in a mean-square sense.
**The same principles apply when you're trying to fit a curve through observational data as when you're trying to solve an ODE.**
Pick a basis, then find the coefficients.

There is a broadly-applicable and very effective family of schemes for discretizing ODEs called Runge-Kutta methods.
I could spend a long time on these if I wanted.
Instead, I'm going to show you a restricted family of schemes called **collocation** methods.
They're easy to describe, they are pretty hard to break, and they're going to lead us nicely into the methods that we'll use for spatial discretization later.
They take a little more work in some respects.

Suppose we want to solve an ODE on some time interval $[0, T]$.
First, we break up the interval into smaller sub-intervals along the knot points $\{t_0, t_1, \ldots, t_{N - 1}, t_N\}$.
I'm always going to use $N$ for the number of knot points and $n$ to denote the index of a knot point.
Moreover, I'll write
$$\delta t_n = t_{n + 1} - t_n$$
for the length of the $n$th sub-interval.
**The notation can get really annoying.
Do not hesitate to ask for clarification.**

I'll start by showing you the simplest collocation method before we go on to the general case.
The simplest collocation method assumes that, in each sub-interval, the approximate solution is a linear function, and we have to determine the coefficients.
**The idea of collocation is to pick a finite set of points and make the ODE exact at those points.**
This will give us a system of equations to solve for the coefficients in the basis expansion.
Hopefully, as we take more and more points, the approximations will grow closer and closer to the true solution.

Before we get down to the details, there's a problem that we'll encounter here in the simplest possible form, but which will recur repeatedly throughout this class.
What are the degrees of freedom for our approximation, how do we pack them into an array, and what geometric entities are they associated with?
For example, if we assume that the approximate solution is linear in each sub-interval, we could use two degrees of freedom for each sub-interval, which is certainly sufficient.
So if there are $N$ sub-intervals we have $2N$ degrees of freedom.

But in fact this choice of degrees of freedom isn't very efficient -- we can get by with much less.
We'll assume for all of the collocation methods that we define (degree-1 or higher) that the approximate solution is a continuous function.
In the degree-1 case, we can assume that there is a single degree-of-freedom $\hat{u}_n$ for each knot point $t_k$.
We can then define the approximation by linear interpolation within each cell.
If $t$ lies in the interval $[t_n, t_{n + 1}]$, then we can define the re-mapped time
$$\tau = (t - t_n) / \delta t_n$$
as the fractional duration of $t$ through the interval $[t_n, t_{n + 1}]$; the nice part about working with this variable is that $\tau$ is always in the interval $[0, 1]$.
We then define $u$ within this interval as
$$u(t_n + \tau\,\delta t_n) = (1 - \tau)\hat u_n + \tau\hat u_{n + 1}$$
and piece together these linear segments to define $u$ over the entire interval $[0, T]$.
This choice only requires $N + 1$ degrees of freedom.
If this feels a little silly or basic now, it'll make more sense when we get to higher-order collocation.

First, we were given some initial conditions, so we can state right away that
$$\hat u_0 = u|_{t = 0}.$$
Now let's suppose that we've determined the coefficients $\hat u_0, \ldots, \hat u_n$, and we now want to compute $\hat u_{n + 1}$.
The collocation condition is that the differential equation is exact at some time point $\tau_*$ between $t_n$ and $t_{n + 1}$.
We know right away the derivative of $u$:
$$\frac{d}{dt} u(\tau_*) = \frac{\hat u_{n + 1} - \hat u_n}{\delta t_n}.$$
So in order to make the ODE exact at $\tau_*$, we need that
$$\frac{\hat u_{n + 1} - \hat u_n}{\delta t_n} = f\Big((1 - \tau_*)\hat u_n + \tau_*\hat u_{n + 1}, t_n + \tau_*\delta t_n\Big) \tag{1}$$
This is an implicit equation for the new value $\hat u_{n + 1}$.
If $f$ is linear, then it's a linear system; otherwise, it's nonlinear.

How do we pick the collocation point $\tau_*$?
We can answer this question by looking at the simplest equation possible:
$$\dot u = -\lambda u.$$
We'll then consider some speical cases.
For a scalar linear equation, the collocation equation becomes
$$\frac{\hat u_{n + 1} - \hat u_n}{\delta t_n} = -\lambda\Big((1 - \tau_*)\,\hat u_n + \tau_*\,\hat u_{n + 1}\Big)$$
which we can rearrange to get
$$\hat u_{n + 1} = \frac{1 - (1 - \tau_*)\cdot\delta t\cdot\lambda}{1 + \tau_*\cdot\delta t\cdot\lambda}\hat u_n.$$
So far so good, but we still don't know how to pick $\tau_*$.
We'll consider three possible values -- $\tau_* = 0$, $\tau_* = 1$, and $\tau_* = 1/2$.
For $\tau_* = 0$, we get the *forward* method:
$$\hat u_{n + 1} = (1 - \delta t\cdot\lambda)\hat u_n.$$
For $\tau_* = 1$, we get the *backward* method:
$$\hat u_{n + 1} = (1 + \delta t\cdot\lambda)^{-1}\hat u_n$$
and finally for $\tau_* = 1/2$, we get the *midpoint* method:
$$\hat u_{n + 1} = \frac{1 - \frac{1}{2}\delta t\cdot\lambda}{1 + \frac{1}{2}\delta t\cdot\lambda}\hat u_n.$$
If we want to get anywhere from here, we have to specialize what $\lambda$ is.
In the coding exercise, you'll look at two cases: $\lambda$ real and positive, and $\lambda$ purely imaginary.

## Linear collocation for scalar problems

Write a routine to compute the degree-1 collocation approximation to the scalar linear ODE
$$\dot u = -\alpha u, \quad u(0) = u_0.$$
The function signature is shown below; fill in the body.
Remember that you can pass the argument `dtype=complex` when creating a numpy array in order to specify that it consists of complex numbers.

A note on Greek letters. 
You can put a Greek letter in Python code in a Jupyter notebook by typing a backslash, spelling it out in english, and then hitting the Tab key.
So for example to type an alpha, you would start typing `\alpha` and then hit Tab.

In [None]:
def linear_collocation(
    u_0: complex,
    α: complex,
    T: float,
    N: int,
    τ: float,
) -> np.ndarray:
    ...

#### Real $\alpha$

First try an initial condition $u_0 = 1$, a decay constant $\alpha = 1$, a final time $T = 20.0$ and 200 steps.
See what happens when you use the forward method, the backward method, and the midpoint method, and compare them to the exact solution $\exp(-\alpha t)$.

Useful functions that you'll probably need:
* [np.linspace](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html)
* [np.exp](https://numpy.org/doc/stable/reference/generated/numpy.exp.html)

Remember that many numpy functions (like `exp`) can take arrays as arguments; they'll return an array of the corresponding shape.

In [None]:
α = 1.0
T = 20.0
N = 200
u_0 = 1.0

...

Try them all but with a decay constant of $\alpha = 18$.
Then try it for even higher values.
Hint: the backward and midpoint methods should look ok, the forward method... less so.

In [None]:
α = 18.0

...

#### Complex $\alpha$

Try $\alpha = i/2$ where $i$ is the imaginary unit, $i^2 = -1$.
You can get the imaginary unit in Python with `1j`.
(Electrical engineers use $j$ for the imaginary unit instead of $i$, blame them for the confusion.)
Since the solution is complex, there's a little more going on here.
You can get the real and imaginary parts of an array `u` with `u.real` and `u.imag`.
Plot the results however you wish.
How do the forward, backward, and midpoint methods behave differently from the exact solution?
Try doubling `N` but keeping `T` the same.

In [None]:
α = 0.5 * 1j

...

#### Convergence rates

Now we'll look at the convergence rate for each of the methods as we take more and more steps.
Take $\alpha = 1$, $T = 20$ again, and compute the results of each method for $N = 20$, $N = 40$, and so on up to $N = 2^{10} \cdot 20$.
For each choice of the number of steps, compute the root mean-squared deviation from the true solution $\exp(-\alpha t)$.
Plot the results on a log scale in both variables.
Then compute the constants $C$ and $p$ on
$$\text{error} \approx C\cdot\delta t^p.$$
You can do this by running a linear fit on the logarithms of both the errors and timestep sizes.
The exponent $p$ is the *order of convergence*.
What is it for each method?

Helpful functions:
* [np.logspace](https://numpy.org/doc/stable/reference/generated/numpy.logspace.html)
* [Axes.set_xscale("log")](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_xscale.html), likewise for `y`
* [np.polyfit](https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html)

In [None]:
u_0 = 1.0
α = 1.0
T = 20.0

exponent = 10
Ns = 20 * np.logspace(start=0, stop=exponent, num=exponent + 1, base=2, dtype=int)
dts = T / Ns

def compute_errors(τ: float, Ns: np.ndarray) -> np.ndarray:
    errors = np.zeros_like(Ns, dtype=float)
    for index, N in enumerate(Ns):
        ...
    return errors

for τ, name in zip([0.0, 0.5, 1.0], ["forward", "midpoint", "backward"]):
    slope, intercept = np.polyfit(np.log(dts), np.log(compute_errors(τ, Ns)), 1)
    print(f"{name}: error ~= {np.exp(intercept):.1g} * dt^{slope:.1f}")

## Linear collocation for vector problems

We can do collocation for multidimensional problems as well.
But we have to be careful about what we mean by "division".
In the multidimensional setting, division becomes solving a linear system.
Suppose that $f(u, t) = -Au$ for some matrix $A$.
Then the collocation equation becomes
$$\hat u_{n + 1} = \left(I + \tau_*\cdot\delta t\cdot A\right)^{-1}\left(I - (1 - \tau)\cdot\delta t\cdot A\right)\hat u_n.$$
The same methods are defined as before: $\tau_* = 0$ is the forward method, $\tau_* = 1$ is the backward method, and $\tau_* = 1/2$ is the midpoint method.

**Exericse:** Suppose that $A$ is diagonalizable.
Can you write out more explicitly what the degree-1 collocation method does in terms of the eigenvalue decomposition of $A$?


Now write a routine to compute the degree-1 collocation approximation to the system of ODEs
$$\dot u = -A\, u, \quad u(0) = u_0.$$
Note: computing the inverse of a matrix is **forbidden**.
I'll explain why.
You're better off using `numpy.linalg.solve` repeatedly.
We can discuss better ways in class.

When you wrote the routine to do scalar problems with collocation, the shape of the returned array was pretty obvious -- it's a 1D array with `num_steps + 1` entries.
For multidimensional problems, we have a choice.
Suppose that `num_states` is the size of the ODE system, i.e. the number of variables or the dimension of $u_0$.
The returned array has to be 2D, but does it have shape `(num_steps + 1, num_states)` or does it have shape `(num_states, num_steps + 1)`?
In principle, it could be either.
I'm going to make the decision for you that the shape should be `(num_steps + 1, num_states)`.
We'll talk about why this is better in class.

Useful functions:
* np.eye
* np.linalg.solve

In [None]:
def linear_collocation(u_0, A, final_time, num_steps, τ):
    ...

#### Dissipative problems

Try the forward, backward, and midpoint methods on a diffusion-type problem like we saw for the probability density of random walks.
You can copy over code to generate the matrices for those problems.
Find a reasonable ending time and timestep through the method of your choice.
You can do this either by finding the eigenvalue decomposition of the rate matrix, or you can be a sensible person and just do it through trial and error.

In [None]:
def make_rate_matrix(holding_times, probabilities):
    ...

In [None]:
holding_times = ...
probabilities = ...
A = make_rate_matrix(holding_times, probabilities)
u_0 = ...

In [None]:
timescale = 50.0
num_steps = 32

u_forward = linear_collocation(u_0, A, timescale, num_steps, 0.0)
u_backward = linear_collocation(u_0, A, timescale, num_steps, 1.0)
u_midpoint = linear_collocation(u_0, A, timescale, num_steps, 0.5)

Plot the total probability using each method.
(Hint: you can use `np.sum` along a particular axis.)
Mathematically, we know that the total probability should always add up to 1.
Is this reflected in all our numerical methods?

Plot the final value of each solution.

Now we'll try a simple random walk on, say, 64 sites.
Make the probability of jumping up or down equal to 0.5 in both directions.
At the endpoints, the random walker will wrap around, so in state 63 there is equal probability to go back down to 62 but also back around to 0, and vice versa.
Make the holding times all equal to 1 again, make the matrix of jump probabilities, and make the rate matrix.

In [None]:
num_states = 64

holding_times = ...
probabilities = ...

A = make_rate_matrix(holding_times, probabilities)

Make the initial probability density equal to $1/32$ for sites 0 through 31, and equal to 0 for sites 32 through 63.
(Check that it sums to 1!)

In [None]:
u_0 = ...

We'll use an absurdly large timestep here.

In [None]:
timescale = 50.0
num_steps = 4

Run the simulation and plot the result at time index 1, so not the initial condition but one right after.
What do you notice?

In [None]:
u_backward = linear_collocation(u_0, A, timescale, num_steps, 1.0)
u_midpoint = linear_collocation(u_0, A, timescale, num_steps, 0.5)

In [None]:
fig, ax = plt.subplots()
ax.plot(u_backward[1], label="backward")
ax.plot(u_midpoint[1], label="midpoint")
ax.legend();

What do you think I'm trying to show you here?

#### Conservative problems

Write some code to form the system matrix for the coupled oscillator problem.
You can copy over code from the previous notebook.
Remember that if $D$ is the incidence matrix and $\Omega$ is the diagonal matrix of frequencies,
$$A = \left[\begin{matrix}0 & D\Omega \\ -\Omega D^* & 0\end{matrix}\right].$$
(The zeros in the upper left and lower right block don't have the same size -- we don't necessarily have the same number of nodes as edges.)

In [None]:
def make_incidence_matrix(springs):
    ...

def make_oscillator_matrix(frequencies, springs):
    ...

Solve the oscillator problem using the initial condition of your choice for a periodic, linear chain of oscillators with 128 nodes and constant frequencies.
Try it with the forward, backward, and midpoint methods.
Pick a final time long enough for something interesting to happen and choose a timestep.
This is, once again, trial and error.
There's some code to make a movie below.
Make sure to watch what happens for the midpoint method.
If you have time, compare to the backward and forward methods.

In [None]:
num_weights = 128
springs = ...
frequencies = ...
A = make_oscillator_matrix(frequencies, springs)
final_time = ...
num_steps = ...
z_0 = ...

In [None]:
zs_forward = linear_collocation(z_0, A, final_time, num_steps, 0.0)
zs_backward = linear_collocation(z_0, A, final_time, num_steps, 1.0)
zs_midpoint = linear_collocation(z_0, A, final_time, num_steps, 0.5)

In [None]:
qs_forward = zs_forward[:, :num_weights]
qs_backward = zs_backward[:, :num_weights]
qs_midpoint = zs_midpoint[:, :num_weights]

In [None]:
%%capture
fig, ax = plt.subplots()
x = np.array(list(range(num_weights)))
ax.set_ylim((-1, +1))
points = ax.scatter(x, qs_midpoint[0])

def animate(q):
    points.set_offsets(np.column_stack((x, q)))
animation = FuncAnimation(fig, animate, qs_midpoint, interval=1e3/30)

In [None]:
HTML(animation.to_html5_video())

Assuming that you saved the oscillator matrix in a variable named `A`, the function below will compute the energy of the time series `zs`.
Compute the energies for the forward, backward, and midpoint methods and plot them.
How do they compare?
You might need to use [set_ylim](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_ylim.html) because of one of these schemes.

In [None]:
def energies(zs):
    return -0.5 * np.array([np.dot(z, A @ (A @ z)) for z in zs])

In [None]:
ts = np.linspace(0, final_time, num_steps + 1)
fig, ax = plt.subplots()
energies_midpoint = energies(zs_midpoint)
ax.plot(ts, energies_midpoint, label="midpoint")
ax.set_ylim((0, 2 * energies_midpoint.max()))
ax.plot(ts, energies(zs_forward), label="forward")
ax.plot(ts, energies(zs_backward), label="backward")
ax.set_xlabel("time")
ax.set_ylabel("energy")
ax.legend();

If you have time, try all this again with a smaller timestep.
How does the energy drift of each method change when you use a different timestep?

### Stability

The demonstrations above show the different orders of convergence for the forward, backward, and midpoint schemes.
You'll also have seen that the forward scheme can become wildly oscillatory if you take too long a timestep.
The backward and midpoint schemes don't give a very accurate answer if you use a very long timestep, but they at least remain bounded.

The ODE that we're trying to solve is a dynamical system, but you can think of the numerical method as defining a dynamical system too, just in discrete time instead of continuous.
The method is said to be *stable* for a given problem (i.e. matrix $A$) and timestep if the corresponding discrete dynamical system is stable, i.e. the trajectories $\hat u_n$ remain bounded for large $n$.
Note that stability depends on the timestep and the problem that you're trying to solve.

**Exercise:** Show that the forward method is stable if $\delta t < \lambda^{-1}$.
Show that the backward and midpoint methods are stable for any $\delta t$ as long as the real part of $\lambda$ is positive.
*Hint*: Apply the formulas above for the different schemes to write down an expression for $\hat u_n$ in terms of $\hat u_0$ for each method.
Then use the fact that, for any complex number $z$, $|z|^n$ is bounded for any $n$ only if $|z| \le 1$.

**Exercise:** What are the consequences if we're solving a system of ODE instead?