# This was the assessment for Inversion and Optimisation in 2022 <a class="tocSkip"></a>
It covers the material correspoinding to **this** (2023) year's lectures 1-5, and 9-11. Note that in 2022 this assessment also covered lecture 12 (data assimilation), which will be unassessed this year, in a separate part not included here, so this year the part corresponding to lecture 1-5 and 9-11 may be just slightly longer. Separately, there is also an assessment based on lectures 6-8 in week 2.

## Section A - Row Echelon Form, Nullspace and Minimum Norm Solution
Consider the following matrix

$$
\underline{\mathbf A} =
\begin{pmatrix}
  2  & -1 & 0 & 0 & 0 \\
  -4 & 3 & 2 & 0 & 0 \\
  7 & -4 & 1 & 2 & 4 \\
  5 & -3 & -2 & -1 & -2 \\
\end{pmatrix}
$$

* Work out its Row Echelon Form
* Using this result, determine the rank of the matrix and the dimension of its nullspace. Which of the following terms applies to this matrix (or a linear system of equations based on it)? Give all terms that apply and explain why.
  - under-determined
  - equi-determined
  - over-determined
  - mixed-determined
  - full-rank
  - singular
  - rank-deficient
* Give a (linearly-independent) basis for the nullspace of the matrix
* You are given the exact solution $\boldsymbol{x}=(0,0,0,0,31)$ for the linear systems $\underline{\mathbf A}\boldsymbol{x} = \boldsymbol{b}$ with right-hand side vector $\boldsymbol{b}=(0,  0, 124, -62)$. Using the nullspace vectors, find the minimum norm (norm of $\boldsymbol{x}$) solution to the same equation (with the same right-hand side). Use a different method to find the same minimum-norm solution for this case (you may use any scipy routine for this) and check that the answer is the same. Explain why this method also provides the same answer.

## Section B - Krylov Subspaces

Consider the linear system $\underline{\mathbf A}{\bf x}={\bf b}$ based on the matrix

$$
\underline{\mathbf A} = \begin{pmatrix}
2 & 0 & 1 \\ 0 & 4 & 0 \\ 1 & 0 &2
\end{pmatrix}
$$

* Using an initial guess of ${\bf x_0} = (0,0,0)$, give a (linearly independent) basis for the Krylov subspaces $\mathcal{D}_0, \mathcal{D}_1,$ and $\mathcal{D}_2$ for the following cases:
   - a right-hand side vector ${\bf b} = (1, 0, 0)$
   - a right-hand side vector ${\bf b} = (1, 1, 0)$
   - a right-hand side vector ${\bf b} = (0, 1, 0)$
* Based on the previous answer, predict how many iterations the Conjugate Gradient algorithm and the GMRES algorithm will take to solve the system $\underline{\mathbf A}{\bf x}={\bf b}$ with initial guess ${\bf x_0} = (0,0,0)$ for each of the cases.
* For the case ${\bf b} = (1, 0, 0)$ compute what the iterative approximations ${\bf x}^{i}$ are for $i=1, \dots, n$, where $n$ is the number of iterations you have predicted in the previous question, using the Conjugate Gradient Algorithm. Work these out yourself (in code or by hand), do not use an existing CG implementation here.
* For the case ${\bf b} = (1, 0, 0)$ compute what the iterative approximations ${\bf x}^{i}$ are for $i=1, \dots, n$, where $n$ is the number of iterations you have predicted, using the GMRES Algorithm. Work these out yourself (in code or by hand), do not use an existing GMRES implementation here.

## Section C - Nonlinear Methods

In the lectures we have seen the Steepest or Gradient Descent algorithm for minimisation of a quadratic function $f$:

$$
  {\bf x}^{(i+1)} = {\bf x}^{(i)} - \alpha^{(i)} f'({\bf x}^{(i)})
$$

where $\alpha$ controls the step size, and we made the choice:

$$
  \alpha^{(i)} = \frac{{\bf r}^{(i)}\cdot {\bf r}^{(i)}}
  {{\bf r}^{(i)} \cdot\underline{\mathbf A} {\bf r}^{(i)}}
$$

with $\underline{\mathbf A}$ being the (constant) Hessian matrix of $f$, and ${\bf r}^{(i)}=-f'({\bf x}^{(i)})$.

For the minimisation of more general, nonlinear functions $f$ we need to consider a different formula for $\alpha$. One option is the Barzilai-Borwein formula:

$$
  \alpha^{(i)} = \frac{| \left({\bf x}^{(i)} - {\bf x}^{(i-1)}\right)\cdot \left(f'({\bf x}^{(i)}) - f'({\bf x}^{(i-1)})\right)|}
  {\|f'({\bf x}^{(i)}) - f'({\bf x}^{(i-1)})\|^2}
$$

Implement the Steepest Descent method with this choice for the step size and test it on the so called Rosenbrock function

$$
  f(x,y) = 100 (y-x^2)^2 + (1-x)^2
$$

Plot the convergence trajectory for a number of different initial guesses. Describe and try to explain what you observe. Compare the convergence with that of Newton's method (for this you may use any of the code in the lecture notes), no line search or trust region method is needed.

Note that the Barzilai-Borwein formula depends on the last two iterations. In the very first iteration you can just use a fixed value of $\alpha$ instead, say $\alpha^{(0)}=0.01$.

Hint: for the Rosenbrock function itself, and any of its derivatives you can use the implementation in scipy.optimize:

In [None]:
import scipy.optimize as sop
xy = [0,1]
print("f(x, y) =", sop.rosen(xy))
print("f'(x, y) =", sop.rosen_der(xy))
print("f''(x, y) =", sop.rosen_hess(xy))

## Section D - Image Smoothing

In lecture 3 we saw how a discrete Laplace operator can be used to smoothen/blur an image. We solve the following linear system:

$$
  \left[\underline{\mathbf I} + m \underline{\mathbf A}\right]
  \boldsymbol{u}_{\text{smooth}} = \boldsymbol{u}_{\text{orig}}
$$

where $\boldsymbol{u}_{\text{orig}}$ is the original image and $\boldsymbol{u}_{\text{smooth}}$ the smoothed image we solve for. $\underline{\mathbf I}$ is the identity matrix, $\underline{\mathbf A}$ is the discrete Laplace operator, and $m$ is a positive constant. The images $\boldsymbol{u}_{\text{orig}}$ and $\boldsymbol{u}_{\text{smooth}}$ are stored as flattened vectors, where an image of $N_y\times N_x$ is stored as single vector of length $n=N_yN_x$.

To read in the original image, you may use the following code:

In [None]:
import matplotlib.pyplot as plt
# reads in an image of Ny x Nx x 4
# where the last dimension represents three RGB colour channels and an alpha channel
img = plt.imread('london_road.png')

# convert to black and white, by averaging over the RGB channels (dropping the alpha channel)
img_bw = img[:,:,:3].sum(axis=-1)/3

# flatten into vector of length Nx*Ny
u_orig = img_bw.flatten()

print(img_bw.shape, u_orig.shape)

which we can display using

In [None]:
fig, ax = plt.subplots(1, figsize=(17,10))
ax.imshow(u_orig.reshape(img_bw.shape), cmap='gray');

The discrete Laplace operator matrix can be obtained from the following piece of code:

In [None]:
import numpy as np
import scipy.sparse as sp

def Laplace(Ny, Nx):
    """ Assembles a discrete Laplace operator with Neumann boundary conditions
    into a sparse matrix.
    """
    # construct matrix from 5 (off-)diagonals
    # we provide the diagonals as one 5 x n array
    # the actual off-diagonals should of course be shorter
    # but dia_matrix cuts them off for us
    n = Nx*Ny
    offsets = [-Nx, -1, 0, 1, Nx]
    diags = -np.ones((5,n))
    diags[2] = -diags[2]*4  # main diagonal should be positive and 4 times the off-diagonals
    A = sp.dia_matrix((diags, offsets), shape=(n,n)).tocsr()
    
    # grid point in the right-most column, should not be connected to
    # the grid point in the first column on the next row
    for i in range(1,Ny):
        A[i*Nx-1, i*Nx] = 0
        A[i*Nx, i*Nx-1] = 0
        
    # for homogenous Neumann boundary conditions all we have to do is
    # make sure that the diagonal is set such that the row sum is zero
    # This replaces some of the 4 values on the diagonal with the actual number
    # of connected grid points on the boundary:
    A.setdiag(A.diagonal() - np.array(A.sum(axis=1)).flatten())
    
    return A

A = Laplace(img_bw.shape[0], img_bw.shape[1])

* What are the properties of the matrix $\left[\underline{\mathbf I} + m \underline{\mathbf A}\right]$? Based on your answer choose an appropriate iterative solver and produce a smoothened image with $m=10$ using a solver from https://docs.scipy.org/doc/scipy/reference/sparse.linalg.html

We would now like to consider a spatially varying smoothing filter, where the amount of smoothing in each grid point is controlled by the entries in _vector_ $\bf m$ of length $n=NyNx$. We can do this by solving
$$
  \left[\underline{\mathbf I} + \operatorname{diag}(\bf m) \underline{\mathbf A}\right]
  \boldsymbol{u}_{\text{smooth}} = \boldsymbol{u}_{\text{orig}}
$$
where $\operatorname{diag}(\bf m)$ is the diagonal matrix with the entries of $\bf m$ on its main diagonal. Its role is to scale each row of $\underline{\mathbf A}$ with the corresponding value of $\bf m$.

An example for a spatially varying $\bf m$ is set up in the code below:

In [None]:
# set up x and y coordinate vectors for the grid
Ny, Nx = img_bw.shape
x, y = np.meshgrid(np.linspace(0,1,Nx), np.linspace(0,1,Ny))
x = x.flatten()
y = y.flatten()

# blur everywhere except for an area around (0.25,0.6)
m = (1-np.exp(-((y-0.6)**4 + (x-0.25)**4)*500))*10

# plot the values of the m vector
plt.imshow(m.reshape(img_bw.shape))
plt.colorbar();

* Assemble the matrix $\left[\underline{\mathbf I} + \operatorname{diag}(\bf m) \underline{\mathbf A}\right]$ (hint you may find [scipy.sparse.diags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.diags.html) useful) and investigate its properties. Solve the linear system using an iterative solver from scipy.sparse.linalg and show the resulting image.

In the file 'london_road_tilt.png' you'll find the result of the same spatially varying smoothing process but based on an unknown vector $\bf m$. We would like to find out what vector $\bf m$ has been used to produce that image. We can do this by formulating the following PDE-constrained optimisation problem:

$$
  \text{minimize} f({\bf u}, {\bf m})\;\;
  \text{subject to }g({\bf u}, {\bf m}) = 0
$$

where

$$
  f(\bf u, \bf m) = \| \bf u - \bf{u_{\text target}} \|^2
$$

is the functional that measures the difference between the image 'london_road_tilt.png' stored as the vector $\bf{u_{\text target}}$ and an image $\bf u$ that satisfies the PDE constraint

$$
  g(\bf u, \bf m) \equiv \left[\underline{\mathbf I} + \operatorname{diag}(\bf m) \underline{\mathbf A}\right] {\bf u} - \bf{u_{\text orig}} = \bf 0
$$

where $\bf u_{\text orig}$ represents the original image 'london_road.png'. 

* Implement the reduced functional $\hat f({\bf m})$ associated with this PDE-constrained optimisation problem and implement its derivative
$$
  \frac{\partial \hat f({\bf m})}{\partial {\bf m}}
$$
Make sure you appropriately test this derivative!

**Hint:** one of the derivatives that you might need is given by
    
$$
  \frac{\partial g({\bf u}, {\bf m})}{\partial{\bf m}}
  =\operatorname{diag}(\underline{\mathbf A}{\bf u})
$$

* Using only the reduced functional $\hat f({\bf m})$ and its derivative, describe an optimisation algorithm that we might use to solve the PDE constrained optimisation problem and motivate your choice. You do not need to perform this optimisation here! In trying out such an algorithm we find poor convergence and very noisy solutions; Describe why this might be the case and what we could do to improve the situation. We also find a number of other images that have had the same smoothing applied, based on the same unknown vector $\bf m$. Describe how we might use these to improve the accuracy of the inversion problem.