# Question 1 <a class="tocSkip"></a>

Ahmed, Brian, Cheng, Dave, and Evelyn went to eat at an Italian restaurant:

* Ahmed ordered the lasagna, a salad, a coke, and a fanta for £30.90

* Brian ordered the lasagna and a salad for £26.50

* Cheng ordered the lasagna and a coke for £20.20

* Dave ordered a salad, and a fanta for £10.70

* Evelyn ordered the lasagna, two salads and a fanta for £37.20

To work out the prices for each individual item we formulate the following linear system of equations

$$
\begin{pmatrix}
1 & 1 & 1 & 1 \\
1 & 1 & 0 & 0 \\
1 & 0 & 1 & 0 \\
0 & 1 & 0 & 1 \\
1 & 2 & 0 & 1
\end{pmatrix}
\begin{pmatrix}
    x_1 \\ x_2 \\ x_3 \\ x_4
\end{pmatrix}
=
\begin{pmatrix}
    30.9 \\ 26.5 \\ 20.2 \\ 10.7 \\ 37.2
\end{pmatrix},
$$

where $x_1, x_2, x_3,$ and $x_4$ are the prices for the lasagna, the salad, a coke and a fanta respectively.

**1.1** Using Gaussian Elimination, work out the Reduced Row Echelon Form of the matrix on the left-hand side of the system of equations, and determine its null space and rank.
Which of the following terms apply to this system:
* equi-determined
* over-determined
* under-determined
* mixed-determined
* full-rank
* singular
* rank-deficient

Name all terms (may be more than one) that apply.

## Answer <a class="tocSkip"></a>
The Reduced Row Echelon Form (RREF) is worked out in the code below:

In [3]:
import numpy as np
from pprint import pprint

In [4]:
A = np.array([
    [1, 1, 1, 1],
    [1, 1, 0, 0],
    [1, 0, 1, 0],
    [0, 1, 0, 1],
    [1, 2, 0, 1]])

In [5]:
A0=A.copy() # copy of original
# eliminate nonzero entries below pivot in column 0
A[1] = A[1] - A[0]
A[2] = A[2] - A[0]
A[4] = A[4] - A[0]
pprint(A)

array([[ 1,  1,  1,  1],
       [ 0,  0, -1, -1],
       [ 0, -1,  0, -1],
       [ 0,  1,  0,  1],
       [ 0,  1, -1,  0]])


In [6]:
# swap rows 1 and 3
v = A[1].copy()
A[1] = A[3].copy()
A[3] = v
pprint(A)

array([[ 1,  1,  1,  1],
       [ 0,  1,  0,  1],
       [ 0, -1,  0, -1],
       [ 0,  0, -1, -1],
       [ 0,  1, -1,  0]])


In [7]:
# eliminate nonzero entries below pivot in column 1
A[2] += A[1]
A[4] -= A[1]
pprint(A)

array([[ 1,  1,  1,  1],
       [ 0,  1,  0,  1],
       [ 0,  0,  0,  0],
       [ 0,  0, -1, -1],
       [ 0,  0, -1, -1]])


In [8]:
# swap rows 2 and 3
v = A[2].copy()
A[2] = A[3].copy()
A[3] = v
pprint(A)

array([[ 1,  1,  1,  1],
       [ 0,  1,  0,  1],
       [ 0,  0, -1, -1],
       [ 0,  0,  0,  0],
       [ 0,  0, -1, -1]])


In [9]:
# eliminate nonzero entries below pivot in column 2
A[4] = A[4] - A[2]
pprint(A)

array([[ 1,  1,  1,  1],
       [ 0,  1,  0,  1],
       [ 0,  0, -1, -1],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0]])


In [10]:
# we are now in REF, and continue to eliminate entries above the diagonal
# to achieve RREF - first column1
A[0] = A[0] - A[1]
pprint(A)

array([[ 1,  0,  1,  0],
       [ 0,  1,  0,  1],
       [ 0,  0, -1, -1],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0]])


In [11]:
# same for column 2
A[0] = A[0] + A[2]
pprint(A)

array([[ 1,  0,  0, -1],
       [ 0,  1,  0,  1],
       [ 0,  0, -1, -1],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0]])


In [12]:
# we can't elimate the nonzero entries column 3
# all we've left to do is to ensure the leading pivot in row 2 is 1 as well
# which we achieve by multiplying it by -1
A[2] *= -1
pprint(A)

array([[ 1,  0,  0, -1],
       [ 0,  1,  0,  1],
       [ 0,  0,  1,  1],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0]])


The rank of the matrix can be read off immediately from the RREF, which clearly has 3 independent rows, and thus the number of independent rows and the rank of the original matrix is also 3. Using the rank-nullity theorem we then also know that the dimension of the nullspace must be: #columns - 3 = 1.

Note that I asked for the nullspace itself, and not just its dimension. Since it is a one-dimensional space, we just need to find one null vector $\bf n$, and the nullspace will be formed by any scalar multiples of it. This vector is quite easily found from the RREF (which has the same null vectors):

$$
\begin{pmatrix}
  1 & 0 & 0 & -1 \\
  0 & 1 & 0 & 1 \\
  0 & 0 & 1 & 1 \\
  0 & 0 & 0 & 0 \\
  0 & 0 & 0 & 0
\end{pmatrix}
\begin{pmatrix}
  n_0 \\ n_1 \\ n_2 \\n_3
\end{pmatrix}
 =
\begin{pmatrix}
  0 \\ 0 \\ 0 \\ 0 \\ 0
\end{pmatrix}
$$ 

You could for instance start by setting its first coefficient $n_0=1$ (the first coefficient is either zero, which leads to all other coefficients being zero to satisfy the first three equations, or nonzero but since we can multiply any nullvector by an arbitrary scalar, we might as well choose its first coefficient to be 1). The other coefficients then immediately follow from satisfying the first three row equations. And thus we get a nullspace

$$
  N({\rm\underline A}) = \text{span}\big(
  \begin{pmatrix}
     1 \\ -1 \\ -1 \\ 1
  \end{pmatrix}\big)
$$

Based on the definitions in the lecture notes, the matrix system is:

* over-determined
* mixed determined
* rank deficient

Note that by these definitions the system is *not* under-determined. The definition of under vs over determined here is purely based on the number of equations (rows) vs. the number of unknowns (columns), whether these equations are independent or not. Of course, in this case we know that we only have three independent equations - so you could think of it being equivalent to an under-determined system that has only three equations. The reason however we stick to a definition that still strictly calls the original system over-determined, is that we are reliant on the right-hand side being exactly consistent, which in practice, in large systems we never achieve (due to round off or measurement error), which makes it different from the under-determined case. The system is however rank deficient (rank smaller than the number of columns or rows), or equivalently, mixed determined.

**1.2** Use the generalized inverse to give a solution for this system (you may use scipy here). Is it unique? Why is this solution not realistic?

## Answer <a class="tocSkip"></a>

The solution from the generalized inverse is given in the code below. It is not a unique solution to the system of equations as we know that the system has a non-trivial nullspace. The solution we get is not realistic as one of the prices comes out negative.

In [14]:
import scipy.linalg as sl
A = A0  # get back original matrix
b = [30.9, 26.5, 20.2, 10.7, 37.2]
sl.pinv(A) @ b

array([15.625, 10.875,  4.575, -0.175])

**1.3** Suppose we know that half a year ago the prices were £17.5, £8.0, £2.0, £2.0 for the lasagna, the salad, the coke and the fanta respectively. Reformulate the system of equations so that we can find the solution that is closest to these previous prices.

## Answer <a class="tocSkip"></a>

There are a number of approaches possible here:

### Use the generalized inverse on a reformulated system <a class="tocSkip"></a>
We know that when there are multiple solutions, the generalized inverse gives us the solution with the shortest length of the solution vector. If we want the solution that is closest to some desired vector $\hat{\bf x}$ (based on some a priori knowledge), we can use this fact by solving for the _difference_ with that desired vector:

$$
  \delta {\bf x} = {\bf x} - \hat{\bf x}
$$

We have

$$
  {\rm\underline A} \delta {\bf x} = {\rm\underline A} {\bf x} - {\rm\underline A} {\bf\hat x}
  ={\bf b} - {\rm\underline A} {\bf\hat x} \tag{*}
$$

where we can directly calculate the right-hand side of the last equation. Thus we have a new linear system based on the same matrix, but with a different right-hand side, which we can solve for $\delta{\bf x}$. As it is based on the same matrix, with the same nullspace, the solutions are again not unique, but this time if we apply the generalized inverse we get the solution with minimal norm of $\delta{\bf x}$, and thus the solution $\bf x$ that is closest to $\hat{\bf x}$, the prices from half a year ago. In code:

In [15]:
xhat = [17.5, 8.0, 2.0, 2.0]
# solve A deltax = A (x - xhat) = b - A@x0
deltax = sl.pinv(A) @ (b - A@xhat)
x = xhat + deltax
pprint(x)

array([18. ,  8.5,  2.2,  2.2])


Equally valid approaches are:

### Use the nullspace <a class="tocSkip"/>
We can use the nullspace vector $\bf n$ we obtained in question 1.1. Any multiple (say with a scalar $\alpha$) can be added to the solution we obtained in question 1.2. By computing the norm of the difference between this ${\bf x} + \alpha {\bf n}$ and the previous prices $\hat{\bf x}$, we get a quadratic expression in $\alpha$ which we can minimize.

### Solve a constrained optimisation problem <a class="tocSkip"/>
The solution needs to satisfy ${\rm\underline A}{\bf x} = {\bf b}$ but we simultaneously want to minimize $\|{\bf\hat x}-{\bf x}\|^2$. This can be seen as a constrained minimisation problem for which we can use the Lagrangian

$$
  L({\bf x}, {\boldsymbol\lambda}) = \|{\bf\hat x}-{\bf x}\|^2 + \boldsymbol\lambda^T \left({\rm\underline A}{\bf x} - {\bf b}\right)
$$

Note that to satisfy the original equations ${\rm\underline A}{\bf x} = {\bf b}$ you need five Lagrange multipliers (i.e. $\boldsymbol\lambda$ is a length 5 vector) for the 5 equations (constraints). However if you work out the equations to find the stationary point of the Lagrangian, you will notice that you end up again with a singular system. What does work is if instead of the constraint ${\rm\underline A}{\bf x} = {\bf b}$ you simply use the three equations of the RREF form of the equations. This does require however that you have also worked out the corresponding operations on the right-hand side vector (which wasn't asked for in question 1.1). If you then solve the constraint optimisation problem, you are doing in fact get the same as if you would apply the least squares solution approach on the RREF form of the "reformulated" system in equation (\*) above.

You could also use a constrained optimisation solver to obtain the solution like SLSQP.

### Using damped least squares <a class="tocSkip"></a>
This is minimizing

$$
f = \|{\rm\underline A}{\bf x} - {\bf b}\|_2^2 + \mu \|{\bf x}-{\bf\hat x}\|_2^2
$$

where you need to try a few values of $\mu$ to obtain a $\mu$ that is small enough that ${\rm\underline A}{\bf x} - {\bf b}$ is still relatively accurately solved.

A number of people tried an approach where they simply added ${\bf x} = {\bf\hat x}$ or ${\rm\underline I}{\bf x} = {\bf\hat x}$ as four additional equations to the five equations of the system ${\rm\underline A}{\bf x}={\bf b}$. This results in an over-determined that has no solution, but is not rank deficient. Although it is true that least-squares would give the solution that most closely satisfies these nine equations, that does *not* give you what I asked for. The least squares solution for these nine equations in fact minimizes $\|{\rm\underline A}{\bf x} - {\bf b}\|_2^2 + \|{\bf x}-{\bf\hat x}\|_2^2$, as that is the residual of these nine equations, but what we want is for a solution that exactly satisfies ${\rm\underline A}{\bf x}={\bf b}$, in other words $\|{\rm\underline A}{\bf x} - {\bf b}\|_2^2=0$, *and* minimizes $\|{\bf x}-{\bf\hat x}\|_2^2$.