<table>
 <tr align=left><td><img align=left src="./images/CC-BY.png">
 <td>Text provided under a Creative Commons Attribution license, CC-BY. All code is made available under the FSF-approved MIT license. (c) Kyle T. Mandli</td>
</table>

Note:  This material largely follows the text "Numerical Linear Algebra" by Trefethen and Bau (SIAM, 1997) and is meant as a guide and supplement to the material presented there.

In [1]:
%matplotlib inline
import numpy
import matplotlib.pyplot as plt

# Conditioning and Stability

Conditioning is the behavior of a perturbed problem mathematically (analytically).  Stability is concerned with how an algorithm behaves when perturbed (say with input).

## Conditioning and Condition Numbers

A **well-conditioned** problem is one where a small perturbation to the original problem leads to only small changes in the solution.

Formally we can think of a function $f$ which maps $x$ to $y$

$$
    f(x) = y ~~~~\text{or}~~~~ f: X \rightarrow Y.
$$

Using $\epsilon$ notation, let $x \in X$ where we perturb $x$ with $\epsilon$ and we ask how the result $y$ changes:

$$
    ||f(x) - f(x + \epsilon)|| \leq C ||x - (x+\epsilon)||
$$

for some constant $C$ possible dependent on $\epsilon$ depending on the type of conditioning we are considering.

### Absolute Condition Number

If we let $\delta x = x + \epsilon$ be the small perturbation to the input and $\delta f = f(x + \delta x) - f(x)$ be the result the **absolute condition number** $\hat{\kappa}$ can be defined as
$$
    \hat{\kappa} = \sup_{\delta x} \frac{||\delta f||}{||\delta x||}
$$
for most problems (assuming $\delta f$ and $\delta x$ are both infinitesimal).  

When $f$ is differentiable we can evaluate the condition number via the Jacobian as we did with Lipschitz constants in the homework (note that the Lipschitz constant is really a form of condition number).  Recall that the derivative of a multi-valued function can be termed in the form of a Jacobian $J(x)$ where
$$
    [J(x)]_{ij} = \frac{\partial f_i}{\partial x_j}(x).
$$

This allows us to write the infinitesimal $\delta f$ as
$$
    \delta f \approx J(x) \delta x
$$
with equality when $||\delta x|| \rightarrow 0$.  Then we can write the condition number as
$$
    \hat{\kappa} = ||J(x)||
$$
where the norm is the one induced by the spaces $X$ and $Y$.

### Relative Condition Number

The **relative condition number** is defined similarly and is related to the difference before between the absolute error and relative error as defined previously.  With the same caveats as before it can be defined as
$$
    \kappa = \sup_{\delta x} \left( \frac{\frac{||\delta f||}{||f(x)||}}{\frac{||\delta x||}{||x||}} \right).
$$

Again if $f$ is differentiable we can use the Jacobian $J(x)$ to evaluate the relative condition number as
$$
    \kappa = \frac{||J(x)||}{||f(x)|| ~/ ~||x||}.
$$

#### Examples
Calculate the following relative condition numbers of the following problems.

$\sqrt{x}$ for $x > 0$. 

$$
    f(x) = \sqrt{x}, ~~~~ J(x) = f'(x) = \frac{1}{2 \sqrt{x}} \\
    \kappa = \frac{||J(x)||}{||f(x)|| / ||x||} = \frac{1}{2 \sqrt{x}} \frac{x}{\sqrt{x}} = \frac{1}{2}
$$

Obtain the scalar $f(x) = x_1 - x_2$ from the vector $x = (x_1, x_2)^T \in \mathbb R^2$ using a $\ell_\infty$ norm.

$$
    f(x) = x_1 - x_2, ~~~~ J(x) = \left [ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}\right ] = [1, -1] \\
    \kappa = \frac{||J(x)||_\infty}{||f(x)||_\infty / ||x||_\infty} = \frac{2 \max_{i=1,2} |x_i|}{|x_1 - x_2|}
$$


Matrix-vector multiplication $Ax$ only perturbing $x$.

$$\begin{aligned}
    \kappa &= \sup_{\delta x} \left ( \frac{||A (x+\delta x) - A x||}{||Ax||} \frac{||x||}{||\delta x||}\right ) \\
    &= \sup_{\delta x} \frac{ ||A \delta x||}{||\delta x||} \frac{||x||}{||Ax||} \\
    &= ||A|| \frac{||x||}{||A x||}
\end{aligned}$$

If $A$ has an inverse, then we can use
$$
    \frac{||x||}{||A x||} \leq ||A^{-1}||
$$
and therefore
$$
    \kappa \leq ||A|| ||A^{-1}||.
$$

### Condition Number of a Matrix

The condition number of a matrix is defined by the product
$$
    \kappa(A) = ||A||~||A^{-1}||.
$$
where here we are thinking about the matrix rather than a problem.  If $\kappa$ is small than $A$ is said to be **well-conditioned**.  If $A$ is singular we assign $\kappa(A) = \infty$ as the matrix's condition number.

When we are considering the $\ell_2$ norm then we can write the condition number as
$$
    \kappa(A) = \frac{\sqrt{\rho(A^\ast A)}}{\sqrt{\rho((A^\ast A)^{-1})}} = \frac{\sqrt{\max |\lambda|}}{\sqrt{\min |\lambda|}}.
$$

### Condition Number of a System of Equations

Another way to think about the conditioning of a problem we have looked at before is that the matrix $A$ itself is an input to the problem.  Consider than the system of equations $Ax = b$ where we will perturb both $A$ and $x$ resulting in
$$
    (A + \delta A)(x + \delta x) = b.
$$

Assuming we solve the problem exactly we know that $Ax = b$ and that the infinitesimals multiplied $\delta A \delta x$ is small the above simplifies to
$$
\begin{aligned}
    (A + \delta A)(x + \delta x) &= b \\
    Ax + \delta Ax + A \delta x + \delta A \delta x &= b \\
    \delta Ax + A \delta x & = 0
\end{aligned}
$$

Solving for $\delta x$ leads to 
$$
    \delta x = -A^{-1} \delta A x
$$
implying

$$
    ||\delta x|| \leq ||A^{-1}|| ~ ||\delta A|| ~ ||x||
$$

and therefore
$$
    \frac{\frac{||\delta x||}{||x||}}{\frac{||\delta A||}{||A||}} \leq ||A^{-1}||~||A|| = \kappa(A).
$$

We can also say the following regarding the condition number of a system of equations then

**Theorem:**  Let $b$ be fixed and consider the problem of computing $x$ in $Ax = b$ where $A$ is square and non-singular.  The condition number of this problem with respect to perturbations in $A$ is the condition number of the matrix $\kappa(A)$.

## Stability

We now return to the consideration of the fact that we are interested not only in the well-conditioning of a mathematical problem but in how we might solve it on a finite precision machine.  In some sense conditioning describes how well we can solve a problem in exact arithmetic and stability how well we can solve the problem in finite arithmetic.  

### Accuracy and Stability

As we have defined before we will consider **absolute error** as
$$
    ||\hat{f}(x) - f(x)||
$$
where $\hat{f}(x)$ is the approximation to the true solution $f(x)$.  Similarly we can define **relative error** as
$$
    \frac{||\hat{f}(x) - f(x)||}{||f(x)||}.
$$
In the ideal case we would like the relative error to be $\mathcal{O}(\epsilon_{\text{machine}})$.

Looking back at our definitions of conditioning it is clear that it unrealistic to expect a poorly conditioned problem to be accurately computed.  We then restrict our notion of stability with the notion of conditioning so that a problem $f$ is **stable** for $x \in X$ if

$$
    \frac{||\hat{f}(x) - f(\hat{x})||}{||f(\hat{x})||} = \mathcal{O}(\epsilon_{\text{machine}})
$$

for some $\hat{x}$ with 

$$
    \frac{||\hat{x} - x||}{||x||} = \mathcal{O}(\epsilon_{\text{machine}}).
$$

In other words
> A stable algorithm gives nearly the right answer to nearly the right question.

#### Backwards Stability

A stronger notion of stability can also be defined which is satisfied by many approaches in numerical linear algebra.  We say that an algorithm $\hat{f}$ is **backward stable** if for $x \in X$ we have

$$
    \hat{f}(x) = f(\hat{x})
$$

for some $\hat{x}$ with

$$
    \frac{||\hat{x} - x||}{||x||} = \mathcal{O}(\epsilon_{\text{machine}}).
$$

In other words
> A backward stable algorithm gives exactly the right answer to nearly the right question.

An important aspect of the above statement is that we can not necessarily guarantee an accurate result.  If the condition number $\kappa(x)$ is small we would expect that a stable algorithm would give us an accurate result (by definition).  This is reflected in the following theorem.

**Theorem:**  Suppose a backward stable algorithm is applied to solve a problem $f: X \rightarrow Y$ with condition number $\kappa$ on a finite precision machine, then the relative errors satisfy
$$
    \frac{||\hat{f}(x) - f(\hat{x})||}{||f(\hat{x})||} = \mathcal{O}(\kappa(x) ~ \epsilon_{\text{machine}}).
$$

**Proof:**  By the definition of the condition number of a problem we can write
$$
    \frac{||\hat{f}(x) - f(\hat{x})||}{||f(\hat{x})||} \leq (\kappa(x) + \mathcal{O}(1))\frac{||\hat{x} - x||}{||x||}
$$
where $\mathcal{O}(1) \rightarrow 0$ as $\epsilon_{\text{machine}} \rightarrow 0$.  Combining this with the definition of backwards stability we can arrive at the statement of the theorem.

The proof above demonstrates **backward error analysis**, in other words using the condition number of the problem and stability of the algorithm to determine the error.  A perhaps more obvious approach to determine eventual accuracy would be to consider the accrual of error at each step of an algorithm given slightly perturbed input.  This approach is known as **forward error analysis**.

### Stability of $Ax = b$ using Householder Triangularization

As an example lets consider the conditioning and algorithm for solving $Ax = b$.  Here we will use a $QR$ factorization approach to solve $Ax = b$ given by a Householder triangularization.  First off lets discuss the $QR$ factorization itself.

**Theorem:** Let the $QR$ factorization $A = QR$ of a matrix $A \in \mathbb C^{m \times n}$ be computed using a Householder triangularization approach on a finite precision machine, then

$$
    \hat{Q}\hat{R} = A + \delta A ~~~~~ \frac{||\delta A||}{||A||} = \mathcal{O}(\epsilon_{\text{machine}})
$$

for some $\delta A \in \mathbb C^{m \times n}$ where $\hat{Q}$ and $\hat{R}$ are the finite arithmetic versions of $Q$ and $R$.  Householder triangularization is therefore backward stable.

#### Solving $Ax = b$ with $QR$ Factorization

So Householder triangularization is backwards stable but we also know that this does not guarantee accuracy if the problem itself is ill-conditioned.  Is backward stability enough to guarantee accurate results if we use it for $Ax = b$ for instance?  It turns out that the accuracy of the product of $QR$ is enough to guarantee accuracy of a larger algorithm.

Consider the steps to solving $A x = b$ using $QR$ factorization:
1. Compute the $QR$ factorization of $A$
1. Multiply the vector $b$ by $Q^\ast$ so that $y = Q^\ast b$.
1. Solve using backward-substitution the triangular system $R x = y$ or $x = R^{-1} y$.

We know that step (1) is backward stable, what about step (2), the matrix-vector multiplication?  We can write the estimate of the backwards stability as

$$
    (\hat{Q} + \delta Q) \hat{y} = b ~~~~ \text{with}~~~~ ||\delta Q|| = \mathcal{O}(\epsilon_{\text{machine}})
$$

where we have inverted the matrix $\hat{Q}$ since it is unitary.  Since this is exact we know also that the matrix-vector multiplication is also backwards stable since this is an equivalent statement to multiplying $b$ by a slightly perturbed matrix.

Step (3) is backward substitution (or the computation of $R^{-1}$).  Writing the backwards stability estimate we have

$$
    (\hat{R} + \delta R) \hat{x} = \hat{y} ~~~~\text{with}~~~~ \frac{||\delta R||}{||\hat{R}||} = \mathcal{O}(\epsilon_{\text{machine}})
$$

demonstrating that the results $\hat{x}$ is the exact solution to a slight perturbation of the original problem.

These results lead to the following two theorems:

**Theorem:**  Using $QR$ factorization to solve $Ax=b$ as described above is backward stable, satisfying
$$
    (A + \Delta A) \hat{x} = b, ~~~~ \frac{||\Delta A||}{||A||} = \mathcal{O}(\epsilon_{\text{machine}})
$$
for some $\Delta A \in \mathbb C^{m \times n}$.

**Theorem:** The solution $\hat{x}$ computed by the above algorithm satisfies
$$
    \frac{||\hat{x} - x||}{||x||} = \mathcal{O}(\kappa(x) ~ \epsilon_{\text{machine}}).
$$