<center>
    <img src="http://sct.inf.utfsm.cl/wp-content/uploads/2020/04/logo_di.png" style="width:60%">
    <h1> INF-285 - Computación Científica </h1>
    <h2> An small but detailed example of GMRes </h2>
    <h2> <a href="#acknowledgements"> [S]cientific [C]omputing [T]eam </a> </h2>
    <h2> Version: 1.00</h2>
</center>

<div id='toc' />

## Table of Contents
* [Preliminary: Short review of useful topics for this Jupyter Notebook](#preliminary)
    * [Matrix vector product](#matvec)
    * [A least-square problem](#prels)
    * [Translating a linear system of equations to a least-square problem](#fromlstolsp)
    * [How this is connected with GMRes](#connectiongmres)
    * [What does GMRes do? What are its advantages and disadvantages?](#questionsprelim)
* [The Small Example](#smallexample)
* [The Krylov sub-space](#krylovsubspace)
* [Arnoldi Iteration for the computation of the upper Hessenberg form](#arnoldi)
* [Looking at the vectors obtained](#lookingatvectors)
    * [First case: Krylov sub-space](#plotfirstcase)
    * [Second case: Looking at the vectors using the orthonormal parametrization of x](#plotsecondcase)
    * [Final case: Solving the small least-square problems](#plotfinalcase)
* [Colorful version of GMRes](#colorfulgmres)
    * [Matrix A0](#ma0)
    * [Matrix A1](#ma1)
    * [Matrix A2](#ma2)
    * [Matrix A3](#ma3)
    * [With a widget but lossing the colors..., nevertheless it is useful for looking at different values of m](#uncolorfulgmres)
* [Acknowledgements](#acknowledgements)

In [25]:
import numpy as np # type: ignore
import matplotlib.pyplot as plt # type: ignore
from ipywidgets import interact # type: ignore
import ipywidgets as widgets # type: ignore 

import matplotlib # type: ignore
FS = 14
matplotlib.rc('xtick', labelsize=FS)
matplotlib.rc('ytick', labelsize=FS)
plt.rcParams.update({
    'font.size': FS,
    'text.usetex': True,
    'font.family': 'sans-serif',
    'font.sans-serif': 'Helvetica',
    'text.latex.preamble': r'\usepackage{amsfonts}\usepackage{amsmath}'
})

from colorama import Fore, Back, Style
# https://pypi.org/project/colorama/
# conda install -c anaconda colorama
# Fore: BLACK, RED, GREEN, YELLOW, BLUE, MAGENTA, CYAN, WHITE, RESET.
# Back: BLACK, RED, GREEN, YELLOW, BLUE, MAGENTA, CYAN, WHITE, RESET.
# Style: DIM, NORMAL, BRIGHT, RESET_ALL
textBold = lambda x: Style.BRIGHT+x+Style.RESET_ALL
textBoldH = lambda x: Style.BRIGHT+Back.YELLOW+x+Style.RESET_ALL
textBoldI = lambda x: Style.BRIGHT+Back.GREEN+Fore.BLACK+x+Style.RESET_ALL
textBoldR = lambda x: Style.BRIGHT+Back.RED+Fore.BLACK+x+Style.RESET_ALL

def colorful_GMRes(A,b,m=3):
	# Checking it all make sense:
	if m>n:
		raise UserWarning('ERROR: "m" must be less or equal than "n"')

	# Storing the initial residual norm
	nb=np.linalg.norm(b)
	# Pre-allocating the memory needed for the matrices Q and H
	Q = np.zeros((n,np.min([m+1,n])))
	H = np.zeros((np.min([m+1,n]),np.min([m,n])))
	flag_last_columns = False
	flag_break = False

	# Computing q1
	Q[:,0] = b / nb
	# Assuming we execute "m" iterations, where "m<n"
	for k in np.arange(m):
		print(textBoldI('Processing column '),textBold('k ='),textBold(str(k)))
		##############################
		# Arnoldi iteration
		##############################
		# Build the initial LHS of "A@q_k=\sum_{i=1}^{k+1} h_{i,k}*q_i"
		if k<n-1:
			# THIS IS THE COMMON CASE
			y = np.dot(A, Q[:,k])
			for j in np.arange(np.min([k+1,n])):
				H[j,k] = np.dot(Q[:,j], y)
				y = y - H[j,k]*Q[:,j]
			H[k+1,k] = np.linalg.norm(y)
			# We check if H[k+1,k] is not null, so we can build q_{k+1} 
			if (np.abs(H[k+1,k]) > threshold):
				# No break-down -> we can get a new orthonormal vector
				Q[:,k+1] = y/H[k+1,k]
			else:
				# This is a 'good' break down!
				flag_break=True
		else: 
			# When we get k=n-1, i.e. we need to process the last column,
			# the procedure only needs to compute the coefficients.
			y = np.dot(A, Q[:,k])
			for j in np.arange(np.min([k+1,n])):
				H[j,k] = np.dot(Q[:,j], y)
				y = y - H[j,k]*Q[:,j]
			flag_last_columns=True
		##############################
	
		############################################################
		# Finding the approximation or 'exact' solution.
		############################################################
		if flag_last_columns:
			# Do you remember why we have "e_1" here?
			e1 = np.zeros(n)        
			e1[0]=1
			H_tilde = H	
			# Solving the 'SMALL' "SQUARE" linear system of equations. 
			ck = np.linalg.solve(H_tilde, nb*e1)
			xk = np.dot(Q[:,0:(k+1)], ck)
		elif flag_break:
			# Early break_down (which is good!)
			# Do you remember why we have "e_1" here?
			e1 = np.zeros(k+1)        
			e1[0]=1
			H_tilde=H[0:(k+1),0:(k+1)]
			ck = np.linalg.solve(H_tilde, nb*e1)
			xk = np.dot(Q[:,0:(k+1)], ck)
			print(' ',textBoldH('Reduced problem solved:'))
			print('  ',textBold('H_tilde :\n'),H_tilde)
			print('  ',textBold('||b||*e_1 :'),nb*e1)
			print('  ',textBold('ck:'),ck)
			print('  ',textBoldH('||nb*e1-H_tilde@ck||\t='),np.linalg.norm(nb*e1-H_tilde@ck))
			print('  ',textBold('xk found:'),xk)
			print('  ',textBoldH('||b-A@xk||\t\t='),np.linalg.norm(nb*e1-H_tilde@ck))
			print(textBoldR('####################################################################################'))
			print(textBoldI('GMRes finished in only '),textBold('%d'%(k+1)),textBoldI('iterations!!!'))
			print(textBoldR('####################################################################################'))
			break
		else:
			# THIS IS THE COMMON CASE
			# Do you remember why we have "e_1" here?
			e1 = np.zeros((k+1)+1)        
			e1[0]=1
			H_tilde=H[0:(k+1)+1,0:k+1]
			# Solving the 'SMALL' least square problem. 
			ck = np.linalg.lstsq(H_tilde, nb*e1,rcond=None)[0] 
			xk = np.dot(Q[:,0:(k+1)], ck)
		print(' ',textBoldH('Reduced problem solved:'))
		print('  ',textBold('H_tilde :\n'),H_tilde)
		print('  ',textBold('||b||*e_1 :'),nb*e1)
		print('  ',textBold('ck:'),ck)
		print('  ',textBoldH('||nb*e1-H_tilde@ck||\t='),np.linalg.norm(nb*e1-H_tilde@ck))
		print('  ',textBold('xk found:'),xk)
		print('  ',textBoldH('||b-A@xk||\t\t='),np.linalg.norm(nb*e1-H_tilde@ck))
		if flag_last_columns:
			print(textBoldR('####################################################################################'))
			print(textBoldI('GMRes finished in '),textBold('3'),textBoldI('iterations.'))
			print(textBoldR('####################################################################################'))
	############################################################
 
	# For comparison we include the np.linalg.solve approximation
	print(textBoldH('\nGMRes approximation\t:'),xk)
	print(textBoldH('np.linalg.solve\t\t:'),np.linalg.solve(A,b))

<div id='preliminary' />

# Preliminary: Short review of useful topics for this Jupyter Notebook
[Back to TOC](#toc)

<div id='matvec' />

## Matrix vector product
[Back to TOC](#toc)

Consider that the matrix $A$ belong to $\mathbb{R}^{m\times m}$ and $\mathbf{x}\in\mathbb{R}^m$.

A matrix-vector product $A\,\mathbf{x}$ can be understood as a linear combination of the columns of $A$, say $\mathbf{a}_i$ for $i\in\{1,2,3,\dots,m\}$, and the coefficients $x_i$, which are the components of $\mathbf{x}$.
Thus,
$$
\begin{align*}
    A\,\mathbf{x} 
    &=
    \begin{bmatrix}
        \mathbf{a}_1, & \mathbf{a}_2, & \mathbf{a}_3, & \dots & \mathbf{a}_m
    \end{bmatrix}\,\mathbf{x}\\
    &=
    \begin{bmatrix}
        \mathbf{a}_1, & \mathbf{a}_2, & \mathbf{a}_3, & \dots & \mathbf{a}_m
    \end{bmatrix}\,
    \begin{bmatrix}
        x_1\\
        x_2\\
        \vdots\\
        x_n
    \end{bmatrix}\\
    &=
    x_1\,\mathbf{a}_1+x_2\,\mathbf{a}_2+x_3\,\mathbf{a}_3 +\dots+ x_n\,\mathbf{a}_m\\
    &=
    \sum_{i=1}^m x_i\,\mathbf{a}_i.
\end{align*}
$$
This means that whenever we see $A\,\mathbf{x}$ we get a vector that has the form $\sum_{i=1}^m x_i\,\mathbf{a}_i$.
This will be useful for this notebook.

<div id='prels' />

## A least-square problem
[Back to TOC](#toc)

Consider that the matrix $A$ belong to $\mathbb{R}^{m\times n}$, $\mathbf{x}\in\mathbb{R}^n$ and $\mathbf{b}\in\mathbb{R}^m$.

A least-square problem can be written in several forms.
For instance, we traditionally write the residual vector $\mathbf{r}=\mathbf{b}-A\,\mathbf{x}$, where we write,
$$
\begin{align*}
    \overline{\mathbf{x}} 
        &= \argmin_{\mathbf{x}\in\mathbb{R}^n} \left\| \mathbf{r} \right\|_2^2\\
        &= \argmin_{\mathbf{x}\in\mathbb{R}^n} \left\| \mathbf{b}-A\,\mathbf{x} \right\|_2^2\\
        &= \argmin_{\mathbf{x}\in\mathbb{R}^n} \left\| \mathbf{b}-\sum_{i=1}^n x_i\,\mathbf{a}_i \right\|_2^2.
\end{align*}
$$
Particularly the last form says that we want the find the coefficients $x_i$ of the linear combinations of the columns of the matrix $A$, i.e. $\mathbf{a}_i$ to approximate $\mathbf{b}$, such that we minimiza the 2-norm.
This will be usefull again for this notebook.

<div id='fromlstolsp' />

## Translating a linear system of equations to a least-square problem
[Back to TOC](#toc)

Consider that the matrix $A$ belong to $\mathbb{R}^{m\times m}$ and is not singular, $\mathbf{x}\in\mathbb{R}^n$, and $\mathbf{b}\in\mathbb{R}^m$.

First, we will make a connection between a linear system of equations and a least-square problem.
For instance, we can still define $\mathbf{r}=\mathbf{b}-A\,\mathbf{x}$ for a linear system of equations.
In this case, all the dimensions match, i.e. all the vectors involved are of dimension $m$ and the matrix $A$ is of dimension $m\times m$.
This implies that by solving the linear system of equations $A\,\mathbf{x}=\mathbf{b}$ we can make the residual vector equal to the zero vector, i.e. $\mathbf{0}$, just by replacing $\mathbf{x}$ by $A^{-1}\,\mathbf{b}$ in the residual vector above.
Recall that $A^{-1}$ exists (we usually don't want to compute but this does not mean it does not exists!), so when we replace it we get,
$$
\begin{align*}
 \mathbf{r}
    &=\mathbf{b}-A\,\mathbf{x}\\
    &=\mathbf{b}-A\,A^{-1}\,\mathbf{b}\\
    &=\mathbf{b}-I\,\mathbf{b}\\
    &=\mathbf{b}-\mathbf{b}\\
    &=\mathbf{0}.
\end{align*}
$$

Now, let's suppose we **parametrize** or **restrict** the vector space where **we want** to _find_ an **approximation** of the _solution_ $\mathbf{x}$.
For instance, a possible **restriction** for the approximation of $\mathbf{x}$ is the vector sub-space $V\subset \mathbb{R}^m$, where $V=\mathrm{span}\left(\mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3\right)$ and the vectors $\mathbf{v}_j$ for $j\in\{1,2,3\}$ are linearly independent.
This implies that the dimension of $V$ is 3.

So, **how we find the _"best"_ approximation in $V$?** 
Recall that here we consider the best approximation is in the **least-square** sense.
Thus, we need to minimize the following expression,
$$
\begin{align*}
    \left.\mathbf{x}^{\textrm{best}}\right|_{V}
        &= \argmin_{\mathbf{x}\in V} \left\| \mathbf{b}-A\,\mathbf{x} \right\|_2^2,
\end{align*}
$$
where $\left.\mathbf{x}^{\textrm{best}}\right|_{V}$ denotes that we are restricting the vector $\mathbf{x}^{\textrm{best}}$ to $V$.
The challenge here seems to be the _computation_ of the **minimization** problem but **restricting** the domain where $\mathbf{x}$ belongs.
Recall that in this problem $\mathbf{x}$ is a vector with $m$ components, but when we restricted to the vectors sub-space $V$ the components ($x_i$) are not just reals numbers, there is a correlation between them.
In summary, solving this problems that way is _cumbersome_.

A better way to solve the previous minimization problem is by means a **parametrization** of the vector sub-space $V$.
In particular, to assure that $\mathbf{x}$ belongs to $V$ is by writing it as,
$$
\begin{align*}
    \left.\mathbf{x}\right|_{V} 
        &= c_1\,\mathbf{v}_1+c_2\,\mathbf{v}_2+c_3\,\mathbf{v}_3\\
        &=
        \underbrace{\begin{pmatrix}
           \mathbf{v}_1, & \mathbf{v}_2, & \mathbf{v}_3 
        \end{pmatrix}}_{\displaystyle{\textrm{This is a $m \times 3$ matrix}}}
        \underbrace{\begin{pmatrix}
           c_1\\
           c_2\\
           c_3
        \end{pmatrix}
        }_{\displaystyle{\textrm{This is a vector}}}
\end{align*}
$$
where $c_j\in \mathbb{R}$ are unknown coefficients to be determined.
Thus, using this, we can translate the previous minimization problem as follows,
$$
\begin{align}
    \min_{\mathbf{x}\in V} \left\| \mathbf{b}-A\,\mathbf{x} \right\|_2^2
    &=
    \min_{\mathbf{c}\in \mathbb{R}^3} \left\| \mathbf{b}-A\,\begin{pmatrix}
            \mathbf{v}_1, & \mathbf{v}_2, & \mathbf{v}_3 
            \end{pmatrix}\,\mathbf{c} \right\|_2^2\\
    &=
    \min_{\mathbf{c}\in \mathbb{R}^3} \left\| \mathbf{b}-A\,\left(c_1\,\mathbf{v}_1+c_2\,\mathbf{v}_2+c_3\,\mathbf{v}_3\right)\right\|_2^2\nonumber\\
    &=
    \min_{\mathbf{c}\in \mathbb{R}^3} \left\| \mathbf{b}
        -c_1\,\underbrace{\left(A\,\mathbf{v}_1\right)}_{\displaystyle{\mathbf{w}_1}}
        -c_2\,\underbrace{\left(A\,\mathbf{v}_2\right)}_{\displaystyle{\mathbf{w}_2}}
        -c_3\,\underbrace{\left(A\,\mathbf{v}_3\right)}_{\displaystyle{\mathbf{w}_3}}
        \right\|_2^2
\end{align}
$$
Some observations,
1. Equation (1) shows how to translate a **restricted** minimization problem to a **traditional** least square problem, in this case for the unknown vector $\mathbf{c}$, the RHS $\mathbf{b}$, and the matrix $A\,\begin{pmatrix} \mathbf{v}_1, & \mathbf{v}_2, & \mathbf{v}_3 \end{pmatrix}\in\mathbb{R}^{m \times 3}$. Note that the _matrix_ of the least-square problem is the product of two matrices.
2. Equation (2) show that the minimization problem can be written as to find the coefficients $c_j$ of a linear combination of vectors $\mathbf{w}_j$ that minimiza the corresponding residual.
3. In the beginning of this sub-section we use the $\argmin$ and at the end we used the $\min$. The main difference is that the $\argmin$ returns the solution vector where the minimum is found and $\min$ return the value of the norm at the minimum. This is the reason why in Equation (1) were able to use equality, because the minimum values is the same. If we had used $\argmin$ we would had had an inconsistency.
4. An additional step is needed to find $\left.\mathbf{x}^{\textrm{best}}\right|_{V}$, this is solve as follows,
$$
\begin{align*}
    \overline{\mathbf{c}}
        &=
        \argmin_{\mathbf{c}\in \mathbb{R}^3} \left\| \mathbf{b}
        -c_1\,\mathbf{w}_1
        -c_2\,\mathbf{w}_2
        -c_3\,\mathbf{w}_3
        \right\|_2^2,\\
    \left.\mathbf{x}^{\textrm{best}}\right|_{V}
        &=
        \begin{pmatrix}
            \mathbf{v}_1, & \mathbf{v}_2, & \mathbf{v}_3 
        \end{pmatrix}\,\overline{\mathbf{c}} = \overline{c}_1\,\mathbf{v}_1+\overline{c}_2\,\mathbf{v}_2+\overline{c}_3\,\mathbf{v}_3.
\end{align*}
$$

<div id='connectiongmres' />

## How this is connected with GMRes
[Back to TOC](#toc)

1. The **restricted** sub-space will be the Krylov sub-space $\mathcal{K}_k=\text{span}\left(\mathbf{b}, A\,\mathbf{b}, A^2\,\mathbf{b}, A^3\,\mathbf{b}, \dots, A^{k-1}\,\mathbf{b}\right)$.
2. There will be a least-square problem that we need to solve, but thanks to the partial reduction of the matrix $A$ to the upper Hessenberg form $A\,Q_k=Q_{k+1}\,\widetilde{H}_k$, we will be solving a very small least-square problem! **This is one of the great features of GMRes!**.

<div id='questionsprelim' />

## What does GMRes do? What are its advantages and disadvantages?
[Back to TOC](#toc)

**It solves square linear system of equations by means of a sequence of **small** least-square problems**.


- Advantages
    - It solves a square linear system of equations without _modifying_ or _accesing_ any of its coefficients, as matrix factorizations, such as $PALU$, do. **It only requires to compute, several times, the product between the matrix $A$ and a vector**.
    - The amount of memory it uses is proportional to the number of iterations performed and a way to control this is to _restart_ GMRes, i.e. use the approximation found as a _initial guess_.
    - Considering exact arithmetic, it finds the **exact** solution in at most $n$ steps for an $n\times n$ matrix.
    - It can find a **numerical solution** in less that $n$ iterations, this is called _breakdown_ and it is a good thing!
- Disadvantages
    - The memory required is proportional to number of iterations squared.
    - It requires to solve a least-square problem, but _small_, per iteration.
    - It may be challenging to understand but **it can help A LOT to solve linear system of equations that otherwise would require too much memory**. See for install the Jupyter Notebook _"Bonus - 07 - Sylvester Equation with GMRes.ipynb"_.


<div id='smallexample' />

# The Small Example
[Back to TOC](#toc)

We will analyze the use of GMRes for solving the following linear system of equations,
$$
\begin{align*}
    \begin{bmatrix}
        1 & 2 & 3\\
        3 & 2 & 1\\
        1 & 1 & -1
    \end{bmatrix}
    \mathbf{x} &=
    \begin{bmatrix}
        1\\
        1\\
        1
    \end{bmatrix}.
\end{align*}
$$

In [26]:
A = np.array([[1,2,3],[3,2,1],[1,1,-1]])
b = np.array([1,1,1])

# We compute the determinant of A to make sure it is not singular.
print(textBoldH('The determinant of A'))
print(textBold('|A|:'),np.linalg.det(A))
print(textBoldH('Since the determinant is not equal to 0, the linear system of equation has a unique solution.'))

[1m[43mThe determinant of A[0m
[1m|A|:[0m 8.000000000000002
[1m[43mSince the determinant is not equal to 0, the linear system of equation has a unique solution.[0m


<div id='krylovsubspace' />

# The Krylov sub-space
[Back to TOC](#toc)

Now we will build the _original_ Krylov sub-spaces, for completeness, we will show the three sub-spaces,
$$
\begin{align*}
    \mathcal{K}_1&=\textrm{span}\left(\mathbf{b}\right)\subset \mathcal{K}_2,\\
    \mathcal{K}_2&=\textrm{span}\left(\mathbf{b}, A\,\mathbf{b}\right) \subset \mathcal{K}_3, \\
    \mathcal{K}_3&=\textrm{span}\left(\mathbf{b}, A\,\mathbf{b}, A^2\,\mathbf{b}\right) = \mathbb{R}^3.
\end{align*}
$$
Notice that we only need 3 terms since we are solving a linear square of equations with a matrix of dimension $3\times 3$.

Recall that, in general, we have 3 equivalent Krylov sub-spaces,
$$
\begin{align*}
\mathcal{K}_k
&=\text{span}\left(\mathbf{b}, A\,\mathbf{b}, A^2\,\mathbf{b}, A^3\,\mathbf{b}, \dots, A^{k-1}\,\mathbf{b}\right)\\
&=\text{span}\left(\mathbf{q}_1, \mathbf{q}_2, \mathbf{q}_3, \mathbf{q}_4, \dots, \mathbf{q}_k\right)\\
&= \text{span}\left(\mathbf{q}_1, A\,\mathbf{q}_1, A\,\mathbf{q}_2, A\,\mathbf{q}_3, \dots, A\,\mathbf{q}_{k-1}\right).
\end{align*}$$

In the following, we will be using the relationship between the **second** and **third** representations.
See the classnotes for more details, here we will go directly into the computation.


In [27]:
# Storing the Krylov _original_ basis:
K = np.zeros((3,3))
K[:,0] = b              # b
K[:,1] = A @ b          # A @ b
K[:,2] = A @ (A @ b)    # A^2 @ b

print(textBoldH('Showing the computed basis vectors as ROW vectors:'))
print(textBold('K[:,0]:'),K[:,0])
print(textBold('K[:,1]:'),K[:,1])
print(textBold('K[:,2]:'),K[:,2])

print(textBoldH('Determinant of K to show it contains linearly-independent columns if it is not null:'))
print(textBold('|K|:'),np.linalg.det(K))

print(textBoldH('Showing the matrix K, where the columns are the basis vectors:'))
print(textBold('K:\n'),K)

[1m[43mShowing the computed basis vectors as ROW vectors:[0m
[1mK[:,0]:[0m [1. 1. 1.]
[1mK[:,1]:[0m [6. 6. 1.]
[1mK[:,2]:[0m [21. 31. 11.]
[1m[43mDeterminant of K to show it contains linearly-independent columns if it is not null:[0m
[1m|K|:[0m 50.000000000000014
[1m[43mShowing the matrix K, where the columns are the basis vectors:[0m
[1mK:
[0m [[ 1.  6. 21.]
 [ 1.  6. 31.]
 [ 1.  1. 11.]]


<div id='arnoldi' />

# Arnoldi Iteration for the computation of the upper Hessenberg form
[Back to TOC](#toc) 

Here we will use the _Arnoldi iteration_ (i.e. Modified Gram-Schmidt algorithm) for the orthonormalization of the columns of $K$.
In particular, we will compute the orthonormal vectors $\mathbf{q}_1$, $\mathbf{q}_2$, $\mathbf{q}_3$, and the matrix $\widetilde{H}_k$, from the following set of equations,
$$
\begin{align}
	A\,\mathbf{q}_1 &= h_{11}\,\mathbf{q}_1+h_{21}\,\mathbf{q}_2,\\
	A\,\mathbf{q}_2 &= h_{12}\,\mathbf{q}_1+h_{22}\,\mathbf{q}_2+h_{32}\,\mathbf{q}_3,\\
	A\,\mathbf{q}_3 &= h_{13}\,\mathbf{q}_1+h_{23}\,\mathbf{q}_2+h_{33}\,\mathbf{q}_3,
\end{align}
$$
where we start from knowing that $\mathbf{q}_1=\dfrac{\mathbf{b}}{\|\mathbf{b}\|}$.
So, we first compute $\mathbf{q}_1$,

In [28]:
# Pre-allocating the memory to store Q and H
Q = np.zeros((3,3))
H = np.zeros((3,3))

# q1 = b/||b||
q1 = b/np.linalg.norm(b)

Now, the next step is to compute the $\textrm{\color{red}unknown}$ terms from **Equation (1)**, which are the terms in $\textrm{\color{red}red}$ and the $\textrm{\color{blue}known}$ terms are in $\textrm{\color{blue}blue}$,
$$
{\color{blue} A\,\mathbf{q}_1} = {\color{red} h_{11}}\,{\color{blue} \mathbf{q}_1}+{\color{red}h_{21}\,\mathbf{q}_2}.
$$
The following code computes the $\textrm{\color{red}unknown}$ terms using the unrolled Arnoldi iteration.

In [29]:
# A@q1 = h11*q1+h21*q2
y = A@q1
h11 = np.dot(q1,y)
y -= h11*q1
h21 = np.linalg.norm(y)
q2 = y/h21
print(textBold('Showing the coefficients obtained:'))
print('[h11,h21]:',[h11,h21])
print(textBold('Showing that equality is numerically satisfied:'))
print('A@q1-(h11*q1+h21*q2):',A@q1-(h11*q1+h21*q2))

[1mShowing the coefficients obtained:[0m
[h11,h21]: [4.333333333333334, 2.357022603955159]
[1mShowing that equality is numerically satisfied:[0m
A@q1-(h11*q1+h21*q2): [0.00000000e+00 0.00000000e+00 1.11022302e-16]


Now, we will work on **Equation (2)**, which is,
$$
{\color{blue} A\,\mathbf{q}_2} = 
    {\color{red} h_{12}}\,{\color{blue} \mathbf{q}_1}
    +
    {\color{red} h_{22}}\,{\color{blue} \mathbf{q}_2}
    +{\color{red}h_{32}\,\mathbf{q}_2}.
$$

In [30]:
# A@q2 = h12*q1+h22*q2+h32*q3
y = A@q2
h12 = np.dot(q1,y)
y -= h12*q1
h22 = np.dot(q2,y)
y -= h22*q2
h32 = np.linalg.norm(y)
q3 = y/h32
print(textBold('Showing the coefficients obtained:'))
print('[h12,h22,h32]:',[h12,h22,h32])
print(textBold('Showing that equality is numerically satisfied:'))
print('A@q2-(h12*q1+h22*q2+h32*q3):',A@q2-(h12*q1+h22*q2+h32*q3))

[1mShowing the coefficients obtained:[0m
[h12,h22,h32]: [0.9428090415820628, -1.333333333333334, 1.7320508075688772]
[1mShowing that equality is numerically satisfied:[0m
A@q2-(h12*q1+h22*q2+h32*q3): [0. 0. 0.]


Finally, we will work on **Equation (3)**, which is,
$$
{\color{blue} A\,\mathbf{q}_3} = 
    {\color{red} h_{13}}\,{\color{blue} \mathbf{q}_1}
    +
    {\color{red} h_{23}}\,{\color{blue} \mathbf{q}_2}
    +
    {\color{red} h_{33}}\,{\color{blue} \mathbf{q}_3}.
$$
This case is a bit _different_ from the previous ones. 
The main difference is that the last term now has a $\textrm{\color{blue}known}$ part, which is ${\color{blue} \mathbf{q}_3}$.
This implies that the procedure to find ${\color{red} h_{33}}$ needs to be different to the one we used for the _last_ terms before.

In [31]:
# A*q3 = h13*q1+h23*q2+h33*q3
y = A@q3
h13 = np.dot(q1,y)
y -= h13*q1
h23 = np.dot(q2,y)
y -= h23*q2
# IMPORTANT: Why can't we do this? Because we already have q3!
# h33 = np.linalg.norm(y)
# q3 = y/h33
h33 = np.dot(q3,y)
print(textBold('Showing the coefficients obtained:'))
print('[h13,h23,h33]:',[h13,h23,h33])
print(textBold('Showing that equality is numerically satisfied:'))
print('A@q3-(h13*q1+h23*q2+h33*q3):',A@q3-(h13*q1+h23*q2+h33*q3))

print(textBoldH('For completeness, we show the output for h33 when using:\n the norm and a dot product with q3'))
print(textBold('h33=np.dot(q3,y)='),h33, textBoldI(', which is the CORRECT procedure for this case.'))
print(textBold('h33=||y||='),np.linalg.norm(y), textBoldR(', which is the INCORRECT procedure for this case.'),textBoldH('The magnitude is correct but the sign is wrong.'))
print(textBold('\nNote that if we use the INCORRECT value for h33 the previous equality does not hold'))
print('A@q3-(h13*q1+h23*q2+h33_INCORRECT*q3):',A@q3-(h13*q1+h23*q2+np.linalg.norm(y)*q3))

[1mShowing the coefficients obtained:[0m
[h13,h23,h33]: [-1.6626775227096734e-15, -7.415643429738541e-16, -1.0]
[1mShowing that equality is numerically satisfied:[0m
A@q3-(h13*q1+h23*q2+h33*q3): [-4.44089210e-16 -2.22044605e-16 -8.96266451e-17]
[1m[43mFor completeness, we show the output for h33 when using:
 the norm and a dot product with q3[0m
[1mh33=np.dot(q3,y)=[0m -1.0 [1m[42m[30m, which is the CORRECT procedure for this case.[0m
[1mh33=||y||=[0m 0.9999999999999999 [1m[41m[30m, which is the INCORRECT procedure for this case.[0m [1m[43mThe magnitude is correct but the sign is wrong.[0m
[1m
Note that if we use the INCORRECT value for h33 the previous equality does not hold[0m
A@q3-(h13*q1+h23*q2+h33_INCORRECT*q3): [ 1.41421356e+00 -1.41421356e+00  4.23163405e-16]


Now, we can show the matrix $Q_3$ and $H_3$,

In [32]:
Q[:,0] = q1
Q[:,1] = q2
Q[:,2] = q3

H = np.array([[h11,h12,h13],[h21,h22,h23],[0,h32,h33]])

print('H:\n',H)
print('Q:\n',Q)

H:
 [[ 4.33333333e+00  9.42809042e-01 -1.66267752e-15]
 [ 2.35702260e+00 -1.33333333e+00 -7.41564343e-16]
 [ 0.00000000e+00  1.73205081e+00 -1.00000000e+00]]
Q:
 [[ 5.77350269e-01  4.08248290e-01 -7.07106781e-01]
 [ 5.77350269e-01  4.08248290e-01  7.07106781e-01]
 [ 5.77350269e-01 -8.16496581e-01 -2.56395025e-16]]


Moreover, we can show that the upper Hessenberg form $A\,Q_3=Q_3\,\widetilde{H}_3$ is safiesfied:

In [33]:
print(textBold('Showing the LHS of the upper Hessenberg form:'))
print('A@Q:',(A@Q))
print(textBold('Showing the RHS of the upper Hessenberg form:'))
print('Q@H:',(Q@H))
print(textBold('Computing the difference between the LHS and RHS.'),textBoldH('It should be the null matrix.'))
print('(A@Q)-(Q@H):',(A@Q)-(Q@H))
print(textBoldI('It is the null matrix!'))
print(textBold('Computing the matrix norm of the diference between the LHS and RHS.'),textBoldH('It should be 0 or close to 0.'))
print(np.linalg.norm(A@Q-Q@H))
print(textBoldI('It is close to 0!'))

[1mShowing the LHS of the upper Hessenberg form:[0m
A@Q: [[ 3.46410162e+00 -1.22474487e+00  7.07106781e-01]
 [ 3.46410162e+00  1.22474487e+00 -7.07106781e-01]
 [ 5.77350269e-01  1.63299316e+00 -1.87694185e-16]]
[1mShowing the RHS of the upper Hessenberg form:[0m
Q@H: [[ 3.46410162e+00 -1.22474487e+00  7.07106781e-01]
 [ 3.46410162e+00  1.22474487e+00 -7.07106781e-01]
 [ 5.77350269e-01  1.63299316e+00 -9.80675399e-17]]
[1mComputing the difference between the LHS and RHS.[0m [1m[43mIt should be the null matrix.[0m
(A@Q)-(Q@H): [[ 0.00000000e+00  0.00000000e+00 -4.44089210e-16]
 [ 0.00000000e+00  0.00000000e+00 -2.22044605e-16]
 [ 2.22044605e-16  0.00000000e+00 -8.96266451e-17]]
[1m[42m[30mIt is the null matrix![0m
[1mComputing the matrix norm of the diference between the LHS and RHS.[0m [1m[43mIt should be 0 or close to 0.[0m
5.512311447771244e-16
[1m[42m[30mIt is close to 0![0m


<div id='lookingatvectors' />

# Looking at the vectors obtained
[Back to TOC](#toc)


<div id='plotfirstcase' />

## First case: Krylov sub-space
[Back to TOC](#toc)

In this section we will show the basis of the Krylov sub-space.
Notice that in the plot we normalize the vectors just to avoid visulization issues, this does not change the purpose since it does not change the direction of each vector.
$$
\begin{align*}
    K &= \begin{bmatrix} \mathbf{b}, & A\,\mathbf{b}, & A^2\,\mathbf{b} \end{bmatrix},\\
    Q &= \begin{bmatrix} \mathbf{q}_1, & \mathbf{q}_2, & \mathbf{q_3} \end{bmatrix}.
\end{align*}
$$

In [34]:
def show_vectors1(elev=15, azim=18, roll=0,show_Ki=True,show_qi=False,Ki=1,qi=1):
    ax = plt.figure().add_subplot(projection='3d')

    # Make the direction data for the arrows
    n = 6
    V = np.zeros((n,3))
    # Normalizing for simplicity, but this does not change the length.
    V[0,:] = K[:,0]/np.linalg.norm(K[:,0])
    V[1,:] = K[:,1]/np.linalg.norm(K[:,1])
    V[2,:] = K[:,2]/np.linalg.norm(K[:,2])

    V[0+3,:] = q1
    V[1+3,:] = q2
    V[2+3,:] = q3
    
    if show_Ki and show_qi:
        l=0
        ax.quiver(0, 0, 0, V[l,0], V[l,1], V[l,2], length=1, normalize=True, color='red',alpha=0.5)
        ax.quiver(0, 0, 0, V[3+l,0], V[3+l,1], V[3+l,2], length=1, normalize=True, color='black',alpha=0.5)
        ax.text(V[l,0], V[l,1], V[l,2], r'$\mathbf{K}_1=\textbf{q}_1$',(1,1,0))
        i = Ki
        for l in np.arange(1,i):
            ax.quiver(0, 0, 0, V[l,0], V[l,1], V[l,2], length=1, normalize=True, color='red',alpha=0.5)
            ax.text(V[l,0], V[l,1], V[l,2], r'$\mathbf{K}_{%d}$'%(l+1),(1,1,0)) 
        j = qi
        for l in np.arange(1,j):
            ax.quiver(0, 0, 0, V[3+l,0], V[3+l,1], V[3+l,2], length=1, normalize=True, color='black',alpha=0.5)
            ax.text(V[3+l,0], V[3+l,1], V[3+l,2], r'$\mathbf{q}_{%d}$'%(l+1),(1,1,0))
    else:
        if show_Ki:
            i = Ki
            for l in np.arange(i):
                ax.quiver(0, 0, 0, V[l,0], V[l,1], V[l,2], length=1, normalize=True, color='red',alpha=0.5)
                ax.text(V[l,0], V[l,1], V[l,2], r'$\mathbf{K}_{%d}$'%(l+1),(1,1,0))
        if show_qi:
            j = qi
            for l in np.range(j):
                # ax.quiver(X[3:(3+j),0], X[3:(3+j),1], X[3:(3+j),2], V[3:(3+j),0], V[3:(3+j),1], V[3:(3+j),2], length=0.1, normalize=True,color='black',alpha=0.5)
                ax.quiver(0, 0, 0, V[3+l,0], V[3+l,1], V[3+l,2], length=1, normalize=True, color='black',alpha=0.5)
                ax.text(V[3+l,0], V[3+l,1], V[3+l,2], r'$\mathbf{q}_{%d}$'%(l+1),(1,1,0))
    
    ax.view_init(elev, azim, roll)
    ax.set_xlabel(r'$x_1$')
    ax.set_ylabel(r'$x_2$')
    ax.set_zlabel(r'$x_3$')
    ax.set_xlim(-1, 1)
    ax.set_ylim(-1, 1)
    ax.set_zlim(-1, 1)
    plt.title(r'Krylov sub-space: $K_i=A^{i-1}\,\mathbf{b}$ (red) and $\mathbf{q}_i$ (black)')
    plt.show()
    
interact(show_vectors1,elev=(0,360,1),azim=(0,360,1),roll=(0,360,1),Ki=(1,3,1),qi=(1,3,1))

interactive(children=(IntSlider(value=15, description='elev', max=360), IntSlider(value=18, description='azim'…

<function __main__.show_vectors1(elev=15, azim=18, roll=0, show_Ki=True, show_qi=False, Ki=1, qi=1)>

<div id='plotsecondcase' />

## Second case: Looking at the vectors using the orthonormal parametrization of $\mathbf{x}$
[Back to TOC](#toc)

In the Preliminary section we show how to connect the solution of a linear system of equations with a least-square problem.
Here, we will show a pre-GMRes analysis.
We will consider as the vector space $V$, described before, the Krylov sub-space $\mathcal{K}_3=\text{span}\left(\mathbf{b}, A\,\mathbf{b}, A^2\,\mathbf{b}\right)$.
Thus, if we consider that the vectors $\mathbf{q}_1$, $\mathbf{q}_2$, and $\mathbf{q}_3$, as a basis for $\mathcal{K}_3$, the least-square minimization $ \min_{\mathbf{x}\in \mathcal{K}_3} \left\| \mathbf{b}-A\,\mathbf{x} \right\|_2^2$ can be written as,
$$
\begin{align}
    \min_{\mathbf{x}\in \mathcal{K}_3} \left\| \mathbf{b}-A\,\mathbf{x} \right\|_2^2
    &=
    \min_{\mathbf{c}\in \mathbb{R}^3} \left\| \mathbf{b}
        -A\,\left(c_1\,\mathbf{q}_1+c_2\,\mathbf{q}_2+c_3\,\mathbf{q}_3\right)
        \right\|_2^2\\
    &=
    \min_{\mathbf{c}\in \mathbb{R}^3} \left\| \mathbf{b}
        -c_1\,\left(A\,\mathbf{q}_1\right)
        -c_2\,\left(A\,\mathbf{q}_2\right)
        -c_3\,\left(A\,\mathbf{q}_3\right)
        \right\|_2^2.
\end{align}
$$
So, the following plot shows how well can the vector $\mathbf{b}$ be approximated with the linear combination of the **vectors** $A\,\mathbf{q}_1$, $A\,\mathbf{q}_2$, and $A\,\mathbf{q}_3$.

Note that the interesting part is when we find the numerical solution when using $\mathcal{K}_1$ or $\mathcal{K}_2$ in this case.
For $\mathcal{K}_3$, we actually recover $\mathbb{R}^3$, so it is clear we will find the numerical solution.

Thus, when you analyze the plot below consider we are looking to the $\textrm{span}$ of the vectors shown.
- This means that when you show only one vector, i.e. $A\,\textbf{q}_1$ the $\textrm{span}$ is the line generated by that vector. So we are looking for the orthogonal projection of $\mathbf{b}$ along $A\,\textbf{q}_1$. 
- Then, if you show $A\,\textbf{q}_1$ and $A\,\textbf{q}_2$, it means you will be looking for a solution which is the orthogonal projection of $\mathbf{b}$ on the $\textrm{span}(A\,\textbf{q}_1,A\,\textbf{q}_2)$.
- For the last case, we have three linearly independent vectors, so we recover $\mathbb{R}^3$.


In [35]:
def show_vectors2(elev=15, azim=18, roll=0, flag_Aq=False, Aq_i=1):
    ax = plt.figure().add_subplot(projection='3d')

    # Make the direction data for the arrows
    n = 4
    V = np.zeros((n,3))

    V[0,:] = A@q1/np.linalg.norm(A@q1)
    V[1,:] = A@q2/np.linalg.norm(A@q2)
    V[2,:] = A@q3/np.linalg.norm(A@q3)

    V[0+3,:] = b

    if flag_Aq:
        i = Aq_i        
        for j in np.arange(i):
            ax.quiver(0, 0, 0, V[j,0], V[j,1], V[j,2], length=1, normalize=True,color='blue',alpha=0.5)
            ax.text(V[j,0], V[j,1], V[j,2], r'$A\,\mathbf{q}_{%d}$'%(j+1),(1,1,0))
    ax.quiver(0, 0, 0, V[3,0], V[3,1], V[3,2], color='green',alpha=0.5)
    ax.text(V[3,0], V[3,1], V[3,2], r'$\mathbf{b}$',(1,1,0))
    
    ax.view_init(elev, azim, roll)
    ax.set_xlabel(r'$x_1$')
    ax.set_ylabel(r'$x_2$')
    ax.set_zlabel(r'$x_3$')
    ax.set_xlim(-1, 1)
    ax.set_ylim(-1, 1)
    ax.set_zlim(-1, 1)
    
    plt.title(r'Showing $A\,\mathbf{q}_i$ and $\mathbf{b}$')
    plt.show()
    
interact(show_vectors2,elev=(0,360,1),azim=(0,360,1),roll=(0,360,1),Aq_i=(1,3,1))

interactive(children=(IntSlider(value=15, description='elev', max=360), IntSlider(value=18, description='azim'…

<function __main__.show_vectors2(elev=15, azim=18, roll=0, flag_Aq=False, Aq_i=1)>

<div id='plotfinalcase' />

## Final case: Solving the small least-square problems
[Back to TOC](#toc)

**See more details in the classnotes! Here we show a _brief_ review.**

For the final case, we will make use of the upper Hessenberg factorizations A\,Q_k=Q_{k+1}\,\widetilde{H}_k$, the use depends on the value of $k$ because we will be getting a different least-square problem.

- For $k=1$ we use the identity $A\,\mathbf{q}_1=Q_2\,\widetilde{H}_1$ and we consider $\mathbf{x}\in\mathcal{K}_1$ can be parametrized by $c_1\,\mathbf{q}_1$:
$$
\begin{align*}
    \min_{\mathbf{x}\in \mathcal{K}_1} \left\| \mathbf{b}-A\,\mathbf{x} \right\|_2^2
    &=
    \min_{c_1\in \mathbb{R}} \left\| \mathbf{b}
        -A\,\left(c_1\,\mathbf{q}_1+\right)
        \right\|_2^2\\
    &=
    \min_{c_1\in \mathbb{R}} \left\| \mathbf{b}
        -A\,\mathbf{q}_1\,c_1
        \right\|_2^2\\
    &=
    \min_{c_1\in \mathbb{R}} \left\| \mathbf{b}
        -Q_2\,\widetilde{H}_1\,c_1
        \right\|_2^2\\
    &=
    \min_{c_1\in \mathbb{R}} \left\| \begin{bmatrix} \|\mathbf{b}\|\\ 0\end{bmatrix}
        -\widetilde{H}_1\,c_1
        \right\|_2^2\\
    &=
    \min_{c_1\in \mathbb{R}} \left\| \begin{bmatrix} \|\mathbf{b}\|\\ 0\end{bmatrix}
        -\begin{bmatrix} h_{1,1} \\ h_{2,1}\end{bmatrix}\,c_1
        \right\|_2^2\\
\end{align*}
$$ 

- For $k=2$ we use the identity $A\,Q_2=Q_3\,\widetilde{H}_2$ and we consider $\mathbf{x}\in\mathcal{K}_2$ can be parametrized by $Q_2\,\begin{bmatrix} c_1 \\ c_2\end{bmatrix}$:
$$
\begin{align*}
    \min_{\mathbf{x}\in \mathcal{K}_2} \left\| \mathbf{b}-A\,\mathbf{x} \right\|_2^2
    &=
    \min_{c\in \mathbb{R}^2} \left\| \mathbf{b}
        -A\,Q_2\,\begin{bmatrix} c_1 \\ c_2\end{bmatrix}
        \right\|_2^2\\
    &=
    \min_{c\in \mathbb{R}^2} \left\| \mathbf{b}
        -Q_3\,\widetilde{H}_2\,\begin{bmatrix} c_1 \\ c_2\end{bmatrix}
        \right\|_2^2\\
    &=
    \min_{c\in \mathbb{R}^2} \left\| \begin{bmatrix} \|\mathbf{b}\|\\ 0 \\ 0 \end{bmatrix}
        -\widetilde{H}_2\,\begin{bmatrix} c_1 \\ c_2\end{bmatrix}
        \right\|_2^2\\
    &=
    \min_{c\in \mathbb{R}^2} \left\| \begin{bmatrix} \|\mathbf{b}\|\\ 0 \\ 0 \end{bmatrix}
        -\begin{bmatrix} h_{1,1} & h_{1,2}\\ h_{2,1} & h_{2,2} \\ 0 & h_{3,2} \end{bmatrix}\,
        \begin{bmatrix} c_1 \\ c_2\end{bmatrix}
        \right\|_2^2\\
\end{align*}
$$

- For $k=3$ we use the identity $A\,Q_3=Q_3\,\widetilde{H}_3$ and we consider $\mathbf{x}\in\mathcal{K}_3$ can be parametrized by $Q_3\,\begin{bmatrix} c_1 \\ c_2 \\ c_3\end{bmatrix}$:
$$
\begin{align*}
    \min_{\mathbf{x}\in \mathcal{K}_3} \left\| \mathbf{b}-A\,\mathbf{x} \right\|_2^2
    &=
    \min_{c\in \mathbb{R}^3} \left\| \mathbf{b}
        -A\,Q_3\,\begin{bmatrix} c_1 \\ c_2 \\ c_3\end{bmatrix}
        \right\|_2^2\\
    &=
    \min_{c\in \mathbb{R}^3} \left\| \mathbf{b}
        -Q_3\,\widetilde{H}_3\,\begin{bmatrix} c_1 \\ c_2 \\ c_3\end{bmatrix}
        \right\|_2^2\\
    &=
    \min_{c\in \mathbb{R}^3} \left\| \begin{bmatrix} \|\mathbf{b}\|\\ 0 \\ 0 \end{bmatrix}
        -\widetilde{H}_3\,\begin{bmatrix} c_1 \\ c_2 \\ c_3\end{bmatrix}
        \right\|_2^2\\
    &=
    \min_{c\in \mathbb{R}^3} \left\| \begin{bmatrix} \|\mathbf{b}\|\\ 0 \\ 0 \end{bmatrix}
        -\begin{bmatrix} h_{1,1} & h_{1,2} & h_{1,3}\\ h_{2,1} & h_{2,2} & h_{2,3} \\ 0 & h_{3,2} & h_{3,3} \end{bmatrix}\,
        \begin{bmatrix} c_1 \\ c_2 \\ c_3\end{bmatrix}
        \right\|_2^2\\
\end{align*}
$$

In summary, for $k=1$ and $k=2$ we get 2 least-square problems, and for $k=3$, it is just a linear system of equations.
To better understand the small least-square problem, we will plot what it is minimizes.

For simplicity of notation in the plot we define the following vectors:
$$
\begin{align*}
    \widetilde{\mathbf{b}}_2 &= \begin{bmatrix} \|\mathbf{b}\| & 0\end{bmatrix}^\top,\\
    \widetilde{\mathbf{b}}_3 &= \begin{bmatrix} \|\mathbf{b}\| & 0 & 0\end{bmatrix}^\top,\\
    \widetilde{\mathbf{h}}_1 &= \begin{bmatrix} h_{1,1} & h_{2,1}\end{bmatrix}^\top,\\
    \mathbf{h}_1 &= \begin{bmatrix} h_{1,1} & h_{2,1} & 0\end{bmatrix}^\top,\\
    \mathbf{h}_2 &= \begin{bmatrix} h_{1,2} & h_{2,2} & h_{3,2}\end{bmatrix}^\top,\\
    \mathbf{h}_3 &= \begin{bmatrix} h_{1,3} & h_{2,3} & h_{3,3}\end{bmatrix}^\top.
\end{align*}
$$

In [36]:
def show_vectors3(elev=15, azim=18, roll=0, k=1):

    if k==1:
        fig, ax = plt.subplots()
        # Just making it unitary for visualization purposes only.
        b_tilde_2 = np.array([np.linalg.norm(b), 0])/np.linalg.norm(b)
        # Just making it unitary for visualization purposes only.
        h_tilde_1 = H[:2,0]/np.linalg.norm(H[:2,0])
        ax.set_title(r'Small least-square problem for GMRes with $k=1$')
        #######################################
        Q_out = ax.quiver(0, 0, b_tilde_2[0], b_tilde_2[1], angles='xy', scale_units='xy', scale=1, units='x',color='green')
        ax.quiverkey(Q_out, b_tilde_2[0], b_tilde_2[1], 0, r'$\widetilde{\mathbf{b}}_2$',coordinates='data')
        #######################################
        Q_out = ax.quiver(0, 0, h_tilde_1[0], h_tilde_1[1], angles='xy', scale_units='xy', scale=1, units='xy', color='blue', alpha=0.5)
        ax.quiverkey(Q_out, h_tilde_1[0], h_tilde_1[1], 0, r'$\widetilde{\mathbf{h}}_1$',coordinates='data',labelpos='E')
        #######################################
        plt.grid(True)
        ax.set_xlim([-1.5,1.5])
        ax.set_ylim([-1.5,1.5])
        plt.show()
    elif k>1:
        ax = plt.figure().add_subplot(projection='3d')
        
        # Just making it unitary for visualization purposes only.
        b_tilde_3 = np.array([np.linalg.norm(b), 0, 0])/np.linalg.norm(b)
        # Just making it unitary for visualization purposes only.
        h_1 = H[:,0]/np.linalg.norm(H[:,0])
        h_2 = H[:,1]/np.linalg.norm(H[:,1])
        h_3 = H[:,2]/np.linalg.norm(H[:,2])
        
        ax.quiver(0, 0, 0, h_1[0], h_1[1], h_1[2], length=1, normalize=True,color='blue',alpha=0.5)
        ax.text(h_1[0], h_1[1], h_1[2], r'$\mathbf{h}_1$',(1,1,0))
        ax.quiver(0, 0, 0, h_2[0], h_2[1], h_2[2], length=1, normalize=True,color='blue',alpha=0.5)
        ax.text(h_2[0], h_2[1], h_2[2], r'$\mathbf{h}_2$',(1,1,0))
        if k==3:
            ax.quiver(0, 0, 0, h_3[0], h_3[1], h_3[2], length=1, normalize=True,color='blue',alpha=0.5)
            ax.text(h_3[0], h_3[1], h_3[2], r'$\mathbf{h}_3$',(1,1,0))
        
        ax.quiver(0, 0, 0, b_tilde_3[0], b_tilde_3[1], b_tilde_3[2], color='green',alpha=0.5)
        ax.text(b_tilde_3[0], b_tilde_3[1], b_tilde_3[2], r'$\mathbf{b}$',(1,1,0))
        
        ax.view_init(elev, azim, roll)
        ax.set_xlabel(r'$x_1$')
        ax.set_ylabel(r'$x_2$')
        ax.set_zlabel(r'$x_3$')
        ax.set_xlim(-1, 1)
        ax.set_ylim(-1, 1)
        ax.set_zlim(-1, 1)
        
        if k==2:
            plt.title(r'Small least-square problem for GMRes with $k=2$')
        else:
            plt.title(r'Small least-square problem for GMRes with $k=3$')
            
        plt.show()
    
interact(show_vectors3,elev=(0,360,1),azim=(0,360,1),roll=(0,360,1),k=(1,3,1))

interactive(children=(IntSlider(value=15, description='elev', max=360), IntSlider(value=18, description='azim'…

<function __main__.show_vectors3(elev=15, azim=18, roll=0, k=1)>

<div id='colorfulgmres' />

# Colorful version of GMRes
[Back to TOC](#toc)

This implementation fo GMRes show the computation step by step.
The first cell defines the problem to be solve and the next cell executes GMRes.
For clarity, we show here the 4 matrices we will use as examples:
$$
\begin{align*}
    A_0 &= \begin{bmatrix}
            1 & 2 & 3\\
            3 & 2 & 1\\
            1 & 1 & -1
        \end{bmatrix},\\
    A_1 &= \mathrm{Random\; matrix},\\
    A_2 &= \begin{bmatrix}
            1 & 0 & 2\\
            0 & 1 & 3\\
            0 & 0 & 1
        \end{bmatrix},\\
    A_3 &= \begin{bmatrix}
            2 & 0 & 0\\
            0 & 2 & 0\\
            0 & 0 & 2
        \end{bmatrix}.\\
\end{align*}
$$

In [37]:
# Fixing the 'seed' of the random number generator to obtain reproducible outcomes.
rng = np.random.Generator(np.random.PCG64(seed=0))
threshold = 1e-12

# Defining size of the matrix
n = 3
# Defining number of iterations.
# For m=n, we need to change a little bit the main loop
m = 3
# Building a random matrix
A1 = rng.normal(0,1,size=(n,n))
b = np.ones(n)

###########################################################################
# Original matrix
A0 = np.array([[1,2,3],[3,2,1],[1,1,-1]])
###########################################################################
A2 = np.array([[1,0,2],[0,1,3],[0,0,1]])
###########################################################################
A3 = np.eye(3)*2
###########################################################################

<div id='ma0' />

## Matrix $A_0$
[Back to TOC](#toc)

$$
A_0 = \begin{bmatrix}
            1 & 2 & 3\\
            3 & 2 & 1\\
            1 & 1 & -1
        \end{bmatrix}
$$

In [38]:
colorful_GMRes(A0,b,3)

[1m[42m[30mProcessing column [0m [1mk =[0m [1m0[0m
  [1m[43mReduced problem solved:[0m
   [1mH_tilde :
[0m [[4.33333333]
 [2.3570226 ]]
   [1m||b||*e_1 :[0m [1.73205081 0.        ]
   [1mck:[0m [0.3084474]
   [1m[43m||nb*e1-H_tilde@ck||	=[0m 0.8276058886023681
   [1mxk found:[0m [0.17808219 0.17808219 0.17808219]
   [1m[43m||b-A@xk||		=[0m 0.8276058886023681
[1m[42m[30mProcessing column [0m [1mk =[0m [1m1[0m
  [1m[43mReduced problem solved:[0m
   [1mH_tilde :
[0m [[ 4.33333333  0.94280904]
 [ 2.3570226  -1.33333333]
 [ 0.          1.73205081]]
   [1m||b||*e_1 :[0m [1.73205081 0.         0.        ]
   [1mck:[0m [0.29921072 0.23839316]
   [1m[43m||nb*e1-H_tilde@ck||	=[0m 0.6041220933301769
   [1mxk found:[0m [ 0.27007299  0.27007299 -0.02189781]
   [1m[43m||b-A@xk||		=[0m 0.6041220933301769
[1m[42m[30mProcessing column [0m [1mk =[0m [1m2[0m
  [1m[43mReduced problem solved:[0m
   [1mH_tilde :
[0m [[ 4.33333333e+00  9.42809042

<div id='ma1' />

## Matrix $A_1$
[Back to TOC](#toc)

$$
A_1 = \mathrm{Random\; matrix}
$$

In [39]:
colorful_GMRes(A1,b,3)

[1m[42m[30mProcessing column [0m [1mk =[0m [1m0[0m
  [1m[43mReduced problem solved:[0m
   [1mH_tilde :
[0m [[0.70407319]
 [0.66179647]]
   [1m||b||*e_1 :[0m [1.73205081 0.        ]
   [1mck:[0m [1.30609282]
   [1m[43m||nb*e1-H_tilde@ck||	=[0m 1.186268165392827
   [1mxk found:[0m [0.75407304 0.75407304 0.75407304]
   [1m[43m||b-A@xk||		=[0m 1.186268165392827
[1m[42m[30mProcessing column [0m [1mk =[0m [1m1[0m
  [1m[43mReduced problem solved:[0m
   [1mH_tilde :
[0m [[ 0.70407319 -0.03621584]
 [ 0.66179647 -1.36212133]
 [ 0.          0.60214518]]
   [1m||b||*e_1 :[0m [1.73205081 0.         0.        ]
   [1mck:[0m [2.18341144 0.88370529]
   [1m[43m||nb*e1-H_tilde@ck||	=[0m 0.6267241610367571
   [1mxk found:[0m [1.20660774 0.66446337 1.91070844]
   [1m[43m||b-A@xk||		=[0m 0.6267241610367571
[1m[42m[30mProcessing column [0m [1mk =[0m [1m2[0m
  [1m[43mReduced problem solved:[0m
   [1mH_tilde :
[0m [[ 0.70407319 -0.03621584  0.5862

<div id='ma2' />

## Matrix $A_2$
[Back to TOC](#toc)

$$
A_2 = \begin{bmatrix}
            1 & 0 & 2\\
            0 & 1 & 3\\
            0 & 0 & 1
        \end{bmatrix}
$$

In [40]:
colorful_GMRes(A2,b,3)

[1m[42m[30mProcessing column [0m [1mk =[0m [1m0[0m
  [1m[43mReduced problem solved:[0m
   [1mH_tilde :
[0m [[2.66666667]
 [1.24721913]]
   [1m||b||*e_1 :[0m [1.73205081 0.        ]
   [1mck:[0m [0.53293871]
   [1m[43m||nb*e1-H_tilde@ck||	=[0m 0.7337993857053426
   [1mxk found:[0m [0.30769231 0.30769231 0.30769231]
   [1m[43m||b-A@xk||		=[0m 0.7337993857053426
[1m[42m[30mProcessing column [0m [1mk =[0m [1m1[0m
  [1m[43mReduced problem solved:[0m
   [1mH_tilde :
[0m [[ 2.66666667 -2.22717702]
 [ 1.24721913 -0.66666667]]
   [1m||b||*e_1 :[0m [1.73205081 0.        ]
   [1mck:[0m [-1.15470054 -2.1602469 ]
   [1m[43m||nb*e1-H_tilde@ck||	=[0m 4.440892098500626e-16
   [1mxk found:[0m [-1. -2.  1.]
   [1m[43m||b-A@xk||		=[0m 4.440892098500626e-16
[1m[41m[30m####################################################################################[0m
[1m[42m[30mGMRes finished in only [0m [1m2[0m [1m[42m[30miterations!!![0m
[1m[41m[30m#

<div id='ma3' />

## Matrix $A_3$
[Back to TOC](#toc)

$$
A_3 = \begin{bmatrix}
            2 & 0 & 0\\
            0 & 2 & 0\\
            0 & 0 & 2
        \end{bmatrix}
$$

In [41]:
colorful_GMRes(A3,b,3)

[1m[42m[30mProcessing column [0m [1mk =[0m [1m0[0m
  [1m[43mReduced problem solved:[0m
   [1mH_tilde :
[0m [[2.]]
   [1m||b||*e_1 :[0m [1.73205081]
   [1mck:[0m [0.8660254]
   [1m[43m||nb*e1-H_tilde@ck||	=[0m 0.0
   [1mxk found:[0m [0.5 0.5 0.5]
   [1m[43m||b-A@xk||		=[0m 0.0
[1m[41m[30m####################################################################################[0m
[1m[42m[30mGMRes finished in only [0m [1m1[0m [1m[42m[30miterations!!![0m
[1m[41m[30m####################################################################################[0m
[1m[43m
GMRes approximation	:[0m [0.5 0.5 0.5]
[1m[43mnp.linalg.solve		:[0m [0.5 0.5 0.5]


<div id='uncolorfulgmres' />

## With a widget but lossing the colors..., nevertheless it is useful for looking at different values of $m$
[Back to TOC](#toc)

In [42]:
Matrices=(A0,A1,A2,A3)
matrix_widget = widgets.Dropdown(
    options=[('A0',0),('A1',1),('A2',2),('A3',3)],
    value=0,
    description='Matrix:',
)

interact(lambda i: colorful_GMRes(Matrices[i],b,m), i=matrix_widget, m=(1,3,1))

interactive(children=(Dropdown(description='Matrix:', options=(('A0', 0), ('A1', 1), ('A2', 2), ('A3', 3)), va…

<function __main__.<lambda>(i)>

<div id='acknowledgements' />

# Acknowledgements
[Back to TOC](#toc)

* _Material created by professor Claudio Torres_ (`ctorres@inf.utfsm.cl` and `claudio.torres@usm.cl`). _DI UTFSM. June 2024._