# Gram-Schmidt orthogonalization

One of the fundamental ways to factor an $m\times n$ matrix $A$ is as a product $A=QR$, where $Q$ is orthogonal and $R$ is upper triangular. A straightforward derivation of the factorization is by the Gram-Schmidt algorithm. However, there are two different ways to structure this algorithm, and while they are mathematically equivalent, they have different behaviors computationally. 

## Classical Gram-Schmidt

Using the columnwise interpretation of the product $A=QR$ leads to the following expression:
$$
a_j = Q r_j = \sum_{k=1}^j R_{kj} q_k.
$$
We have applied the upper triangularity of $R$ here to truncate the sum. Rearranging,
$$
R_{jj} q_j = a_j - \sum_{k=1}^{j-1} R_{kj} q_k.
$$
Using the unitariness of $Q$, we get for all $k<j$,
$$
0 = q_k^* a_j - R_{kj}.
$$
This is used to fill in the $j$th column of $R$ above the diagonal. Then
$$
R_{jj} = \left\| a_j - \sum_{k=1}^{j-1} R_{kj} q_k\right\|_2,
$$
and finally we get $q_j$ as well.  

In [1]:
function cgs(A)
    m,n = size(A)
    Q = zeros(m,n)
    R = zeros(n,n)
    for j = 1:n
        v = A[:,j]
        for k = 1:j-1
            R[k,j] = dot(Q[:,k],A[:,j])
            v -= R[k,j]*Q[:,k]
        end
        R[j,j] = norm(v)
        Q[:,j] = v/R[j,j]
    end
    return Q,R
end    

cgs (generic function with 1 method)

In [2]:
using LinearAlgebra
A = rand(0.:9.,6,4)

6×4 Array{Float64,2}:
 3.0  7.0  1.0  1.0
 5.0  4.0  9.0  7.0
 1.0  5.0  4.0  3.0
 5.0  9.0  7.0  3.0
 4.0  1.0  2.0  9.0
 6.0  8.0  1.0  6.0

In [3]:
Q,R = cgs(A);
display(Q), display(R);

6×4 Array{Float64,2}:
 0.283473    0.43367    -0.274145  -0.0758888 
 0.472456   -0.326169    0.620429  -0.171191  
 0.0944911   0.50941     0.336617   0.762969  
 0.472456    0.357931    0.277079  -0.351495  
 0.377964   -0.561939   -0.137569   0.509132  
 0.566947    0.0464211  -0.575242   0.00693307

4×4 Array{Float64,2}:
 10.583  13.5122   9.5436    12.0949 
  0.0     7.30887  0.963848  -4.02642
  0.0     0.0      7.74536    1.22038
  0.0     0.0      0.0        4.58399

In [4]:
norm(Q'*Q-I)

1.581854752060599e-15

## Modified Gram-Schmidt

Suppose instead we turn to the outer-product interpretation of the factorization. We'll use real matrices to avoid some distracting conjugates in the complex case. Let's write
$$
A = \begin{bmatrix} a_1 & \cdots & a_n \end{bmatrix},
\qquad Q = \begin{bmatrix} q_1 & \cdots & q_n \end{bmatrix},
\qquad R = \begin{bmatrix} r_1^T \\ \vdots \\ r_n^T \end{bmatrix},
$$
from which we have 
$$
A = \sum_{k=1}^n q_k r_k^T.
$$
Consider the rows $r_k^T$ appearing in the outer product sum. Of these, only $k=1$ has a nonzero in the first column, because of the triangular structure of $R$. So if we restrict attention to the first column only, we conclude
$$
a_1 = R_{11}q_1.
$$
As in CGS, this can be used to find $R_{11}$ (take the 2-norm) and $q_1$. Multiplying the original sum through by $q_1^T$ and applying orthogonality gives
$$
q_1^TA =  r_1^T.
$$
This is used to fill in the rest of the first row of $R$; columnwise it is the same inner product as in CGS:
$$
R_{1j} = q_1^T a_j,
$$
valid for $j>1$. 

In [5]:
Q=zero(Q); R=zero(R);
m,n = size(A);
R[1,1] = norm(A[:,1]);
Q[:,1] = A[:,1]/R[1,1];
for j=2:n
    R[1,j] = dot(Q[:,1],A[:,j])
end
R

4×4 Array{Float64,2}:
 13.4164  12.3729  14.5344  5.73924
  0.0      0.0      0.0     0.0    
  0.0      0.0      0.0     0.0    
  0.0      0.0      0.0     0.0    

Now consider the following rearrangement of the outer product sum:
$$
A - q_1 r_1^T = \sum_{k=2}^n q_k r_k^T.
$$
In all the rows of $R$ appearing on the right, only $k=2$ has a nonzero in column 2. So the second column of the left-hand side is just $R_{22}q_2$. This lets us get both of these unknowns. 

In [6]:
B = A - Q[:,1]*R[1,:]';
R[2,2] = norm(B[:,2]);
Q[:,2] = B[:,2]/R[2,2]
R

4×4 Array{Float64,2}:
 13.4164  12.3729   14.5344  5.73924
  0.0      5.46911   0.0     0.0    
  0.0      0.0       0.0     0.0    
  0.0      0.0       0.0     0.0    

Now a left-multiplication of both sides by $q_2^T$ isolates the second row of $R$ on the right, etc.  The reasoning repeats iteratively (or recursively, or inductively, depending on your point of view). 

In [5]:
function mgs(A)
    m,n = size(A)
    Q = zeros(eltype(A),m,n)
    R = zeros(eltype(A),n,n)
    B = copy(A)
    for k = 1:n
        R[k,k] = norm(B[:,k])
        Q[:,k] = B[:,k]/R[k,k]
        for j = k+1:n
            R[k,j] = dot(Q[:,k],B[:,j])
        end
        B -= Q[:,k]*R[k,:]'
    end
    return Q,R
end    

mgs (generic function with 1 method)

In [6]:
Q,R = mgs(A);
display(Q), display(R);

6×4 Array{Float64,2}:
 0.283473    0.43367    -0.274145  -0.0758888 
 0.472456   -0.326169    0.620429  -0.171191  
 0.0944911   0.50941     0.336617   0.762969  
 0.472456    0.357931    0.277079  -0.351495  
 0.377964   -0.561939   -0.137569   0.509132  
 0.566947    0.0464211  -0.575242   0.00693307

4×4 Array{Float64,2}:
 10.583  13.5122   9.5436    12.0949 
  0.0     7.30887  0.963848  -4.02642
  0.0     0.0      7.74536    1.22038
  0.0     0.0      0.0        4.58399

In [7]:
@show norm(Q'*Q-I);
@show norm(A-Q*R);

norm(Q' * Q - I) = 7.838469895773836e-16
norm(A - Q * R) = 2.059156955057807e-15


Mathematically, there is no difference between CGS and MGS. However, they do arithmetic operations in different orders, and we will find that that can make a huge difference on a computer.