# Gram-Schmidt orthogonalization

One of the fundamental ways to factor an $m\times n$ matrix $A$ is as a product $A=QR$, where $Q$ is orthogonal and $R$ is upper triangular. A straightforward derivation of the factorization is by the Gram-Schmidt algorithm. However, there are two different ways to structure this algorithm, and while they are mathematically equivalent, they have different behaviors computationally. 

## Classical Gram-Schmidt

Using the columnwise interpretation of the product $A=QR$ leads to the following expression:
$$
a_j = Q r_j = \sum_{k=1}^j R_{kj} q_k.
$$
We have applied the upper triangularity of $R$ here to truncate the sum. Rearranging,
$$
R_{jj} q_j = a_j - \sum_{k=1}^{j-1} R_{kj} q_k.
$$
Using the unitariness of $Q$, we get for all $k<j$,
$$
0 = q_k^* a_j - R_{kj}.
$$
This is used to fill in the $j$th column of $R$ above the diagonal. Then
$$
R_{jj} = \left\| a_j - \sum_{k=1}^{j-1} R_{kj} q_k\right\|_2,
$$
and finally we get $q_j$ as well.  

In [1]:
type cgs  


function [Q,R] = cgs(A)
    [m,n] = size(A);
    Q = zeros(m,n);
    R = zeros(n,n);
    for j = 1:n
        v = A(:,j);
        for k = 1:j-1
            R(k,j) = Q(:,k)'*A(:,j);
            v = v - R(k,j)*Q(:,k);
        end
        R(j,j) = norm(v);
        Q(:,j) = v/R(j,j);
    end
end  


In [2]:
A = randi(10,6,4)

A =
     9     3    10     8
    10     6     5    10
     2    10     9     7
    10    10     2     1
     7     2     5     9
     1    10    10    10


In [3]:
[Q,R] = cgs(A)

Q =
    0.4917   -0.2328    0.6065   -0.5446
    0.5464   -0.0650   -0.1048    0.5508
    0.1093    0.6259    0.1908   -0.1309
    0.5464    0.2254   -0.6638   -0.3649
    0.3825   -0.2052    0.2193    0.4377
    0.0546    0.6760    0.3100    0.2412
R =
   18.3030   12.6209   12.1838   14.6970
         0   13.7736    9.1646    7.0069
         0         0   10.1275    9.5502
         0         0         0    6.2205


In [4]:
norm(Q'*Q-eye(4))

ans =
   4.7453e-16


## Modified Gram-Schmidt

Suppose instead we turn to the outer-product interpretation of the factorization. Let's write
$$
A = \begin{bmatrix} a_1 & \cdots & a_n \end{bmatrix},
\qquad Q = \begin{bmatrix} q_1 & \cdots & q_n \end{bmatrix},
\qquad R = \begin{bmatrix} r_1^* \\ \vdots \\ r_n^* \end{bmatrix},
$$
from which we have 
$$
A = \sum_{k=1}^n q_k r_k^*.
$$
Consider the rows $r_k^*$ appearing in the outer product sum. Of these, only $k=1$ has a nonzero in the first column, because of the triangular structure of $R$. So if we restrict attention to the first column only, we conclude
$$
a_1 = R_{11}q_1.
$$
As in CGS, this can be used to find $R_{11}$ (take the 2-norm) and $q_1$. Multiplying the original sum through by $q_1^*$ and applying orthogonality gives
$$
q_1^*A =  r_1^*.
$$
This is used to fill in the rest of the first row of $R$; columnwise it is the same inner product as in CGS:
$$
R_{1j} = q_1^* a_j,
$$
valid for $j>1$. 

In [5]:
[m,n] = size(A);
Q=zeros(m,n); R=zeros(n,n);
R(1,1) = norm(A(:,1));
Q(:,1) = A(:,1)/R(1,1);
for j=2:n
    R(1,j) = Q(:,1)'*A(:,j);
end
R

R =
   18.3030   12.6209   12.1838   14.6970
         0         0         0         0
         0         0         0         0
         0         0         0         0


Now consider the following rearrangement of the outer product sum:
$$
A - q_1 r_1^* = \sum_{k=2}^n q_k r_k^*.
$$
In all the rows of $R$ appearing on the right, only $k=2$ has a nonzero in column 2. So the second column of the left-hand side is just $R_{22}q_2$. This lets us get both of these unknowns. 

In [6]:
B = A - Q(:,1)*R(1,:);
R(2,2) = norm(B(:,2));
Q(:,2) = B(:,2)/R(2,2)
R

Q =
    0.4917   -0.2328         0         0
    0.5464   -0.0650         0         0
    0.1093    0.6259         0         0
    0.5464    0.2254         0         0
    0.3825   -0.2052         0         0
    0.0546    0.6760         0         0
R =
   18.3030   12.6209   12.1838   14.6970
         0   13.7736         0         0
         0         0         0         0
         0         0         0         0


Now a left-multiplication of both sides by $q_2^*$ isolates the second row of $R$ on the right, etc.  The reasoning repeats iteratively (or recursively, or inductively, depending on your point of view). 

In [7]:
type mgs    


function [Q,R] = mgs(A)
    [m,n] = size(A);
    Q = zeros(m,n);
    R = zeros(n,n);
    for k = 1:n
        R(k,k) = norm(A(:,k));
        Q(:,k) = A(:,k)/R(k,k);
        for j = k+1:n
            R(k,j) = Q(:,k)'*A(:,j);
        end
        A = A - Q(:,k)*R(k,:);
    end
end


In [8]:
[Q,R] = mgs(A)

Q =
    0.4917   -0.2328    0.6065   -0.5446
    0.5464   -0.0650   -0.1048    0.5508
    0.1093    0.6259    0.1908   -0.1309
    0.5464    0.2254   -0.6638   -0.3649
    0.3825   -0.2052    0.2193    0.4377
    0.0546    0.6760    0.3100    0.2412
R =
   18.3030   12.6209   12.1838   14.6970
         0   13.7736    9.1646    7.0069
         0         0   10.1275    9.5502
         0         0         0    6.2205


In [9]:
norm(Q'*Q-eye(4))
norm(A-Q*R)

ans =
   2.7271e-16
ans =
   2.5712e-15


Mathematically, there is no difference between CGS and MGS. However, they do arithmetic operations in different orders, and we will find that that can make a huge difference on a computer.