---
title: Cholesky QR
description: High-performance QR factorization using Cholesky decomposition of the Gram matrix for improved computational efficiency
keywords: [Cholesky QR, Gram matrix, matrix multiplication, numerical stability, condition number, QR factorization]
numbering:
  equation:
    enumerator: 3.%s
    continue: true
  proof:theorem:
    enumerator: 3.%s
    continue: true
  proof:algorithm:
    enumerator: 3.%s
    continue: true
  proof:definition:
    enumerator: 3.%s
    continue: true
  proof:proposition:
    enumerator: 3.%s
    continue: true
---

The main downside to classical QR algorithms is that they are sequential. As we have [seen](../01-Background/cost-of-numerical-linear-algebra.ipynb), we get a much higher flop rate with matrix-multiplication than with matrix factorization. 
We can use this to our advantage by computing a QR factorization using the Cholesky factorization of the Gram matrix $\vec{A}^\T\vec{A}$.
Such an approach 


:::{prf:algorithm} CholeskyQR
:label: cholesky-qr

**Input:** $\vec{A}\in\R^{n\times d}$

1. Form $\vec{X} = \vec{A}^\T\vec{A}$
1. Compute Choleksy factorization $\vec{R} = \Call{chol}(\vec{X})$
1. Form $\vec{Q} = \vec{A}\vec{R}^{-1}$

**Output:** $\vec{Q}, \vec{R}$
:::

This algorithm is easily implemented in Numpy.

In [1]:
def cholesky_QR(A):

    X = A.T@A
    R = np.linalg.cholesky(X).T
    Q = sp.linalg.solve_triangular(R.T,A.T,lower=True).T

    return Q,R

The cost of [](cholesky-qr) is dominated by the matrix-matrix product in the first line, which costs $O(nd^2)$ operations. 
Thus, we might hope that this algorithm runs faster in practice than standard QR factorization algorithms, since matrix-matrix multiplication has a very high flop-rate.
In addition, [](cholesky-qr) is mathematically exact; i.e. in exact arithmetic, it will produce a true QR factorization.

:::{prf:theorem}
The output of {prf:ref}`cholesky-qr` is a QR factorization of $\vec{A}$, i.e., $\vec{A} = \vec{Q}\vec{R}$ where $\vec{Q}$ has orthonormal columns and $\vec{R}$ is upper triangular.
:::

:::{prf:proof}
:class: dropdown
:enumerated: false

By construction $\vec{R}$ is upper triangular and $\vec{A} = \vec{Q}\vec{R}$.
Since $\vec{R}$ is the Cholesky factorization of $\vec{A}^\T\vec{A}$, we have that $\vec{R}^\T\vec{R} = \vec{A}^\T\vec{A}$.
This means that 
\begin{equation*}
\vec{Q}^\T \vec{Q} = \vec{R}^{-\T}\vec{A}^\T\vec{A}\vec{R}^{-1} = \vec{R}^{-\T}\vec{R}^\T\vec{R}\vec{R}^{-1} = \vec{I},
\end{equation*}
 so $\vec{Q}$ is orthogonal.
:::



## Numerical Experiment

Let's try to understand the performance of Cholesky QR relative to the default QR factorization method in Numpy.
In addition to the runtime, we will compute the orthogonality and reconstruction errors 
```{math}
\|\vec{Q}^\T\vec{Q} - \vec{I}\|
\quad\text{and}\quad
\|\vec{A} - \vec{Q}\vec{R}\|,
```
which are both zero for a perfect QR factorization.

In [2]:
import numpy as np
import scipy as sp
import time
import pandas as pd

import sys
sys.path.append('../')
from randnla import *

In [3]:
# Generate a random matrix with controlled condition number
n = 5000
d = 300

U,s,Vt = np.linalg.svd(np.random.rand(n,d),full_matrices=False)
s = np.geomspace(1e-4,1,d) # define singular values
A = U@np.diag(s)@Vt

In [4]:
# Define QR factorization methods
qr_methods = [
    {'name':'Householder QR',
     'func': lambda: np.linalg.qr(A,mode='reduced')},
    {'name':'Cholesky QR',
     'func': lambda: cholesky_QR(A)}
]

In [5]:
# Time the QR factorization methods
n_repeat = 10  # Number of repetitions for averaging

results = []

for qr_method in qr_methods:

    method_name = qr_method['name']

    # Time the method
    start = time.time()
    for _ in range(n_repeat):
        Q, R = qr_method['func']()
    end = time.time()
    
    avg_time = (end - start) / n_repeat
    
    # Compute accuracy metrics
    results.append({
        'method': method_name,
        'time (s)': avg_time,
        'orthogonality': np.linalg.norm(Q.T @ Q - np.eye(d)),
        'reconstruction': np.linalg.norm(A - Q @ R)
    })

# Create DataFrame and compute relative performance
results_df = pd.DataFrame(results)
results_df['speedup'] = results_df['time (s)'].max() / results_df['time (s)']

# Display results with formatting
results_df.reindex(columns=['method','time (s)','speedup','orthogonality','reconstruction']).style.format({
    'time (s)': '{:.4f}',
    'speedup': '{:.1f}x',
    'orthogonality': '{:1.1e}',
    'reconstruction': '{:1.1e}',
})

Unnamed: 0,method,time (s),speedup,orthogonality,reconstruction
0,Householder QR,0.0909,1.0x,7e-15,3e-15
1,Cholesky QR,0.0319,2.8x,4.3e-09,6.7e-16


As expected, Cholesky QR is much faster than the standard Householder QR factorization.

## Too good to be true?

While CholeskyQR is faster than a standard QR method, our numerical experiment reveals that the $\vec{Q}$ matrix produced by Cholesky SQR is much less orthogonal!
The numerical instability can be traced back to presence of the Gram matrix $\vec{A}^\T\vec{A}$. 
Indeed, $\cond(\vec{A}^\T\vec{A}) = \cond(\vec{A})^2$, so by forming the Gram matrix we end up making the conditioning way worse 💔.

In the next section, we will explore how RandNLA can be used to produce a more accurate approximation, *while maintaining the efficiency of Cholesky QR*.