# Cholesky QR


The main downside to classical QR algorithms is that they are sequential. As we have [seen](../Background/cost-of-numerical-linear-algebra.ipynb), we get a much higher flop rate with matrix-multiplication than with matrix factorization. 


```{prf:algorithm} CholeskyQR
:label: cholesky-qr

**Input:** $\vec{A}\in\R^{n\times d}$

1. Form $\vec{X} = \vec{A}^\T\vec{A}$
1. Compute Choleksy factorization $\vec{R} = \Call{chol}(\vec{X})$
1. Form $\vec{Q} = \vec{A}\vec{R}^{-1}$

**Output:** $\vec{Q}, \vec{R}$
```

Note that the cost of this algorithm is dominated by the matrix-matrix product in the first line, which costs $O(nd^2)$ operations. 
Thus, we might hope that this algorithm runs faster in practice than standard QR factorization algorithms.

This algorithm is mathematically exact; i.e. in exact arithmetic, it will produce a true QR factorization.

```{prf:theorem}
The output of {prf:ref}`cholesky-qr` is a QR factorization of $\vec{A}$, i.e., $\vec{A} = \vec{Q}\vec{R}$ where $\vec{Q}$ is orthogonal and $\vec{R}$ is upper triangular.
```

```{prf:proof}
By construction $\vec{R}$ is upper triangular and $\vec{A} = \vec{Q}\vec{R}$.
Since $\vec{R}$ is the Cholesky factorization of $\vec{A}^\T\vec{A}$, we have that $\vec{R}^\T\vec{R} = \vec{A}^\T\vec{A}$.
This means that $\vec{Q}^\T \vec{Q} = \vec{R}^{-\T}\vec{A}^\T\vec{A}\vec{R}^{-1} = \vec{R}^{-\T}\vec{R}^{-\T}\vec{R}^\T\vec{R}\vec{R}^{-1} = \vec{I}$, so $\vec{Q}$ is orthogonal.
```

However, the presence of the Gram matrix $\vec{A}^\T\vec{A}$ is worrying numerically, since $\cond(\vec{A}^\T\vec{A}) = \cond(\vec{A})^2$.


## Numerical Experiment

Let's try to understand the performance of Cholesky QR relative to 

In [77]:
import numpy as np
import scipy as sp
import time
import pandas as pd

In [89]:
# Generate a random matrix with controlled condition number
n = 5000
d = 300

U, s, Vt = np.linalg.svd(np.random.rand(n, d), full_matrices=False)
s = np.geomspace(1e-4, 1, d)  # Controlled singular values for numerical stability
A = U @ np.diag(s) @ Vt

In [90]:
def cholesky_QR(A):
    """
    QR factorization using Cholesky decomposition
    """
    R = np.linalg.cholesky(A.T @ A).T
    Q = sp.linalg.solve_triangular(R.T, A.T, lower=True).T
    return Q, R

In [91]:
# Define QR factorization methods and their theoretical flop counts
qr_methods = {
    'Householder QR': {
        'func': lambda: np.linalg.qr(A,mode='reduced'),
        'flops': 2 * n * d**2 - (2/3) * d**3
    },
    'Cholesky QR': {
        'func': lambda: cholesky_QR(A),
        'flops': 2 * n * d**2 + d**3/3  # A.T@A + Cholesky + triangular solve
    }
}

In [104]:
# Time the QR factorization methods
n_repeat = 10  # Number of repetitions for averaging

results = []

for method_name, method_info in qr_methods.items():
    # Time the method
    start = time.time()
    for _ in range(n_repeat):
        Q, R = method_info['func']()
    end = time.time()
    
    avg_time = (end - start) / n_repeat
    
    # Compute accuracy metrics
    results.append({
        'method': method_name,
        'time (s)': avg_time,
        'flops/s': method_info['flops'] / avg_time,
        'orthogonality': np.linalg.norm(Q.T @ Q - np.eye(d)),
        'reconstruction': np.linalg.norm(A - Q @ R)
    })

# Create DataFrame and compute relative performance
results_df = pd.DataFrame(results)
results_df['speedup'] = results_df['time (s)'] / results_df['time (s)'].max()

# Display results with formatting
results_df.style.format({
    'time (s)': '{:.4f}',
    'flops/s': '{:.1e}',
    'orthogonality': '{:1.1e}',
    'reconstruction': '{:1.1e}',
    'speedup': '{:.1f}x',
})

Unnamed: 0,method,time (s),flops/s,orthogonality,reconstruction,speedup
0,Householder QR,0.1485,5900000000.0,7.3e-15,2.8e-15,1.0x
1,Cholesky QR,0.0299,30000000000.0,4.1e-09,6.8e-16,0.2x


As expected, Cholesky QR is much faster than the standard Householder QR factorization.
However, the $\vec{Q}$ matrix produced is much less orthogonal!
In the next section, we will explore how RandNLA can be used to produce a more accurate approximation, *while maintaining the efficiency of Cholesky QR*.