# Conjugate gradients
When you need to solve a system of linear equations, [conjugate gradients](https://en.wikipedia.org/wiki/Conjugate_gradient_method) present a fast way.

The idea is quite clever. Take a function $F=L2(X’, X^0)^2$, that is a squared $L2$ norm of [not necessarily correct] solution $X’$ and the real solution $X^0$. The function has parabolic shape and under the best scenario its minimum is located at origin (which would also be the solution). In such case, we may travel from $X’$ perpendicularly to each axis, one by one, to arrive into $X^0$.

While this is not usually the case, real scenarios are not that much different. In contrast to the best scenario, conjugate gradients travel perpendicularly to eigenvectors which results in the same effect as described above.

Just remember that your matrix has to be positive semidefinite. If that’s not the case, use transformation `(A’A)x=(A’b)`. That will handle overdetermined and underdetermined systems, too.

In [1]:
import numpy as np

## algorithm

In [2]:
def conjugate_gradients(A, b):
    x = np.zeros(A.shape[1])
    residuals = b - A @ x
    direction = residuals
    error = residuals.T @ residuals

    # step along conjugate directions
    while error > 1e-8:
        x += direction * error / (direction.T @ A @ direction)
        residuals = b - A @ x
        error1 = error
        error = residuals.T @ residuals
        direction = residuals + error / error1 * direction

    return x

## run

In [3]:
A = np.random.rand(5, 3)
b = np.random.rand(5)

print('A')
print(A)
print('b')
print(b)
print('x')

# make system positive semidefinite
print(conjugate_gradients(A.T @ A, A.T @ b))

A
[[ 0.72202644  0.70073834  0.09483571]
 [ 0.97657699  0.41447392  0.96942563]
 [ 0.35596098  0.91433461  0.52105508]
 [ 0.66857021  0.48664146  0.46385892]
 [ 0.65044125  0.98123656  0.55633599]]
b
[ 0.97013697  0.76971217  0.92539487  0.10820739  0.98406079]
x
[ 0.26730959  0.87199275 -0.04536451]
