### Solving an overdetermined system using  pseudo inverse

Consider the overdetermined system corresponding to cat-brain from Chapter 2.

There are 15 training examples, each with input and desired outputs specified.

Our goal is to determine 3 unkwnowns (w0, w1, b).

This can be cast as an over-determined system of equations
$$
A\vec{w} = \vec{y}
$$
where
$$ 
A =
\begin{bmatrix}
        0.11 & 0.09 & 1.00 \\
        0.01 & 0.02 & 1.00 \\
        0.98 & 0.91 & 1.00 \\
        0.12 & 0.21 & 1.00 \\
        0.98 & 0.99 & 1.00 \\
        0.85 & 0.87 & 1.00 \\
        0.03 & 0.14 & 1.00 \\
        0.55 & 0.45 & 1.00 \\
        0.49 & 0.51 & 1.00 \\
        0.99 & 0.01 & 1.00 \\
        0.02 & 0.89 & 1.00 \\
        0.31 & 0.47 & 1.00 \\
        0.55 & 0.29 & 1.00 \\
        0.87 & 0.76 & 1.00 \\
        0.63 & 0.24 & 1.00
\end{bmatrix}
\;\;\;\;\;\;\;
\vec{y} = 
\begin{bmatrix}
        -0.8 \\
        -0.97 \\
        0.89 \\ 
        -0.67 \\ 
        0.97 \\ 
        0.72 \\ 
        -0.83 \\ 
        0.00 \\
        0.00 \\
        0.00 \\
        -0.09 \\
        -0.22 \\ 
        -0.16 \\
        0.63 \\
        0.37
\end{bmatrix}
\;\;\;\;\;\;\;
\vec{w} = \begin{bmatrix} w_{0}\\w_{1}\\b\end{bmatrix}
$$

We solve for $\vec{w}$ using the pseudo inverse formula $\space\space\large{\vec{w} = (A^TA)^{-1}A^Ty}$

In [1]:
import torch
# Let us revisit our cat brain data set
# Notice that there are 15 training examples, with 3
# unkwnowns (w0, w1, b).
# This is an over determined system.
# It can be easily seen that the solution is roughly
# $w_{0} = 1, w_{1} = 1, b = -1$.
# It has been deliberately chosen as such.
# But the equations are not fully consistent (i.e., there is
# no solution that satisfies all the equations).
# We want to find the best values such that it minimizes Aw - b.
# This is what the pseudo-inverse does.

def pseudo_inverse(A):
    return torch.matmul(torch.linalg.inv(torch.matmul(A.T, A)), A.T)

# The usual cat-brain input dataset
X = torch.tensor([[0.11, 0.09], [0.01, 0.02], [0.98, 0.91], [0.12, 0.21],
              [0.98, 0.99], [0.85, 0.87], [0.03, 0.14], [0.55, 0.45],
              [0.49, 0.51], [0.99, 0.01], [0.02, 0.89], [0.31, 0.47],
              [0.55, 0.29], [0.87, 0.76], [0.63, 0.24]])

# Output threat score modeled as a vector
y = torch.tensor([-0.8, -0.97, 0.89, -0.67, 0.97, 0.72, -0.83, 0.00, 0.00,
              0.00, -0.09, -0.22, -0.16, 0.63, 0.37])
A = torch.column_stack((X, torch.ones(15)))
print(A)

# Column stack will add an additional column of 1s to the training
# dataset to represent the coefficient of the bias
A_pi = pseudo_inverse(A)
print(A_pi)
w = torch.matmul(A_pi, y)

print("The solution is {}\n"
      "Note that this is almost equal to [1.0, 1.0, -1.0])".format(w))

tensor([[0.1100, 0.0900, 1.0000],
        [0.0100, 0.0200, 1.0000],
        [0.9800, 0.9100, 1.0000],
        [0.1200, 0.2100, 1.0000],
        [0.9800, 0.9900, 1.0000],
        [0.8500, 0.8700, 1.0000],
        [0.0300, 0.1400, 1.0000],
        [0.5500, 0.4500, 1.0000],
        [0.4900, 0.5100, 1.0000],
        [0.9900, 0.0100, 1.0000],
        [0.0200, 0.8900, 1.0000],
        [0.3100, 0.4700, 1.0000],
        [0.5500, 0.2900, 1.0000],
        [0.8700, 0.7600, 1.0000],
        [0.6300, 0.2400, 1.0000]])
tensor([[-0.1298, -0.1709,  0.1600, -0.1621,  0.1342,  0.0900, -0.1969,  0.0344,
         -0.0232,  0.4569, -0.4454, -0.1250,  0.0861,  0.1383,  0.1532],
        [-0.1494, -0.1697,  0.1850, -0.0626,  0.2450,  0.1969, -0.0861, -0.0214,
          0.0430, -0.4936,  0.4799,  0.0711, -0.1414,  0.1079, -0.2048],
        [ 0.1997,  0.2295, -0.0977,  0.1762, -0.1122, -0.0682,  0.2043,  0.0592,
          0.0586,  0.0639,  0.0699,  0.0966,  0.0883, -0.0517,  0.0837]])
The solution is tensor([ 1