# When can we solve for c?

Suppose $Xc = y$.

$X$ and $y$ are known, and we want to solve for $c$.

When does `c = np.solve(X, y)` work?

## Review Fruit

### Data

* `10 apples and 0 bananas sold for $7`
* `2 apples and 8 bananas sold for $5`
* `4 apples and 4 bananas sold for $5`

### Equations

* `10*apple + basket == 7`
* `2*apple + 8*banana + basket == 5`
* `4*apple + 4*banana + basket == 5`

### Matrix

In [1]:
import numpy as np

#good
X = np.array([
    [10,0,1],
    [2,8,1],
    [4,4,1],
])
y = np.array([7,5,5]).reshape(-1,1)

c = np.linalg.solve(X, y)
c

array([[0.5 ],
       [0.25],
       [2.  ]])

In [9]:
# how can it go wrong with numpy
X = np.array([
    [10,0,1],
    [2,8,1],
    [4,4,1],
    [1,0,1],
])
y = np.array([7,5,5,2.5]).reshape(-1,1)

#c = np.linalg.solve(X, y) # won't work because it's not square
c = np.array([[0.5 ],
       [0.25],
       [2.  ]])

X @ c # but there is a solution!

array([[7. ],
       [5. ],
       [5. ],
       [2.5]])

In [None]:
# how can it go wrong with mathematically speaking
X = np.array([
    [10,0,1],
    [2,8,1],
    [4,4,1],
    [4,4,1],
])
y = np.array([7,5,5,4.9]).reshape(-1,1)
# 4*apple + 4*banana + 1*basket == 5
# 4*apple + 4*banana + 1*basket == 4.9

### Equivalent Statements

* there's a solution for the system of equations
* there's a solution for $c$ (in $Xc = y$), even if `np.linalg.solve` can't find it
* $y$ is in the column space of $X$

# The Problem

More rows than columns in our dataset means more equations than variables.

This *usually* means that:

The equations aren't solvable, and y isn't in the column space of X.

In [13]:
X = np.array([
    [10,0,1],
    [2,8,1],
    [4,4,1],
    [4,4,1],
])
y = np.array([7,5,5,4.9]).reshape(-1,1)
# c = np.linalg.solve(X, y) # not solvable
c = np.linalg.solve(X.T @ X, X.T @ y)
c # similar to our answer before we added the troublesome row!

array([[0.525],
       [0.275],
       [1.75 ]])