# Overview

**Last time**, we estimated house prices using a linear model based on the dot product as follows:

$Xc = y$

* $X$ (known) is a matrix with house features (from DataFrame)
* $c$ (known) is a vector of coefficients (our model parameters)
* $y$ (computed) are the prices

**This time**, what if X and y are know, and we want to find c?

In [1]:
import pandas as pd
import numpy as np

In [2]:
houses = pd.read_csv("houses.csv")
houses

Unnamed: 0,beds,baths,year,price
0,2,1,1985,196.55
1,3,1,1998,260.56
2,4,3,2005,334.55
3,4,2,2020,349.6


If we assume price is linearly based on the features, with this equation:

* $beds*c_0 + baths*c_1 + year*c_2 + 1*c_3 = y$

Then we get four equations:

* $2*c_0 + 1*c_1 + 1985*c_2 + 1*c_3 = 196.55$
* $3*c_0 + 1*c_1 + 1998*c_2 + 1*c_3 = 260.56$
* $4*c_0 + 3*c_1 + 2005*c_2 + 1*c_3 = 334.55$
* $4*c_0 + 2*c_1 + 2020*c_2 + 1*c_3 = 349.60$

In [5]:
X = houses.values[:, :-1]
X

array([[2.000e+00, 1.000e+00, 1.985e+03],
       [3.000e+00, 1.000e+00, 1.998e+03],
       [4.000e+00, 3.000e+00, 2.005e+03],
       [4.000e+00, 2.000e+00, 2.020e+03]])

In [11]:
X = np.concatenate((X, np.ones(4).reshape(-1, 1)), axis=1)
X

array([[2.000e+00, 1.000e+00, 1.985e+03, 1.000e+00],
       [3.000e+00, 1.000e+00, 1.998e+03, 1.000e+00],
       [4.000e+00, 3.000e+00, 2.005e+03, 1.000e+00],
       [4.000e+00, 2.000e+00, 2.020e+03, 1.000e+00]])

In [14]:
y = houses.values[:, -1:]
y

array([[196.55],
       [260.56],
       [334.55],
       [349.6 ]])

In [18]:
# X @ c = y
# numpy can solve for c, given X and y
c = np.linalg.solve(X, y)
c

array([[ 4.230e+01],
       [ 1.000e+01],
       [ 1.670e+00],
       [-3.213e+03]])

In [21]:
X[0:1, :] @ c

array([[196.55]])

In [22]:
X @ c

array([[196.55],
       [260.56],
       [334.55],
       [349.6 ]])

In [25]:
list(c.reshape(-1))

[42.29999999999957, 9.99999999999991, 1.670000000000043, -3213.0000000000846]