# Calculating SSE

<p>
<font color='#21618c'>
<b>This notebook is intended to help you write the <code>find_sse</code> method for HW P01. Before jumping into an example, we need to review a few concepts relating to numpy arrays.</b></font>

### Multiplying 1D and 2D numpy arrays
<p>
<font color='#21618c'><b>
Assume that we have defined the following 1D and 2D numpy arrays, with shapes <code>(2,)</code> and <code>(3,2)</code>:
</font>

`a = np.array([2,5])`

`X = np.array([[1,2],[3,4],[5,6]])`

<p>
<font color='#21618c'><b>
The result of multiplying these two arrays will be to multiply every row of <code>X</code> by <code>a</code> using standard numpy (entry-wise) multiplication. The result will be another <code>(3,2)</code> array. Note that this is NOT the usual form of matrix multiplication used in mathematics.
</b></font>

`a * X` 
$ = \begin{bmatrix} 2 & 5 \end{bmatrix} * \begin{bmatrix} 1&2 \\ 3&4 \\ 5&6 \end{bmatrix}$
$ = \begin{bmatrix} 2(1)&5(2) \\ 2(3)&5(4) \\ 2(5)&5(6) \end{bmatrix}$
$ = \begin{bmatrix} 2&10 \\ 6&20 \\ 10&30 \end{bmatrix}$
`= [[2,10], [6,20], [10,30]]`

<p>
<font color='#21618c'><b>
To perform a multiplication of this sort, the number of entries in <code>a</code> must be equal to the number of columns in <code>X</code>. Otherwise, we would get an error. 
</b></font>

<p>
<font color='#21618c'><b>
If we pass an array of any size/shape to the function <code>np.sum()</code>, the result will be a single number that is equal to the sum of all of the entries. 
</b></font>


`np.sum(a * X) = 2 + 10 + 6 + 20 + 10 + 30 = 78`
 
<p>
<font color='#21618c'><b>
However, if we specify the parameter <code>axis = 1</code> in the function <code>np.sum()</code>, then the sums will be performed across the rows only. The result will be a 1D numpy array that contains the row sums. 
</b></font>

 
`np.sum(a * X, axis=1) `
$ = \begin{bmatrix} 2 + 10 \\ 6 + 20 \\ 10 + 30 \end{bmatrix}$
$ = \begin{bmatrix} 12 \\ 26 \\ 40 \end{bmatrix}$
`= [12  26  40]` 

<p>
<font color='#21618c'><b>
If we had specified the <code>axis = 0</code>, we would get columm sums instead. 
</b></font>

<p>
<font color='#21618c'><b>
We demonstrate this example in the code cells below. 
</b></font>



In [1]:
import numpy as np

a = np.array([2,5])
X = np.array([[1,2],[3,4],[5,6]])

print('a = ', a) 
print('X = \n', X) 

a =  [2 5]
X = 
 [[1 2]
 [3 4]
 [5 6]]


In [2]:
print(a * X)

[[ 2 10]
 [ 6 20]
 [10 30]]


In [3]:
print(np.sum(a * X))

78


In [4]:
print(np.sum(a * X, axis = 1))

[12 26 40]


## Finding SSE

<p>
<font color='#21618c'><b>
We will now discuss how to calculate SSE for a proposed (not necessarily optimal) regression model. We will illustrate the process using Model 1 from HW W02 as an example. 
</b> </font>

<p>
<font color='#21618c'><b>
Assume we have the training data described below. There are two features, which are stored in the columns of <code>X</code>. The labels are stored in <code>y</code>. For the purpose of discussion, assume the feature stored in the first column of <code>X</code> is named $x^{(1)}$ and the feature in the second column is named $x^{(2)}$ 
</b> </font>

`X = np.array([[12,4], [14,3], [16,6], [20,5], [24,2]])` 
$ = \begin{bmatrix} 12 & 4 \\ 14 & 3 \\ 16 & 6 \\ 20 & 5 \\ 24 & 2 \end{bmatrix}$

`y = np.array([50, 53, 67, 70, 63])`
$ = \begin{bmatrix} 50 \\ 53 \\ 67 \\ 70 \\ 63 \end{bmatrix}$


<p>
<font color='#21618c'><b>
Assume we want to score the following model by calculating its SSE on the training data:
</b></font> 

$\hat{y} = 12 + 1.5 x^{(1)} + 5 x^{(2)}$

<p>
<font color='#21618c'><b>
We will store the parameters for this model in an array called <code>beta</code>. The individual parameter values will be denoted by $\beta_0$, $\beta_1$, and $\beta_2$, so $beta = [\beta_0, \beta_1, \beta_2]$.
</b></font> 

`beta = np.array([12, 1.5, 5])`

<p>
<font color='#21618c'><b>
Before calculating SSE, we first need to find $\hat{y}$. To do that, we need to perform the following calculation:
</b> </font> 

$\hat{y} = 
\begin{bmatrix} 
\beta_0 + \beta_1 (12) + \beta_2 (4) \\ \beta_0 + \beta_1 (14) + \beta_2 (3) \\ \beta_0 + \beta_1 (16) + \beta_2 (6) \\ 
\beta_0 + \beta_1 (20) + \beta_2 (5) \\ \beta_0 + \beta_1 (24) + \beta_2 (2)
\end{bmatrix} = 
\begin{bmatrix} 12 + 1.5(12) + 5(4) \\ 12 + 1.5(14) + 5(3)  \\ 12 + 1.5(16) + 5(6) \\ 12 + 1.5(20) + 5(5) \\ 12 + 1.5(24) + 5(2) \end{bmatrix} = \begin{bmatrix} 50 \\ 48 \\ 66 \\ 67 \\ 58  \end{bmatrix}$

<p>
<font color='#21618c'><b>
We can perform these calculations using the numpy tools discussed at the beginning of this notebook. We will illustrate the process using a mix of code and mathematical notation, and will summarize the resulting code afterward. 
</b> </font>

$\hat{y} = 
\begin{bmatrix} 
\beta_0 + \beta_1 (12) + \beta_2 (4) \\ \beta_0 + \beta_1 (14) + \beta_2 (3) \\ \beta_0 + \beta_1 (16) + \beta_2 (6) \\ 
\beta_0 + \beta_1 (20) + \beta_2 (5) \\ \beta_0 + \beta_1 (24) + \beta_2 (2)
\end{bmatrix} = 
\begin{bmatrix} \beta_0 \\ \beta_0 \\ \beta_0 \\ \beta_0 \\ \beta_0 \end{bmatrix} +
\begin{bmatrix} 
\beta_1 (12) + \beta_2 (4) \\ \beta_1 (14) + \beta_2 (3) \\ \beta_1 (16) + \beta_2 (6) \\ 
\beta_1 (20) + \beta_2 (5) \\ \beta_1 (24) + \beta_2 (2)
\end{bmatrix}$

`  = beta[0] + np.sum(`
$\begin{bmatrix} 
\beta_1 (12) & \beta_2 (4) \\ \beta_1 (14) & \beta_2 (3) \\ \beta_1 (16) &\beta_2 (6) \\ 
\beta_1 (20) & \beta_2 (5) \\ \beta_1 (24) & \beta_2 (2)
\end{bmatrix}$
`,axis=1)`

` = beta[0] + np.sum(beta[1:] * X, axis=1)`

<p>
<font color='#21618c'><b>
So, the code that we need to find <code>y_hat</code>, given <code>X</code>, <code>y</code>, and <code>beta</code>, is:
</b> </font>

`y_hat = beta[0] + np.sum(beta[1:] * X, axis=1)`

<p>
<font color='#21618c'><b>
Once we have <code>y_hat</code>, we can calculate the residuals and SSE as follows:
</b> </font>

`residuals = y - y_hat`

`sse = np.sum(residuals**2)`

<p>
<font color='#21618c'><b>
The following code cells illustrate this example. 
</b> </font>

In [5]:
# We start by definining X, y, and beta.
X = np.array([[12,4], [14,3], [16,6], [20,5], [24,2]])
y = np.array([50, 53, 67, 70, 63])
beta = np.array([12, 1.5, 5])

# Now we calculate y_hat and sse.
y_hat = beta[0] + np.sum(beta[1:] * X, axis=1)
residuals = y - y_hat
sse = np.sum(residuals**2)

# Print the results
print('y_hat = ', y_hat)
print('residuals = ', residuals)
print('sse = ', sse)

y_hat =  [ 50.  48.  66.  67.  58.]
residuals =  [ 0.  5.  1.  3.  5.]
sse =  60.0


<p>
<font color='#21618c'><b>
In the example we used, there were two features. However, this code would work with any number of features. We would just have to make sure that the number of entries in <code>beta</code> is one greater than the number of features (which is equal to the number of columns in <code>X</code>). 
</b></font>


## Alternate Method: Finding y_hat Using Loops

<p>
<font color='#21618c'><b>
It is possible to calculate the elements of <code>y_hat</code> one at a time, using a for loop. This approach requires a bit more code, and more importantly, runs slower than using numpy operations. However, for completeness, we will also illustrate this method. 
</b></font>


In [6]:
y_hat = []

for i in range(X.shape[0]):
    temp = beta[0] + np.sum(beta[1:]*X[i,:])
    y_hat.append(temp)
    
y_hat = np.array(y_hat)
print(y_hat)

[ 50.  48.  66.  67.  58.]


<p>
<font color='#21618c'><b>
Let's take a closer look at the following line: 
</b></font>

`temp = beta[0] + np.sum(beta[1:]*X[i,:])`

<p>
<font color='#21618c'><b>
For this particular example, we could have written this line as follows:
</b></font>

`temp = beta[0] + beta[1]*X[i,0] + beta[2]*X[i,1]`

<p>
<font color='#21618c'><b>
This second line would work for our example, but would NOT work for an example in which there was 1 feature, or for an example with more than 2 features. The first line is more flexible, and would work with any number of features. 
</b></font>