<a href="https://colab.research.google.com/github/werowe/HypatiaAcademy/blob/master/class/calculate_linear_regression_manually.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Algorithm for Least Squares with Zero Intercept

1. **Initialize the Coefficient**:
   - Start with a guess for the slope ($ b $).
   - For example, $ b = 1 $.

2. **Compute Predictions**:
   - Use the current slope ($ b $) to predict $ \hat{y}_i = b \cdot x_i $.

3. **Calculate Error**:
   - Compute the Sum of Squared Errors (SSE):
$$
\text{SSE} = \sum_{i=1}^n (y_i - \hat{y}_i)^2 = \sum_{i=1}^n (y_i - b \cdot x_i)^2
$$


4. **Adjust Coefficient**:
   - If SSE decreases, increase $ b $ slightly.
   - If SSE increases, decrease $ b $ slightly.
   - Use a small step size ($ \Delta b $) for adjustments.

5. **Repeat Until Convergence**:
   - Stop when SSE stabilizes or changes by less than a small threshold (e.g., $ 0.001 $).




In [15]:

import numpy as np

import math

def calc_error(old, new):
    return math.sqrt((old - new)**2)


# Vectorize the function so it can run for every element in the two arrays
vectorized_calc_error = np.vectorize(calc_error)



# Learning Rate

```
    We change the coefficient by subtracting the learning rate.
    
    In this mocked example the initial guess starting point coefficient
    is higher than the actual one, as we just
    added 1 to it to give us a starting value.
    So we subtract the learning rate.  if it had ben lower we could
    we would add the learning rate (lr) instead.
```

In [18]:

#turn off scientific notation

np.set_printoptions(suppress=True)


# actual coefficient (only works for positive numbers)
actual_coeff =  840


# input array
x = np.array([0,1,2,3,4,6,7,8,9])

# actual output
y = actual_coeff * x


# first guess
guess_coeff = actual_coeff + 1


# learning rate
lr = 0.1

cnt = 0

#init mse by adding 2 to actual coefficient
old_mse = guess_coeff + 2

#while new_mse - old_mse > 0.01:

print("starting point")
print("actual coefficient ", actual_coeff)
print("initial guess actual_coeff + 1 = ", guess_coeff)



while round(old_mse) > 0:

     # calculate new_y
     new_y = guess_coeff * x


     diff_y = vectorized_calc_error(y, new_y)
     new_mse = np.mean(diff_y)


     print("\n")

     print("loop", cnt)


     print(np.column_stack([y, new_y, diff_y]))

     print("new mse {:.3f} ".format(new_mse))

     guess_coeff = guess_coeff - lr

     print("new guess_coeff {:.4f}".format(guess_coeff))

     # update old values with new

     old_y = new_y

     print("old_mse {} new_mse {} ".format(old_mse, new_mse))

     print("rounded new_mse {}".format(round(new_mse)))

     old_mse = new_mse

     cnt += 1

print("\n==========================================")

print("\ncalculated linear square error coefficient",  guess_coeff)

starting point
actual coefficient  840
initial guess actual_coeff + 1 =  841


loop 0
[[   0.    0.    0.]
 [ 840.  841.    1.]
 [1680. 1682.    2.]
 [2520. 2523.    3.]
 [3360. 3364.    4.]
 [5040. 5046.    6.]
 [5880. 5887.    7.]
 [6720. 6728.    8.]
 [7560. 7569.    9.]]
new mse 4.444 
new guess_coeff 840.9000
old_mse 843 new_mse 4.444444444444445 
rounded new_mse 4


loop 1
[[   0.     0.     0. ]
 [ 840.   840.9    0.9]
 [1680.  1681.8    1.8]
 [2520.  2522.7    2.7]
 [3360.  3363.6    3.6]
 [5040.  5045.4    5.4]
 [5880.  5886.3    6.3]
 [6720.  6727.2    7.2]
 [7560.  7568.1    8.1]]
new mse 4.000 
new guess_coeff 840.8000
old_mse 4.444444444444445 new_mse 3.999999999999861 
rounded new_mse 4


loop 2
[[   0.     0.     0. ]
 [ 840.   840.8    0.8]
 [1680.  1681.6    1.6]
 [2520.  2522.4    2.4]
 [3360.  3363.2    3.2]
 [5040.  5044.8    4.8]
 [5880.  5885.6    5.6]
 [6720.  6726.4    6.4]
 [7560.  7567.2    7.2]]
new mse 3.556 
new guess_coeff 840.7000
old_mse 3.99999999999986