__Linear Regression__
is an attractive model because the representation is so simple. The representation is a linear equation that combines a specific set of input values (x) the solution to which is
the predicted output for that set of input values (y). As such, both the input values (x) and
the output value are numeric.
The linear equation assigns one scale factor to each input value or column, called a coeficient
that is commonly represented by the Greek letter Beta. One additional coeficient is
also added, giving the line an additional degree of freedom (e.g. moving up and down on a
two-dimensional plot) and is often called the intercept or the bias coeficient. For example, in a
simple regression problem (a single x and a single y), the form of the model would be:
y = B0 + B1 * x

### Assumptions

- __Linear Assumption__. Linear regression assumes that the relationship between your input
and output is linear. It does not support anything else. This may be obvious, but it is
good to remember when you have a lot of attributes. You may need to transform data to
make the relationship linear (e.g. log transform for an exponential relationship).
- __Remove Noise__. Linear regression assumes that your input and output variables are
not noisy. Consider using data cleaning operations that let you better expose and clarify
the signal in your data. This is most important for the output variable and you want to
remove outliers in the output variable (y) if possible.
- __Remove Collinearity__. Linear regression will over-fit your data when you have highly correlated input variables. Consider calculating pairwise correlations for your input data and removing the most correlated.
- __Gaussian Distributions__. Linear regression will make more reliable predictions if your
input and output variables have a Gaussian distribution. You may get some benefit using
transforms (e.g. log or BoxCox) on you variables to make their distribution more Gaussian
looking.
- __Rescale Inputs__: Linear regression will often make more reliable predictions if you rescale
input variables using standardization or normalization.

B1 = corr(x; y) *stdev(y)/stdev(x)

In [1]:
def linear_regression(x,y,iterations=100,
                      learning_rate=0.01):
  n,m= len(x[0]),len(x)
  beta_0, beta_other = initialize_params(n)
  for _ in range(iterations):
    gradient_beta_0, gradient_beta_other = compute_gradient(
        x,y,beta_0,beta_other,n,m
    )
    beta_0, beta_other = update_params(
        beta_0,beta_other, gradient_beta_0, gradient_beta_other,
        learning_rate
    )
  return beta_0, beta_other

In [3]:
import random
def initialize_params(dimensions):
  beta_0 = 0
  beta_other = [random.random() for i in range(dimensions)]
  return beta_0, beta_other

In [5]:
def compute_gradient(x,y,beta_0,beta_other,dimension,m):
  gradient_beta_0 = 0
  gradient_beta_other = [0] * dimension

  for i in range(m):
    y_i_hat = sum(x[i][j] * beta_other[j] for j in range(dimension))+ beta_0
    derror_dy = 2 * (y[i] - y_i_hat)
    for j in range(dimension):
      gradient_beta_other[j] +=derror_dy + x[i][j]/m
    gradient_beta_0 += derror_dy/m
  return gradient_beta_0, gradient_beta_other

In [6]:
def update_params(beta_0, beta_other, gradient_beta_0,
                  gradient_beta_other, learning_rate):
  beta_0 += gradient_beta_0 * learning_rate
  for i in range(len(beta_other)):
    beta_other[i] +=(gradient_beta_other[i] * learning_rate)
  return beta_0, beta_other