<a href="https://colab.research.google.com/github/yasmine-mk/implementing_linear_regression/blob/main/implementing_gradient_decent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [7]:
import numpy as np
import copy , math

<a name="toc_15456_5"></a>
# implementing Predict Function 
The model's prediction with multiple variables is given by the linear model:

$$ f_{\mathbf{w},b}(\mathbf{x}) =  w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b $$
or in vector notation:
$$ f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b  $$ 
where $\cdot$ is a vector `dot product`
here is an implementation of the predict function

In [3]:
# predicting one vector 
# x here is an array of shape (n,) n referring to the number of features in our dataset
def predict(x,w,b):
  res = np.dot(x,w)+b
  return res

<a name="toc_15456_5"></a>
# implementing Cost Function
The equation for the cost function with multiple variables $J(\mathbf{w},b)$ is:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 $$ 
where:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b   $$ 

In [2]:
# compute cost 
"""
X is our traing data of shape (m,n)
Y us the target data of shape (m,)
w is our parameter (weiths) vector
b is a scalar parameter (bias term) of shape ()
"""
def compute_cost(X,Y,w,b):
  m = X.shape[0] #number of training examples
  cost = 0.0     #cost
  for i in range(m):
    cost= cost + (np.dot(w, X[i]) +b - Y[i])**2     
  cost = cost/(2*m) 
  return 

<a name="toc_15456_5"></a>
# implementing Gradient Descent 
the algorithm of radient descent for multiple variables is as follows:

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline\;
& w_j = w_j -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j}  \; & \text{for j = 0..n-1}\newline
&b\ \ = b -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial b}  \newline \rbrace
\end{align*}$$

where, n is the number of features, parameters $w_j$,  $b$, are updated simultaneously and   

$$
\begin{align}
\frac{\partial J(\mathbf{w},b)}{\partial w_j}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)}   \\
\frac{\partial J(\mathbf{w},b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) 
\end{align}
$$where:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b   
$$ 

* m is the number of training examples in the data set

    
*  $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the true (target) value


PS : the implementation will be devided into two functions:

 * compute_gradient() : to compute the derivatives (inner loop) 

 * gradient_decent() : to compute the outer loop

In [4]:
def compute_gradient(X,Y,w,b):
  m,n = X.shape
  dj_dw = np.zeros((n,))
  dj_db = 0.
  for i in range(m):
    err = np.dot(X[i],w)+b - Y[i]
    for j in range (n):
      dj_dw[j] = dj_dw[j] + sum* X[i,j]
    dj_db = dj_db + err
  dj_db = dj_db/m
  dj_dw = dj_dw/m
  return dj_db, dj_dw

In [8]:
def gradient_decent(X, y, w_in, b_in,cost_function, gradient_function, learning_rate, num_iter):
  # An array to store cost J and w's at each iteration primarily for graphing later
  J_history = []
  w = copy.deepcopy(w_in)  #avoid modifying global w within function
  b = b_in
  for i in range (num_iter):
    # Calculate the gradient 
    dj_db, dj_dw = gradient_function(X,y,w,b)
    # Update Parameters
    w = w - learning_rate*dj_dw
    b = b - learning_rate*dj_db
    # Save cost J at each iteration
    if i<100000:      # prevent resource exhaustion 
      J_history.append(cost_function(X, y, w, b))
      # Print cost every at intervals 10 times or as many iterations if < 10
    if i% math.ceil(num_iter / 10) == 0:
      print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")
        
  return w, b, J_history #return final w,b and J history for graphing