### Subject
Being given *test_scores.csv* file with Mathematical and Computer Science student's test scores you are to find out the correlation between scores of these subjects.
1. For test scores in .csv file, run gradient descent algorithm to find out value of m, b and appropriate learning rate
2. On each iteration, compare previous cost with current cost. Stop when costs are similar (use *math.isclose* function with *1e-20* threshold)
3. Now using *sklearn.linear_model* find coefficient (i.e. m) and intercept (i.e. b). Compare them with m, b generated by your gradient descent algorithm

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import math

In [2]:
def predict_using_sklearn():
    df = pd.read_csv('test_scores.csv')
    r = LinearRegression()
    r.fit(df[['math']],df.cs)
    return r.coef_, r.intercept_

In [3]:
def gradient_descent(x,y):
    m_curr = 0
    b_curr = 0
    iterations = 1000000
    n = len(x)
    learning_rate = 0.0002
    previous_cost = 0
    
    for i in range(iterations):
        y_predicted = m_curr*x + b_curr
        cost = (1/n)*sum([value**2 for value in (y-y_predicted)])
        md = -(2/n)*sum(x*(y-y_predicted))
        bd = -(2/n)*sum(y-y_predicted)
        m_curr = m_curr - learning_rate*md
        b_curr = b_curr - learning_rate*bd
        if math.isclose(cost, previous_cost, rel_tol=1e-20):
            break
        cost_previous = cost
        # print("m {}, b {}, cost {} iteration {}".format(m_curr,b_curr,cost,i))
    
    return m_curr, b_curr

In [4]:
df = pd.read_csv('test_scores.csv')
df

Unnamed: 0,name,math,cs
0,david,92,98
1,laura,56,68
2,sanjay,88,81
3,wei,70,80
4,jeff,80,83
5,aamir,49,52
6,venkat,65,66
7,virat,35,30
8,arthur,66,68
9,paul,67,73


In [5]:
x = np.array(df.math)
y = np.array(df.cs)

m, b = gradient_descent(x,y)
print('Using gradient descent function: Coef {} Intercept {}'.format(m, b))

m_sklearn, b_sklearn = predict_using_sklearn()
print('Using sklearn: Coef {} Intercept {}'.format(m_sklearn,b_sklearn))

Using gradient descent function: Coef 1.0177362378598027 Intercept 1.9152193109535014
Using sklearn: Coef [1.01773624] Intercept 1.9152193111569318
