<a href="https://colab.research.google.com/github/yachika-yashu/Machine-learning/blob/main/stochastic_gradient_descent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [21]:
from sklearn.datasets import load_diabetes
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

In [22]:

print(X.shape)
print(y.shape)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)
reg = LinearRegression()
reg.fit(X_train,y_train)
print(reg.coef_)
print(reg.intercept_)


(442, 10)
(442,)
[  -9.15865318 -205.45432163  516.69374454  340.61999905 -895.5520019
  561.22067904  153.89310954  126.73139688  861.12700152   52.42112238]
151.88331005254167


In [23]:
X_train.shape

(353, 10)

In [24]:
y_pred = reg.predict(X_test)
r2_score(y_test,y_pred)

0.4399338661568968

# **Stochastic Gradient descent**

The key difference between the SGDRegressor and GDRegressor lies in how they update model parameters during training. The SGDRegressor uses Stochastic Gradient Descent, where it updates the weights and intercept for each training epoch by selecting a single random sample from the dataset. Within each epoch, it runs an inner loop where it randomly picks a data point (idx) and computes the prediction y_hat using just that sample. It then calculates gradients (for both intercept and coefficients) based on the error from that sample and updates the model parameters immediately. This leads to more frequent updates (once per sample) and introduces some noise but allows faster convergence on large datasets.

On the other hand, GDRegressor uses Batch Gradient Descent, which updates the parameters only once per epoch using the entire training dataset. It calculates predictions for all samples at once (y_hat = np.dot(X_train, self.coef_) + self.intercept_) and then computes the average gradient across the full dataset. These gradients are then used to update the coefficients and intercept. This makes the learning process smoother and more stable but can be computationally heavier and slower for very large datasets since it processes all samples in each epoch. Additionally, the fit() method in SGDRegressor involves more nested loops and random sampling logic, while GDRegressor uses a simpler, cleaner structure due to its batch-based approach.


In [19]:
class SGDRegressor:

    def __init__(self,learning_rate=0.01,epochs=100):

        self.coef_ = None
        self.intercept_ = None
        self.lr = learning_rate
        self.epochs = epochs

    def fit(self,X_train,y_train):
        # init your coefs
        self.intercept_ = 0
        self.coef_ = np.ones(X_train.shape[1])

        for i in range(self.epochs):
            for j in range(X_train.shape[0]):
                idx = np.random.randint(0,X_train.shape[0]) #selecting random row

                y_hat = np.dot(X_train[idx],self.coef_) + self.intercept_  #prediction for that selected row

                intercept_der = -2 * (y_train[idx] - y_hat)
                self.intercept_ = self.intercept_ - (self.lr * intercept_der)

                coef_der = -2 * np.dot((y_train[idx] - y_hat),X_train[idx])
                self.coef_ = self.coef_ - (self.lr * coef_der)

        print(self.intercept_,self.coef_)

    def predict(self,X_test):
        return np.dot(X_test,self.coef_) + self.intercept_

In [26]:
gdr = SGDRegressor(epochs=1000,learning_rate=0.5)
gdr.fit(X_train,y_train)

151.5531043227468 [  25.52560977  -82.08266568  524.53664661  311.27344914 -996.63042266
  647.18806302   91.14400109  131.88414053  892.85249008  148.28071358]


In [27]:
y_pred = gdr.predict(X_test)

In [28]:
r2_score(y_test,y_pred)

0.4127502223048788