
# Assignment: Linear Regression, Logistic Regression, and K-Means (From Scratch)

**Instructions**
- You are NOT allowed to use `scikit-learn` for model implementation, scaling.
- You may use it for implementation of clustering
- You may use: `numpy`, `matplotlib`, and standard Python libraries only.
- Every step (scaling, loss, gradients, optimization) must be implemented manually.
- Clearly comment your code and explain your reasoning in Markdown cells.


## Question 1: Linear Regression from Scratch (with Standardization and Regularization)

You are given a dataset `(X, y)`.

### Tasks
1. Implement **StandardScaler manually**:
   - Compute mean and standard deviation for each feature.
   - Standardize the features.
2. Implement **Linear Regression using Gradient Descent**.
3. Add **L2 Regularization (Ridge Regression)**.
4. Plot:
   - Loss vs iterations
   - True vs predicted values

Do NOT use `sklearn`.


In [13]:

import numpy as np
import matplotlib.pyplot as plt


In [14]:

# Implement StandardScaler manually ,  first read about it, how it works and then implement it
class StandardScalerManual:
    def __init__(self):
        self.mean=None
        self.std=None
    def fit(self, X):
        self.mean=np.mean(X,axis=0)
        self.std=np.std(X,axis=0)
        self.std[self.std==0]=1.0
        return self
        pass

    def transform(self, X):
        return (X - self.mean_) / self.std_
        pass

    def fit_transform(self, X):
        self.fit(X)
        return self.transform(X)
        pass


In [15]:

# Implement Linear Regression from scratch, here you have to also construct the regulization term coefficient of which will be
# denoted by l2_lambda
# try to implement L1 regularization or atlease read about it and where it is used
class LinearRegressionManual:
      def __init__(self, lr=0.01, epochs=1000, l2_lambda=0.0):
        self.lr=lr
        self.epochs=epochs
        self.l2_lambda=l2_lambda
        self.w=None
        self.b=None
        pass

      def fit(self, X, y):
        n,m=X.shape
        y=y.reshape(-1,1)
        self.w=np.zeros((m,1),dtype=np.float32)
        self.b=0.00
        msearr=np.zeros(self.epochs)
        for epoch in range(self.epochs):
          ypred=np.dot(X,self.w)+self.b
          mse=np.mean((ypred-y)**2)+self.l2_lambda*np.sum(self.w**2)
          dw=2*np.dot(X.T,ypred-y)/n+2*self.l2_lambda*self.w
          db=2*np.sum((ypred-y))/n
          self.w-=self.lr*dw
          self.b-=self.lr*db
        plt.plot(range(self.epochs),msearr,color='green',label="Loss vs iterations")
        plt.legend()
        plt.show()
        plt.plot(y,ypred,color="pink",label="True vs predicted values")
        plt.show()
        plt.scatter(X,y,color='blue',label="Actual data")
        plt.plot(X,ypred,color="red",label="Predicted line")
        plt.legend()
        plt.show()
        pass


      def predict(self, X):
        return np.dot(X,self.w)+self.b



## Question 2: Logistic Regression from Scratch (with Standardization and Regularization)

You are given a binary classification dataset.

### Tasks
1. Reuse your **manual StandardScaler**.
2. Implement **Logistic Regression using Gradient Descent**.
3. Use:
   - Sigmoid function
   - Binary Cross Entropy loss
4. Add **L2 Regularization**.
5. Report:
   - Training loss curve
   - Final accuracy

Do NOT use `sklearn`.


In [16]:

#Implement sigmoid function as told in the lectures
def sigmoid(z):
    res=1/(1+np.exp(-z))
    return res
    pass


In [17]:
from logging import log

#Implement Logistic Regression from scratch and here also add the regularizaation term
class LogisticRegressionManual:
    def __init__(self, lr=0.01, epochs=1000, l2_lambda=0.0):
        self.lr=lr
        self.epochs=epochs
        self.l2_lambda=l2_lambda
        self.w=None
        self.b=None
        self.y=None
        pass

    def fit(self, X, y):
        n,m=X.shape
        self.y=y.reshape(-1,1)
        self.w=np.zeros((m,1),dtype=np.float32)
        self.b=0.00
        loss=np.zeros(self.epochs)
        for epoch in range(self.epochs):
            z=np.dot(X,self.w)+self.b
            res=sigmoid(z)
            loss[epoch]=-np.mean(y*np.log(res+1e-9)+(1-y)*np.log(1-res+1e-9))+self.l2_lambda*np.sum(self.w**2)
            dw=1*np.dot(X.T,res-y)/n+2*self.l2_lambda*self.w
            db=1*np.sum((res-y))/n
            self.w-=self.lr*dw
            self.b-=self.lr*db
        plt.plot(range(self.epochs),loss,color='green',label="Loss vs iterations")
        plt.show()
        pass

    def predict_proba(self, X):
        z=np.dot(X,self.w)+self.b
        res2=sigmoid(z)
        pass

    def predict(self, X):
        probab=self.predict_proba(X)   # sigmoid outputs
        preds=np.zeros_like(probab)
        for i in range(len(probab)):
            if probab[i]>=0.5:
                preds[i]=1
            else:
                preds[i]=0


        return preds,np.mean(preds==self.y)*100
        pass


## Question 3: K-Means Clustering from Scratch (Matrix Clustering)

You are given a **random matrix** `M` of shape `(n, m)`.

### Tasks
Implement K-Means clustering **from scratch** such that:

1. Input:
   - A random matrix `M`
   - Number of clusters `k`
2. Output:
   - `assignment_table`: a matrix of same shape as `M`, where each element stores the **cluster label**
   - `cookbook`: a dictionary (hashmap) where:
     - Key = cluster index
     - Value = list of **positions (i, j)** belonging to that cluster
   - `centroids`: array storing centroid values

You must cluster **individual elements**, not rows.


In [18]:

# Implement K-Means for matrix elements
#CAN USE SK-LEARN FOR THIS TASK AS THIS TASK WILL HELP US DIRECTLY IN OUR PROJECT !
def kmeans_matrix(M, k, max_iters=100):
    '''
    Returns:
    assignment_table: same shape as M, contains cluster labels
    cookbook: dict -> cluster_id : list of (i, j) positions
    centroids: numpy array of centroid values
    '''
    n,m=M.shape
    assignment_table=np.zeros((n,m),dtype=int)
    centroids=np.random.choice(M.flatten(),size=k,replace=False)
    centroids1=np.zeros((k))
    for iter in range(max_iters):
        for i in range(n):
          for j in range(m):
            distance=np.abs(centroids-M[i,j])
            assignment_table[i,j]=np.argmin(distance)
        for r in range(k):
          values=M[assignment_table==r]
          if len(values)>0:
                centroids1[r]=np.mean(values)
          else:
                centroids1[r]=np.random.choice(M.flatten())
        centroids = centroids1.copy()
    cookbook={c: [] for c in range(k)}
    for i in range(n):
            for j in range(m):
                c=assignment_table[i, j]
                cookbook[c].append((i, j))
    return assignment_table,cookbook,centroids


## Submission Guidelines
- Submit the completed `.ipynb` file.
- Clearly label all plots and outputs.
- Code readability and correctness matter.
- Partial credit will be given for logically correct implementations.

**Bonus**
- Compare convergence with and without standardization.
- Try different values of regularization strength.
