<a href="https://colab.research.google.com/github/varun-chourasia/Machine-Learning-/blob/main/Supervised_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Linear Regression

Linear regression is a supervised machine learning algorithm that models the linear relationship between a dependent variable and one or more independent variables to predict continuous numerical values. It works by finding a straight line that best fits the data points on a scatter plot, with the goal of minimizing the distance between the line and the points. This algorithm is used for tasks like predicting housing prices based on size or forecasting sales revenue from historical data.  

### How it works
`Relationship modeling:` Linear regression establishes a mathematical relationship between variables to understand how they influence each other.

`Finding the best-fit line:` The algorithm identifies the "best fit" line that minimizes the error between the predicted values and the actual data points.

`Prediction:` Once the line is established, it can be used to predict the value of the dependent variable for new, unseen data points based on their independent variable values.

_Formula :_

          y = w*x + b

In [None]:
import numpy as np

#Data
X= np.array([1,2,3,4,5])
Y= np.array([3,5,7,9,11])

#Initialize
w = 0.5 #slope
b = 0.5 #intercept
lr = 0.01  #learning rate
epochs = 30  #keep it small so we can watch each update

#Training Loop
for i in range(epochs):
  #step 1 :- Forward pass [predictions]
  Y_pred = w * X + b

  #step 2 :- compute loss
  loss= np.mean((Y - Y_pred)**2)

  #step 3 :- Gradiant
  dw = -(2/len(X)) * np.sum(X * (Y - Y_pred))
  db = -(2/len(X)) * np.sum(Y - Y_pred)

  #step 4 :- Update
  w = w - lr * dw
  b = b - lr * db

  #print progress
  print(f"epoch {i +1} : w = {w: .3f}, b = {b: .3f}, loss = {loss: .4f}")

def predict(X,w,b):
  return w*X+b


In [None]:
X_test = np.array([5,6,7])

Y_pred_test = predict(X_test,w,b)
print("predicted value :",Y_pred_test)

 Plot the Model

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.scatter(X, Y, color='blue', label="actual points")
plt.plot(X, Y_pred, color='red', label='predicted line')

plt.xlabel("X [I/P features]")
plt.ylabel("Y [Target O/P]")
plt.title("Linear Regression Line")
plt.legend()
plt.grid(True)
plt.show()

In [None]:
Y_test = np.array([11,13,15])

plt.scatter(X, Y, color='blue', label="actual points")
plt.plot(X, Y_pred, color='red')

plt.scatter(X_test, Y_test, color='green', label="unseen points")
plt.plot(X_test, Y_pred_test, color='red', label='predicted line')

plt.xlabel("X [I/P features]")
plt.ylabel("Y [Target O/P]")
plt.title("Linear Regression Line")
plt.legend()
plt.grid(True)
plt.show()

Computer the Error of the models


In [None]:
from sklearn.metrics import mean_squared_error, r2_score

#for Y known data
mse = mean_squared_error(Y, Y_pred)
r2 = r2_score(Y, Y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)

In [None]:
# for Y_test unknonw data
mse_test = mean_squared_error(Y_test, Y_pred_test)
r2_test = r2_score(Y_test, Y_pred_test)

print("Mean Squared Error:", mse_test)
print("R-squared:", r2_test)

### Custom Implemented Linear Regression:-

In [None]:
from sklearn.model_selection import train_test_split
import pandas as pd

class MyLinearRegression:
  def __init__(self,lr =0.01,epochs=1000):
    self.lr = lr
    self.epochs = epochs
    self.w = None
    self.b = 0
    self.loss_history = []

  def fit(self,X,y):
    # if X is 1D, reshape it to 2D
    if X.ndim ==1:
      X = X.reshape(-1,1)

    n_samples, n_features = X.shape
    self.w = np.zeros(n_features)
    self.b = 0

    for i in range(self.epochs):
      # forward pass
      y_pred = np.dot(X,self.w) + self.b

      # compute gradients
      dw = -(2/n_samples) * np.dot(X.T, (y - y_pred))
      db = -(2/n_samples) * np.sum(y - y_pred)

      # update weights and bias
      self.w -= self.lr * dw
      self.b -= self.lr * db

      # print loss occasionally
      if i % 100 == 0:
        loss = np.mean((y - y_pred)**2)
        print(f"Epochs {i}: Loss={loss:.4f}")

      self.loss_history.append(loss)
    return self

  def predict(self, X):
    if X.ndim ==1:
      X = X.reshape(-1,1)

    return np.dot(X,self.w) + self.b



### Problem Question

1. Student Performance Prediction

`Problem Statement:` Predict a student’s final CGPA based on internal academic indicators such as attendance, assignment scores, and midterm marks.

`Goal:` Identify students likely to underperform so teachers can intervene early.

In [None]:
# Create Dataset
n =100
attendance = np.random.randint(60,100, n)
assignment = np.random.randint(50,100, n)
midterm = np.random.randint(40,95, n)

cgpa = 0.03*attendance + 0.04*assignment + 0.05*midterm + np.random.normal(0,0.3,n) +2

X = np.column_stack((attendance, assignment, midterm))
y = cgpa

# train-test split
split = int(0.8*n)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

# train your model
# model = MyLinearRegression(lr=0.00001, epochs=1000)
model = MyLinearRegression(lr=0.00001, epochs=2000) # epochs =2000+ for Probably stabilizes ~same (good).
model.fit(X_train, y_train)

# predict on test data
y_pred = model.predict(X_test)

# evaluate
mse = np.mean((y_test - y_pred)**2)
ss_total = np.sum((y_test - np.mean(y_test))**2)
ss_res = np.sum((y_test - y_pred)**2)
r2 = 1 - (ss_res / ss_total)

print("\nFinal Results:")
print(f"MSE: {mse:.4f}")
print(f"R² : {r2:.4f}")
print("Weights:", model.w)
print("Bias:", model.b)

In [None]:
import matplotlib.pyplot as plt
plt.scatter(y_test, y_pred, color='blue')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color='red')
plt.xlabel('Actual CGPA')
plt.ylabel('Predicted CGPA')
plt.title('Actual vs Predicted')
plt.show()


2. Car Price Estimation

`Problem Statement:` Predict a car’s market price based on measurable specifications like age, mileage, and engine power.

`Goal:` Help used-car dealers estimate a fair resale value.

In [None]:
# 1️⃣ Generate data
np.random.seed(2)
n = 120
age = np.random.randint(1, 10, n)
mileage = np.random.randint(5000, 150000, n)
power = np.random.randint(60, 250, n)
price = 20 - (0.5*age) - (0.00005*mileage) + (0.08*power) + np.random.normal(0, 0.5, n)

df2 = pd.DataFrame({
    'Age': age,
    'Mileage': mileage,
    'Power': power,
    'Price': np.round(price, 2)
})

X = df2[['Age','Mileage','Power']]
y = df2['Price']

# 2️⃣ Split first
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 3️⃣ Normalize (fit only on train, apply to both)
X_mean, X_std = X_train.mean(axis=0), X_train.std(axis=0)
X_train_scaled = (X_train - X_mean) / X_std
X_test_scaled = (X_test - X_mean) / X_std   # same mean/std

# 4️⃣ Train model
model = MyLinearRegression(lr=0.001, epochs=5000)
model.fit(X_train_scaled, y_train)

# 5️⃣ Predict & evaluate
y_pred = model.predict(X_test_scaled)
print("\nMSE:", mean_squared_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))

# 6️⃣ Unscale coefficients back to real-world units
real_w = model.w / X_std
real_b = model.b - np.sum(real_w * X_mean)

print("\nReal-world coefficients:")
print("Weights:", real_w)
print("Bias:", real_b)


In [None]:
fig, axes = plt.subplots(3, 1, figsize=(7, 12))
fig.subplots_adjust(hspace=0.6)  # <-- gap between plots

# Loss Curve
axes[0].plot(model.loss_history, label='Training Loss')
axes[0].set_title("Loss Curve")
axes[0].set_xlabel("Epoch")
axes[0].set_ylabel("MSE Loss")
axes[0].legend()

# Actual vs Predicted
axes[1].scatter(y_test, y_pred, color='blue')
axes[1].plot([y_test.min(), y_test.max()],
             [y_test.min(), y_test.max()], 'r--')
axes[1].set_title("Actual vs Predicted Prices")
axes[1].set_xlabel("Actual")
axes[1].set_ylabel("Predicted")

# Residuals
residuals = y_test - y_pred
axes[2].scatter(y_pred, residuals, color='green')
axes[2].axhline(0, color='red', linestyle='--')
axes[2].set_title("Residual Plot")
axes[2].set_xlabel("Predicted")
axes[2].set_ylabel("Residual")

plt.show()


3. Fitness Tracker — Calories Burn Prediction
`Problem Statement:` Estimate the number of calories burned in a day based on daily steps, workout time, and age.

`Goal:` Help users understand their activity–calorie relationship and plan workouts effectively.

In [None]:
from sklearn.preprocessing import StandardScaler

np.random.seed(3)
n = 150

steps = np.random.randint(2000, 20000, n)
workout_time = np.random.randint(10, 120, n)
age = np.random.randint(18, 60, n)

calories = 0.03*steps + 5*workout_time - 3*age + np.random.normal(0, 100, n) + 500

df3 = pd.DataFrame({
    'Steps': steps,
    'WorkoutTime': workout_time,
    'Age': age,
    'CaloriesBurned': np.round(calories,2)
})

X = df3[['Steps','WorkoutTime','Age']]
y = df3['CaloriesBurned']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

model = MyLinearRegression(lr = 0.001, epochs=2000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("\nMSE:", mean_squared_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))

# 6️⃣ Unscale coefficients back to real-world units
real_w = model.w / X_std
real_b = model.b - np.sum(real_w * X_mean)

print("\nReal-world coefficients:")
print("Weights:", real_w)
print("Bias:", real_b)

