What is Supervised Learning (Regression)?

Regression is a supervised learning technique used to predict continuous numerical values (e.g., price, temperature, salary, etc.).

Types of Regression Algorithms in Supervised Learning

| Algorithm                                             | Type                   | Use Cases                 | Example                                   |
| ----------------------------------------------------- | ---------------------- | ------------------------- | ----------------------------------------- |
| **1. Linear Regression**                              | Linear                 | Predicting prices, scores | House price prediction                    |
| **2. Polynomial Regression**                          | Non-linear             | Curved data trends        | Growth curve modeling                     |
| **3. Ridge Regression**                               | Regularized linear     | Multicollinearity         | Predicting rent prices                    |
| **4. Lasso Regression**                               | Regularized linear     | Feature selection         | Stock price prediction                    |
| **5. Elastic Net Regression**                         | Combo of Lasso + Ridge | Balanced regularization   | Risk analysis                             |
| **6. Decision Tree Regressor**                        | Non-linear             | Rule-based regression     | Car price estimation                      |
| **7. Random Forest Regressor**                        | Ensemble               | High performance          | Salary prediction                         |
| **8. Gradient Boosting Regressor**                    | Ensemble               | Accurate prediction tasks | Energy load forecasting                   |
| **9. SVR (Support Vector Regression)**                | Margin-based           | Complex, small datasets   | Time series forecasting                   |
| **10. MLP Regressor (Neural Network)**                | Deep learning          | Complex non-linear        | Predicting house price with many features |
| **11. HGB Regressor (HistGradientBoostingRegressor)** | Scalable boosting      | Large data regression     | Click-through rate prediction             |


In [2]:
#- Linear Regression :-
'''
Use: It’s used for predicting a continuous target using a linear relationship between input features and output.

How it works: Models relationship using a straight line
'''
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression        #- This generates a synthetic regression dataset.- Helpful for testing and learning without using real-world data.

X, y = make_regression(n_samples=100, n_features=1, noise=10)
''' 
- n_samples=100: Creates 100 data points.
- n_features=1: Each point has one feature (i.e., one input variable).
- noise=10: Adds some randomness to simulate real-world data, making the regression line imperfect.
'''
model = LinearRegression()      #- It will fit a line of the form y = m·x + b to the data.
model.fit(X, y)                 #- Internally, it computes the best-fitting line using least squares—minimizing the error between predicted and actual values.
print(model.coef_, model.intercept_)

''' 
model.coef_       # This gives β₁ — Coefficient (Slope of the Line)
model.intercept_  # This gives β₀ — the intercept (Where the line croses the y-axis)
- y: Predicted output
- x: Input feature
'''

[37.2305456] -1.052880660116693


' \nmodel.coef_       # This gives β₁ — Coefficient (Slope of the Line)\nmodel.intercept_  # This gives β₀ — the intercept (Where the line croses the y-axis)\n- y: Predicted output\n- x: Input feature\n'

In [3]:
#- Example :-

X = [[1], [2], [3], [4], [5]]
y = [50, 90, 130, 170, 210]

model = LinearRegression()
model.fit(X, y)
print(model.coef_, model.intercept_)


[40.] 10.000000000000028


🔧 Use Cases of Linear Regression
Real-world applications where this model shines:
- Predicting house prices based on size, location, etc.
- Forecasting sales from advertising spend.
- Estimating salary based on years of experience.


In [4]:
#Polynomial Regression (which is a powerful way to model complex relationships between input and output)

''' 
Use: Non-linear trend

How it works: Adds polynomial features to linear regression
'''

from sklearn.preprocessing import PolynomialFeatures    #- This brings in a tool to convert original features into polynomial features.
from sklearn.linear_model import LinearRegression       #- Even though it's “linear,” it's used on polynomial-transformed data to fit curves
from sklearn.pipeline import make_pipeline              #- Allows chaining multiple steps in one go: preprocessing + model training.This simplifies code and ensures smooth data flow.


model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression())
model.fit(X, y)
print(model.predict(X[:5]))     #- - The prediction output will be the estimated values of y for the first five input rows of X.


[ 50.  90. 130. 170. 210.]


Use Cases of Polynomial Regression :-

| 📈 Finance | Modeling profit growth or compound interest curves | 
| 🏥 Healthcare | Predicting patient recovery trends over time | 
| 🌡️ Weather | Forecasting temperature with seasonal effects | 
| 🚗 Engineering | Vehicle performance modeling based on torque, speed, etc. | 





RIDGE REGRESSION

What Is Ridge Regression?

Ridge Regression is a type of linear regression that adds L2 regularization to the model. Regularization helps prevent overfitting by penalizing large coefficient values.

Why use it?

- Helps when there’s multicollinearity (i.e., features are highly correlated).
- Works well when there are more features than observations or noisy data.
- Keeps model complexity in check by shrinking coefficients


In [5]:
#- Ridge Regression

''' 
Use: Ridge Regression improves linear regression when predictors are correlated (multicollinearity).

How it works: Adds L2 penalty to linear regression (or) Adds an L2 penalty (sum of squared coefficients) to shrink coefficients and stabilize the model.
'''

from sklearn.linear_model import Ridge

model = Ridge(alpha=1.0)            # Step 1: Create Ridge Regression model
''' 
- alpha controls the strength of regularization.
- Higher alpha means more penalty ➜ simpler model.
- Default is 1.0, but tuning it is key to good performance.
'''
model.fit(X, y)                     # Step 2: Train the model on features X and target y
''' 
- X is the input features (like age, income, etc.)
- y is the target variable (like house price, exam score, etc.)
- The model learns coefficients that best fit the data
'''
print(model.score(X, y))            # Step 3: Evaluate the model with R^2 score
''' 
- Returns R² score, which tells you how well your model explains the variability of the output.
- Score of 1.0 = perfect fit, 0.0 = no better than mean, negative = worse than guessing.
'''


0.9917355371900827


' \n- Returns R² score, which tells you how well your model explains the variability of the output.\n- Score of 1.0 = perfect fit, 0.0 = no better than mean, negative = worse than guessing.\n'

Example :-

Predict student exam scores based on:
- Number of hours studied
- Sleep hours
- Class attendance



In [6]:
import numpy as np
from sklearn.linear_model import Ridge

X = np.array([[5, 8, 90],
              [7, 6, 80],
              [4, 9, 95]])   # Features: [study_hours, sleep_hours, attendance]
y = np.array([85, 87, 82])   # Target: exam scores

model = Ridge(alpha=0.5)
''' 
🛠️ model = Ridge(alpha=0.5)
You're creating a Ridge Regression model from scikit-learn, and you're setting the regularization strength using the alpha parameter.
🔍 So what does alpha=0.5 do?
- alpha controls how much penalty the model applies to large coefficient values.
- Lower alpha (like 0.5) means less regularization, so the model is closer to standard linear regression.
- Higher alpha (like 5.0 or 10.0) means more regularization, which shrinks the coefficients more and may help reduce overfitting

'''
model.fit(X, y)
print(model.score(X, y))     # Output might be something like 0.92 (very good fit)

0.9097602228633961


🚀 When to Use Ridge Over Regular Linear Regression?
- Your model is overfitting
- Your dataset has many correlated features
- You want more stable and reliable predictions


LASSO REGRESSION

📘 What Is Lasso Regression?

Lasso Regression (Least Absolute Shrinkage and Selection Operator) is a linear model that includes L1 regularization. It's popular for both prediction and automatic feature selection.

🧪 Definition

- Adds a penalty based on the absolute value of coefficients.
- This penalty can shrink some coefficients to zero, essentially removing unimportant features.
- Great when you have many features, but only a few truly matter.


In [7]:
#Lasso regression
''' 
Use: Feature selection

How it works: Adds L1 penalty (shrinks coefficients to zero)
'''

from sklearn.linear_model import Lasso

model = Lasso(alpha=0.1)
'''
- Initializes the model with regularization strength alpha = 0.1
- Smaller alpha → less penalty, more freedom to keep features
- Larger alpha → stronger penalty, more features dropped
'''
model.fit(X, y)
''' 
- Trains the model to learn the relationship between:
- X: features/input variables (e.g. age, income, etc.)
- y: target/output variable (e.g. price, score, etc.)

'''
print(model.coef_)
''' 
- Displays the model’s learned coefficients.
- Any zero values mean those features were deemed unimportant and removed by Lasso.

'''

[ 0.         -0.         -0.31171429]


' \n- Displays the model’s learned coefficients.\n- Any zero values mean those features were deemed unimportant and removed by Lasso.\n\n'

Example :-

Predicting bike rental demand based on:
- Temperature
- Humidity
- Wind speed
- Holiday indicator
- Day of the week


In [8]:
import numpy as np
from sklearn.linear_model import Lasso

X = np.array([[25, 60, 12, 1, 2],
              [30, 50, 8, 0, 5],
              [22, 70, 10, 0, 6]])   # Features: [temp, humidity, wind, holiday, day]
y = np.array([120, 140, 100])        # Target: bike rentals

model = Lasso(alpha=0.2)
model.fit(X, y)

print(model.coef_)  # Some values may be 0 → features auto-removed by Lasso

#It means humidity and holiday aren’t adding useful info — Lasso zeroed them out 

[ 0.    -1.997 -0.    -0.    -0.   ]


Why Use Lasso?

- ✅ Automatic feature selection (simpler, cleaner models)
- ✅ Prevents overfitting
- ✅ Handles high-dimensional datasets
- ✅ Useful when many features are irrelevant


ELASTIC NET REGRESSION

🧠 What Is Elastic Net Regression?

Elastic Net Regression is a linear regression model enhanced with regularization techniques. It blends:
- L1 penalty from Lasso Regression: promotes sparsity by shrinking some coefficients to zero (feature selection)
- L2 penalty from Ridge Regression: discourages large coefficients (adds stability when predictors are correlated)
Together, Elastic Net gives you the best of both worlds: feature selection and robustness against multicollinearity.


🔧 How It Works

Elastic Net minimizes the following cost function:

\text{Loss} = \text{RSS} + \alpha \left( \text{l1\_ratio} \cdot ||\beta||_1 + (1 - \text{l1\_ratio}) \cdot ||\beta||_2^2 \right)

Where:
- \text{RSS} = Residual sum of squares (ordinary loss)
- \alpha = overall regularization strength
- \text{l1\_ratio} controls how much L1 vs L2 to apply


🎯 Use Cases

Elastic Net is especially useful when:
- There are many features, some of which are highly correlated
- You need feature selection but Lasso drops too many or Ridge keeps too many
- Data is noisy or high-dimensional (like in genomics, finance, or image recognition


In [10]:
from sklearn.linear_model import ElasticNet

# Create model with 50% L1 and 50% L2 penalty
model = ElasticNet(alpha=0.1, l1_ratio=0.5)

# Fit model to training data
model.fit(X, y)

# Evaluate performance
print(model.score(X, y))  # R² score on training set

''' 
Let’s say you’re predicting house prices with 100 features, some of which overlap—like area in square feet and number of rooms. A plain linear regression may struggle with collinearity. Elastic Net:
- Shrinks unimportant features (L2)
- Eliminates redundant ones (L1)
- Boosts prediction accuracy and model interpretability

'''

0.999997213768693


' \nLet’s say you’re predicting house prices with 100 features, some of which overlap—like area in square feet and number of rooms. A plain linear regression may struggle with collinearity. Elastic Net:\n- Shrinks unimportant features (L2)\n- Eliminates redundant ones (L1)\n- Boosts prediction accuracy and model interpretability\n\n'

DECISION TREE REGRESSOR

Definition :- 

A Decision Tree Regressor is a supervised machine learning algorithm that predicts a continuous value by learning rules from data. It builds a tree structure that splits data into smaller subsets based on feature thresholds, ending in leaf nodes that hold predictions.

How It Works

- Start at the Root Node: The model looks at all features and finds the split that minimizes prediction error (usually using Mean Squared Error).
- Recursive Splitting: It continues splitting each subset by selecting features and thresholds until:
- A stopping condition is met (e.g. maximum depth, minimum samples per leaf).
- No further improvement is possible.
- Leaf Nodes: Each leaf holds a constant prediction value—the mean of the target values in that subset.

Use Cases

Decision Trees shine when:
- You want easy interpretability.
- Your data has non-linear relationships.
- You have mixed-type features (numerical and categorical).
- There’s no need to generalize beyond training data (risk of overfitting).
Common in:
- Forecasting stock prices
- Predicting medical costs
- Estimating house prices
- Customer behavior modeling



In [None]:
from sklearn.tree import DecisionTreeRegressor

# Initialize model with default parameters
model = DecisionTreeRegressor()

# Train (fit) model to feature matrix X and target y
model.fit(X, y)

# Predict values for first 5 instances in X
print(model.predict(X[:5]))

''' 
Example :-
Let’s say you’re predicting salaries based on years of experience, education level, and job role.
- The tree first checks “Is experience > 5 years?”
- Then maybe “Is education = Master’s?”
- At each step, it narrows the decision to groups with similar salary ranges.
In the end, each leaf holds the average salary value for a sub-group of employees that match a certain path of rules.

'''

[120. 140. 100.]


RANDOM FOREST REGRESSOR

What Is Random Forest Regression?

A Random Forest Regressor is an ensemble model that builds multiple decision trees during training and predicts by averaging their outputs. It combines the simplicity of decision trees with the accuracy boost of bagging (bootstrap aggregating).

How It Works

- Bootstrapping: The model draws multiple random samples (with replacement) from the original dataset.
- Tree Building: For each sample, a decision tree is built using random subsets of features.
- Prediction Averaging: For regression, the final prediction is the average of predictions from all trees.
This randomness reduces overfitting and improves generalization

 Use Cases
 
Random Forest is ideal when:
- You need strong prediction accuracy out-of-the-box.
- Your data has lots of features, possibly with non-linear relationships.
- You're dealing with tabular datasets, either clean or noisy.
Popular in:
- Environmental modeling (e.g. rainfall prediction)
- Healthcare (e.g. risk scoring)
- Real estate price estimates
- Fraud detection in financ


In [12]:
from sklearn.ensemble import RandomForestRegressor

# Create a random forest regressor model
model = RandomForestRegressor()     #initializes the model using default settings (100 trees).

# Fit model to training data
model.fit(X, y)                     #trains each tree on different slices of data

# Predict on first five instances
print(model.predict(X[:5]))         #outputs an averaged prediction across trees for each input

''' 
Suppose you’re predicting crop yield based on soil quality, rainfall, temperature, and fertilizer type. One decision tree might overfit to rainfall patterns. But with 100 slightly different trees, Random Forest will:
- Smooth out noisy decisions
- Leverage diverse perspectives from all trees
- Result in a more stable and accurate prediction

'''

[119.2 129.8 112.4]


' \nSuppose you’re predicting crop yield based on soil quality, rainfall, temperature, and fertilizer type. One decision tree might overfit to rainfall patterns. But with 100 slightly different trees, Random Forest will:\n- Smooth out noisy decisions\n- Leverage diverse perspectives from all trees\n- Result in a more stable and accurate prediction\n\n'

GRADIENT BOOSTING REGRESSOR

What Is Gradient Boosting Regressor?

Gradient Boosting is a powerful ensemble technique that builds a strong predictor by combining many weak learners—usually shallow decision trees. Each tree corrects the errors of the previous one, producing highly accurate models, especially for structured/tabular data.

How It Works

The core idea: sequential learning with gradient descent optimization.
- Start with an initial prediction (usually the mean of targets).
- Calculate the residuals (errors from previous prediction).
- Fit a tree to these residuals.
- Add the new tree's predictions (scaled by a learning rate) to the previous model.
- Repeat for a set number of iterations (trees).
This step-by-step error correction allows the model to focus on what previous trees got wrong.

Use Cases

Use Gradient Boosting when:
- You need high predictive accuracy on structured/tabular data.
- Relationships are non-linear and complex.
- You’re dealing with heterogeneous feature sets.
Common applications:
- Credit scoring and risk modeling 💳
- Sales forecasting 📈
- Disease progression modeling 🧬
- Energy consumption predictions 



In [13]:
from sklearn.ensemble import GradientBoostingRegressor

# Create the model with default settings
model = GradientBoostingRegressor()
''' 
- GradientBoostingRegressor() uses parameters like:
- n_estimators: number of trees
- learning_rate: how much each tree contributes
- max_depth: depth of each tree

'''

# Fit to training data
model.fit(X, y)             #builds the sequence of trees to minimize error.

# Predict for the first 5 samples
print(model.predict(X[:5]))     #produces predictions for the first five samples—averaged across all trees.

[120.         139.99946877 100.00053123]


SVR (SUPPORT VECTOR REGRESSION)

Definition :- 

SVR (Support Vector Regression) is a type of Support Vector Machine used for predicting continuous values. It doesn’t try to perfectly fit the data—instead, it finds a function that deviates from actual values by no more than a certain margin (epsilon), while keeping the model as flat (simple) as possible.
It prioritizes robustness and generalization, especially for small to medium-sized datasets.

How It Works

- SVR defines a margin of tolerance (epsilon) where errors within this zone are ignored.
- It only considers data points outside this margin for model correction—these are the "support vectors."
- A kernel function (e.g., rbf) transforms data into higher dimensions, enabling non-linear regression.
- The model learns weights that define the regression hyperplane while keeping most coefficients at zero—making it sparse and efficient

When to Use SVR

SVR is ideal when:
- Your dataset is not too large (computationally expensive for big data).
- You need a non-linear model with flexibility (especially with kernels).
- Accuracy matters, but you want to tolerate small prediction errors.
- Data might be noisy, and you want to avoid overfitting.
Applications include:
- Time series forecasting 📊
- Stock price prediction 💹
- Environmental modeling 🌧️
- Engineering simulations 



In [14]:
from sklearn.svm import SVR

# Create SVR model with Radial Basis Function kernel
model = SVR(kernel='rbf')       #chooses a non-linear kernel to handle complex patterns.

# Fit model to training data
model.fit(X, y)                 #learns the regression function using support vectors.

# Predict on first 5 inputs
print(model.predict(X[:5]))     #gives continuous predictions for the first 5 samples

''' 
You can tweak C (penalty for errors), epsilon (margin width), and gamma (kernel coefficient) for better control and performance

Example

Say you’re predicting battery health based on usage patterns, temperature, and charge cycles:
- SVR will build a regression function that captures the trend while ignoring minor deviations.
- Only outlier samples (where predictions exceed the epsilon margin) influence model updates.
- This creates a balanced model that doesn’t overreact to noisy data.

'''

[120.00000003 120.18247461 119.8337276 ]


' \nYou can tweak C (penalty for errors), epsilon (margin width), and gamma (kernel coefficient) for better control and performance\n\nExample\n\nSay you’re predicting battery health based on usage patterns, temperature, and charge cycles:\n- SVR will build a regression function that captures the trend while ignoring minor deviations.\n- Only outlier samples (where predictions exceed the epsilon margin) influence model updates.\n- This creates a balanced model that doesn’t overreact to noisy data.\n\n'

MLP REGRESSOR (NUERAL NETWORK)

What Is MLP Regressor?

An MLP Regressor is a feedforward neural network designed for regression tasks. It consists of fully connected layers of neurons that learn complex nonlinear mappings from inputs to outputs using backpropagation and gradient descent.

How It Works

- Architecture:
- Input layer → Hidden layers → Output layer
- Each neuron applies a weighted sum of inputs, adds bias, and passes through an activation function (typically ReLU or tanh).
- Training:
- The model starts with random weights.
- It predicts output values, compares them to actual targets using a loss function (usually MSE).
- Backpropagation adjusts weights to minimize error.
- This repeats over multiple iterations (max_iter) until convergence.

When to Use MLP Regressor

MLP Regressor works best when:
- You have nonlinear relationships that simpler models (like trees or linear regressors) can’t capture.
- The dataset size is moderate (too large, and training can be slow).
- You’re comfortable tuning parameters like layer sizes, learning rate, and activation functions.
It’s often used for:
- Predicting housing or vehicle prices 🏡🚗
- Modeling climate or energy consumption ☀️⚡
- Forecasting demand or user engagement 📈
- Any regression task requiring deep, abstract feature learning


In [15]:
from sklearn.neural_network import MLPRegressor

# Create a neural network model with default architecture
model = MLPRegressor(max_iter=1000)
''' 
- MLPRegressor() builds the neural network. 
You can customize layers using hidden_layer_sizes=(100,), 
activation with activation='relu', and solver with solver='adam'.

'''

# Train it on your feature matrix X and target variable y
model.fit(X, y)     #trains the model using gradient descent.

# Predict outputs for first 5 samples
print(model.predict(X[:5]))     #outputs predicted values based on learned weights.

''' 
Example
Imagine predicting air pollution levels based on traffic, weather, time, and industrial activity. Linear models may miss the interactions. MLP can:
- Encode nonlinear effects (e.g., traffic and humidity combined may amplify pollution)
- Learn hidden patterns through its deep layers
- Yield more accurate predictions where simpler models fail

'''

[119.93687987 140.14295046  99.94516437]


' \nExample\nImagine predicting air pollution levels based on traffic, weather, time, and industrial activity. Linear models may miss the interactions. MLP can:\n- Encode nonlinear effects (e.g., traffic and humidity combined may amplify pollution)\n- Learn hidden patterns through its deep layers\n- Yield more accurate predictions where simpler models fail\n\n'

HISTOGRAM-BASED GRADIENT BOOSTING REGRESSOR(HGB)

What Is HGB Regressor?

The HistGradientBoostingRegressor is a high-speed, scalable version of gradient boosting, designed for large datasets. It uses histogram-based binning of numerical features to speed up training and reduce memory usage—making it perfect for real-world tabular data.
It’s part of scikit-learn’s experimental ensemble module, inspired by the LightGBM framework.

How It Works

HGB speeds things up using these tricks:
- Feature Binning:
- Instead of using raw feature values, it groups them into discrete bins.
- This reduces the number of possible splits in trees.
- Gradient Boosting Strategy:
- Builds trees sequentially, each learning from the errors (residuals) of the previous ones.
- Optimizes predictions using gradient descent.
By binning features, the model can handle datasets with millions of rows efficiently

When to Use It

Use HGB when:
- Your dataset is large and high-dimensional.
- You need both accuracy and speed.
- You’re solving regression on structured/tabular data.
It shines in:
- E-commerce: price/demand predictions 💰
- Energy modeling: load forecasting 🔋
- Web analytics: engagement or conversion forecasting 



In [None]:
from sklearn.ensemble import HistGradientBoostingRegressor

# Initialize the fast gradient boosting model
model = HistGradientBoostingRegressor()     #uses efficient feature binning

# Fit to training data
model.fit(X, y)                     #builds and trains gradient-boosted trees.

# Predict outputs for first 5 samples
print(model.predict(X[:5]))         #makes fast predictions using the ensemble

''' 
You can tune:
- max_iter (number of trees)
- learning_rate
- max_depth
- max_bins (for histogram resolution)

Example

Say you are modeling real estate prices across an entire country, with millions of records. Traditional models may struggle. But HGB:
- Bins features like square footage, location score, and age of property
- Builds fast, accurate models using fewer calculations
- Keeps memory usage low, even as data scales

'''

[120. 120. 120.]


' \nYou can tune:\n- max_iter (number of trees)\n- learning_rate\n- max_depth\n- max_bins (for histogram resolution)\n\nExample\n\nSay you’re modeling real estate prices across an entire country, with millions of records. Traditional models may struggle. But HGB:\n- Bins features like square footage, location score, and age of property\n- Builds fast, accurate models using fewer calculations\n- Keeps memory usage low, even as data scales\n\n'