What is Supervised Learning (Classification)?

Supervised Learning is a type of machine learning where the model learns from labeled data. In classification, the goal is to predict a category or class label (e.g., spam/not spam, yes/no, disease/no disease).

| Algorithm                                          | Type                 | Use Cases                                   | Example                   |
| -------------------------------------------------- | -------------------- | ------------------------------------------- | ------------------------- |
| **1. Logistic Regression**                         | Linear               | Binary classification, medical diagnosis    | Spam detection (spam/ham) |
| **2. K-Nearest Neighbors (KNN)**                   | Lazy, non-parametric | Recommendation systems, pattern recognition | Classify image as dog/cat |
| **3. Support Vector Machine (SVM)**                | Linear/non-linear    | Text classification, face detection         | Sentiment analysis        |
| **4. Decision Tree**                               | Non-linear           | Rule-based systems, fraud detection         | Loan approval             |
| **5. Random Forest**                               | Ensemble             | Credit scoring, feature importance          | Classify loan defaulters  |
| **6. Naive Bayes**                                 | Probabilistic        | Text classification, spam filtering         | Email spam classification |
| **7. Gradient Boosting** (XGBoost, LightGBM, etc.) | Ensemble             | High-performance tasks                      | Disease prediction        |
| **8. Neural Networks (MLP)**                       | Deep learning        | Image & speech classification               | Handwriting recognition   |
| **9. Histogram-based Gradient Boosting (HGB)**     | Scalable ensemble    | Large datasets                              | Click prediction          |


In [None]:
#Logistic Regression

'''
Use: Binary classification

How it works: Uses sigmoid function to output probabilities between 0 and 1.'''

from sklearn.linear_model import LogisticRegression     #Model used for binary classification.
from sklearn.datasets import load_breast_cancer         #Loads the breast cancer dataset.
from sklearn.model_selection import train_test_split    #Splits the dataset into training and testing parts.

X, y = load_breast_cancer(return_X_y=True)              # Feature matrix X and target vector y.
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.25,random_state=42)   # Splits data (default: 75% train, 25% test).

model = LogisticRegression(max_iter=1000)               #initializes the model, allowing up to 1000 iterations for convergence.
model.fit(X_train, y_train)                             #trains the model using the training data.
print(model.score(X_test, y_test))                      #calculates accuracy — that is, how well your model performs on unseen data.

0.958041958041958


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT

Increase the number of iterations to improve the convergence (max_iter=1000).
You might also want to scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [None]:
#K-Nearest Neighbors (KNN)
'''
Use: Pattern recognition

How it works: Classifies a data point based on the majority label of its k-nearest neighbors.
'''

from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=3)  # The model to look at the 3 nearest neighbors when making predictions.- It will classify a test point based on the majority vote among these 3 neighbors.

model.fit(X_train, y_train)         #- Stores the training data. KNN is a lazy learner, so it doesn’t build a model upfront — it just memorizes the data and computes distances at prediction time.

print(model.score(X_test, y_test))     # Returns the accuracy


0.9020979020979021


In [4]:
# Support Vector Machine (SVM)
'''
Use: Text, image classification

How it works: Finds the hyperplane that best separates classes.
'''

from sklearn.svm import SVC         #Stands for Support Vector Classification, which is a powerful algorithm for both linear and non-linear classification tasks.

model = SVC(kernel='linear')        #SVM model using a linear kernel, which means it will attempt to find a linear hyperplane that separates the data.
model.fit(X_train, y_train)         #Fits the SVM model to the training data.
print(model.score(X_test, y_test))


0.951048951048951


| Kernel      | Description                                                                    | Use Case                                 |
| ----------- | ------------------------------------------------------------------------------ | ---------------------------------------- |
| `'linear'`  | No transformation. Tries to separate data with a **straight line (or plane)**. | When data is linearly separable.         |
| `'poly'`    | Polynomial kernel. Allows **curved decision boundaries**.                      | When the relationship is polynomial.     |
| `'rbf'`     | Radial Basis Function (Gaussian). Most commonly used.                          | For non-linear, complex data structures. |
| `'sigmoid'` | Uses the sigmoid function. Similar to a neural network activation function.    | Rarely used.                             |
| `custom`    | You can define your own kernel function.                                       | For very specific use cases.             |


In [None]:
#Decision tree
'''
Use: Rule-based systems

How it works: Creates a tree where each node is a decision rule.
'''

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
'''
Initializes a decision tree classifier using default parameters:

criterion='gini' (default): uses the Gini impurity to split nodes.

max_depth=None: tree expands until all leaves are pure or contain less than min_samples_split.

random_state=None: randomness in tree construction is not fixed.
'''
model.fit(X_train, y_train)
print(model.score(X_test, y_test))


DECISION TREE       ||    COMMAN ISSUES AND CONSIDERATION 

| Problem                   | Cause                           | Fix                                        |
| ------------------------- | ------------------------------- | ------------------------------------------ |
| **Overfitting**           | Tree is too deep and fits noise | Use `max_depth`, `min_samples_split`, etc. |
| **Poor generalization**   | Not enough data pruning         | Prune or restrict tree growth              |
| **Inconsistent accuracy** | Randomness in tree construction | Set `random_state`                         |


You can control overfitting by tuning hyperparameters that prune or restrict the tree:

| Parameter           | Description                                                   | Helps Fix Overfitting? |
| ------------------- | ------------------------------------------------------------- | ---------------------- |
| `max_depth`         | The maximum depth of the tree.                                | ✅ Yes                  |
| `min_samples_split` | Minimum samples required to split an internal node.           | ✅ Yes                  |
| `min_samples_leaf`  | Minimum samples required at a leaf node.                      | ✅ Yes                  |
| `max_leaf_nodes`    | Limit on number of leaf nodes.                                | ✅ Yes                  |
| `max_features`      | Number of features to consider when splitting.                | ✅ Yes                  |
| `ccp_alpha`         | Complexity parameter for **post-pruning**.                    | ✅ Yes                  |
| `random_state`      | Seed for randomness (for reproducibility).                    | ❌ No                   |
| `criterion`         | Function to measure quality of a split (`gini` or `entropy`). | ❌ No                   |


All Key Parameters in DecisionTreeClassifier (with Explanation)
| Parameter                  | Description                                                                               |
| -------------------------- | ----------------------------------------------------------------------------------------- |
| `criterion`                | `'gini'` (default) or `'entropy'`. Measure used to select best splits.                    |
| `splitter`                 | `'best'` (default) or `'random'`. Strategy used to choose the split at each node.         |
| `max_depth`                | Maximum depth of the tree. Prevents the tree from growing too deep.                       |
| `min_samples_split`        | Minimum number of samples required to split a node. Default is 2.                         |
| `min_samples_leaf`         | Minimum number of samples at a leaf node. Prevents leaves with very few samples.          |
| `max_features`             | Max number of features to consider when looking for the best split.                       |
| `max_leaf_nodes`           | Grow tree with at most this many leaf nodes.                                              |
| `min_weight_fraction_leaf` | Like `min_samples_leaf` but uses fraction of total weights (for weighted samples).        |
| `max_samples`              | (Since v1.1) Only available if `bootstrap=True`. Used to subsample the data.              |
| `random_state`             | Controls randomness in tree building (e.g. `splitter='random'`). Ensures reproducibility. |
| `ccp_alpha`                | Complexity parameter used for **Minimal Cost-Complexity Pruning**. Higher → simpler tree. |
| `class_weight`             | Can be used to handle imbalanced classes.                                                 |
| `presort`                  | Deprecated. Now always presorts data for faster split finding.                            |


 What is a Random Forest?

A Random Forest is an ensemble of multiple decision trees, where:

Each tree is trained on a random subset of the data and features.

Final prediction is made by voting (classification) or averaging (regression).

It's robust, less prone to overfitting, and works well even without feature scaling.

In [None]:
#Random Forest

''' 
Use: Robust predictions

How it works: Combines multiple decision trees (ensemble).
'''

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
''' 
Uses 100 trees (n_estimators=100) by default.

Each tree is built on a random sample with replacement (bootstrap).

Splits are made using a random subset of features at each node.
'''
model.fit(X_train, y_train)
print(model.score(X_test, y_test))


0.9790209790209791


Why use Random Forest?
| Feature                   | Benefit                                             |
| ------------------------- | --------------------------------------------------- |
| **Reduces overfitting**   | Averaging over many trees smooths out the variance. |
| **Handles non-linearity** | Can capture complex patterns.                       |
| **Works without scaling** | No need to normalize features.                      |
| **Feature importance**    | You can find which features are most useful.        |
| **Robust**                | Works well even with missing values or noisy data.  |


Important Parameters
| Parameter           | Description                                                            | Helps Control          |
| ------------------- | ---------------------------------------------------------------------- | ---------------------- |
| `n_estimators`      | Number of trees in the forest. More trees → better results but slower. | Bias-variance tradeoff |
| `criterion`         | `'gini'` (default) or `'entropy'`. Split quality metric.               | Accuracy               |
| `max_depth`         | Max depth of trees. Prevents overfitting.                              | Overfitting            |
| `min_samples_split` | Minimum samples to split an internal node.                             | Overfitting            |
| `min_samples_leaf`  | Minimum samples in a leaf node.                                        | Overfitting            |
| `max_features`      | Number of features to consider when splitting.                         | Performance            |
| `bootstrap`         | Whether sampling is done with replacement.                             | Diversity              |
| `random_state`      | Reproducibility of results.                                            | Consistency            |
| `n_jobs=-1`         | Use all CPU cores for faster training.                                 | Speed                  |
| `oob_score=True`    | Use out-of-bag samples for validation.                                 | Model evaluation       |


In [None]:
'''#Tuning the MODEL
model = RandomForestClassifier(
    n_estimators=200,
    max_depth=10,
    min_samples_leaf=4,
    random_state=42,
    n_jobs=-1,
    oob_score=True
)'''

In [None]:
'''# Feature Importance (After Training)

import pandas as pd

feature_importances = pd.Series(model.feature_importances_, index=feature_names)
feature_importances.sort_values(ascending=False).head()

#Helps understand which features are most influential in predictions.'''

🧠 What Is Naive Bayes?
Naive Bayes is a probabilistic classifier based on Bayes’ Theorem with the naive assumption that all features are independent of each other.

📊 Gaussian Naive Bayes
The GaussianNB classifier assumes that features follow a normal (Gaussian) distribution.

Best suited for continuous input features (like those in the Breast Cancer dataset).

It’s fast, simple, and often performs well for high-dimensional datasets.

In [None]:
#Naive Bayes
''' 
Use: Spam filters

How it works: Uses Bayes theorem assuming feature independence.
'''

from sklearn.naive_bayes import GaussianNB

model = GaussianNB()            #Initializes the model with default parameters.Assumes each feature is normally distributed within each class.
model.fit(X_train, y_train)     #Trains the model using the training data.It estimates the mean and variance of each feature for each class.
print(model.score(X_test, y_test))


0.951048951048951


In [7]:
pip install xgboost

Collecting xgboost
  Downloading xgboost-3.0.2-py3-none-win_amd64.whl.metadata (2.1 kB)
Downloading xgboost-3.0.2-py3-none-win_amd64.whl (150.0 MB)
   ---------------------------------------- 0.0/150.0 MB ? eta -:--:--
   ---------------------------------------- 0.0/150.0 MB ? eta -:--:--
   ---------------------------------------- 0.5/150.0 MB 2.5 MB/s eta 0:01:01
   ---------------------------------------- 1.0/150.0 MB 2.5 MB/s eta 0:01:01
   ---------------------------------------- 1.6/150.0 MB 2.5 MB/s eta 0:00:59
    --------------------------------------- 2.6/150.0 MB 3.3 MB/s eta 0:00:45
    --------------------------------------- 3.7/150.0 MB 3.7 MB/s eta 0:00:40
   - -------------------------------------- 5.2/150.0 MB 4.3 MB/s eta 0:00:34
   - -------------------------------------- 6.6/150.0 MB 4.7 MB/s eta 0:00:31
   -- ------------------------------------- 7.9/150.0 MB 4.9 MB/s eta 0:00:30
   -- ------------------------------------- 8.9/150.0 MB 4.9 MB/s eta 0:00:29
   -- --

What Is XGBoost? 

XGBoost (Extreme Gradient Boosting) is a fast, regularized, and scalable implementation of gradient boosting. It builds an ensemble of decision trees sequentially, where each tree tries to correct the errors of the previous ones.

It’s known for:

High performance

Speed

Built-in regularization to reduce overfitting

In [None]:
#Gradient Boosting
''' 
Use: Structured data competitions

How it works: Builds models sequentially to correct errors from previous models.
'''
from xgboost import XGBClassifier

model = XGBClassifier()

''' 
Initializes the classifier with default settings, including:

n_estimators=100: number of boosting rounds (trees)

learning_rate=0.3: how much each tree contributes

max_depth=6: max depth of each tree

use_label_encoder=False (for newer versions)

eval_metric='logloss' (for binary classification)
'''
model.fit(X_train, y_train)     #Builds multiple trees sequentially using gradient descent to minimize the loss function.
print(model.score(X_test, y_test))

0.993006993006993


 Key Advantages of XGBoost
 | Feature                                        | Benefit                                              |
| ---------------------------------------------- | ---------------------------------------------------- |
| **Regularization** (`reg_alpha`, `reg_lambda`) | Reduces overfitting                                  |
| **Handling of missing values**                 | Automatically handled by the algorithm               |
| **Tree pruning**                               | Uses *max depth* and *max leaves* for better control |
| **Parallel training**                          | Faster than traditional GBM                          |
| **Early stopping support**                     | Can stop training when no improvement is seen        |
| **Custom loss functions**                      | Flexible for advanced use cases                      |


Important Parameters (Commonly Tuned)
| Parameter                  | Description                                                |
| -------------------------- | ---------------------------------------------------------- |
| `n_estimators`             | Number of boosting rounds (trees)                          |
| `max_depth`                | Maximum depth of trees                                     |
| `learning_rate` (or `eta`) | Shrinks the contribution of each tree                      |
| `subsample`                | Percentage of rows used per tree                           |
| `colsample_bytree`         | Percentage of features used per tree                       |
| `gamma`                    | Minimum loss reduction to make a split                     |
| `reg_alpha`                | L1 regularization                                          |
| `reg_lambda`               | L2 regularization                                          |
| `scale_pos_weight`         | Used for imbalanced data                                   |
| `objective`                | Specifies the learning task (default: `'binary:logistic'`) |


XGBoost Use Cases
| Domain       | Use Case                                        |
| ------------ | ----------------------------------------------- |
| Finance      | Fraud detection, credit scoring                 |
| Healthcare   | Disease prediction, risk classification         |
| Marketing    | Churn prediction, customer segmentation         |
| Competitions | Widely used in **Kaggle** and **AI challenges** |


In [None]:
'''#Example with Tuned Parameters

model = XGBClassifier(
    n_estimators=200,
    max_depth=4,
    learning_rate=0.1,
    subsample=0.8,
    colsample_bytree=0.8,
    use_label_encoder=False,
    eval_metric='logloss',
    random_state=42
)
'''

In [None]:
#Multilayer Perceptron (Neural Network) A type of artificial neural network.
''' 
Use: Deep learning for non-linear data and supervised learning tasks—especially classification

How it works: Layers of neurons connected with weights.
'''
from sklearn.neural_network import MLPClassifier        #- MLPClassifier is a feedforward artificial neural network used for classification.

model = MLPClassifier(max_iter=1000)
model.fit(X_train, y_train)     #- Internally, it uses backpropagation and gradient descent to adjust weights and minimize classification error.
print(model.score(X_test, y_test))


0.8881118881118881


Use Cases :- (MLP Classifiier)

| 📧 Spam detection | Classifies emails as spam or not | 
| 🖼️ Image classification | Distinguishes digits, animals, etc. | 
| 🏥 Medical diagnosis | Predicts conditions from patient data | 
| 🎓 Student performance | Predicts pass/fail based on study patterns | 






In [None]:
#Histogram-based Gradient Boosting
''' 
Use: Scalable boosting for big data;  It’s faster and more memory-efficient than traditional Gradient Boosting, especially for large datasets.

How it works: Faster implementation of gradient boosting
'''
from sklearn.ensemble import HistGradientBoostingClassifier #- It uses gradient boosting with decision trees and histogram-based binning to speed up computation.

model = HistGradientBoostingClassifier()    #- It automatically handles categorical features, missing values, and has built-in regularization.
model.fit(X_train, y_train)     #- Internally, it builds an ensemble of decision trees, where each tree corrects the mistakes of the previous one.
print(model.score(X_test, y_test))


0.993006993006993


| Algorithm           | Best For                | Pros                 | Cons                         |
| ------------------- | ----------------------- | -------------------- | ---------------------------- |
| Logistic Regression | Simple binary problems  | Fast, interpretable  | Limited to linear boundaries |
| KNN                 | Low-dimensional data    | Easy to implement    | Slow on large data           |
| SVM                 | High-dimensional data   | Effective, flexible  | Needs tuning                 |
| Decision Tree       | Rule-based logic        | Easy to understand   | Overfitting                  |
| Random Forest       | Ensemble learning       | Accurate             | Less interpretable           |
| Naive Bayes         | Text data               | Fast, scalable       | Assumes independence         |
| Gradient Boosting   | Performance tasks       | High accuracy        | Training time                |
| MLP/Neural Net      | Complex non-linear data | Deep learning ready  | Needs large data             |
| HGB                 | Scalable gradient boost | Very fast & accurate | Newer model                  |
