## Objective of this assignment
In this assignment, you will choose two classical machine learning algorithms implemented by
scikit-learn, spend some time researching them, then describe each algorithm in your own words,
compare and contrast their strengths and weaknesses, and apply both algorithms to the same
dataset - of your choice

## Part 1: Algorithm Selection
#### I will be looking at Logistic Regression and Support Vector Machine models

## Part 2: Description

### Main Concepts
**Logistic Regression Model**
1. Predict a binary outcome (Yes/No, 0/1) by modeling the probability that a data point belongs to a class or not.
2. Uses a linear combination of features, then squashes that value through the logistic (sigmoid) function to get a probability between 0 and 1.

**Support Vector Machine (SVM) model**
1. Finds a decision boundary (a hyperplane) that separates classes with the maximum margin (distance between closest points of each class to the boundary).
2. Can create non-linear boundaries using kernels (e.g., RBF) by implicitly mapping data to higher dimensions.

#### How the Algorithms Work
**Logistic Regression**
1. Taking a linear combination of the features.
2. Turning it into a probability using the sigmoid function.
3. Comparing predictions to real outcomes.
4. Adjusting weights to minimize errors.
5. Repeating until the model learns the best boundary between classes.

**Support vector Machine**
1. Finding the best separating line (or surface) between classes.
2. Maximizing the distance between that line and the closest points (support vectors).
3. Using kernel tricks to handle non-linear data.
4. Controlling flexibility with the regularization parameter (C).
5. Predicting class based on which side of the boundary a new point lies.

#### Part 3: Comparison and Key Differences: Logistic Regression vs SVM

| **Aspect**                         | **Logistic Regression (LR)**                                                      | **Support Vector Machine (SVM)**                                                             |
| ---------------------------------- | --------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- |
| **Type of Model**                  | Probabilistic (outputs probabilities)                                             | Geometric / Margin-based (finds best boundary)                                               |
| **Main Idea**                      | Finds a linear boundary that best fits data by minimizing the log loss.           | Finds the hyperplane that maximizes the margin between classes.                              |
| **Output**                         | Predicts probabilities (e.g., “70% chance of class 1”).                           | Predicts class labels directly (can give probability if enabled).                            |
| **Decision Boundary**              | Based on the sigmoid curve — smooth and probabilistic.                            | Based on support vectors — sharp margin separation.                                          |
| **Handling Non-linearity**         | Needs feature engineering (like polynomial features) to handle non-linear data.   | Uses kernel functions (RBF, polynomial) to model complex, non-linear patterns.               |
| **Regularization Parameter**       | Controlled by `C` (inverse of regularization strength).                           | Also controlled by `C` (trade-off between margin width and misclassification).               |
| **Interpretability**               | <font color="green"> Easy to interpret — coefficients show feature influence.                          | Harder to interpret, especially with non-linear kernels.  </font>                                   |
| **Computation Speed**              | <font color="green">Faster on large datasets.</font>                                                         | Slower on large datasets (especially with non-linear kernels).                               |
| **Sensitivity to Feature Scaling** | Not highly sensitive but scaling helps.                                           | Highly sensitive — scaling is essential.                                                     |
| **Typical Use Cases**              | When interpretability and probabilities matter (e.g., credit scoring, marketing). | <font color="green">When accuracy and clear separation matter (e.g., image classification, text classification).<font> |

<font color='green'>_green indicates strength. which is a weakness for the other_</font>

## Part 4: Application of Dataset

### Developing two independent models on social network dataset found on Kaggle

[Social Network Ads @ Kaggle](https://www.kaggle.com/datasets/d4rklucif3r/social-network-ads)

##### Objective: The Dataset used in these models tells about whether a person of certain age having certain income purchases a product or not.

#### ❓ Questions I Seek to answer

1. Did the user purchase ads or not

The modelling process will be separated from this notebook for the sake of separation of concerns. 

**Here is the link to both ML models used on this dataset:**
[Logistic Regression and SVM model](social_ntwrk_model.ipynb)