**Author:** Vivek Singh Solanki
**Book:** Approaching Almost Any Machine Learning Problem.
**Date:**  04-03-2023


# Machine Learning
## Supervised v/s Unsupervised ML
  * **Supervised ML:** Learning models where predicting single or multiple variables.
    * <b>Classification v/s Regression:</b> Predicting a category v/s a numeric value.
  * <b>Un-supervised ML:</b> No target variable, and we need to find patterns or group them.
    * __Clustring:__ Instances with no target variable, can be grouped/divided into clusters and helps us to identity patterns. <br> i.e. Credit Card Transactions, where lots of data comes every second, and it gets very difficult to mark them fraud/genuine trx using humans. We can use clustering to group those transactions and find patterns and try to figure out abnormal behaviours.

## Cross-Validation
* It's a step in process of building ML model for a problem that ensures that the ML model *fits the data accurately* and make sure that it does not *overfit*.
* Helps us choose best model from a set of candidate hypotheses.
* Helps us find the estimate of performance of the model for production-live version, by mimicking the production live enviornment using the existing data.

##### Overfitting:
* When a model learns the training data well but fails to generalize unseen samples/ test data. High performance on train-set but very low on test-set.
* A hypothesis overfits the training examples, if some other hypothesis that fits the training data less well by performs better on unseen data or over the entire distribution of instances.
* High Variance, Low Bias.

*Occam's Razor* stats that one should not try to complicate things that can be solved in much simpler manner.
**Arises when;**
   * We use a complex hypothesis/model for simpler ones.
   * When there is noise in the data
   * \# of training examples is too small to produce a representative sample of the true target function.
      *

##### How Cross-Validation is done?
It's simple divide data into two sets, one to train the models and another to test (hold-out/validation/test set) the performance. Though there many methods to cross-validate the models;
 * __Hold-out:__ above discussed. When data is large and it's expensive to train model several times. time-series data. etc
 * __k-fold:__ Divide the data randomly into k sets exclusive of each other, use k-1 sets for training and remained one set to test and repeat this k times.
 * __stratified k-fold:__ when classes are highly imbalanced, dividing randomly could lead no or very less samples for small class, you need to ensure the proportion of that in each of the k sets.
 * __Leave one out: __
 * __Group k-fold:__


## Evaluation Metrics
  * Classifications Metrics:
    * Accuracy, Precision, Recall, F1-Score, TPR, FPR, Area under the ROC curve
    * Log-Loss
    * Precison at k (P@K)

  * Regression
    * Mean Absoulte Error
    * Mean Squared Error/ Roor MSE
    * RMSLE
    * MAPE : Mean Percentage Error
    * R suqared

#### Classification Metrics
    ###### Confusion Matrix / Error Matrix
    <div><img src="images/ConfusedMat.png" width="1000"/></div><br><br><br>

  * FP, False Positive, False Alarm, type-I error: A test result which wrongly indicates a condition is **present** when it's not.
  * FN, False Negative, Miss, type-II error: A test result which wrongly indicates a condition is **absent** when it's not.
  * Prevalence: % of +ve class instances
  * Accuracy: What % of instances are correctly classified?
    * = (TP+TN)/(TP+FP+TN+FN) => `# of correctly classified instances` / `# total instances`

  * Precision: What % of +ve predicted instances are actually +ve?
     * = TP/(TP+FP) => `# of True Positives`/ `# of +ve predicted instances`

  * Recall / Hit-Rate: What % of actual +ve instances detected?
    * = TP/(TP+FN) => `# of True Positives`/ `# of actual +ve instances`
    * Also Called, True Positive Rate (TPR), Sensitivity.

  * Miss-rate: What % of actual +ve instances not detected?
    * = FN/(TP+FN) => `# of +ve incorectly classified`/ `# of actual +ve instances`

  * FPR / False Positive Rate / False Alarm Rate: What % of -ve class instances detected as +ve?
    * = 1 - TNR

  * True Negative Rate / Specificity: What % of actual -ve instances rejected (detected as -ve)?

  * F1-Score: Simple weighted average (harmonic mean) of P and R; = 2*P*R / (P+R)

Some terms which go hand by hand
* Precision - Recall (TPR): For P-R curves
* Sensitivity (TPR)- Specificity (TNR): For ROC curve, Sensitivity v/s 1-Specificity
* TPR - FPR: ROC Curves
* Hit Rate (TPR) - Miss Rate (FNR)

##### P-R Curve v/s ROC curve
**Why P-R or ROC curves are used in classification?**
When a model predicts the probability of the +ve class i.e. logistic regression, it can be more flexible to assign the +ve class if the probability is above a threshold, <br>so we can play with some choices of threshold based on the problem where the cost of one error outweighs the cost of other type or errors i.e. type I and type II errors. P-R and ROC curve are the two diagnostic tools that help in the interpretation of probabilistic forecast for binary classification.

**P-R Curve:**
* Summarizes trade-off between Recall(TPR) and Precision for a predictive model.
* Precision on x-axis and Recall on y-axis. When Precision increases, Recall decreases and vice versa.
* Helps finding a threshold such that you will have a good Recall or good Precision depending on the problem given a sufficient Precision or Recall correspondingly.
* Used for highly imbalanced classes.

**ROC Curve / Receiver Operating Curve**
* Summarizes trade-off between TPR and FPR for a predictive model.
* Sensitivity/Recall/Hit-Rate/TPR on y-axis v/s 1-specificity/1-TNR/FPR on x-axis.
* Helps finding a threshold such that you will have a good TPR given tolerable FPR depending upon the problem (cost).
* Used for moderately im-balance classes.

**AUC - Area Under Curve (ROC/PR) is another metric to assess the model.**
  * 0 <= AUC < 0.5: There must be some logical or code bug, you might be assigning wrong class with the probability.
  * AUC = 0.5: Similar to the random assigning of the class.
  * 0.5 < AUC <= 1: better than a random model, if 1 then a perfect model.

<div><img src="images/ROC.png" width="300"/> <img src="images/PR.png" width="300"/></div><br><br><br>

#### Regression Metrics:
* R-squared: quantifies how good your model explains the variability of the target variable
    * = 1 - `variance explained by the model` / `variance of the taget variable`
    * = 1 - `squared error of the model` / `squared error of the mean model`

