# Supervised Vector Machine (SVM)

https://www.youtube.com/watch?v=efR1C6CvhmE




## Bias and Variance

https://www.youtube.com/watch?v=EuBBz3bI-aA

- The inability for a machine learning method (like linear regression) to
capture the true relationship is called **bias**. Because the **Straight Line** can't be curved like the "true" relationship, it has a relatively large amount of **bias**.

- Another machine learning method might fit a **Squiggly Line** to the training set... The **Squiggly Line** is super flexible and hugs the training set along the arc of the true relationship. The **Squiggly Line** has very little **bias**.

- In Machine Learning lingo, the difference in fits between training sets and test sets is called **variance**.

- The **Squiggly Line** has **low bias**, since ist is flexible and can adapt to the curve in the relationship between weight and hight. In contrast, the **Straight Line** has relatively high bias, since it cannot capture the curve in the relationship between the weight and height...but the **Straight Line** has relatively **low variance**, because the Sums of Squares are very similar for different datasets. 

- In other words, the **Straight Line** might only give good predictions, and not great predictions. But they will be **consistently** good predictions.

- Three commonly used methods for finding the sweet spot between simple and complicated models are: **regularization, boosting, and bagging**.


## Cross Validation

https://www.youtube.com/watch?v=fSytzGwwBVw

- **Cross Validation** allows us to compare different machine learning methods and get a sense of how well they will work in practice.

- Using machine learning lingo, we need data to:
    - **Train** the machine learning methods.
    - **Test** the machine learning methods.

- Rather than worry too much about which set of data would be best for testing, **cross validation** uses them all, one at a time, and summarizes the results at the end. 

- In this case, since the **support vector machine** did the best job classifying the test datasets, we'll use it!.
    - Four-Fold Cross Validation, Ten-Fold Cross Validation
    - We could use 10-fold cross validation to help find the best value for that tuning parameter.


## ROC and AUC

https://www.youtube.com/watch?v=4jRBRDbJemM

- When we're doing **Logistic Regression**, the y-axis is converted to the probability that a mouse is obese. So this **Logistic Regression** tells us the ***probability*** that a mouse is obese based on its weight.

    - However, if we want to *classify* the mice as obese or not obese, then we need a way to turn probabilities into classifications, such as setting up a threshold at 0.5.

- Now we create a **confusion matrix** to summarize the classifications.
![jupyter](./figs/confusion_matrix.png)

    - Once the **Confusion Matrix** is filled in, we can calculate **Sensitivity and Specificity** to evaluate this **Logistic Regression** when 0.5 is the threshold for obesity.
    
    - For example, if it was super important to correctly classify every **obese** sample, we could set the threshold to 0.1 ...
        - The lower threshold would increase the number of **False-Positive (I area)**.
        - The lower threshold would also reduce the number of **False-Negatives(III area)**, because all of the **obese** mice were correctly classified.
        - and it would reduce the number of **True Negative(IV are)**, because two of the mice that were not obese were incorrectly classified as obese.
        
    - **In some cases, it's absolutely essential to correctly classify *every* sample infected with DISEASE in order to minimize the risk of an outbreak**.   ** and that means lowering the threshold, even if that results in more *False Positives* **.
    
    - With **higher threshold**, this data does a better job classifying samples as obese and not-obese.

**How to decide which threshold is the best**

- Instead of being overwhelmed with confusion matrices, **Receiver Operator Characteristics(ROC)** graphs provide a simple way to summarize all of the information.

> The Y-axis shows the **True Positive Rate**
> True Positive Rate = Sensitivity = True Positive / (True Positives + False Negatives)

![jupyter](./figs/confusion_matrix2.png)

> The **True Positive Rate** tells you what proportion of obese samples were **correctly** classified.


> The X-axis shows the **False Positive Rate**, which is the same thing as (1- **Specificity**)

> False Positive Rate = (1- Specificity) = False Positives / (False Positives + True Negatives)

![jupyter](./figs/confusion_matrix3.png)

> The **False Positive Rate** tells you the proportion of not obese samples that were **incorrectly** classified and are False Positives.


- The **ROC graph** summarizes all of the confusion matrics that each threshold produced

![jupyter](./figs/confusion_matrix4.png)

- The **AUC (Area Undr the Curve)** makes it easy to compare one **ROC** curve to another. 

- Although **ROC** graphs are drawn using **True Positive Rates** and **False Positive Rates** to summarize confusion matrices, there are other metrics that attempt to do the same thing.
    - People often replace the *False Positive Rate** with ** Precision**.
    > Precision = True Positives / (True Positives + False Positives)

![jupyter](./figs/confusion_matrix5.png)
    > Precision is the proportion of positive results that were correctly classified.
    
