# Machine Learning Fundamentals


## Problem Premise

Supervised machine learning problems typically starts with (**observations**, **targets**)

$$(x_i, y_i) \text{ where } x_i \in \mathcal{X} \text{ and } y_i \in \mathcal{Y}$$

#### Examples of Observations and Targets in ML Problems

|Observation Space $\mathcal{X}$ | Target Space $\mathcal{Y}$: |
|-------------------------------- : | :---------------- : |
| Images | Image classes: "cat", "dog"... |
| Images | Caption: "kids playing soccer" |
|Face Images | User's identity |
|Natural Images | Stylized Images (cartoons) |
|Signals of Human Speech | Text transcript of the speech |
|Sentence in English | Translation into Spanish |
|Demographic info: age, income | Other info: education, employment
| Diet, lifestyle | Risk of Heart Disease

## Generative vs Discriminative Models

**Generative Models** aims to describe the join distribution of observation and targets.

$$\text{ in other words, } p(x_i, y_i) \text{ is modeled}$$

- Though costly, generative models tend to converge faster
- Better performance with sparse data
- Insights into the physical process generating the data
- Assumes each point in the joint distribution *p(x, y)* is independent, accuracy will drop if the assumption is violated

---

**Discriminative Models** aims to model the probability a target event happens given an observation is known.

 

$$\text{ in other words } p(y_i | x_i) \text{ is modeled}$$

- Weak assumptions about data and dependencies
- Little or no insight into data generation
- May require more training data for modest accuracy
- Slower convergence than generative models
- Fewer forms of bias

|Generative | Discriminative |
|----------: | :---------------- : |
| Noisy linear functions | Linear Least Squares |
| Naïve Bayes | Logistic Regression |
|Hidden Markov Models | Conditional Random Fields |
|Gaussian mixture models |  |
|Latent Dirichlet Allocation |  |
| | Support Vector Machines (SVM) |
| | Decision trees + Random Forests
| | Neural Networks

*Question:  * If Neural Networks are discriminative model, then why can we use NN to model Generative network in GAN?


# ML Components

There are four components in a typical supervised machine learning problem:  

**Dataset, Model, Loss function, Optimization.**

## Dataset

Dataset defines the problem to be solved.  There are two types of ML problems:

1. Regression
2. Classification

**Regression** problems can be described with dataset

$$(x_i, y_i) \text{ where } y_i \in \mathbb{R}$$

**Classification** problems can be described with dataset

$$(x_i, y_i) \text{ where } y_i \in \{1, 2, 3 \cdots k\} $$

where *k*  is the number of classes.

## Model

We can represent model with a **prediction function**:

$$y = f(x)$$

where *y* takes a single value given x.

## Loss Function

A **loss function** measure the difference between a target prediction and a target data value.

*Why do generalize accuracy metrics with loss function?*

- There may be no "true" target value *y* for *x*
- There may be unmodeled effects such as noise in the dataset

**Risk vs Loss Function**

Since the target data is given during training, we define loss function based on the known data.  *How do we measure accuracy based on data we haven't seen yet?* We use **expected loss:**

$$\mathbb{E}[(\hat{y} - y)^2]$$

expected loss is called **risk.**  Averaged loss is called **empirical risk.**

Generally minimizing empirical risk instead of true risk is fine, unless:
1. The **data sample is biased**
2. There is **not enough data**

## Optimization

**Optimization** is a technique which we use to tune parameters in a model so the predicted value will be similar to 

## Support Vector Machines

## Linear Regression

[Optimization](./Optimization-5305d7a6-51dd-418e-bb8a-640c44f88543.md)

[Support Vector Machines](./SVM-approach.md)