### Supervised and Unsupervised Learning

*Supervised Learning* is an approach in machine learning where we have a *labeled* dataset, meaning that each data point consists of certain *features* and a corresponding label. These datasets are designed to supervise learning algorthims is to deduce a function that maps these input feature vectors to output labels. In supervised learning, unlike unsupervised one, we have a *ground truth*, meaning that we know what outputs will be for certain input samples. We use **feature extraction** techniques to best describe the raw data using the features that would be useful for our predictive model, e.g. the RGB values of an image. Sometimes these extracted features could not best represent the data for our model, and instead we engineer other features from existing ones, this process is called **feature engineering**, for example we use acceleration derived from velocity as a function of time, or use the log of a feature instead of the actual feature itself.

Most widely used algorithms in supervised learning includes:
- Linear regression
- Logistic regression
- Support-vector machines (SVM)
- Naive Bayes
- Decision trees
- K-nearest neighbors (KNN)
- Neural Networks


*Unsupervised learning* is identified by the lack of ground truth, the algorithms do not take labeled input, but the goal is to infer the inherint structures present in the dataset, and to do an explotary analysis. The output could be in many forms, such as features in an image, or most commonly, clusters in the data. **Dimensionality reduction** is a key technique within unsupervised learning. Often times working in high-dimensional spaces is complex and unpleasant, its computatinoally expensive, or the data is sparse. Dimensionality reduction, is the transformation of data from a high-dimensional space (many distinctive features or independent variables) into a low-dimensional space, such that it retains intrinsic characteristics of the original data. Dimensionality reduction enables us to reduce noise and redundancy in the dataset and find an approximate version of the dataset using fewer features. Unsupervised learning (along with supervised learning) is also used for **representation/feature learning**, which is the set of all techniques used in a system to *automatically* extract features from the raw data or discover the representation needed for feature detection or classification. This allows the model to automatically learn the features (as opposed to manual feature extraction and engineering) and use them, in a perhapse, supervised learning model, to perform a certain task.

Unsupervised learning models are used in three main tasks:
- Clustering
- Association
- Dimensionality reduction

### Linear Regression

*Linear regression* is used to find a linear relation between one or more features:

$$
\begin{align}
y_i &= \beta_0 + \beta_1 x_{i1} + \cdots + \beta_m x_{im} + \varepsilon_i = \boldsymbol{x}_i \cdot \boldsymbol{\beta} + \varepsilon_i, \ \ i = 1, \cdots, n \\
\boldsymbol{y} &= \boldsymbol{X} \cdot \boldsymbol{\beta} + \boldsymbol{\varepsilon} \ \ \textsf{in matrix notation, where} \\
\boldsymbol{y} &= 
\begin{bmatrix}
y_1 \\
y_2 \\
\vdots\\
y_n
\end{bmatrix}
, \ \ \boldsymbol{X} = 
\begin{bmatrix}
1 & x_{11} & \cdots & x_{1m} \\
1 & x_{21} & \cdots & x_{2m} \\
\vdots & \vdots & \ddots & \vdots \\
1 & x_{n1} & \cdots & x_{nm}
\end{bmatrix}
, \ \ \boldsymbol{\beta} = 
\begin{bmatrix}
\beta_0 \\
\beta_1 \\
\vdots \\
\beta_m
\end{bmatrix}
, \ \ \boldsymbol{\varepsilon} = 
\begin{bmatrix}
\varepsilon_0 \\
\varepsilon_1 \\
\vdots \\
\varepsilon_n
\end{bmatrix}
\end{align}
$$

In a linear regression model each target, ($y_i$) is a linear combination of $m$ features, plus $\beta_0$, the intercept term, the value of the prediction when all the features are $0$. $\boldsymbol{\beta}$ elements are known as *regression coefficients*. $\boldsymbol{\varepsilon}$ is the *error term*, or *noise* as apposed to the signal provided by the rest of the model. This variable captures all other factors which influence the dependent variable $y$ other than *regressors* $\boldsymbol{x}$.

#### References
- [Wikipedia - Supervised Learning](https://en.wikipedia.org/wiki/Supervised_learning)
- [IBM Learn - Unsupervised Learning](https://www.ibm.com/cloud/learn/unsupervised-learning)
- [A Review on Linear Regression Comprehensive in Machine Learning](https://jastt.org/index.php/jasttpath/article/download/57/20)
- [Wikipedia - Linear Regression](https://en.wikipedia.org/wiki/Linear_regression)