# Introduction
Elliptical envelope is a technique for detecting outliers in multi-dimensional data, assuming it follows a Gaussian distribution. It essentially draws an ellipse around the "normal" data points and identifies anything falling outside this boundary as an anamoly.

### Intuition behind elliptical envelope
- Imagine a dataset with multiple features (dimensions). Elliptical envelope assumes that the combined distribution of these features forms a multi-dimensional Gaussian, which visually resembles an ellipse.
- This ellipse represents the "expected" or "normal" behavior of the data. Points clustered within or close to the ellipse are considered typical.

# Elliptical Envelope Algorithm

### Core idea
1. Distribution fitting: This technique starts by fitting a Gaussian distribution (multivariate Gaussian for multiple features) to the data. This involves estimating the key parameters of the distribution,
    - $\mu$ (mean): The cenre of the ellipse representing the average values for each feature.
    - $\Sigma$ (covariance matrix): Captures the relationship between the features, influencing the shape and orientation of the ellipse.
2. Estimating $\mu$ and $\Sigma$: These parameters are not directly known, they are estimated from the actual data points. This estimation process involves complex calculations but aims to find the best-fitting ellipse that captures the central tendency and variation within the data.

### Algorithm
1. Calculate the mean and covariance matrix:
    - The mean vector represents the centre of the ellipse.
    - The covariance matrix represents the shape and orientation of the ellipse.
2. Determine the Mahalonobis distance: The Mahalonobis distance for each data point from the mean vector is calculated. This distance takes into account the covariance structure of the data.
3. Set a threshold: A threshold value is chosen for the Mahalonobis distance. Points with a distance greater than this threshold are considered as outliers.
4. Identify outliers: Compare the Mahalonobis distance of each point to the threshold. If the distance is greater than the threshold, the data point is classified as an outlier.

The Mahalonobis distance is mathematically represented as,

$d(x_i, \mu) = \sqrt{(x_i - \mu)^T \Sigma^{-1} (x_i - \mu)}$

Where,
- $x_i$ = A vector representing a data point.
- $\mu$ = The mean vector of the distribution.
- $\Sigma$ = The covariance matrix of the distribution.