# Summer of Code - Artificial Intelligence
## Week 02: Descriptive Statistics and Probability
### Day 05: Measure of Dispersion

In this notebook, we will learn about **Measures of Dispersion** using Python.

## `scipy` Library
The `scipy` library is a powerful library for scientific and technical computing in Python.
### `scipy.stats` Module
The `scipy.stats` module provides a wide range of statistical functions and tools.

# Measures of Dispersion
Dispersion or variability describes the extent to which a data distribution is spread out or clustered together. It provides insights into the spread of data points around a central value, such as the mean or median. Common measures of dispersion include:
- **Range**
- **Variance**
- **Standard Deviation**

## Range
The range is the simplest measure of dispersion, calculated as the difference between the maximum and minimum values in a dataset. Mathematically, it is expressed as:
$$\text{Range} = \text{Max} - \text{Min}$$

## Variance
Variance quantifies the average squared deviation of each data point from the mean. It provides a measure of how much the data points vary around the mean. The formula for variance (σ²) is:
$$\sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2$$
where:
- $N$ is the number of data points
- $x_i$ is each individual data point
- $\mu$ is the mean of the data points

## Standard Deviation
Standard deviation is the square root of the variance and provides a measure of dispersion in the same units as the original data. It indicates how much the data points typically deviate from the mean. The formula for standard deviation (σ) is:
$$\sigma = \sqrt{\sigma^2} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2}$$
where:
- $N$ is the number of data points
- $x_i$ is each individual data point
- $\mu$ is the mean of the data points

### Cofficient of Variation
The coefficient of variation (CV) is a standardized measure of spread that expresses the standard deviation as a percentage of the mean. It is useful for comparing the relative variability of datasets with different units or scales. The formula for CV is:
$$\text{CV} = \left( \frac{\sigma}{\mu} \right) \times 100\%$$
where:
- $\sigma$ is the standard deviation
- $\mu$ is the mean of the data points


## Skewness
Skewness measures the asymmetry of a data distribution around its mean.


## Kurtosis
Kurtosis measures the "tailedness" of a data distribution, indicating the presence of outliers.

# Measures of Position
Measures of position describe the relative standing of a data point within a dataset. They help to identify where a particular value lies in relation to the rest of the data. Common measures of position include:
- ***z* scores**
- **Percentiles**
- **Quartiles**

## *z* scores
*z* scores standardize data points by expressing them in terms of standard deviations from the mean. A *z* score indicates how many standard deviations a data point is from the mean. The formula for calculating the *z* score is:
$$z = \frac{(x - \mu)}{\sigma}$$
where:
- $x$ is the individual data point
- $\mu$ is the mean of the data points
- $\sigma$ is the standard deviation


## Percentiles
Percentiles indicate the value below which a given percentage of observations in a dataset fall. For example, the 25th percentile (P25) is the value below which 25% of the data points lie.
To calculate the pth percentile, the formula is:
$$P_p = \left( \frac{p}{100} \right) \times (N + 1)$$
where:
- $P_p$ is the pth percentile
- $p$ is the desired percentile (e.g., 25 for the 25th percentile)
- $N$ is the number of data points


## Quartiles
Quartiles divide a dataset into four equal parts. The first quartile (Q1) is the 25th percentile, the second quartile (Q2) is the median (50th percentile), and the third quartile (Q3) is the 75th percentile.

### Interquartile Range (IQR)
The interquartile range (IQR) is the difference between the third quartile (Q3) and the first quartile (Q1). It measures the spread of the middle 50% of the data and is calculated as:
$$\text{IQR} = Q3 - Q1$$
It is useful for identifying outliers in a dataset.