# Statistical Inference

## Introduction

**Statistical Inference** uses classical statistics to observe, study, and predict patterns resulting from data analysis and probabilistic modeling. There are two main approaches to the field, either **Frequentist** or **Bayesian**. The first approach implies a fixed distribution responsible for the generation of repeated patterns in the data. We can find the best estimate of the distribution's parameters given data samples. The last one implies that the state of the world can be updated using observed samples. The parameters of the distributions can themselves be represented using probability.

As an example, here is a situation explaining using the two frameworks. Imagine your phone is ringing in your house, and you want to reach it:
- **Frequentist**: I have a mental model of my house. Given the beeping sound, I can infer the house's area to search for my phone.
- **Bayesian**: On top of having a mental model of my house, I also remember from the past where I misplaced the phone. By combining my inferences using the beeps and my prior information on its location, I can identify the house area and locate my phone.

## Statistical Analysis Fundamentals

**Statistical Analysis** allows us to describe patterns observed in any given dataset via **Visualization** or **Statistical Descriptors**. In statistics, we are interested in studying a given **Sample** from a **Population** of **Individuals**.

```{note}
Statistical Descriptors are closely related to [Probability Descriptors](./2_probabilities_and_information_theory.ipynb). As we will observe later in the chapter, the Observed Mean can be associated with the Expectation, the Observed Variance to the Variance, and many more.
```

Suppose we design a study on the effectiveness of some medicine for some given disease. In that case, the sample corresponds to the set of patients who participated in the study, and the population represents all the people suffering from the studied disease.

To give credit to a **Study**, in other word to **Generalize** its results, the sample needs to **Reflect** the **Variation** present in the entire population of interest. One way to obtain such a sample is to include **Randomness** in the population's selection process.

Now that we have defined the context of application let us define all the fundamentals of Statistical Analysis.

```{note}
The different types of visualizations that can be used to express some data series' underlying variations have been discussed in the [Data Visualization Chapter](../1_data_representations/1_data_visualization.ipynb).
```

### Central Tendency

The first type of descriptor is the **Central Tendency**. It expresses the central value, the typical value for a data distribution. The most common central tendency measures are the **Mode**, the **Mean**, and the **Median**.

The **Mode** $M_o$ is the value that occurs most often in a given set of data values.

The **Mean** $\overline{x}$ refers to the **Arithmetic Mean**:

$$
\overline{x} = \frac{1}{n} \sum_{i=1}^{n} x_i
$$

In some cases it also refers to the **Weighted Arithmetic Mean**:

$$
\overline{x} = \frac{1}{n} \sum_{i=1}^{n} w_i x_i
$$

The **Median** $M_e$ is the variable responsible for separating the statistical distributions as two equally populated groups when organized in ascending order.

$$
M_e = \begin{cases}
x_{(n+1)/2} & \text{if}\;n\;\text{is odd} \\
\frac{x_{n/2} + x_{n/2 + 1}}{2} & \text{if}\;n\;\text{is even}
\end{cases}
$$

### Spread
### Correlation

## Sampling and Central Limit Theorem
## Point Estimate
## Confidence Intervals
## Hypothesis Testing
## Two Sample Hypothesis testing
## Chi-Square Test
## Anaysis of Variance
## Bayesian Inference