# Basic Statistic Concepts

Statistics studies variability in real-world issues where chance plays a role. Through statistics, a series of objective data can be obtained, which allows for drawing conclusions.

Statistics comprises the methods and procedures for collecting, classifying, analyzing, and representing data, as well as drawing conclusions from them, with the aim of making predictions and aiding in decision-making.

- **Descriptive Statistics** or **Deductive Statistics** is the branch of statistics that deals with organizing, summarizing, and graphically representing the results collected during research. Descriptive statistics not only describe but also analyze and represent data using numerical and graphical elements.

- **Inferential Statistics** or **Inductive Statistics** is the branch of statistics that aims to draw conclusions about the entire population based on data obtained from a subset of it or a representative group of elements (sample).

- **Population:** The group under study. It's important to distinguish between *Population* (the group in which we are considering the variable under study) and *Universe* (the group of all elements under study, in which we do not consider the variable). The universe, therefore, is the set of individuals who possess the characteristic or characteristics under study, and these, collectively, form the population.

- **Sample:** Any subset of the population. For the sample to be useful for drawing conclusions about the population, it must be representative. This is achieved by selecting its elements randomly, resulting in a random sample.

- **Sampling:** The procedure for obtaining a sample.

- **Purposive Sampling:** This is a procedure for selecting sample elements based on the researcher's judgment. It is, therefore, subjective, and the resulting sample may not be representative of the population.

- **Random Sampling:** This is a sampling procedure in which each and every element of the population has a certain probability of being chosen. Thus, if we have a population of N elements and we are interested in obtaining a sample of n elements (a sample of size n), each subset of n elements from the population will also have a certain probability of being the chosen sample.

- **Simple Random Sampling (SRS):** This is a type of random sampling where the probability of selecting an element remains constant throughout the entire sampling process. The sampling technique can be likened to drawing balls from an urn with replacement. Consequently, the same data point may be sampled more than once. Each selection is independent of the previous ones, making the sample data stochastically independent.

- **Unrestricted Sampling (without replacement):** In this type of sampling, the probability of selecting a data point in each selection is influenced by previous results, as this sampling method does not allow the same data point to be selected more than once (which changes the probabilities with each sample draw). This corresponds to a model of drawing without replacement. Considering the convergence of the hypergeometric distribution to the binomial distribution, it is easy to intuit that when the population is very large (N → ∞), unrestricted sampling can be considered as simple random sampling.

Therefore, in the study of samples for large populations, we will consider only simple random sampling. However, in the study of samples for finite populations, it is essential to analyze the sample distributions generated by appropriate unrestricted sampling.

---

- **Mean:** The most common expression for the mean of a statistical distribution with a discrete random variable is the mathematical average of all the terms. To calculate it, add up the values of all the terms and then divide by the number of terms.
$$
\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i
$$

- **Median:** The middle number in a sequence of numbers. To find the median, organize each number in order by size; the number in the middle is the median.
$$
\text{Median} =
\begin{cases} 
x_{\frac{n+1}{2}} & \text{if } n \text{ is odd} \\
\frac{x_{\frac{n}{2}} + x_{\frac{n}{2} + 1}}{2} & \text{if } n \text{ is even}
\end{cases}
$$

- **Mode:** The value of the term that occurs the most often. It is not uncommon for a distribution with a discrete random variable to have more than one mode, especially if there are not many terms. This happens when two or more terms occur with equal frequency, and more often than any of the others.
$$
\text{Mode} = \text{the value of the term that occurs the most often}
$$

- **Range:** It is the difference between the maximum value and the minimum value. For a distribution with a continuous random variable, the range is the difference between the two extreme points on the distribution curve, where the value of the function falls to zero. For any value outside the range of a distribution, the value of the function is equal to 0.
$$
\text{Range} = x_{\text{maximum}} - x_{\text{minimum}}
$$

---

## The Central Limit Theoram
The central limit theorem states that if you take sufficiently large samples from a population, the samples’ means will be [normally distributed](https://www.scribbr.com/statistics/normal-distribution/), even if the population isn’t normally distributed.

<style>
    .center {
        display: block;
        margin-left: auto;
        margin-right: auto;
    }
</style>

<img src="https://www.scribbr.com/wp-content/uploads/2022/07/Central-limit-theorem.webp" alt="The central limit theoram" class="center" width="800">

A population follows a [Poisson distribution](https://www.scribbr.com/statistics/poisson-distribution/) (left image). If we take 10,000 samples from the population, each with a sample size of 50, the sample means follow a normal distribution, as predicted by the central limit theorem (right image).

### What is the central limit theorem?

The central limit theorem relies on the concept of a sampling distribution, which is the [probability distribution](https://www.scribbr.com/statistics/probability-distributions/) of a statistic for a large number of [samples](https://www.scribbr.com/methodology/population-vs-sample/) taken from a population.

Imagining an experiment may help you to understand sampling distributions:

Suppose that you draw a random sample from a population and calculate a statistic for the sample, such as the mean.
Now you draw another random sample of the same size, and again calculate the mean.
You repeat this process many times, and end up with a large number of means, one for each sample.
The distribution of the sample means is an example of a sampling distribution.

The central limit theorem says that the sampling distribution of the mean will always be normally distributed, as long as the sample size is large enough. Regardless of whether the population has a normal, Poisson, binomial, or any other distribution, the sampling distribution of the mean will be normal.

A normal distribution is a symmetrical, bell-shaped distribution, with increasingly fewer observations the further from the center of the distribution.

## Null & Alternative Hypotheses

The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test:
- Null hypothesis *($H_0$)*: There’s no effect in the population.
- Alternative hypothesis *($H_a$ or $H_1$)*: There’s an effect in the population.

The effect is usually the effect of the independent variable on the dependent variable.

<style>
    .center {
        display: block;
        margin-left: auto;
        margin-right: auto;
        width: 50%;
    }
</style>

<img src="https://miro.medium.com/v2/resize:fit:1066/1*6hxenSajbjoRftDEVm2ROA.png" alt="Hypothesis Null vs Alternative Hypothesis" class="center" width="400">


## References
* [Basic statistic concepts](https://is.gd/Kd0rb4)
* [Basic concepts about Inference](https://www.uv.es/ceaces/tex1t/3%20infemues/conceptos.htm#)
* [Statistical mean, median, mode and range](https://is.gd/SiEcYy)
* [Central limit theorem](https://is.gd/GsOLZB)
* [Central Limit Theorem | Formula, Definition & Examples](https://www.scribbr.com/statistics/central-limit-theorem/)
* [Null & Alternative Hypotheses | Definitions, Templates & Examples](https://www.scribbr.com/statistics/null-and-alternative-hypotheses/)