# Statistics

### Descriptive Statistics

Descriptive statistics involve methods and techniques used to summarize and describe important features of data. These statistics are used to present and organize data in a meaningful way, providing insights into its characteristics. Common descriptive statistics include measures of central tendency (like mean, median, mode), measures of variability or dispersion (such as variance, standard deviation, range), and graphical representations (like histograms, box plots, and scatter plots). Descriptive statistics are primarily concerned with summarizing data sets, identifying patterns, and highlighting key features without making inferences beyond the data at hand.

#### Tools for Descriptive Statistics

- **Measures**: Mean, median, mode, range, variance, standard deviation.
- **Graphical Tools**: Histograms, bar charts, pie charts, box plots, scatter plots.
- **Software**: Excel, SPSS, R, Python (libraries like Pandas, Matplotlib, Seaborn).

### Inferential Statistics

Inferential statistics involves making inferences or predictions about a population based on a sample of data taken from that population. It uses probability theory to generalize from a sample to a larger population, drawing conclusions and making predictions or hypotheses. Inferential statistics includes techniques like hypothesis testing, confidence intervals, regression analysis, and analysis of variance (ANOVA). These methods help researchers assess relationships between variables, test hypotheses, and make decisions based on the data.

#### Tools for Inferential Statistics

- **Hypothesis Testing**: Z-test, t-test, chi-square test, ANOVA.
- **Estimation**: Confidence intervals, regression analysis.
- **Software**: SPSS, R, Python (libraries like Statsmodels, Scipy).


### Descriptive Statistics

| Concept           | Definition                                            | Population Formula                                    | Sample Formula                                       |
|-------------------|--------------------------------------------------------|-------------------------------------------------------|-------------------------------------------------------|
| **Mean**          | The arithmetic average of the data points in a dataset. | $$ \mu = \frac{1}{N} \sum_{i=1}^{N} X_i $$             | $$ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i $$        |
| **Variance**      | Variance is the average of the squared distances of each data point from the mean. | $$ \sigma^2 = \frac{1}{N}\sum_{i=1}^{N} (X_i - \mu)^2 $$ | $$ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2 $$ |
| **Standard Deviation** | The square root of the variance, indicating the typical amount of deviation from the mean. | $$ \sigma = \sqrt{\sigma^2} $$                          | $$ s = \sqrt{s^2} $$                                 |


### Random Variable: 
*A random variable is defined as a function that assigns a real number to each outcome of a random experiment.* 
Random variables are categorized as either `discrete` (with countable outcomes) or `continuous` (with infinite possible outcomes within a range).
They are denoted by uppercase letters (X, Y, Z) and are essential for describing and analyzing probabilities, distributions, and statistical measures in various data analysis methods.

### Probability Distribution: 
A probability distribution is a mathematical function that describes the likelihood of obtaining the possible outcomes of a random variable. It can be characterized by its probability mass function (for discrete random variables) or probability density function (for continuous random variables).


| Characteristic                   | Probability Mass Function (PMF)                                      | Probability Density Function (PDF)                                      |
|----------------------------------|-----------------------------------------------------------------------|--------------------------------------------------------------------------|
| **Definition**                   | Describes the probability of discrete random variable \( X \) taking on specific values \( x \). | Describes the probability density of continuous random variable \( X \).  |
| **Applicability**                | Used for discrete random variables.                                    | Used for continuous random variables.                                    |
| **Mathematical Representation**  | $$ P(X = x) $$ where \( x \) is a specific value of \( X \).           | $$ f(x) $$ where \( x \) is a continuous variable representing the density. |
| **Range of Values**              | Only defined for specific discrete values \( x \).                     | Defined for all values within a continuous range.                         |
| **Probability Values**           | Can be directly interpreted as probabilities.                          | Represents relative likelihoods; actual probabilities require integration.|
| **Summation/Integration**        | Summation over all possible values \( x \) equals 1:                   | Integration over the entire range equals 1:                               |
|                                  | $$ \sum_{x} P(X = x) = 1 $$                                            | $$ \int_{-\infty}^{\infty} f(x) \, dx = 1 $$                              |
| **Example Distribution**         | Bernoulli, Binomial, Poisson, etc.                                     | Normal, Exponential, Uniform, etc.                                        |
