# Notes for Think Stats by Allen B. Downey

In [1]:
from typing import List

import numpy as np
import pandas as pd
import scipy

## Chapter 01

### Glossary

- anecdotal evidence - is an evidence based on personal experience rather than based on well-designed and scrupulous study. 
- cross-sectional study - is a study that colllects data about a population at a particular point in time.
- longitudinal study - is a study that follow the same group repeatedly and collects the data over time.

## Chapter 02

#### Mean - central tendency

$$ \overline{x} = \frac{1}{n} \sum_i x_i \ $$

In [2]:
sample = [1, 3, 5, 6]

In [3]:
np.mean(sample)

3.75

In [4]:
pd.DataFrame(sample).mean()

0    3.75
dtype: float64

#### Variance

$$ S^2 = \frac{1}{n} \sum_i (x_i - \overline{x})^2 $$

In [5]:
np.var(sample)

3.6875

In [6]:
# Warining! Pandas variance by default is normalized by N-1!
# That can be changed by using ddof(delta degrees of freedom) = 0
pd.DataFrame(sample).var(ddof = 0)

0    3.6875
dtype: float64

#### Standard Deviation

$$ \sigma = \sqrt{S^{2}} $$ 

In [7]:
np.std(sample)

1.920286436967152

In [8]:
# Warining! Pandas std is calculated with variance by N-1!
# That can be changed by using ddof(delta degrees of freedom) = 0
pd.DataFrame(sample).std(ddof = 0)

0    1.920286
dtype: float64

#### Effect size - Cohen'd

Having groups **G1** and **G2**, with number of elements given as **N1** and **N2**, the effect size is given as:

$$ Cohen'd = \frac{\overline{G1} - \overline{G2}}{\sqrt{(\sigma (G1) \cdot (N1-1) + \sigma (G2) \cdot (N2-1)) / ((N1-1) + (N2-1))}} $$

In [9]:
def effect_size(g1: pd.DataFrame, g2: pd.DataFrame) -> float:
    diff = g1.mean() - g2.mean()
    var_g1, var_g2 = g1.var(ddof=1), g2.var(ddof=1)
    n1, n2 = len(g1), len(g2)
    
    pooled_var = (var_g1 * (n1 - 1) + var_g2 * (n2 - 1)) / ((n1 - 1) + (n2 - 1))
    cohen_d = diff / np.sqrt(pooled_var)
    return cohen_d

It is calculated with delta degree of freedom = 1!

In [10]:
effect_size(pd.DataFrame([1, 2, 3, 4]), pd.DataFrame([3, 3, 1, 2]))

0    0.219971
dtype: float64