## Types of effect size measures

* difference measures 
  * mean difference
  * standarized difference (Cohen's d)
* correlation measures
  * **r<sup>2</sup>** - proportion (%) of variation in one variable that is related to ("explained by") another variable 
  
## Statistical significance

* reject the null
* results not likely due to chance (sampling error) 

### Meaningfulness of result

1. What was measured? 
  * variables
2. Effect size
3. Can we rule out random chance for the result (can we rule out sampling error as an explanation)?
4. Can we rule out alternative explanations? (lurking variables)

## Cohen's d

**Standarized mean difference**

In [1]:
%%latex

$$d = \frac{\bar{x} - \mu}{s}$$

<IPython.core.display.Latex object>

s - standard deviation of the sample

**Interpretation of Cohen's d**: 
```
How far apart are means in standard deviation units
```

## r<sup>2</sup>

r<sup>2</sup> - Coefficient of determination. It takes values from **0.00** to **1.00**.

* r<sup>2</sup> = 0 : variables **not** related
* r<sup>2</sup> = 1 : variables **perfectly** related

In [2]:
%%latex
$$
r^2 = \frac{t^2}{t^2 + df}
$$

<IPython.core.display.Latex object>

* ``t`` - value from t-test (**not** t-critical value)
* ``df`` - degrees of freedom

In [3]:
# examples of r^2 calculations
def r_sq(t, df):
    return (t ** 2) / (t ** 2 + df)

t = 2
df = 20

r_sq(t, df)

0.16666666666666666

## Results sections

1. Descriptive statistics (M, SD)
  * text 
  * graphs
  * tables
2. Inferential statistics
  * hypothesis test 
    * kind of test - one sample t-test
    * test statistic (value of t)
    * degrees of freedom ``df``
    * p-value
    * direction of the test (1-tailed/2-tailed)
    * **always** provide *alpha* level
  * APA style - (eng. *American Psychological Association*) - it has whole guide on writing statistical reports.
  
  ``t(df) = x.xx, p=.xx,direction``
  
  Example: ``t(24)=-2.50,p<.05,one-tailed`` - t value, 24 degrees of freedom equals -2.50, p value < .05 with one tailed test
  
  * Confidence intervals
      * confidence level 95% - we have to know upper and lower limit.
      * CI on what? single mean? CI on difference interval between 2 means.
        
   * Confidence interval - APA style
     * confidence interval on the mean difference; ``95%CI=(4, 6)``
     
3. Effect size measures
  * d, r<sup>2</sup>

In [11]:
%%latex
Degrees of freedom
$$df = n - 1$$

Standard error of the mean
$$SEM = \frac{s}{\sqrt{n}}$$

One sample t-test
$$t = \frac{\bar{x} - \mu}{SEM}$$

Confidence Interval
$$CI = \bar{x} \pm t_{critical} * SEM$$
$$t_{critical} * SEM = margin of error$$

Cohen's d (s - std for the sample)
$$d = \frac{\bar{x} - \mu}{s}$$

r^2
$$r^2 = \frac{t^2}{t^2 + df}$$

<IPython.core.display.Latex object>

# Full one-sample t-test

US Families spent an average of **$151** per week on food in 2012 (Gallup). We assume that this sample represents whole population.

In [5]:
import pandas as pd
import numpy as np

mu = 151

There is a food cooperative. We call it **Food Now!**. 

They want to reduce cost of food for their members. They implement some cost saving program (eg. buying from local grower).

* ** $ spent on food per week is dependant variable**
* **Cost saving program is the treatment**

### Null hypothesis

*The prograrm does not change the cost of food*

### Alternative hypothesis

*The program reduced the cost of the food*

In [6]:
%%latex

$$H_0: \mu_{program} \geq 151$$
$$H_A: \mu_{program} < 151$$

<IPython.core.display.Latex object>

This is gonna be **one-tailed test** in a (**-**) negative direction

In [7]:
n = 25
df = n - 1  # degrees of freedom

t_critical = -1.711  # from t-table
s = 50  # sample standard deviation

In [8]:
# standard error of the mean
sem = s / np.sqrt(n)
sem

10.0

In [10]:
x_prog = 126

mean_diff = x_prog - mu
mean_diff

-25

In [12]:
t = mean_diff / sem
t

-2.5

In [13]:
# Cohen's d
cd = mean_diff / s
cd

-0.5

In [15]:
r2 = r_sq(t, df)
r2

0.20661157024793389

In [17]:
# margin of error for 95% CI
# 2-tailed test
# df = 24
# alfa = .05
t_cr2 = 2.064  # from t-table

margin_of_error = t_cr2 * sem
margin_of_error

20.640000000000001

In [18]:
CI_95_percent = (x_prog - margin_of_error, x_prog + margin_of_error)

CI_95_percent

(105.36, 146.63999999999999)

This means that coop after applying program would cause for food cost to be between ``$105.36`` and ``$146.64``.