# Did Mace's TA group hand in more essays than other groups?

A statistical analysis if Mace's TA group handed in more essays than the other groups.

There are total 165 students, and they are assumed to be uniformly, and randomly distributed across the four TA groups.

Submission is a binomial variable, is assumed to an independent.

In [1]:
from math import sqrt
import scipy.stats

## Data

In [2]:
dimga = {'noStudents': 165,
         'essayGroups': {
             'ta0': 2,
             'ta1': 4,
             'mace': 8,
             'ta3': 3}
        }

In [3]:
[print(ta, round(dimga['essayGroups'][ta] / (dimga["noStudents"] / len(dimga["essayGroups"])) * 100, 2), "%") for ta in dimga["essayGroups"]]

ta0 4.85 %
ta1 9.7 %
mace 19.39 %
ta3 7.27 %


[None, None, None, None]

## Formalizing the problem

Now, testing for the one-tailed statistical difference: was other TA's $n_o$ students overtaken by Mace's $n_m$ students?

Let the null hypothesis $H_0$ be that there is no difference, and the alternative hypothesis $H_a$ be that Mace's submitted more. Formally $H_0 : p_o = p_m$, and $H_a : p_o < p_m$.

$z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p} (1 - \hat{p})(\frac{1}{n_1} + \frac{1}{n_2})}}$

where

$\hat{p} = \frac{n_1\hat{p}_1 + n_2\hat{p}_2}{n_1 + n_2}$.

## Calculating the test statistic $z$

Calculate the sizes of groups of other TA's $n_o$ students and Mace's $n_m$ students, and their ratios of submission $p_o$ and $p_m$ respectively.

In [4]:
n_o = (dimga['noStudents'] / (len(dimga['essayGroups']))) * 3
n_m = dimga['noStudents'] / len(dimga['essayGroups'])
p_o = sum([dimga['essayGroups'][ta] for ta in ['ta0', 'ta1', 'ta3']]) / n_o
p_m = dimga['essayGroups']['mace'] / n_m

In [5]:
n_o, n_m, p_o, p_m

(123.75, 41.25, 0.07272727272727272, 0.19393939393939394)

Calculate $\hat{p}$.

In [6]:
phat = ((n_o*p_o) + (n_m*p_m)) / (n_o + n_m)
phat

0.10303030303030303

Calculate test statistic $z$.

In [7]:
z = (p_m - p_o) / sqrt(phat * (1 - phat) * ((1/n_o) + (1/n_m)))
z

2.2177739881780356

## Hypothesis test results

We choose to accept at the 95% confidence level. One-tailed, $\alpha = 0.05 / 2 = 0.025$. From the [z-table](http://www.statisticshowto.com/tables/z-table/) we can look up critical value for $0.5 - 0.025 = 0.4750$, and retrieve $z_\alpha$ = 1.9.

In [8]:
z_a = 1.9

$H_0$ there is no difference between Mace's groups and other TA groups; $H_a$ Mace's students handed in more.

In [9]:
if (z > z_a):
    print("reject the null hypothesis")
else:
    print("retain the null hypothesis")

reject the null hypothesis


##  Calculate using scipy

An alternative way to calculate the same statistic, using scipy.

In [11]:
(ratio, pvalue) = scipy.stats.fisher_exact([
    [n_o - 9, n_m - dimga['essayGroups']['mace']],
    [9, dimga['essayGroups']['mace']]], 'greater')

$H_0$ there is no difference between Mace's groups and other TA groups; $H_a$ Mace's students handed in more.

In [12]:
if pvalue < 0.05:
    print("reject the null hypothesis")
else:
    print("retain the null hypothesis")

reject the null hypothesis
