# Statistical Testing

Let's see if the relative frequencies from the table in wikipedia (https://en.wikipedia.org/wiki/Pass_the_Pigs#Relative_frequencies) reflects the data we get from the trials in our class. First, we read in the data again.

In [19]:
import pandas as pd

data = pd.read_csv("pass_the_pigs.csv")
data

Unnamed: 0,dot,dot_f,razorback,trotter,snouter,leaning_jowler,other
0,38,52,45,17,2,0,1
1,19,32,32,4,3,0,0
2,14,26,20,7,3,0,2
3,14,21,25,5,1,0,3
4,15,27,40,13,5,0,0
5,32,45,43,10,9,0,0
6,16,24,20,6,1,1,2
7,16,50,53,13,4,1,0
8,24,23,17,7,0,3,0
9,13,25,38,12,3,2,4


In [23]:
#keys for shorthand, I may want to actually type these out in future or just use a basic conversion
keys = {
    'dot' : 'dot',
    'dot_f' : 'nod',
    'razorback' : 'rzb',
    'trotter' : 'ttr',
    'snouter' : 'str',
    'leaning_jowler' : 'ljw',
    'other' : 'otr'
}

#find percentages for each row to calculate standard deviation
cols = list(keys.keys())
shortcut = list(keys.values())


total_counts = data.sum()

#pct_data.rename(columns=keys, inplace='True')

# input of relative frequencies from wikipedia
wiki_pct = pd.DataFrame.from_dict({
    'dot' : [.349],
    'nod' : [.302],
    'rzb' : [.224],
    'ttr' : [.088],
    'str' : [.030],
    'ljw' : [.0061],
    'otr' : [.0009]
})

In [24]:
#total # of occurences for each outcome
total_counts

dot               215
dot_f             350
razorback         355
trotter            98
snouter            32
leaning_jowler      7
other              14
dtype: int64

In [25]:
#relative frequencies from Wikipedia 
wiki_pct

Unnamed: 0,dot,nod,rzb,ttr,str,ljw,otr
0,0.349,0.302,0.224,0.088,0.03,0.0061,0.0009


What we want to look at here is the result of a $\chi^2$ test. 

## $\chi^2$ Test of Proportions

Let the null hypothesis $H_{0}$ be:

$H_{0}:$ The expected proportions from Wikipedia are consistent with the findings of our samples.

Let the alternative hypothesis $H_{a}$ be:

$H_{a}:$ The expected proportions from Wikipedia are not consistent with the findings of our samples.

Let $\alpha$ significance level be 0.05

In [34]:
import scipy.stats as stat
import numpy as np

expected = np.array(wiki_pct.values) * np.sum(total_counts.values)
expected

array([[373.779 , 323.442 , 239.904 ,  94.248 ,  32.13  ,   6.5331,
          0.9639]])

In [35]:
stat.chisquare(total_counts.values, f_exp=expected[0])

Power_divergenceResult(statistic=301.335074306209, pvalue=4.233447373297275e-62)

From the result above, we see that the $p-value=4.423\times 10^{-62} < (\alpha = 0.05)$. Thus, we reject the null hypothesis that the expected proportions from Wikipedia are consistent with our findings in favor of the alternative hypothesis.

What this means is that either the values from Wikipedia are unreliable, or the findings from our classroom tests were widely skewed. For the purposes of my simulations, then, I will use the proportions found empirically from our class.