## Question 1
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

In [8]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as st
from scipy.stats import poisson
from scipy.stats import binom

In [7]:
# your answer here
mu = 2.435

# H0: distribution follows poisson(mu)
# H1: distribution does not follow poisson(mu)
poisson_dist = poisson(mu)

poisson_pmfs = np.array([poisson_dist.pmf(i) for i in range(0,7)])
poisson_pmfs
with_tail = np.append(poisson_pmfs, poisson_dist.sf(6))
with_tail

# 2. Significance level
alpha = 0.05

# 3. Sample
O = np.array([35, 99, 104, 110, 62, 25, 10, 3])
E = with_tail * 448

# 4. Compute the statistics and p-value
stat, p_value = st.chisquare(O, f_exp=E)
print('stat', stat)
print('p_value', p_value)

# Decision
if p_value < alpha:
    print('Reject the null hypothesis')
else:
    print('Do not reject the null hypothesis')

stat 6.491310681109792
p_value 0.48368890685373034
Do not reject the null hypothesis


## BONUS/OPTIONAL - Question 2
Let's analyze a discrete distribution. To analyze the number of defective items in a factory in the city of Medellín, we took a random sample of n = 60 articles and observed the number of defectives in the following table:

![](table2.png)

A poissón distribution was proposed since it is defined for x = 0,1,2,3, .... using the following model:

![](image1.png)

For some extra insights check the following link: https://online.stat.psu.edu/stat504/node/63/ 

Does the distribution of defective items follow this distribution?

In [42]:
# your code here
n = 60

# H0: distribution of observed values follows poisson(mu)
# H1: distribution of observed values does not follow poisson(mu)

mu = np.array(sum([(32*0), (15*1), (0*2), (9*3), (4*4)]))/60
poisson_dist = poisson(mu)

poisson_pmfs = np.array([poisson_dist.pmf(i) for i in range(0,4)])
total_pmfs = np.append(poisson_pmfs, poisson_dist.sf(3))

# 2. Significance level
alpha = 0.05

# 3. Sample
O = np.array([32, 15, 0, 9, 4])
E = total_pmfs *60


# 4. Compute the statistics and p-value
stat, p_value = st.chisquare(O, f_exp=E)
print('stat', stat)
print('p_value', p_value)

# Decision
if p_value < alpha:
    print('Reject the null hypothesis')
else:
    print('Do not reject the null hypothesis')

stat 34.32169618960071
p_value 6.401393042020205e-07
Reject the null hypothesis


## Question 3
A quality control engineer takes a sample of 10 tires that come out of an assembly line, and would like to verify on the basis of the data that follows, if the number of tires with defects observed over 200 days, if it is true that 5% of all tires have defects (that is, if the sample comes from a binomial population with n = 10 and p = 0.05). 

![](table3.png)


In [13]:
# your code here
n = 10
p = 0.05

# H0: distribution of observed values follows binom(n,p)
# H1: distribution of observed values does not follow binom(n,p)
binom_dist = binom(n,p)

binom_pmfs = np.array([binom_dist.pmf(i) for i in range(0,2)])
binom_pmfs
with_tail = np.append(binom_pmfs, binom_dist.sf(1))
with_tail

# 2. Significance level
alpha = 0.05

# 3. Sample
O = np.array([138, 53, 9])
E = with_tail * 200

# 4. Compute the statistics and p-value
stat, p_value = st.chisquare(O, f_exp=E)
print('stat', stat)
print('p_value', p_value)

# Decision
if p_value < alpha:
    print('Reject the null hypothesis')
else:
    print('Do not reject the null hypothesis')

stat 8.30617951954277
p_value 0.015715783395951168
Reject the null hypothesis


## Question 4
A researcher gathers information about the patterns of Physical Activity of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (Low, Medium, High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (Yes = consumed, No = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table4.png)

In [12]:
#your answer here

# your code here
n = 10
p = 0.05

# H0: activity and sugar consumption are dependent
# H1: activity and sugar consumption are not dependent

# 2. Significance level
alpha = 0.05

# 3. Sample
table = np.array([[32, 12], [14, 22], [6, 9]])

# Stats and p_value
stat, p_value, ddof, exp_values = st.chi2_contingency(table)

print('stat =', stat)
print('p_value =', p_value)


# Decision
if p_value < alpha:
    print('Reject the null hypothesis')
else:
    print('Do not reject the null hypothesis')

stat = 10.712198008709638
p_value = 0.004719280137040844
Reject the null hypothesis
