## Question 1
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

In [2]:
import scipy.stats as st
import numpy as np
from scipy.stats import poisson

In [3]:
# H0: The number of scores follows a Poisson distribution
# H1: The number of scores doesn't follow a Poisson distribution

In [19]:
observed = [35,99,104,110,62,25,10,3]

In [20]:
sum(observed)

448

In [13]:
poisson = st.poisson(2.435)

In [14]:
round(poisson.pmf(1)*448)

<scipy.stats._distn_infrastructure.rv_frozen at 0x7fe2d4a52340>

In [17]:
expected = [round(poisson.pmf(i)*449) for i in range(8)]
sum(expected)

448

In [21]:
st.chisquare(observed,expected)

Power_divergenceResult(statistic=5.255071638635614, pvalue=0.6288667875814871)

In [39]:
# Conclusion

# the pvalue is higher than .05 so we can't reject the null hypothesis. The number of scores seams to be following...
#... a Poisson distribution.

## BONUS/OPTIONAL - Question 2
Let's analyze a discrete distribution. To analyze the number of defective items in a factory in the city of Medellín, we took a random sample of n = 60 articles and observed the number of defectives in the following table:

![](table2.png)

A poissón distribution was proposed since it is defined for x = 0,1,2,3, .... using the following model:

![](image1.png)

For some extra insights check the following link: https://online.stat.psu.edu/stat504/node/63/ 

Does the distribution of defective items follow this distribution?

In [10]:
# H0: The number of defective items follows a Poisson distribution
# H1: The number of defective items doesn't follow a Poisson distribution

In [11]:
# mean = (np.array([32*0,15*1,9*3,4*4]).sum()/60)
# mean

0.9666666666666667

In [68]:
observed = [32,15,9,4]
n_items = [0,1,3,4]

In [69]:
# another way of calculating the mean done by Miguel (a better one)

In [71]:
list(zip(n_items,observed))

[(0, 32), (1, 15), (3, 9), (4, 4)]

In [74]:
probs = []

for x,y in list(zip(n_items,observed)):
    probs.append(x*y)
    
lambda_poisson = sum(probs)/sum(observed)

In [75]:
lambda_poisson

0.9666666666666667

In [76]:
defective_poisson = st.poisson(lambda_poisson)

In [77]:
expected = [round(defective_poisson.pmf(i)*74) for i in n_items]
sum(expected)

60

In [79]:
print(observed)
print(expected)

[32, 15, 9, 4]
[28, 27, 4, 1]


In [78]:
st.chisquare(observed,expected)

Power_divergenceResult(statistic=21.154761904761905, pvalue=9.776539976039265e-05)

In [15]:
# Conclusion: the p.value is lower than 0.05 so we can reject the null hypothesis and state that the nº of...
# ...defective items doesn't follow a Poisson distribution.

## BONUS/OPTIONAL - Question 3
A quality control engineer takes a sample of 10 tires that come out of an assembly line, and would like to verify on the basis of the data that follows, if the number of tires with defects observed over 200 days, if it is true that 5% of all tires have defects (that is, if the sample comes from a binomial population with n = 10 and p = 0.05). 

![](table3.png)


In [16]:
# Solved in class with Miguel

# H0: the 5% of all tires have defects
# H1: the 5% of all tires don't have defects

In [81]:
tires_binom = st.binom(n=10,p=0.05)

In [82]:
expected = [round(tires_binom.pmf(i)*202) for i in range(3)]
sum(expected)

200

In [83]:
expected

[121, 64, 15]

In [36]:
observed = [138,53,9]

In [37]:
st.chisquare(observed,expected)

Power_divergenceResult(statistic=6.679054752066115, pvalue=0.0354537100355941)

In [38]:
# Conclusion: as the p.value is lower than 0.05 we can reject the null hypothesis, meaning that...
# ... 5% of all tires don't seem to have defects.

## Question 4
A researcher gathers information about the patterns of Physical Activity of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (Low, Medium, High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (Yes = consumed, No = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table4.png)

In [33]:
# H0: The level of physical activity and consumption of sugary drinks is not correlated
# H1: The level of physical activity and consumption of sugary drinks is correlated

physical_table = [[32,12],[14,22],[6,9]]

In [34]:
st.chi2_contingency(np.array(physical_table))

(10.712198008709638,
 0.004719280137040844,
 2,
 array([[24.08421053, 19.91578947],
        [19.70526316, 16.29473684],
        [ 8.21052632,  6.78947368]]))

In [19]:
# Conclusion: there is an impact between the sugary drinks consumption and the physical activity of children.