## Question 1
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

In [63]:
# your answer here
# imports
from scipy.stats import poisson
from scipy.stats import chisquare

# observed sample
observed = [35, 99, 104, 110, 62, 25, 10, 3]

# expected sample computed based on a Poisson distribution with mu = 2.435
expected = [poisson.pmf(0, 2.435)*448, 
            poisson.pmf(1, 2.435)*448, 
            poisson.pmf(2, 2.435)*448, 
            poisson.pmf(3, 2.435)*448, 
            poisson.pmf(4, 2.435)*448, 
            poisson.pmf(5, 2.435)*448, 
            poisson.pmf(6, 2.435)*448, 
            (1-poisson.cdf(6, 2.435))*448]

# chi-square
chisquare(f_obs= observed, f_exp= expected)
# H0: the observed values are equal to the expected values (poisson distribution mu= 2.435)
# H1: the observed values are not equal to the expected values (poisson distribution mu= 2.435)

# The number of scores is a Poisson variable, because p-value is higher than 0.05 (significance level). It does not allow us
# to reject the null hypothesis which states that the observed values are equal to the expected values (poisson distribution
# mu= 2.435)

Power_divergenceResult(statistic=6.491310681109786, pvalue=0.4836889068537311)

## BONUS/OPTIONAL - Question 2
Let's analyze a discrete distribution. To analyze the number of defective items in a factory in the city of Medellín, we took a random sample of n = 60 articles and observed the number of defectives in the following table:

![](table2.png)

A poisson distribution was proposed since it is defined for x = 0,1,2,3, .... using the following model:

![](image1.png)

For some extra insights check the following link: https://online.stat.psu.edu/stat504/node/63/ 

Does the distribution of defective items follow this distribution?

In [74]:
# your code here
# observed sample (the last two lines were joined, because chi-square test is invalid when the observed or expected 
# frequencies in each category are too small)
observed = [32, 15, 13]

# maximum likelihood sample mean
estimated_mean = (32*0+15*1+9*3+4*4)/60

# expected values computed with a Poisson distributed considering the estimated mean
expected = [poisson.pmf(0, estimated_mean)*60,
            poisson.pmf(1, estimated_mean)*60,
            (1-poisson.cdf(1, estimated_mean))*60]

# chi-square where the ddof input was considered to be one, because it refers to the number of values estimated
chisquare(f_obs= observed, f_exp= expected, ddof= 1)

# H0: the observed values are equal to the expected values (poisson distribution)
# H1: the observed values are not equal to the expected values (poisson distribution )

# The number of scores is not a Poisson variable, because p-value is less than 0.05 (significance level). It allows us
# to reject the null hypothesis which states that the observed values are equal to the expected values 

Power_divergenceResult(statistic=6.248550837473296, pvalue=0.012429495465327694)

## Question 3
A quality control engineer takes a sample of 10 tires that come out of an assembly line, and would like to verify on the basis of the data that follows, if the number of tires with defects observed over 200 days, if it is true that 5% of all tires have defects (that is, if the sample comes from a binomial population with n = 10 and p = 0.05). 

![](table3.png)


In [77]:
# your answer here
# imports
from scipy.stats import binom

# observed sample
observed = [138, 53, 9]

# expected sample computed based on a Binomial distribution with n = 10 and p = 0.05
expected = [binom.pmf(0, 10, 0.05)*200, 
            binom.pmf(1, 10, 0.05)*200, 
            (1-binom.cdf(1, 10, 0.05))*200]

# chi-square
chisquare(f_obs= observed, f_exp= expected)
# H0: the observed values are equal to the expected values (Binomial distribution n = 10 and p = 0.05)
# H1: the observed values are not equal to the expected values (Binomial distribution n = 10 and p = 0.05)

# The number of scores is not a Binomial variable, because p-value is less than 0.05 (significance level). It allows us
# to reject the null hypothesis which states that the observed values are equal to the expected values 
# (Binomial distribution n = 10 and p = 0.05)

Power_divergenceResult(statistic=8.306179519542772, pvalue=0.015715783395951144)

## Question 4
A researcher gathers information about the patterns of Physical Activity of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (Low, Medium, High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (Yes = consumed, No = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table4.png)

In [82]:
#your answer here
# imports
import numpy as np
from scipy.stats import chi2_contingency

# definition of table
table = np.array([[32, 12], [14, 22], [6, 9]])

# chi-square
chi2_contingency(table)

# H0: the variables (sugary drinks, physical activity) are independent
# H1: the variables (sugary drinks, physical activity) are not independent
# The variables (sugary drinks, physical activity) are not independent, because p-value is less than 0.05 (significance 
# level). It allows us to reject the null hypothesis which states that the variables (sugary drinks, physical activity) are 
# independent

(10.712198008709638,
 0.004719280137040844,
 2,
 array([[24.08421053, 19.91578947],
        [19.70526316, 16.29473684],
        [ 8.21052632,  6.78947368]]))