## Question 1
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

In [3]:
from scipy.stats import poisson
import numpy as np
import scipy.stats as st

mu = 2.435

poisson_dist = poisson(mu)

expected = []

for i in range(0,7):
    expected.append(poisson_dist.pmf(i))

expected.append(poisson_dist.sf(6))

print(expected)



[0.08759774704805763, 0.21330051406202033, 0.2596933758705097, 0.21078445674823038, 0.12831503804548525, 0.06248942352815135, 0.025360291048508066, 0.012459153649037175]


In [16]:
# H0: follows poisson distr (2.435)
# H1: doesn't follow poisson distr (2.435)

alpha = 0.05

O = np.array([35,99,104,110,62,25,10,3])
E = np.array(expected)*448

st.chisquare(f_obs=O, f_exp=E)

Power_divergenceResult(statistic=6.491310681109792, pvalue=0.48368890685373034)

In [6]:
0.48368890685373034 < alpha

# Because the p_value is not smaller than our alpha, 
# we don't reject H0 and start believing that the results
# of the rugby matches shown above follow a poisson distribution
# with a mean of 2.435

False

## BONUS/OPTIONAL - Question 2
Let's analyze a discrete distribution. To analyze the number of defective items in a factory in the city of Medellín, we took a random sample of n = 60 articles and observed the number of defectives in the following table:

![](table2.png)

A poissón distribution was proposed since it is defined for x = 0,1,2,3, .... using the following model:

![](image1.png)

For some extra insights check the following link: https://online.stat.psu.edu/stat504/node/63/ 

Does the distribution of defective items follow this distribution?

In [None]:
# your code here

## Question 3
A quality control engineer takes a sample of 10 tires that come out of an assembly line, and would like to verify on the basis of the data that follows, if the number of tires with defects observed over 200 days, if it is true that 5% of all tires have defects (that is, if the sample comes from a binomial population with n = 10 and p = 0.05). 

![](table3.png)


In [15]:
# We do the expected probability for the binomial probability

n = 10
p = 0.05

from scipy.stats import binom

binomial_dist = binom(n,p)

exp = [binomial_dist.pmf(i) for i in range(0,2)]
exp.append(binomial_dist.sf(1))


[0.5987369392383787, 0.3151247048623047, 0.08613835589931637]

In [18]:
# H0: follows binomial distr (n = 10, p = 0.05)
# H1: doesn't follow binomial distr (n = 10, p = 0.05) 

alpha = 0.05

O = np.array([138,53,9])
E = np.array(exp)*(138+53+9)

st.chisquare(f_obs=O, f_exp=E)

Power_divergenceResult(statistic=8.30617951954277, pvalue=0.015715783395951168)

In [20]:
0.015715783395951168 < alpha

# Because p value is smaller than our alpha, we can reject H0
# and start believing that the observed frequency of errors shown above 
# doesn't follow a binomial distribution.

True

## Question 4
A researcher gathers information about the patterns of Physical Activity of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (Low, Medium, High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (Yes = consumed, No = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table4.png)

In [21]:
## H0: the physical activity variable is independent from consumption of sugary drinks
## H1: the physical activity variable is dependent from consumption of sugary drinks

# Significance 

alpha = 0.05

# Sample

sugar_table = np.array([[32, 12],
                        [14,22],
                        [6,9]])

# 4. 5. Stats and p_value

st.chi2_contingency(sugar_table)

(10.712198008709638,
 0.004719280137040844,
 2,
 array([[24.08421053, 19.91578947],
        [19.70526316, 16.29473684],
        [ 8.21052632,  6.78947368]]))

In [22]:
# Decision 

0.004719280137040844 < 0.05

## We have gathered enough evidence to believe with a confidence of 95% that the physical activity
## is not independent with the sugary drinks consumption. And we start believing that they are dependent variables.

True