In [30]:
import scipy.stats as st
from scipy.stats import poisson
import numpy as np

## Question 1
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

In [13]:
#H0: our observations follow Poisson
#H1: our observations do not follow Poisson
observed=np.array([35,99,104,110,62,25,10,3])
mu=2.435
poisson_dist=poisson(mu)

#we have observed but not expected
#we know probability of values 0 to 6 under poisson
poisson_pmfs=np.array([poisson_dist.pmf(i) for i in range(0,7)])
poisson_pmfs



array([0.08759775, 0.21330051, 0.25969338, 0.21078446, 0.12831504,
       0.06248942, 0.02536029])

In [14]:
#to account for >7 observations
with_tail=np.append(poisson_pmfs,1-poisson_pmfs.sum())
with_tail

array([0.08759775, 0.21330051, 0.25969338, 0.21078446, 0.12831504,
       0.06248942, 0.02536029, 0.01245915])

In [15]:
#448 observations total
expected=with_tail*448

In [16]:
st.chisquare(f_exp=expected,f_obs=observed)

Power_divergenceResult(statistic=6.491310681109821, pvalue=0.4836889068537269)

In [9]:
#we do not reject that our observations follow a poisson, because p value is greater than significance 0.05

## BONUS/OPTIONAL - Question 2
Let's analyze a discrete distribution. To analyze the number of defective items in a factory in the city of Medellín, we took a random sample of n = 60 articles and observed the number of defectives in the following table:

![](table2.png)

A poissón distribution was proposed since it is defined for x = 0,1,2,3, .... using the following model:

![](image1.png)

For some extra insights check the following link: https://online.stat.psu.edu/stat504/node/63/ 

Does the distribution of defective items follow this distribution?

In [12]:
mu=(0+1+3+4)/4
mu

2.0

In [37]:
#H0: our observations follow Poisson
#H1: our observations do not follow Poisson
observed=np.array([32,15,9,4])
mu=2
poisson_dist=poisson(mu)

poisson_pmfs=np.array([poisson_dist.pmf(i) for i in [0,1,3,4]])

expected=poisson_pmfs*60

st.chisquare(f_obs=observed,f_exp=expected)

ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
0.4778112197861302

## Question 3
A quality control engineer takes a sample of 10 tires that come out of an assembly line, and would like to verify on the basis of the data that follows, if the number of tires with defects observed over 200 days, if it is true that 5% of all tires have defects (that is, if the sample comes from a binomial population with n = 10 and p = 0.05). 

![](table3.png)


In [43]:
n=10
p=0.05
observed=np.array([138,53,9])

from scipy.stats import binom
binom_dist=binom(n,p)

binom_pmfs=np.array([binom_dist.pmf(i) for i in range(0,2)])

with_tail=np.append(binom_pmfs,1-binom_pmfs.sum())

expected=with_tail*200

st.chisquare(f_obs=observed,f_exp=expected)

Power_divergenceResult(statistic=8.306179519542805, pvalue=0.015715783395950887)

In [44]:
#we can reject that our observations follow a binomial distribution

## Question 4
A researcher gathers information about the patterns of Physical Activity of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (Low, Medium, High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (Yes = consumed, No = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table4.png)

In [45]:
#contingency table

#H0: physical activity is independent of consumption of sugary drinks
#H1: physical activity is dependent on consumption of sugary drinks


In [48]:
table=np.array([[32,12],[14,22],[6,9]])
st.chi2_contingency(table)

Chi2ContingencyResult(statistic=10.712198008709638, pvalue=0.004719280137040844, dof=2, expected_freq=array([[24.08421053, 19.91578947],
       [19.70526316, 16.29473684],
       [ 8.21052632,  6.78947368]]))

In [49]:
#we can reject the null hypothesis, so there seems to be a correlation between physical activity and consumption of sugary drinks