## Question 1
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

In [1]:

import numpy as np

from scipy import stats

In [2]:



## Number of observations
obs = 448

## Sample mean
mu = 2.435

## Number of times
times = [35,99,104,110,62,25,10,3]

## Numebr of scores
scores = [0,1,2,3,4,5,6,7]

## Calculate probability for seven observations from Poisson distribution
probability = [stats.poisson.pmf(i, mu) for i in range(len(scores) -1)]


## Probability for eight observation.
## This probability is one minus sum of probabilitites for all other observations
probability.append(1-sum(probability))


## Calculate the expected value
expected_value = [proba * obs for proba in probability]


## Test if expected values are similar to actual values
stats.chisquare(f_obs=times, f_exp=expected_value)

Power_divergenceResult(statistic=6.491310681109821, pvalue=0.4836889068537269)

In [None]:
'''P-Values is more than 0.05 and hence we fail to reject H0 and conclude that sample data 
does follow Poisson distribution.**'''

## BONUS/OPTIONAL - Question 2
Let's analyze a discrete distribution. To analyze the number of defective items in a factory in the city of Medellín, we took a random sample of n = 60 articles and observed the number of defectives in the following table:

![](table2.png)

A poissón distribution was proposed since it is defined for x = 0,1,2,3, .... using the following model:

![](image1.png)

For some extra insights check the following link: https://online.stat.psu.edu/stat504/node/63/ 

Does the distribution of defective items follow this distribution?

In [3]:
# your code here


## Number of observations
obs = 60

## Number of defective items
defect_items = [0, 1, 2, 3, 4]

## Observed frequency
observed_freq = [32, 15, 0, 9, 4]

## Sample mean
mu = (0*32 + 1*15 + 2*0 + 3*9 + 4*4) / obs 


## Calculate probability for five observations from Poisson distribution
probability = [stats.poisson.pmf(i, mu) for i in range(len(defect_items))]


## Calculate the expected value
expected_value = [proba * obs for proba in probability]


## Test if expected values are similar to actual values
stats.chisquare(f_obs=observed_freq, f_exp=expected_value)

Power_divergenceResult(statistic=37.72656768931596, pvalue=1.2759420913385983e-07)

In [None]:
'''P-Value is less than 0.05 and we reject the null hypothesis and
conclude that the sample does not follow Poisson distribution**'''

## Question 3
A quality control engineer takes a sample of 10 tires that come out of an assembly line, and would like to verify on the basis of the data that follows, if the number of tires with defects observed over 200 days, if it is true that 5% of all tires have defects (that is, if the sample comes from a binomial population with n = 10 and p = 0.05). 

![](table3.png)


In [4]:
# your answer here

## Number of observations
obs = 200

## Sample size
n = 10

## Probability of success
p = 0.05

## Number of defective items
defect = [0, 1, 2]

## Observed frequency
observed_freq = [138, 53, 9]


## Calculate probability for two observations
probability = [stats.binom.pmf(i, n, p) for i in range(len(defect) -1)]

## Probability for third observation.
## This probability is one minus sum of probabilitites for all other observations
probability.append(1-sum(probability))

## Calculate expected value
expected_value = [proba * obs for proba in probability] 


stats.chisquare(f_obs=observed_freq, f_exp=expected_value)

Power_divergenceResult(statistic=8.30617951954273, pvalue=0.015715783395951474)

In [None]:
'''P-Value is less than 0.05 and we reject null hypothesis
and conclude that sample data does not follow Binomial distribution.**'''

## Question 4
A researcher gathers information about the patterns of Physical Activity of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (Low, Medium, High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (Yes = consumed, No = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table4.png)

In [5]:
#your answer here
x_yes = [32, 14, 6]

x_no = [12, 22, 9]

table = np.array([x_yes, x_no])


## Chi-square test of independence of variables in a contingency table.
chi2, p_value, dof, expected = stats.chi2_contingency(observed=table)


result = {"Test_Statistics": chi2,
          "P_Value": p_value,
          "Degrees_of_Freedom": dof,
          "Expected_Frequencies": expected}



result

{'Test_Statistics': 10.712198008709638,
 'P_Value': 0.004719280137040844,
 'Degrees_of_Freedom': 2,
 'Expected_Frequencies': array([[24.08421053, 19.70526316,  8.21052632],
        [19.91578947, 16.29473684,  6.78947368]])}

In [None]:
'''As P-Value is less than 0.05 we reject the null hypothesis and conclude
that there IS NO association between patterns of physical activity and the consumption of sugary drinks.'''