## Question 1
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

In [15]:
import pandas as pd
import numpy as np
import scipy.stats as st

from scipy.stats import poisson

significance = 0.05
mu = 2.435
f_obs = np.array([35, 99, 104, 110, 62, 25, 10, 3])
poisson_dist = poisson(mu)

poisson_pmfs = np.array([poisson_dist.pmf(i) for i in range(0, 7)])
poisson_pmfs_last = np.append(poisson_pmfs, 1 - poisson_pmfs.sum())
f_exp = poisson_pmfs_last * 448

# Perform the chi-squared goodness-of-fit test
chi_squared, p_value = st.chisquare(f_obs=f_obs, f_exp=f_exp)

# Compare p-value with significance level
if p_value < significance:
    result = "Reject the null hypothesis: The data does not follow a Poisson distribution."
else:
    result = "Fail to reject the null hypothesis: The data follows a Poisson distribution."

print("Chi-squared statistic:", chi_squared)
print("P-value:", p_value)
print(result)

Chi-squared statistic: 6.491310681109821
P-value: 0.4836889068537269
Fail to reject the null hypothesis: The data follows a Poisson distribution.


## BONUS/OPTIONAL - Question 2
Let's analyze a discrete distribution. To analyze the number of defective items in a factory in the city of Medellín, we took a random sample of n = 60 articles and observed the number of defectives in the following table:

![](table2.png)

A poissón distribution was proposed since it is defined for x = 0,1,2,3, .... using the following model:

![](image1.png)

For some extra insights check the following link: https://online.stat.psu.edu/stat504/node/63/ 

Does the distribution of defective items follow this distribution?

In [16]:
# your code here

## Question 3
A quality control engineer takes a sample of 10 tires that come out of an assembly line, and would like to verify on the basis of the data that follows, if the number of tires with defects observed over 200 days, if it is true that 5% of all tires have defects (that is, if the sample comes from a binomial population with n = 10 and p = 0.05). 

![](table3.png)


In [17]:
from scipy.stats import binom

# H0: The sample comes from a binomial population (with n = 10 and p = 0.05)
# H1: The sample does not come from a binomial population (with n = 10 and p = 0.05)

# Observed frequencies of defective tires
O = np.array([138, 53, 9])

population = O.sum()
n = 10
p = 0.05
alpha = 0.05 

binom_dist = binom(n, p)


# Calculate the entire PMF for 0 to 2 or more
binom_pmfs = np.array([binom_dist.pmf(i) for i in range(0, 2)])

tail = 1 - binom_pmfs.sum()

binom_with_tail = np.append(binom_pmfs, tail)

E = binom_with_tail * population

chisquare_result = st.chisquare(f_obs = O, f_exp = E)

if chisquare_result.pvalue < alpha:
    print("We can reject the null hypothesis")
else:
    print("We can not reject the null hypothesis")
    
print("p-value:", chisquare_result.pvalue)

We can reject the null hypothesis
p-value: 0.015715783395950887


 Since this p-value is less than the chosen significance level (alpha = 0.05), i can reject the null hypothesis. This suggests that there is evidence to believe that the observed data does not follow a binomial distribution.The result indicates that the distribution of defective tires in the sample significantly deviates from what would be expected under the assumed binomial distribution.

## Question 4
A researcher gathers information about the patterns of Physical Activity of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (Low, Medium, High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (Yes = consumed, No = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table4.png)

In [18]:
from scipy.stats import chi2_contingency

#Hypothesis

#H0 : Physical Activity is independent of sugar drinks consumption 
#H1 : Physical Activity is not independent of sugar drinks consumption

alpha = 0.05


children = [[32, 12],
           [14, 22],
           [6,9]]

chi2_stat, p_val, dof, expected = chi2_contingency(children)

print("Chi-squared statistic:", chi2_stat)
print("P-value:", p_val)
print("Degrees of freedom:", dof)
print("Expected frequencies:\n", expected)

if p_val < alpha:
    print("We can reject the null hypothesis")
else:
    print("We cannot reject the null hypothesis")

Chi-squared statistic: 10.712198008709638
P-value: 0.004719280137040844
Degrees of freedom: 2
Expected frequencies:
 [[24.08421053 19.91578947]
 [19.70526316 16.29473684]
 [ 8.21052632  6.78947368]]
We can reject the null hypothesis


In [19]:
#The rejection of the null hypothesis suggests that there might be a meaningful connection between physical activity and sugar drinks consumption