## Question 1
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

In [3]:
import scipy
import numpy as np
import pandas as pd
from scipy import stats
from scipy.stats import poisson

In [4]:
#define size of observed values dataset
obs_size = 448 
# create array of observed values
obs = np.array([35,99,104,110,62,25,10,3])/obs_size 
# using the mean, we create a poisson distribution of equal length to the obs size
exp = np.array([poisson.pmf(i, 2.435) for i in range(7)]) 
exp = np.append(exp, 1-exp.sum())
# we use chi square to compare the observed and expected values 
print(stats.chisquare(obs,exp))
"""Close to 1, suggests no difference between the observed and expected, therefore there is reason to believe that the number of score is a poisson variable"""

Power_divergenceResult(statistic=0.014489532770334417, pvalue=0.9999999972330947)


'Close to 1, suggests no difference between the observed and expected, therefore there is reason to believe that the number of score is a poisson variable'

## Question 2
The following are the ordered values of a random sample of SAT scores (university entrance exam) for several students: 852, 875, 910, 933, 957, 963, 981, 998, 1010, 1015, 1018, 1023, 1035, 1048, 1063. In previous years, the scores were presented by N (985,50). Based on the sample, is there any reason to believe that there has been a change in the distribution of scores this year? Use the level alpha = 0.05. 

In [5]:
# create array of observed values
obs = np.array([852, 875, 910, 933, 957, 963, 981, 998, 1010, 1015, 1018, 1023, 1035, 1048, 1063])
# create expectated values. loc is the mean, scale is the std
exp = stats.norm(loc=985, scale=50)
stats.kstest(obs, exp.cdf)

KstestResult(statistic=0.1581291279406798, pvalue=0.847406396427736)

## Question 3
Let's analyze a discrete distribution. To analyze the number of defective items in a factory in the city of Medellín, we took a random sample of n = 60 articles and observed the number of defectives in the following table:

![](table4.png)

A poissón distribution was proposed since it is defined for x = 0,1,2,3, .... using the following model:

![](image1.png)

Does the distribution of defective items follow this distribution?

In [6]:
obs_size = 60
obs = np.array([32,15,0,9,4])/obs_size
exp = np.array([poisson.pmf(i, np.mean(obs)) for i in range(4)]) 
exp = np.append(exp, 1-exp.sum())
print(stats.chisquare(obs,exp))
"""P-value close to 0, so we can reject the null hypothesis"""

Power_divergenceResult(statistic=98.53214635507257, pvalue=2.0197203118928622e-20)


'P-value close to 0, so we can reject the null hypothesis'

## Question 4
A quality control engineer takes a sample of 10 tires that come out of an assembly line, and would like to verify on the basis of the data that follows, if the number of tires with defects observed over 200 days, if it is true that 5% of all tires have defects (that is, if the sample comes from a binomial population with n = 10 and p = 0.05). 

![](table6.png)


In [7]:
# define number of trials and probability 
n = 10
p = 0.05
obs_size = 200
obs = np.array([138,53,9])/obs_size
exp = np.array([stats.binom.pmf(x,n,p) for x in range(2)])
exp = np.append(exp, 1-exp.sum())
stats.chisquare(obs,exp)
"""pvalue close to 1, suggests that it is true that 5% of all tyres have defects """

'pvalue close to 1, suggests that it is true that 5% of all tyres have defects '

## Question 5
A researcher gathers information about the patterns of physical activity (AF) of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (1 = Low, 2 = Medium, 3 = High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (1 = consumed, 0 = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table5.png)

In [11]:
obs_size = 95
obs = np.array([32,12,14,22,6,9])
exp = [52*44/obs_size,43*44/obs_size,52*36/obs_size,43*36/obs_size,52*15/obs_size,43*15/obs_size] #we take the total number of those who drink sugary drinks, 
#multiply by the number of people in a certain activity level, the divide by total number of people
chi_squared_statistic = ((obs - exp)**2/exp).sum()

dof = len(exp) - 1
alpha = 0.05
critical_value=stats.chi2.ppf(q=1-alpha,df=dof)

print('Chi-squared Statistic: {}'.format(chi_squared_statistic))
print('Critical Value: {}'.format(critical_value))

Chi-squared Statistic: 10.712198008709638
Critical Value: 11.070497693516351
