# Mean and Proportion Hypothesis Tests

In [3]:
import numpy as np
import pandas as pd
import scipy.stats

file_path = '~/Documents/UNT/csce5310/houston-aqi-2010-2021.csv'
df = pd.read_csv(file_path)

In [4]:
print(f"Mean Wind: {df['avg_wind'].mean()}")

Mean Wind: 5.238881443739425


## Sample Data

In [5]:
n = 100
random_indices = np.random.choice(df.index, n, replace=False)
wind_samples = df.loc[random_indices, 'avg_wind']

## Proportion Test

### Hypothesis Test

We are going to test the claim that greater than 50% of the wind samples are greater than 5 units with a 99% confidence level by conducting a right-tailed hypothesis test, where the null hypothesis is the statement of equality about the proportion of pressure samples that are greater than 5 units is equal to 50%.

In [6]:
alpha = 0.01
p = q = 0.5
wind_over_5 = np.where(wind_samples > 5, 1, 0).sum() / 100.
z = (wind_over_5 - p) / np.sqrt(p * q / n)
print(f'The z-index is: {z}')
print(f'The p-value is: {scipy.stats.norm.sf(z)}')
print(f'The critical value is: {scipy.stats.norm.ppf(1 - alpha)}')
print(f'Reject null hypothesis? {scipy.stats.norm.sf(z) <= alpha}')

The z-index is: -0.7999999999999996
The p-value is: 0.7881446014166031
The critical value is: 2.3263478740408408
Reject null hypothesis? False


Since the p-value is greater than our significance level of 0.01 for this right-tailed test, we fail to reject the null hypothesis and conclude that the proportion of wind samples greater than 5 is equal to 50%.

### Confidence Interval

We calculate the confidence interval for obtaining an estimate over the population proportion we have calculated using the chosen significance level of 0.01.

In [7]:
margin_of_error = scipy.stats.norm.ppf(1 - alpha / 2) * np.sqrt(wind_over_5 * (1 - wind_over_5) / n)
print(f'The proportion of wind samples greater than 5 units is: {wind_over_5}')
print(f'The confidence interval is: ({round(wind_over_5 - margin_of_error, 2)}, {round(wind_over_5 + margin_of_error, 2)})')

The proportion of wind samples greater than 5 units is: 0.46
The confidence interval is: (0.33, 0.59)


We can see that the confidence interval estimate for the proportion of wind samples above 5 units agrees with our null hypothesis about it being 50%.

## Mean Test

### Hypothesis Test

We conduct a two-tailed hypothesis test for the sample means being equivalent to 5 units with 95% confidence level.

In [9]:
alpha = 0.05
degrees_of_freedom = len(wind_samples) - 1
t_statistic, p_value = scipy.stats.ttest_1samp(wind_samples, 5)
critical_value = scipy.stats.t.ppf(1 - alpha / 2, degrees_of_freedom)
print(f'The mean of the wind samples is: {round(wind_samples.mean(), 2)}')
print(f'The t-statistic is: {t_statistic}')
print(f'The p-value is: {p_value}')
print(f'The critical value is: {critical_value}')
print(f'Reject null hypothesis? {p_value <= alpha}')

The mean of the wind samples is: 5.05
The t-statistic is: 0.2518272258484837
The p-value is: 0.8016966458250255
The critical value is: 1.9842169515086827
Reject null hypothesis? False


Since the p-value is greater than the significance level of 0.05, we fail to reject the null hypothesis and conclude that the likely mean of the wind samples is 5 units.

### Confidence Interval

We construct a confidence interval for a 95% confidence level for the estimation of the population mean.

In [10]:
margin_of_error = critical_value * wind_samples.std() / np.sqrt(n)
print(f'Confidence interval: ({round(wind_samples.mean() - margin_of_error, 2)}, {round(wind_samples.mean() + margin_of_error, 2)})')

Confidence interval: (4.66, 5.44)


We can see that the confidence interval overlaps with our estimate about the population mean being 5 units for wind.