# Mean and Proportion Hypothesis Tests

In [1]:
import numpy as np
import pandas as pd
import scipy.stats

file_path = '~/Documents/UNT/csce5310/houston-aqi-2010-2021.csv'
df = pd.read_csv(file_path)

In [2]:
print(f"Mean Humidity:  {round(df['avg_humidity'].mean(), 2)}")

Mean Humidity:  65.64


## Sample Data

In [3]:
n = 100
random_indices = np.random.choice(df.index, n, replace=False)
humidity_samples = df.loc[random_indices, 'avg_humidity']

## Proportion Test

### Hypothesis Test

We are going to test the claim that greater than 50% of the humidity samples are greater than 65 units with a 95% confidence level by conducting a right-tailed hypothesis test, where the null hypothesis is the statement of equality about the proportion of humidity samples being greater than 68 units is 50%.

In [6]:
alpha = 0.05
p = q = 0.5
humidity_over_65 = np.where(humidity_samples > 65, 1, 0).sum() / 100.
z = (humidity_over_65 - p) / np.sqrt(p * q / n)
print(f'The z-index is: {z}')
print(f'The p-value is: {scipy.stats.norm.sf(z)}')
print(f'The critical value is: {scipy.stats.norm.ppf(1 - alpha)}')
print(f'Reject null hypothesis? {scipy.stats.norm.sf(z) <= alpha}')

The z-index is: 1.399999999999999
The p-value is: 0.08075665923377118
The critical value is: 1.6448536269514722
Reject null hypothesis? False


Since the p-value is greater than the significance level of 0.05, we fail to reject the null hypothesis about the proportion of humidity being equal to 65 units.

### Confidence Interval

We calculate the confidence interval for obtaining an estimate over the population proportion we have calculated using the chosen significance level of 0.01.

In [7]:
margin_of_error = scipy.stats.norm.ppf(1 - alpha / 2) * np.sqrt(humidity_over_65 * (1 - humidity_over_65) / n)
print(f'The proportion of humidity samples greater than 65 units is: {humidity_over_65}')
print(f'The confidence interval is: ({round(humidity_over_65 - margin_of_error, 2)}, {round(humidity_over_65 + margin_of_error, 2)})')

The proportion of humidity samples greater than 65 units is: 0.57
The confidence interval is: (0.47, 0.67)


We can see from the confidence interval estimate that the proportion of humidity samples above 65 units agrees with our null hypothesis about the population proportion being at 50%.

## Mean Test

### Hypothesis Test

We conduct a two-tailed hypothesis test for the sample mean equivalent to 65 units using a 95% confidence level.

In [9]:
alpha = 0.05
degrees_of_freedom = len(humidity_samples) - 1
t_statistic, p_value = scipy.stats.ttest_1samp(humidity_samples, 65)
critical_value = scipy.stats.t.ppf(1 - alpha / 2, degrees_of_freedom)
print(f'The mean of the humidity samples is: {round(humidity_samples.mean(), 2)}')
print(f'The t-statistic is: {t_statistic}')
print(f'The p-value is: {p_value}')
print(f'The critical value is: {critical_value}')
print(f'Reject null hypothesis? {p_value <= alpha}')

The mean of the humidity samples is: 65.59
The t-statistic is: 0.475670949326064
The p-value is: 0.6353567509044982
The critical value is: 1.9842169515086827
Reject null hypothesis? False


Since the p-value is greater than the significance level of 0.05, we fail to reject the null hypothesis about the population mean of humidity being at 65 units.

### Confidence Interval

We construct a confidence interval for a 95% confidence level for the population estimation of the mean.

In [10]:
margin_of_error = critical_value * humidity_samples.std() / np.sqrt(n)
print(f'Confidence interval: ({round(humidity_samples.mean() - margin_of_error, 2)}, {round(humidity_samples.mean() + margin_of_error, 2)})')

Confidence interval: (63.13, 68.05)


From our confidence interval, we can also confirm that our estimate for the population humidity at 65 units is reasonably accurate.