# Mean and Proportion Hypothesis Tests

In [1]:
import numpy as np
import pandas as pd
import scipy.stats

file_path = '~/Documents/UNT/csce5310/houston-aqi-2010-2021.csv'
df = pd.read_csv(file_path)

In [2]:
print(f"Mean Pressure: {round(df['avg_pressure'].mean(), 2)}")

Mean Pressure: 1017.89


## Sample Data

We first start by picking 100 random samples for the pressure variable from our dataset.

In [3]:
n = 100
random_indices = np.random.choice(df.index, n, replace=False)
pressure_samples = df.loc[random_indices, 'avg_pressure']

## Proportion Test

### Hypothesis Test

We are going to test the claim that greater than 50% of the pressure samples are greater than 1017 units with a 99% confidence level by conducting a right-tailed hypothesis test, where the null hypothesis is the statement of equality about the proportion of pressure samples that are greater than 1017 units is equal to 50%. We first calculate the test statistic for the z-index using numpy, and find the value of the survival function using scipy.stats.norm.

In [4]:
alpha = 0.01
p = q = 0.5
p_over_1017 = np.where(pressure_samples > 1017, 1, 0).sum() / 100.
z = (p_over_1017 - p) / np.sqrt(p * q / n)
print(f'The z-index is: {z}')
print(f'The p-value is: {scipy.stats.norm.sf(z)}')
print(f'The critical value is: {scipy.stats.norm.ppf(1 - alpha)}')
print(f'Reject null hypothesis? {scipy.stats.norm.sf(z) <= alpha}')

The z-index is: -0.20000000000000018
The p-value is: 0.5792597094391031
The critical value is: 2.3263478740408408
Reject null hypothesis? False


Since the p-value is greater than alpha of 0.01 for this right-tailed test, we fail to reject the null hypothesis and conclude that the proportion of pressure samples greater than 1017 is equal to 50%.

### Confidence Interval

We calculate the confidence interval for obtaining an estimate over the population proportion we have calculated using the chosen significance level of 0.01.

In [5]:
margin_of_error = scipy.stats.norm.ppf(1 - alpha / 2) * np.sqrt(p_over_1017 * (1 - p_over_1017) / n)
print(f'The proportion of pressure samples greater than 1017 unit is: {p_over_1017}')
print(f'The confidence interval is: ({round(p_over_1017 - margin_of_error, 2)}, {round(p_over_1017 + margin_of_error, 2)})')

The proportion of pressure samples greater than 1017 unit is: 0.49
The confidence interval is: (0.36, 0.62)


We ca

## Mean Test

### Hypothesis Test

We conduct a two-tailed hypothesis test for the sample means being equivalent to 1017 units with 95% confidence level. We assume the standard deviation of the population is also not known, as is common in the case when conducting a hypothesis test about a population mean.

In [6]:
alpha = 0.05
degrees_of_freedom = len(pressure_samples) - 1
t_statistic, p_value = scipy.stats.ttest_1samp(pressure_samples, 1017)
critical_value = scipy.stats.t.ppf(1 - alpha / 2, degrees_of_freedom)
print(f'The mean of the pressure samples is: {round(pressure_samples.mean(), 2)}')
print(f'The t-statistic is: {t_statistic}')
print(f'The p-value is: {p_value}')
print(f'The critical value is: {critical_value}')
print(f'Reject null hypothesis? {p_value <= alpha}')

The mean of the pressure samples is: 1018.29
The t-statistic is: 2.034918215786118
The p-value is: 0.0445322340054183
The critical value is: 1.9842169515086827
Reject null hypothesis? True


Since the p-value is less than the significance level of 0.05, we reject the null hypothesis to conclude that the mean of the pressure samples is above 1017 units.

### Confidence Interval

We construct a confidence interval for a 95% confidence level for the estimation of the population mean.

In [7]:
margin_of_error = critical_value * pressure_samples.std() / np.sqrt(n)
print(f'Confidence interval: ({round(pressure_samples.mean() - margin_of_error, 2)}, {round(pressure_samples.mean() + margin_of_error, 2)})')

Confidence interval: (1017.03, 1019.55)


We can see that the confidence interval also does not overlap with our hypothesis of the sample mean that we failed to reject.