# Mean and Proportion Hypothesis Tests

In [1]:
import numpy as np
import pandas as pd
import scipy.stats

file_path = '~/Documents/UNT/csce5310/houston-aqi-2010-2021.csv'
df = pd.read_csv(file_path)

In [3]:
print(f"Mean Temperature: {round(df['avg_temperature'].mean(), 2)}")

Mean Temperature: 68.74


## Sample Data

In [4]:
n = 100
random_indices = np.random.choice(df.index, n, replace=False)
temperature_samples = df.loc[random_indices, 'avg_temperature']

## Proportion Test

### Hypothesis Test

We are going to test the claim that greater than 50% of the temperature samples are greater than 65 units with a 99% confidence level by conducting a right-tailed hypothesis test, where the null hypothesis is the statement of equality about the proportion of pressure samples greater than 65 being equal to 50%.

In [5]:
alpha = 0.01
p = q = 0.5
temperature_over_65 = np.where(temperature_samples > 65, 1, 0).sum() / 100.
z = (temperature_over_65 - p) / np.sqrt(p * q / n)
print(f'The z-index is: {z}')
print(f'The p-value is: {scipy.stats.norm.sf(z)}')
print(f'The critical value is: {scipy.stats.norm.ppf(1 - alpha)}')
print(f'Reject null hypothesis? {scipy.stats.norm.sf(z) <= alpha}')

The z-index is: 2.1999999999999997
The p-value is: 0.013903447513498616
The critical value is: 2.3263478740408408
Reject null hypothesis? False


Since the p-value is slighlty greater than the significance level of 0.01, we fail to reject the null hypothesis and conclude that the proportion of temperature above 65 is equal to 50%. 

### Confidence Interval

We calculate the confidence interval for obtaining an estimate over the population proportion we have calculated using the chosen significance level of 0.01.

In [6]:
margin_of_error = scipy.stats.norm.ppf(1 - alpha / 2) * np.sqrt(temperature_over_65 * (1 - temperature_over_65) / n)
print(f'The proportion of temperature samples greater than 65 units is: {temperature_over_65}')
print(f'The confidence interval is: ({round(temperature_over_65 - margin_of_error, 2)}, {round(temperature_over_65 + margin_of_error, 2)})')

The proportion of temperature samples greater than 65 units is: 0.61
The confidence interval is: (0.48, 0.74)


We can see that the confidence interval estimate for the proportion of temperature samples above 65 units agrees with our null hypothesis about it being 50%.

## Mean Test

### Hypothesis Test

We conduct a two-tailed hypothesis test for the sample means being equivalent to 65 units with 95% confidence level.

In [7]:
alpha = 0.05
degrees_of_freedom = len(temperature_samples) - 1
t_statistic, p_value = scipy.stats.ttest_1samp(temperature_samples, 65)
critical_value = scipy.stats.t.ppf(1 - alpha / 2, degrees_of_freedom)
print(f'The mean of the temperature samples is: {round(temperature_samples.mean(), 2)}')
print(f'The t-statistic is: {t_statistic}')
print(f'The p-value is: {p_value}')
print(f'The critical value is: {critical_value}')
print(f'Reject null hypothesis? {p_value <= alpha}')

The mean of the temperature samples is: 68.25
The t-statistic is: 2.1795097697845724
The p-value is: 0.03166514327125672
The critical value is: 1.9842169515086827
Reject null hypothesis? True


Since the p-value is less than the significance level of 0.05, we reject the null hypothesis to conclude that the mean of the temperature samples is above 65 units.

### Confidence Interval

We construct a confidence interval for a 95% confidence level for the estimation of the population mean.

In [8]:
margin_of_error = critical_value * temperature_samples.std() / np.sqrt(n)
print(f'Confidence interval: ({round(temperature_samples.mean() - margin_of_error, 2)}, {round(temperature_samples.mean() + margin_of_error, 2)})')

Confidence interval: (65.29, 71.22)


From the confidence interval, we can see that the estimate for the population temperature mean is slightly above 65 