In [1]:
import numpy as np
import math as m
from scipy.stats import norm

## Section 10.1

### Problem 10.1.1
Given that:
* Hypothesis test: $H_0: \mu_1 = \mu_2$ versus $H_1: \mu_1 \neq \mu2$, and $\alpha = 0.05$ 
* For Population 1: $\sigma_1 = 10$, sample size: $n_1 = 10$, sample mean $\bar{x}_1 = 4.7$
* For Population 2: $\sigma_2 = 5$, sample size: $n_2 = 15$, sample mean $\bar{x}_2 = 7.8$.
* Significant level: $\alpha = 0.05$.

a.

We can conduct the hypothesis test as follows:
* Compute the test statistic for the difference in means, which follows the standard normal distribution $\mathcal{N}(0,1)$ between two population mean:
$$
Z_0 = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\sigma_1^2/n_1 + \sigma_2^2/n_2}} = \frac{4.7 - 7.8}{\sqrt{10^2/10 + 5^2/15}} \approx -0.90759
$$
* Make a decision based on the value of the test statistic: because this is a two-tailed test, the critical values are $|Z_{\alpha/2}| \approx 1.96$. Since $|Z_0|\leq Z_{\alpha/2}$, the result of the test is that we fail to reject the null hypothesis.

The corresponding p-value for the above test statistic is given as:
$$
p = 2 \times (1 - \phi(|Z_0|)) = 2 \times (1 - \phi(|-0.907587|)) \approx 0.364096
$$

Since the p-value is 0.364096 that is greater than the significant level $\alpha = 0.05$, the result of the test is that we fail to reject the null hypothesis.

b. 

We can conduct a similar test using a $100(1-\alpha)\%$ confidence interval of the difference between two population means $\Delta = \mu_1 - \mu_2$ as follows:
* Compute the estimator of $\Delta$, which is the difference between two sample means $\hat{\Delta}$:
$$
\hat{\Delta} = \bar{x}_1 - \bar{x}_2 = 4.7 - 7.8 = -3.1
$$ 
* Compute the margin of error for the estimator:
$$
\text{MOE} = Z_{\alpha/2}\times\text{SE} = 1.96\times(\sqrt{10^2/10 + 5^2/15}) \approx 6.69467
$$
* Compute the $100(1-\alpha)\%$ confidence interval:
$$
\text{CI}_{\alpha/2} = \hat{\Delta} \pm Z_{\alpha/2}\times\text{SE} = -3.1 \pm 6.69467 = [-9.79467, 3.59467]
$$

Since the 95% confidence interval contains the value of zero, we can conclude that we fail to reject the null hypothesis.

c. 

The power of the test in part (a) for a true difference in means of $\Delta = 3$ can be computed as follows:
* Compute $\beta$:
$$
\begin{aligned}
\beta &= \phi(Z_{\alpha/2} - \frac{\Delta - \Delta_0}{\sqrt{\sigma_1^2/n_1 + \sigma_2^2/n_2}}) - \phi(-Z_{\alpha/2} - \frac{\Delta - \Delta_0}{\sqrt{\sigma_1^2/n_1 + \sigma_2^2/n_2}}) \\
&= \phi(1.96 - \frac{3 - 0}{\sqrt{10^2/10 + 5^2/15}}) - \phi(-1.96 - \frac{3 - 0}{\sqrt{10^2/10 + 5^2/15}})
\end{aligned}
$$
* Compute the power:
$$
\text{power} = 1 - \beta \approx 0.14197
$$

d. 

We can compute the new sample size with $\beta = 0.05$, the true difference in means is 3, $\alpha = 0.05$ as follows:
$$
n \simeq \frac{(Z_{\alpha/2} + Z_\beta)^2 (\sigma_1^2 + \sigma_2^2)}{(\Delta - \Delta_0)^2} \approx 181
$$ 

Verifying the results using the following Python code:

In [2]:

diff_0 = 0
sigma_1 = 10
sigma_2 = 5
n_1 = 10
n_2 = 15
xbar_1 = 4.7
xbar_2 = 7.8

##
# (a):
#
Z = (xbar_1 - xbar_2) / m.sqrt(sigma_1 ** 2 / n_1 + sigma_2 ** 2 / n_2)
p_Z = 2 * (1 - norm.cdf(abs(Z)))
print(p_Z)

##
# (b):
#
moe = abs(norm.ppf(0.05)) * m.sqrt(sigma_1 ** 2 / n_1 + sigma_2 ** 2 / n_2)
diff = xbar_1 - xbar_2
print(f"[{diff - moe},{diff + moe}]")

##
# (c):
#
diff = 3
Z_b_u = abs(norm.ppf(0.05/2)) * (1) - (diff - diff_0) / m.sqrt(sigma_1 ** 2 / n_1 + sigma_2 ** 2 / n_2)
Z_b_l = abs(norm.ppf(0.05/2)) * (-1) - (diff - diff_0) / m.sqrt(sigma_1 ** 2 / n_1 + sigma_2 ** 2 / n_2)
beta = norm.cdf(Z_b_u) - norm.cdf(Z_b_l)
power = 1 - beta
print(power)

##
# (d)
#
Z_a = abs(norm.ppf(0.05/2))
Z_b = abs(norm.ppf(0.05))
n = m.ceil((Z_a + Z_b) ** 2 * (sigma_1 ** 2 + sigma_2 ** 2) / (diff - diff_0) ** 2)
print(n)

0.3640964293652713
[-8.718244710860606,2.518244710860606]
0.14197107914234086
181


### Problem 10.1.2
Given that:
* For Population 1: $\sigma_1 = 0.020$, $\bar{x}_1 \approx 16.015 $
* For Population 2: $\sigma_2 = 0.025$, $\bar{x}_2 \approx 16.005$

a.

The hypothesis test is $H_0: \mu_1 = \mu_2 = 16$ versus $H_1: \mu_1 \neq \mu_2 \neq 16$ with $\alpha = 0.05$. We can test the hypotheses as follows:
* Compute the test statistic for the difference in means, which follows the standard normal distribution $\mathcal{N}(0,1)$ between two population mean:
$$
Z_0 = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\sigma_1^2/n_1 + \sigma_2^2/n_2}} = \frac{16.015 - 16.005}{\sqrt{0.020^2/10 + 0.025^2/10}} \approx 0.98773
$$
* Make a decision based on the value of the test statistic: because this is a two-tailed test, the critical values are $|Z_{\alpha/2}| \approx 1.96$. Since $|Z_0|\leq Z_{\alpha/2}$, the result of the test is that we fail to reject the null hypothesis.

The corresponding p-value for the above test statistic is given as:
$$
p = 2 \times (1 - \phi(|Z_0|)) = 2 \times (1 - \phi(|0.98773|)) \approx 0.32329
$$

Since the p-value is greater than the significant value, the result of the test is that, again, we fail to reject the null hypothesis. Therrefore, it is likely that the engineer is correct.

b.

The 95% confidence interval of the estimated difference in means $\hat{\Delta} = \bar{x}_1 - \bar{x}_2$ is given as:
$$
\text{CI}_{\alpha/2} = (\bar{x}_1 - \bar{x}_2) \pm \text{moe} \approx (16.015 - 16.005) \pm 0.01984 \approx [-0.00984, 0.02984]
$$
with $\text{moe} = Z_{\alpha/2}\sqrt{\sigma_1^2/n_1 + \sigma_2^2/n_2} = 1.96\times\sqrt{0.020^2/10 + 0.025^2/10} \approx 0.01984$

We can intepret the 95% confidence interval as follows: we are 95% confident that the true difference in means lies within this interval. Since the 95% confidence interval contains the value of zero, we can safely conclude that we fail to reject the null hypothesis.

c.

The power of the test in part (a) for a true difference in means of 0.04 is given as:
$$
\text{power} = 1- \beta = 1 - \Big(\Phi(Z_{\alpha/2} - Z_0^*) - \Phi(-Z_{\alpha/2} - Z_0^*)\Big) \approx 1 - \Big(\Phi(1.96 - 3.95092) - \Phi(-1.96 - 3.95092) \Big) \approx 0.97676
$$
with $Z_0 = 0.04/\sqrt{0.020^2/10 + 0.025^2/10} \approx 3.95092$

d.

Assume that sample sizes are equal ($n_1 = n_2 = n$) and using the same significant level $\alpha = 0.05$, the sample size should be used to ensure that β = 0.05 if the true difference in means is 0.04 is given as:
$$
n \simeq \frac{(Z_{\alpha/2} + Z_\beta)^2 (\sigma_1^2 + \sigma_2^2)}{(\Delta - \Delta_0)^2} \approx 9
$$

Verifying the results using the following Python code:

In [3]:
machine_1 = np.array([16.03, 16.01, 16.04, 15.96, 16.05, 15.98, 16.05, 16.02, 16.02, 15.99])
xbar_1 = np.mean(machine_1)
machine_2 = np.array([16.02, 16.03, 15.97, 16.04, 15.96, 16.02, 16.01, 16.01, 15.99, 16.00])
xbar_2 = np.mean(machine_2)
sigma_1 = 0.020
sigma_2 = 0.025
n_1 = 10
n_2 = 10
alpha = 0.05

##
# (a)
#
Z = (xbar_1 - xbar_2) / m.sqrt(sigma_1 ** 2 / n_1 + sigma_2 ** 2 / n_2)
p_Z = 2 * (1 - norm.cdf(abs(Z)))
print(p_Z)

##
# (b)
#
from scipy.stats import ttest_ind
test = ttest_ind(machine_1, machine_2, alternative = "two-sided")
test.confidence_interval()

##
# (c)
#
diff = 0.04
Z = diff / m.sqrt(sigma_1 ** 2 / n_1 + sigma_2 ** 2 / n_2)
Z_a = abs(norm.ppf(alpha / 2))
power = 1 - norm.cdf(Z_a - Z) - norm.cdf(-Z_a - Z)
print(power)

##
# (d)
#
beta = 0.05
Z_a = abs(norm.ppf(alpha / 2))
Z_b = abs(norm.ppf(beta))
n = m.ceil((Z_a + Z_b) ** 2 * (sigma_1 ** 2 + sigma_2 ** 2) / (diff - diff_0) ** 2)
print(n)

0.3232850955136608
0.9767570480702011
9


### Problem 10.1.3
Given that:
* For Population 1: $\sigma_1 = 3$, sample size $n_1 = 20$, and sample mean $\bar{x}_1 = 18$.
* For Population 2: $\sigma_2 = 3$, sample size $n_2 = 20$, and sample mean $\bar{x}_2 = 24$.

a. 

The hypothesis test is given as: $H_0: \mu_1 = \mu_2$ versus $H_1: \mu_1 \neq \mu2$ with $\alpha = 0.05$, and this is a two-tailed test. Hence, the test statistic for the difference between two population mean is given as:
$$
Z_0 = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\sigma_1^2/n_1 + \sigma_2^2/n_2}} = \frac{18 - 24}{\sqrt{3^2/20 + 3^2/20}} \approx -6.32456
$$

The corresponding p-value for the computed test statistic, which follows the standard normal distribution $\mathcal{N}(0,1)$, is given as:
$$
p = 2 \times (1 - \phi(|Z_0|)) = 2 \times (1 - P(Z \leq |-6.32456|)) \approx 0
$$

Since the computed p-value is smaller than the significant level $\alpha = 0.05$, the result of the test is that we reject the null hypothesis.

b. 

The 95% confidence interval of the estimated difference in means $\hat{\Delta}$ is given as:
$$
\text{CI}_{\alpha/2} = \hat{\Delta} \pm Z_{\alpha/2}\times\text{SE} = (18-24) \pm 1.96\times(\sqrt{3^2/20 + 3^2/20}) \approx -6 \pm 1.85942 = [-7.85942, -4.14058]
$$

Practically speaking, the result shows us that we are 95% confident that the difference in means lies within -7.85942 and -4.14058. Since the 95% confidence interval of $\hat{\Delta}$ does not contain the value of zero, it is unlikely that there is no difference between the means of these two populations. Hence, we reject the null hypothesis.

c. 

The $\beta$-error of the test in part (a) if the true difference in mean burning rate is 2.5 is given as:
$$
\begin{aligned}
\beta &= \phi(Z_{\alpha/2} - \frac{\Delta - \Delta_0}{\sqrt{\sigma_1^2/n_1 + \sigma_2^2/n_2}}) - \phi(-Z_{\alpha/2} - \frac{\Delta - \Delta_0}{\sqrt{\sigma_1^2/n_1 + \sigma_2^2/n_2}}) \\
&= \phi(1.96 - \frac{2.5 - 0}{\sqrt{3^2/20 + 3^2/20}}) - \phi(-1.96 - \frac{2.5 - 0}{\sqrt{3^2/20 + 3^2/20}}) \\
&\approx 0.24975
\end{aligned}
$$

d. 

The sample size needed to obtain power of 0.9 at a true difference in means is 14 is given as:
$$
n \simeq \frac{(Z_{\alpha/2} + Z_\beta)^2 (\sigma_1^2 + \sigma_2^2)}{(\Delta - \Delta_0)^2} \approx 1
$$

Verifying the results using the following Python code:


In [None]:
diff_0 = 0
sigma_1 = sigma_2 = 3
n_1 = n_2 = 20
xbar_1 = 18
xbar_2 = 24

##
# (a):
#
Z = (xbar_1 - xbar_2) / m.sqrt(sigma_1 ** 2 / n_1 + sigma_2 ** 2 / n_2)
p_Z = 2 * (1 - norm.cdf(abs(Z)))
print(p_Z)

##
# (b):
#
moe = abs(norm.ppf(0.05)) * m.sqrt(sigma_1 ** 2 / n_1 + sigma_2 ** 2 / n_2)
diff = xbar_1 - xbar_2
print(f"[{diff - moe},{diff + moe}]")

##
# (c):
#
diff = 2.5
Z_b_u = abs(norm.ppf(0.05 / 2)) - (diff - diff_0) / m.sqrt(sigma_1 ** 2 / n_1 + sigma_2 ** 2 / n_2)
Z_b_l = -abs(norm.ppf(0.05 / 2)) - (diff - diff_0)/ m.sqrt(sigma_1 ** 2 / n_1 + sigma_2 ** 2 / n_2)
beta = norm.cdf(Z_b_u) - norm.cdf(Z_b_l)
print(beta)

##
# (d)
#
diff = 14
Z_a = abs(norm.ppf(0.05/2))
Z_b = abs(norm.ppf(0.1))
n = m.ceil((Z_a + Z_b) ** 2 * (sigma_1 ** 2 + sigma_2 ** 2) / (diff - diff_0) ** 2)
print(n)

### Problem 10.1.4
Given that:
* For Population 1: $n_1 = 15$, $\sigma_1 = 20$
* For Population 2: $n_2 = 8$, $\sigma_2 = 20$

a.

The hypothesis test is $H_0: \mu_1 - \mu_2 = 10$ versus $\mu_1 - \mu_2 < 10$ with $\alpha = 0.1$. We can test the hypotheses as follows:
* Compute the test statistic, which follows the standard normal distribution $\mathcal{N}(0,1)$:
$$
Z_0 = \frac{(\bar{x}_1 - \bar{x}_2) - \Delta_0}{\sqrt{\sigma_1^2/n_1 + \sigma_2^2/n_2}} \approx -1.8643
$$
* Compute the corresponding p-value for the test statistic (noted that since this is a lower one-sided test):
$$
p = \Phi(Z_0) \approx \Phi(-1.8643) \approx 0.0311 
$$
Since the p-value is smaller than the significant level, the result of the test is that we reject the null hypothesis.

b.

The 90% lower one-side confidence interval for the difference in means is as follows:
$$
\text{CI}_{\alpha, lwr} = [-\infty, (\bar{x}_1 - \bar{x}_2)  + \text{moe}] \approx [-\infty, 4.53262]
$$
with $\text{moe} = Z_{\alpha}\times\sqrt{\sigma_1^2/n_1 + \sigma_2^2/n_2} \approx 1.28\times\sqrt{20^2/15 + 20^2/8} \approx 11.20762$

Since the 90% lower one-side confidence interval does not contain the value of 10, it is unlikely that the true difference in means is 10. Hence, we can safely reject the null hypothesis.

c.

The results are similar.


In [4]:
exp_1 = np.array([724, 718, 776, 760, 745, 759, 795, 756, 742, 740, 761, 749, 739, 747, 742])
xbar_1 = np.mean(exp_1)
exp_2 = np.array([735, 775, 729, 755, 783, 760, 738, 780])
xbar_2 = np.mean(exp_2)
sigma_1 = sigma_2 = 20

##
# (a):
#
diff_0 = 10
Z = (xbar_1 - xbar_2 - diff_0) / m.sqrt(sigma_1 ** 2 / n_1 + sigma_2 ** 2 / n_2)
p_Z = norm.cdf(Z)
print(Z); print(p_Z)

##
# (b):
#
Z_a = abs(norm.ppf(0.1))
moe = Z_a * m.sqrt(sigma_1 ** 2 / n_1 + sigma_2 ** 2 / n_2)
print(f"[-oo,{(xbar_1 - xbar_2) + moe}]")

-1.8643216762404444
0.03113827228885371
[-oo,4.78754566891606]


### Section 10.2