In [None]:
%matplotlib inline

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns

### q1

A pharmaceutical company is interested in testing a potential blood pressure lowering medication. Their first examination considers only subjects that received the medication at baseline then two weeks later. The data are as follows (SBP in mmHg)

Consider testing the hypothesis that there was a mean reduction in blood pressure? Give the P-value for the associated two sided T test.

(Hint, consider that the observations are paired.)




In [None]:
r = pd.DataFrame([
    [140,132],
    [138,135],
    [150, 151],
    [148, 146],
    [135, 130],
], columns=['baseline', 'week2'])

$H_{1}$ - mean reduction in blood pressure over 2 weeks

- consider observations are paired

- two sided t-test

In [None]:
from scipy import stats
stats.ttest_rel(r.baseline, r.week2) 

We can rject the $null$ hypothesis with c.l. 0.1

#### q2

A sample of 9 men yielded a sample average brain volume of 1,100cc and a standard deviation of 30cc. What is the complete set of values of μ0 that a test of H0:μ=μ0 would fail to reject the null hypothesis in a two sided 5% Students t-test?

$ E_{est} \pm t_{n-1}(\alpha)\times\frac{S}{\sqrt{n}}$

In [None]:
n = 9
nu = n - 1
alpha = 0.975
sd = 30
m = 1100
m + np.array([-1, 1]) * stats.t.ppf(alpha, df=nu) * sd / np.sqrt(n)

### q3 

Researchers conducted a blind taste test of Coke versus Pepsi. Each of four people was asked which of two blinded drinks given in random order that they preferred. The data was such that 3 of the 4 people chose Coke. Assuming that this sample is representative, report a P-value for a test of the hypothesis that Coke is preferred to Pepsi using a one sided exact test.

[Binomeal Test](https://en.wikipedia.org/wiki/Binomial_test)

In [None]:
stats.binom.pmf(k=3,n=4,p=0.5) + stats.binom.pmf(k=4,n=4,p=0.5)

In [None]:
stats.binom_test(3, 4, p=0.5, alternative='greater')

### q4

Infection rates at a hospital above 1 infection per 100 person days at risk are believed to be too high and are used as a benchmark. A hospital that had previously been above the benchmark recently had 10 infections over the last 1,787 person days at risk. About what is the one sided P-value for the relevant test of whether the hospital is *below* the standard?

In [None]:
p = 1/100
stats.binom_test(10, 1787 , p=p, alternative='less')

H0:λ=0.01 versus Ha:λ<0.01. X=11, t=1,787 and assume X∼H0Poisson(0.01×t)

In [None]:
#ppois(10, lambda = 0.01 * 1787)

### q5

Suppose that 18 obese subjects were randomized, 9 each, to a new diet pill and a placebo. Subjects’ body mass indices (BMIs) were measured at a baseline and again after having received the treatment or placebo for four weeks. The average difference from follow-up to the baseline (followup - baseline) was −3 kg/m2 for the treated group and 1 kg/m2 for the placebo group. The corresponding standard deviations of the differences was 1.5 kg/m2 for the treatment group and 1.8 kg/m2 for the placebo group. Does the change in BMI appear to differ between the treated and placebo groups? Assuming normality of the underlying data and a common population variance, give a pvalue for a two sided t test.

In [None]:
n0 = 18
n=9
mean_diff_treated = -3
sd_diff_treated = 1.5

mean_diff_placebo = 1
sd_diff_placebo = 1.8
# common population variance

$t = \frac{\overline{X}_1-\overline{X}_2}{S_p\sqrt{\frac{2}{n}}}$

$S_p = \sqrt{\frac{S^2_{X1} + S^2_{X2}}{2}}$

In [None]:
nu = 2*n - 2
sp = np.sqrt( (sd_diff_treated**2 + sd_diff_placebo**2)/2 )
tst = np.abs(mean_diff_treated - mean_diff_placebo) / (sp * np.sqrt( 2/ n) )
stats.t.cdf(tst, df=nu)

```n1 <- n2 <- 9

x1 <- -3 ##treated

x2 <- 1 ##placebo

s1 <- 1.5 ##treated

s2 <- 1.8 ##placebo

s <- sqrt(((n1 - 1) * s1^2 + (n2 - 1) * s2^2)/(n1 + n2 - 2))

ts <- (x1 - x2)/(s * sqrt(1/n1 + 1/n2))

2 * pt(ts, n1 + n2 - 2)```

### q6

Brain volumes for 9 men yielded a 90% confidence interval of 1,077 cc to 1,123 cc. Would you reject in a two sided 5% hypothesis test of

H0:μ=1,078?

- no I wold not reject, 5% interval should be wider

### q7

Researchers would like to conduct a study of 100 healthy adults to detect a four year mean brain volume loss of .01 mm3. Assume that the standard deviation of four year volume loss in this population is .04 mm3. About what would be the power of the study for a 5% one sided test versus a null hypothesis of no volume loss?

In [None]:
import statsmodels.api as sm

effect_size = 0.01/0.04
sm.stats.tt_solve_power(alpha=0.05, effect_size=effect_size, nobs=100, alternative='larger')

The hypothesis is H0:μΔ=0 versus Ha:μΔ>0 where μΔ is volume loss (change defined as Baseline - Four Weeks). The test statistics is 10X¯Δ.04 which is rejected if it is larger than Z.95=1.645.

We want to calculate

P(X¯ΔσΔ/10>1.645 | μΔ=.01)=P(X¯Δ−.01.004>1.645−.01.004 | μΔ=.01)=P(Z>−.855)=.80
Or note that X¯Δ is N(.01,.004) under the alternative and we want the P(X¯Δ>1.645∗.004) under Ha.

#### q8

Researchers would like to conduct a study of n healthy adults to detect a four year mean brain volume loss of .01 mm3. Assume that the standard deviation of four year volume loss in this population is .04 mm3. About what would be the value of n needed for 90% power of type one error rate of 5% one sided test versus a null hypothesis of no volume loss?


In [None]:
import statsmodels.api as sm

effect_size = 0.01/0.04
sm.stats.tt_solve_power(alpha=0.05, power=0.9, effect_size=effect_size, alternative='larger')

The hypothesis is H0:μΔ=0 versus Ha:μΔ>0 where μΔ is volume loss (change defined as Baseline - Four Weeks). The test statistics is X¯Δ.04/n√ which is rejected if it is larger than Z.95=1.645.

We want to calculate

P(X¯ΔσΔ/n‾‾√>1.645 | μΔ=.01)=P(X¯Δ−.01.04/n‾‾√>1.645−.01.04/n‾‾√ | μΔ=.01)=P(Z>1.645−n‾‾√/4)=.90
So we need 1.645−n‾‾√/4=Z.10=−1.282 and thus

n=(4∗(1.645+1.282))2.