
![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

$$\large \textbf{Constructing A Confidence Interval } 🔶 \text{ CI }🔶$$
$$\large \textbf{ for A True } 🔶\text{ Population Parameter }🔶$$
$$\large \textbf{With a Confidence Level }🔶\text{ CL}🔶 $$
$$\large \textbf{and a Probability Distribution }🔶\text{ pdf}🔶 $$







![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

##**Choices of Probability Density Distributions (pdf) for CI**

<details>
  <summary><font = size = 4.5><b>1. Review of key terms and parameters</b></font></summary>

* **$\bar{x}$:** sample mean
* **$\mu$:** population mean
* **$s$:** Sample Standard Deviation
* **$\sigma$:** population standard deviation
* **$n$:** Sample size
* **$df = n-1$:** degree of freedom
* $\sigma_{\bar{x}} = std$_sample_mean $=\frac{s}{\sqrt{n}}$  

No matter what the distribution we used to calculate the error bound, the format for the confidence intervals for the mean will be the same:

CI $= (\overline{x} - EBM, \overline{x} + EBM)$

Similarly the confidence intervals for the proportion will be:

CI $= (\overline{x} - EBP, \overline{x} + EBP)$

where EBM = Error Bound for the Mean and EBP = Error Bound for the Proportion
</details>

<details>
  <summary><font = size = 4.5><b>2. Choices of Distributions</b></font></summary>

**X**: random variable for a sample mean
  * For known population standard deviation we use a normal distribution: $X \sim N(\bar{x}, \sigma)$
  * For unknown population standard deviation or small sample size ($n < 30$), we use the Student's t distribution:  $t \sim t_{df}$, with $\bar{x}$ and $std$_sample_mean.

</details>

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)
## **Python Syntax for Confidence Interval**

### **Syntax Structure:** $\hspace{20mm}$ ```scipy.stats.```$\color{blue}{\text{distribution}}$**.interval**$( \color{magenta}{\text{parameters}})$

\

|Name of <br> $\color{blue}{\textbf{ Distribution }}$|Math Notation of <br> $\color{blue}{\textbf{ Distribution }}$|Syntax for Confidence Interval <br>$\color{blue}{\textbf{ distribution }}$ . interval . ( $\color{magenta}{\text{parameters}}$ )|
|--|--|--|
| Normal |$N(\mu,\, \sigma)$ |`norm.interval(confidence_level, loc=0, scale=10)`|
|Student's t|  $t_{df}$|`t.interval(confidence_level, df, loc=0, scale=10)`|
|Binomial distribution| $N(n,\, p)$ |`binom.interval(confidence_level, n, p, loc=0)`|
| Chi-Squared  |$\chi^2_{df}$ |`chi2.interval(confidence_level, df, loc=0, scale=10)`|
| F |$F_{df_1,\, df_2}$ |`f.interval(confidence_level, df_n, df_d, loc=0, scale=10)`|

\
###**Output:** Two element array ```ci``` of endpoints of the confidence interval.

###Confidence Interval = [`ci[0]`, `ci[1]`


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

##$$\textbf{ CI for Population Mean } \mu $$

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)
##🔖 **$\color{blue}{\textbf{Lab Work 1: }}$**Comnstruct **CI** fr Population Mean $\mu$ Using Student's t Distribution.

<details>
<summary><b>Show the Problem </b></summary>
Suppose you do a study of acupuncture to determine how effective it is in relieving pain.  You measure sensory rates for 15 subjects with the results given.  Use the sample data to construct a 95% confidence interval for the mean sensory rate for the population.

8.6, 9.4, 7.9, 6.8, 8.3, 7.3, 9.2, 9.6, 8.7, 11.4, 10.3, 5.4, 8.1, 5.5, 6.9

</details>


## **1. Numerical Calculation with native Python:**

### `scipy.stats.` $\color{blue}{\text{t}}$**.interval**$( \color{blue}{\text{confidence_level, df, loc, scale}})$

In [None]:
import statistics

import numpy as np
import scipy.stats as stats

sample = [8.6, 9.4, 7.9, 6.8, 8.3, 7.3, 9.2, 9.6, 8.7, 11.4, 10.3, 5.4, 8.1, 5.5, 6.9]

# Compute sample statistic for the distribution to be used
x_bar = statistics.mean(sample)
# Sample standard deviation
std_sample = statistics.stdev(sample)
n = len(sample)
std_sample_mean = std_sample / np.sqrt(n) # Standard deviation of the sample mean
confidence_level = 0.95

ci = stats.t.interval(
    confidence_level,
    df=(n - 1),
    loc=x_bar,
    scale=std_sample_mean
)
print(f"We estimate with {confidence_level:.0%} confidence that the true population mean for the sensory rate\nis between {ci[0]:.2f} and {ci[1]:.2f} subjects")
print(f"The margin of error is {(x_bar - ci[0]):.2f} subjects")

We estimate with 95% confidence that the true population mean for the sensory rate
is between 7.30 and 9.15 subjects
The margin of error is 0.93 subjects


In [None]:
# Assign the output to two variables
ci_lower, ci_upper = stats.t.interval(
    confidence_level,
    df=(n - 1),
    loc=x_bar,
    scale=std_sample_mean
)
print(f"We estimate with {confidence_level:.0%} confidence that the true population mean\nfor the sensory rate is between {ci_lower:.2f} and {ci_upper:.2f} subjects")
print(f"The margin of error is {(x_bar - ci_lower):.2f} subjects")

We estimate with 95% confidence that the true population mean
for the sensory rate is between 7.30 and 9.15 subjects
The margin of error is 0.93 subjects


###2. **Graphic Illustration of the Confidence Interval**

In [None]:
# @title Adjust the slider to change the confidence level {"run":"auto"}
confidence_level = 1 # @param {"type":"slider","min":0,"max":1,"step":0.01}
import statistics

import numpy as np
import plotly.graph_objects as go
import scipy.stats as stats

sample = [8.6, 9.4, 7.9, 6.8, 8.3, 7.3, 9.2, 9.6, 8.7, 11.4, 10.3, 5.4, 8.1, 5.5, 6.9]

#confidence_level = 0.95
mu = statistics.mean(sample)
std_sample = statistics.stdev(sample)
n = len(sample)
df = n - 1
std = std_sample / np.sqrt(n)

ci_lower, ci_upper = stats.t.interval(
    confidence_level,
    df=df,
    loc=mu,
    scale=std
)

# define the plotting interval
x_lower = mu - 5 * std
x_upper = mu + 5 * std

# confidence interval is always symmetric, centered at the mean
fig = go.Figure()

# variables for pdf
x_pdf = np.arange(x_lower, x_upper, 0.001)
fig.add_trace(
    go.Scatter(
        x=x_pdf,
        y=stats.t.pdf(x_pdf, df, mu, std),
        mode="lines",
        name="pdf of Student's T Distribution",
        line_color="black"
    )
)

# Use boolean variables so that the pdf is bounded by x_lower and x_upper
x_ci = np.trim_zeros(x_pdf * ((ci_lower <= x_pdf) & (x_pdf <= ci_upper)))
x_ci2 = np.trim_zeros(x_pdf[(x_pdf == np.clip(x_pdf, ci_lower, ci_upper))])
assert x_ci.all() == x_ci2.all()

fig.add_trace(
    go.Scatter(
        x=x_ci,
        y=stats.t.pdf(x_ci, df, mu, std),
        mode="lines",
        line_width=1,
        line_color="green",
        name=f"Probability {confidence_level}",
        fill="tozeroy",
        fillcolor="rgba(17, 151, 37, 0.5)"
    )
)

# Add a vertical line for mu
fig.add_trace(
    go.Scatter(
        x=[mu, mu],
        y=[0, stats.t.pdf(mu, df, mu, std)],
        mode="lines",
        line=dict(
            color="orange",
            width=3,
            dash="dash"
        ),
        name=f"True Population Mean: {mu:.2f}",
    )
)

# Add a vertical line for lower bound of CI
fig.add_trace(
    go.Scatter(
        x=[ci_lower, ci_lower],
        y=[0, stats.t.pdf(ci_lower, df, mu, std)],
        mode="lines",
        line=dict(
            color="red",
            width=3,
            dash="dash"
        ),
        name=f"CI Lower Bound: {ci_lower:.2f}",
    )
)

# Add a vertical line for upper bound of CI
fig.add_trace(
    go.Scatter(
        x=[ci_upper, ci_upper],
        y=[0, stats.t.pdf(ci_upper, df, mu, std)],
        mode="lines",
        line=dict(
            color="red",
            width=3,
            dash="dash"
        ),
        name=f"CI Upper Bound: {ci_upper:.2f}",
    )
)

fig.add_annotation(
    x=mu,
    y=(1/3) * stats.t.pdf(mu, df, mu, std), # Positions of annotation
    text=f"<b>Shaded area<br>represents<br>probability<br>P({ci_lower:.1f} < x < {ci_upper:.1f})\
    <br>of {confidence_level:.0%} confidence level",
    align="center",
    font=dict(
        size=18,
        color="black",
        family="Sans Serif"
    )

)

fig.update_layout(
    height=600,
    width=1500,
    title="Confidence Interval with Student's T Distribution",
    yaxis = dict(title=r"$pdf(\overline{X})$", linewidth=1, zeroline=True, linecolor="black"),
    xaxis = dict(title=r"$\text{Sample Mean }(\overline{X})$", linewidth=1, zeroline=True, linecolor="black"),
    plot_bgcolor="grey",
    showlegend=True,
    font=dict(size=18, color="black", family="Sans Serif")
)

fig.show()

### **3.** Discussions: Enter numerical values to see how the areas, lower, upper bounds, and ci change.


#### **a.** Complete the following table
|Confidence Level (cl)|True population Mean $\bar{x}$|lower bound of CI ($ci_{\text{lower}}$)|lower bound of CI ($ci_{\text{lower}}$)|Probability
|:--:|:--:|:--:|:--:|:--:|
|cl = 0.00|8.23|8.23|8.23|0.0|
|cl = 0.4|8.23|7.99|8.46| 0.4 |
|cl = 0.95|8.23|7.30|9.15| 0.95 |
|cl = 1.00|8.23|$-\infty$|$\infty$|1.0|


#### **b.** Narrative Summary.  When the cl = 0, the confidence interval is also 0, and the probability  is 0, meaning that the chance that the sample mean to be within the confidence interval of the true mean is 0.
As the confidence level increases, the size of the confidence interval increases, and the probability increases as well. If we want a higher level of confidence, we need a bigger "net".
If we chose a confidence level of 1, the sample mean will always be within the confidence interval of the true population. To achieve this, the confidence interval must be infinitely large, in order to guarantee this. The probability is always equal to the value of the confidence level.


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)
##$$ \textbf{CI for Population} \color{red}{\text{ Proportion or Percentage}} $$


<details>
  <summary><b>Show review of requirements, key terms and parameters.</b></summary>

**Requirement:** The sample space for each trial must be binomial {yes, no}, {up, down}, {H, T}.  Therefore the underlying <b> distribution is a binomial distribution </b>. Recall $X$ is a binomial random variable, then $X \sim B(n, p)$ where $n$ is the number of trials and $p$ is the probability of success.

* **$p$:**  probability of success of each Bernoulli trial
* **$q=1-p$:** p robability of failure of each Bernoulli trial
* **$n$:**  sample size

</details>


<details>
  <summary><b> Random Variables and Their Distributions</b></summary>

**$X$:** a binomial random variable for number of successes in a binomial experiment,  $X \sim B(n, p)$.

$P^{\prime} = \hat{P} = \frac{X}{n}$: sample proportion random variable in a binomial experiment.

For a **large number** of trials: $P^{\prime} \sim B\left(, p\right) -> \hat{P} (or X) \sim N(p, \sqrt{\frac{pq}{n}})$

</details>




---
## **Normal Distribution Approximation of Repeated Binomial Trials**

|Terms|Validtions and Parameters and Validations||
|--|--|--|
|Condition|1. $n$ is large <br> 2. Your data should be a <b> single random sample </b> <br>   that comes from a population.<br> 3. The outsome of each trial is a binomial distrbution:<br>with the same probability of success $\hat{p}$.  <br> 4. $n\hat{p} \geq 10$ and $n\hat{q} = n(1-\hat{p}) \geq 10$ <br>|
|Population Proportion| $\hat{p}$|
|standard deviation| $std = \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}$|
|Normal Distribution Approximation|$ P\prime \sim N(\hat{p}, std)$
|Point of Estimate| $\hat{p} =\frac{x}{n}$<br> Where <br>$x$ is the number of successes and <br>$n$ is the sample size.


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)
## 🔖 **$\color{blue}{\textbf{Lab Work 2: }}$** Construct A confidence interval for population $\color{red}{\text{proportion}}$
>##  Using Normal Approximation of binomial trials with probability of sucess $p$ and sample size $n$.
> ## random variable $P \sim B(n, p)$ is apprximate to $X \sim N\left(p, \sqrt{\frac{p(1-p}{n}}\right)$ for large $n$ (number of binomial trials)
<details>
<summary><b>Show the Problem </b></summary>
Suppose that a market research firm is hired to estimate the percent of adults living in a large city who have cell phones. Five hundred randomly selected
adult residents in this city are surveyed to determine whether they have cell phones. Of the 500 people surveyed, 421 responded yes - they own cellphones. Using a <b> 95%</b> confidence level, compute a confidence interval estimate for the true proportion of adult residents of this city who have cell phones.
</details>

### **1. Python Syntax for CI with Normal Approximation of Binomial Trials**

### `scipy.stats.` $\color{blue}{\text{norm}}$**.interval**$( \color{blue}{\text{confidence_level, loc = p, scale}=\frac{\sqrt{p(1-p)}}{n}})$

> ## Where random variable $P \sim B(n, p)$ is apprximate to $X \sim N\left(p, \sqrt{\frac{p(1-p}{n}}\right)$ for large $n$ (number of binomial trials)

mean = $p$
Scale should be $\sqrt{\frac{p(1-p)}{n}}$

In [None]:
import statistics

import numpy as np
import plotly.graph_objects as go
import scipy.stats as stats

# Find the relevant parameters for normal approximation of binomial trials
confidence_level = 0.95
x_success = 421
n = 500
p_hat = x_success / n

std_phat = np.sqrt(p_hat * (1 - p_hat) / n)

# Compute confidence interval for true population proportion
ci_lower, ci_upper = stats.norm.interval(
    confidence_level,
    loc=p_hat,
    scale=std_phat
)

print(f"We estimate with {confidence_level:.0%} confidence that the true population proportion\nof the adults who say yes is between {ci_lower:.1%} and {ci_upper:.1%} subjects")
print(f"The margin of error is {(p_hat - ci_lower):.1%}")

We estimate with 95% confidence that the true population proportion
of the adults who say yes is between 81.0% and 87.4% subjects
The margin of error is 3.2%


###2. **Graphic Illustration of the Confidence Interval for the Population Propotion**

In [None]:
# @title Adjust the slider to change the confidence level {"run":"auto"}
confidence_level = 1 # @param {"type":"slider","min":0,"max":1,"step":0.01}
# Find the relevant parameters for normal approximation of binomial trials
x_success = 421
n = 500
p_hat = x_success / n

std_phat = np.sqrt(p_hat * (1 - p_hat) / n)

# Compute confidence interval for true population proportion
ci_lower, ci_upper = stats.norm.interval(
    confidence_level,
    loc=p_hat,
    scale=std_phat
)

# Make a connection between the proportion variable and the plotting variable
# define the plotting interval
mu = p_hat
std = std_phat
x_lower = mu - 5 * std
x_upper = mu + 5 * std

fig = go.Figure()

# variables for pdf
x_pdf = np.arange(x_lower, x_upper, 0.001)
fig.add_trace(
    go.Scatter(
        x=x_pdf,
        y=stats.norm.pdf(x_pdf, mu, std),
        mode="lines",
        name="pdf of Normal Distribution",
        line_color="black"
    )
)

# Use boolean variables so that the pdf is bounded by x_lower and x_upper
x_ci = np.trim_zeros(x_pdf * ((ci_lower <= x_pdf) & (x_pdf <= ci_upper)))
x_ci2 = np.trim_zeros(x_pdf[(x_pdf == np.clip(x_pdf, ci_lower, ci_upper))])
assert x_ci.all() == x_ci2.all()

fig.add_trace(
    go.Scatter(
        x=x_ci,
        y=stats.norm.pdf(x_ci, mu, std),
        mode="lines",
        line_width=1,
        line_color="green",
        name=f"Probability {confidence_level}",
        fill="tozeroy",
        fillcolor="rgba(17, 151, 37, 0.5)"
    )
)

# Add a vertical line for mu
fig.add_trace(
    go.Scatter(
        x=[mu, mu],
        y=[0, stats.norm.pdf(mu, mu, std)],
        mode="lines",
        line=dict(
            color="orange",
            width=3,
            dash="dash"
        ),
        name=f"True Population Proportion: {mu:.2f}",
    )
)

# Add a vertical line for lower bound of CI
fig.add_trace(
    go.Scatter(
        x=[ci_lower, ci_lower],
        y=[0, stats.norm.pdf(ci_lower, mu, std)],
        mode="lines",
        line=dict(
            color="red",
            width=3,
            dash="dash"
        ),
        name=f"CI Lower Bound: {ci_lower:.2f}",
    )
)

# Add a vertical line for upper bound of CI
fig.add_trace(
    go.Scatter(
        x=[ci_upper, ci_upper],
        y=[0, stats.norm.pdf(ci_upper, mu, std)],
        mode="lines",
        line=dict(
            color="red",
            width=3,
            dash="dash"
        ),
        name=f"CI Upper Bound: {ci_upper:.2f}",
    )
)

fig.add_annotation(
    x=mu,
    y=(1/3) * stats.norm.pdf(mu, mu, std), # Positions of annotation
    text=f"<b>Shaded area<br>represents<br>probability<br>P({ci_lower:.1f} < x < {ci_upper:.1f})\
    <br>of {confidence_level:.0%} confidence level",
    align="center",
    font=dict(
        size=18,
        color="black",
        family="Sans Serif"
    )

)

fig.update_layout(
    height=600,
    width=1500,
    title=f"Confidence Interval with Normal Approximation of {n} Binomial Trials<br>for population proportion",
    title_x=0.5,
    yaxis = dict(title=r"$pdf(\overline{X})$", linewidth=1, zeroline=True, linecolor="black"),
    xaxis = dict(title=r"$\text{Sample Mean }(\overline{X})$", linewidth=1, zeroline=True, linecolor="black"),
    plot_bgcolor="white",
    showlegend=True,
    font=dict(size=18, color="black", family="Sans Serif")
)

fig.show()

### **3.** Discussions: Enter numerical values to see how the areas, lower, upper bounds, and ci change.


#### **a.** Complete the following table
|Confidence Level (cl)|True population proportion $\hat{p}$|lower bound of CI ($ci_{\text{lower}}$)|lower bound of CI ($ci_{\text{lower}}$)|Probability
|:--:|:--:|:--:|:--:|:--:|
|cl = 0.00|0.84|0|0|0|
|cl = 0.4|0.84|0.83|0.85|0.4|
|cl = 0.95|0.84|0.81|0.87|0.95|
|cl = 1.00|0.84|$-\infty$|$\infty$|1.00|


#### **b.** Narrative Summary.  When the cl = 0, the confidence interval is also 0, and the probability is 0, meaning that the chance that the sample proportion is within the confidence interval of the true population proportion is 0.
> As the cl increases, the size of the confidence interval increases, and the probability also increases, meaning that and the probability is 0, meaning that the chance that the sample proportion is within the confidence interval of the true population proportion is increases.
If we choose $cl = 1$, meaning that the sample proportion will alwyas be within the confidence interval of the true population proportion.  To achieve this the confidence interval must be infinity in order to guarantee this.

> The probability is always equal to the value of cl.

#### **c.** Will the lower or upper confidence intervals ever be greater than 1?

> ANSWER: No. because the maximum proportion defined in mathematics is between 0 and 1.0

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)
## **$\color{green}{\textbf{TODO 1: }}$ Construct a confidence interval for a true population mean.**

<details>
<summary><b>Show the Problem </b></summary>
The Human Toxome Project (HTP) is working to understand the scope of industrial pollution in the human body. Industrial chemicals may enter the body
through pollution or as ingredients in consumer products. In October 2008, the scientists at HTP tested cord blood samples for 20 newborn infants in the United States. The cord blood of the "In utero/newborn" group was tested for 430 industrial compounds, pollutants, and other chemicals, including chemicals linked to brain and nervous system toxicity, immune system toxicity, and reproductive toxicity, and fertility problems. There are health concerns about the effects of some chemicals on the brain and nervous system. The data below shows
how many of the targeted chemicals were found in each infant’s cord blood.

79, 145, 147, 160, 116, 100, 159, 151, 156, 126,
137, 83, 156, 94, 121, 144, 123, 114, 139, 99

Use this sample data to construct a 90% confidence interval for the mean number of targeted industrial chemicals to be found in an in infant’s blood.

</details>

## **1. Compute Numerical CI with Python Syntax and write a description to your output.**

### `scipy.stats.` $\color{blue}{\text{t}}$**.interval**$\left( \color{blue}{\text{confidence_level, df, loc, scale}}\right)$

In [None]:
import statistics

import numpy as np
import scipy.stats as stats

sample = [79, 145, 147, 160, 116, 100, 159, 151, 156, 126, 137, 83, 156, 94, 121, 144, 123, 114, 139, 99]

# Compute sample statistic for the distribution to be used
x_bar = statistics.mean(sample)
# Sample standard deviation
std_sample = statistics.stdev(sample)
n = len(sample)
std_sample_mean = std_sample / np.sqrt(n) # Standard deviation of the sample mean
confidence_level = 0.90

# Assign the output to two variables
ci_lower, ci_upper = stats.t.interval(
    confidence_level,
    df=(n - 1),
    loc=x_bar,
    scale=std_sample_mean
)
print(f"""We estimate with {confidence_level:.0%} confidence that the true population mean
for the number of targeted chemicals in each infant's cord blood
is between {ci_lower:.2f} and {ci_upper:.2f} chemicals

The margin of error is {(x_bar - ci_lower):.2f}
""")

We estimate with 90% confidence that the true population mean
for the number of targeted chemicals in each infant's cord blood
is between 117.41 and 137.49 chemicals

The margin of error is 10.04



## **2. Create a graph to illustrate the confidence interval construct.**

In [None]:
# @title Adjust the slider to change the confidence level {"run":"auto"}
confidence_level = 0.95 # @param {"type":"slider","min":0,"max":1,"step":0.01}
import statistics

import numpy as np
import plotly.graph_objects as go
import scipy.stats as stats

sample = [79, 145, 147, 160, 116, 100, 159, 151, 156, 126, 137, 83, 156, 94, 121, 144, 123, 114, 139, 99]

mu = statistics.mean(sample)
std_sample = statistics.stdev(sample)
n = len(sample)
df = n - 1
std = std_sample / np.sqrt(n)

ci_lower, ci_upper = stats.t.interval(
    confidence_level,
    df=df,
    loc=mu,
    scale=std
)

# define the plotting interval
x_lower = mu - 5 * std
x_upper = mu + 5 * std

# confidence interval is always symmetric, centered at the mean
fig = go.Figure()

# variables for pdf
x_pdf = np.arange(x_lower, x_upper, 0.001)
fig.add_trace(
    go.Scatter(
        x=x_pdf,
        y=stats.t.pdf(x_pdf, df, mu, std),
        mode="lines",
        name="pdf of Student's T Distribution",
        line_color="black"
    )
)

# Use boolean variables so that the pdf is bounded by x_lower and x_upper
x_ci = np.trim_zeros(x_pdf * ((ci_lower <= x_pdf) & (x_pdf <= ci_upper)))
x_ci2 = np.trim_zeros(x_pdf[(x_pdf == np.clip(x_pdf, ci_lower, ci_upper))])
assert x_ci.all() == x_ci2.all()

fig.add_trace(
    go.Scatter(
        x=x_ci,
        y=stats.t.pdf(x_ci, df, mu, std),
        mode="lines",
        line_width=1,
        line_color="green",
        name=f"Probability {confidence_level}",
        fill="tozeroy",
        fillcolor="rgba(17, 151, 37, 0.5)"
    )
)

# Add a vertical line for mu
fig.add_trace(
    go.Scatter(
        x=[mu, mu],
        y=[0, stats.t.pdf(mu, df, mu, std)],
        mode="lines",
        line=dict(
            color="orange",
            width=3,
            dash="dash"
        ),
        name=f"True Population Mean: {mu:.2f}",
    )
)

# Add a vertical line for lower bound of CI
fig.add_trace(
    go.Scatter(
        x=[ci_lower, ci_lower],
        y=[0, stats.t.pdf(ci_lower, df, mu, std)],
        mode="lines",
        line=dict(
            color="red",
            width=3,
            dash="dash"
        ),
        name=f"CI Lower Bound: {ci_lower:.2f}",
    )
)

# Add a vertical line for upper bound of CI
fig.add_trace(
    go.Scatter(
        x=[ci_upper, ci_upper],
        y=[0, stats.t.pdf(ci_upper, df, mu, std)],
        mode="lines",
        line=dict(
            color="red",
            width=3,
            dash="dash"
        ),
        name=f"CI Upper Bound: {ci_upper:.2f}",
    )
)

fig.add_annotation(
    x=mu,
    y=(1/3) * stats.t.pdf(mu, df, mu, std), # Positions of annotation
    text=f"<b>Shaded area<br>represents<br>probability<br>P({ci_lower:.1f} < x < {ci_upper:.1f})\
    <br>of {confidence_level:.0%} confidence level",
    align="center",
    font=dict(
        size=18,
        color="black",
        family="Sans Serif"
    )

)

fig.update_layout(
    height=600,
    width=1500,
    title={
        "text": "Confidence Interval with Student's T Distribution",
        "x": 0.5
    },
    yaxis = dict(title=r"$pdf(\overline{X})$", linewidth=1, zeroline=True, linecolor="black"),
    xaxis = dict(title=r"$\text{Sample Mean }(\overline{X})$", linewidth=1, zeroline=True, linecolor="black"),
    plot_bgcolor="white",
    showlegend=True,
    font=dict(size=18, color="black", family="Sans Serif")
)

fig.show()

Output hidden; open in https://colab.research.google.com to view.

In [None]:
print(len(sample))

120


### **3.** Discussions: Enter numerical values to see how the areas, lower, upper bounds, and ci change.


#### **a.** Complete the following table
|Confidence Level (cl)|True population Mean $\bar{x}$|lower bound of CI ($ci_{\text{lower}}$)|lower bound of CI ($ci_{\text{lower}}$)|Probability
|:--:|:--:|:--:|:--:|:--:|
|cl = 0.00|127.45|127.45|127.45|0.0|
|cl = 0.4|127.45|124.35|130.55|0.4|
|cl = 0.95|127.45|115.30|139.60|0.95|
|cl = 1.00|127.45|$-\infty$|$\infty$|1.0|


#### **b.** Narrative Summary.  When the cl = 0, the confidence interval is also 0, and the probability  is 0, meaning that the chance that the sample mean to be within the confidence interval of the true mean is 0.


As in our previous lab example, when the confidence level increases, the size of the confidence interval grows, and the probability increases in tandem with the confidence level. If we want a higher level of confidence that we will capture the true value with our existing sample, we need to increase the size of our confidence interval. If we chose a confidence level of 1, the sample mean will always be within the confidence interval of the true population. To achieve this, the confidence interval must be infinitely large, in order to guarantee this.

As an experiment, I increased the size of the sample from 20 to 120 and checked the values of the confidence intervals to see what would change. At a confidence level of 0.95, the lower bound of the CI was 122.86 and the upper bound was 132.04. These confidence intervals are much narrower than the upper and lower bounds were when the sample was 20 (lower bound of 115.30 and upper bound of 139.60), which makes intuitive sense. We have more samples, we have a higher confidence that our estimates are more accurate.

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)
## **$\color{green}{\textbf{TODO 2: }}$** Construct a confidence interval estimate for the true population proportion.

<details>
<summary><b>Show the Problem </b></summary>
Suppose  250  randomly selected people are surveyed to determine if they own a tablet. Of the  250  surveyed,  114  reported owning a tablet. Using a  99%  confidence level, compute a confidence interval estimate for the true proportion of people who own tablets.

</details>

## **1. Calculate Numerical CI with Python Syntax and write a meaningful description to your output:**

### `scipy.stats.` $\color{blue}{\text{norm}}$**.interval**$\left( \color{blue}{\text{confidence_level, loc = p, scale}=\sqrt{\frac{p(1-p)}{n}}}\right)$
* Write a meaningful description for your output

In [None]:
x_success = 114
n = 250
cl = 0.99

p_hat = x_success / n

std_phat = np.sqrt(p_hat * (1 - p_hat) / n)

# Compute confidence interval for true population proportion
ci_lower, ci_upper = stats.norm.interval(
    cl,
    loc=p_hat,
    scale=std_phat
)

print(f"We estimate with {cl:.0%} confidence that the true population proportion\nof the adults who say yes is between {ci_lower:.1%} and {ci_upper:.1%} subjects")
print(f"The margin of error is {(p_hat - ci_lower):.1%}")

We estimate with 99% confidence that the true population proportion
of the adults who say yes is between 37.5% and 53.7% subjects
The margin of error is 8.1%


## **2. Create a graph to illustrate the confidence interval construct.**

In [None]:
# @title Adjust the slider to change the confidence level {"run":"auto"}
cl = 1 # @param {"type":"slider","min":0,"max":1,"step":0.01}
# Find the relevant parameters for normal approximation of binomial trials
x_success = 114
n = 250
p_hat = x_success / n

std_phat = np.sqrt(p_hat * (1 - p_hat) / n)

# Compute confidence interval for true population proportion
ci_lower, ci_upper = stats.norm.interval(
    cl,
    loc=p_hat,
    scale=std_phat
)

# Make a connection between the proportion variable and the plotting variable
# define the plotting interval
mu = p_hat
std = std_phat
x_lower = mu - 5 * std
x_upper = mu + 5 * std

fig = go.Figure()

# variables for pdf
x_pdf = np.arange(x_lower, x_upper, 0.001)
fig.add_trace(
    go.Scatter(
        x=x_pdf,
        y=stats.norm.pdf(x_pdf, mu, std),
        mode="lines",
        name="pdf of Normal Distribution",
        line_color="black"
    )
)

# Use boolean variables so that the pdf is bounded by x_lower and x_upper
x_ci = np.trim_zeros(x_pdf * ((ci_lower <= x_pdf) & (x_pdf <= ci_upper)))
x_ci2 = np.trim_zeros(x_pdf[(x_pdf == np.clip(x_pdf, ci_lower, ci_upper))])
assert x_ci.all() == x_ci2.all()

fig.add_trace(
    go.Scatter(
        x=x_ci,
        y=stats.norm.pdf(x_ci, mu, std),
        mode="lines",
        line_width=1,
        line_color="green",
        name=f"Probability {cl}",
        fill="tozeroy",
        fillcolor="rgba(17, 151, 37, 0.5)"
    )
)

# Add a vertical line for mu
fig.add_trace(
    go.Scatter(
        x=[mu, mu],
        y=[0, stats.norm.pdf(mu, mu, std)],
        mode="lines",
        line=dict(
            color="orange",
            width=3,
            dash="dash"
        ),
        name=f"True Population Proportion: {mu:.2f}",
    )
)

# Add a vertical line for lower bound of CI
fig.add_trace(
    go.Scatter(
        x=[ci_lower, ci_lower],
        y=[0, stats.norm.pdf(ci_lower, mu, std)],
        mode="lines",
        line=dict(
            color="red",
            width=3,
            dash="dash"
        ),
        name=f"CI Lower Bound: {ci_lower:.2f}",
    )
)

# Add a vertical line for upper bound of CI
fig.add_trace(
    go.Scatter(
        x=[ci_upper, ci_upper],
        y=[0, stats.norm.pdf(ci_upper, mu, std)],
        mode="lines",
        line=dict(
            color="red",
            width=3,
            dash="dash"
        ),
        name=f"CI Upper Bound: {ci_upper:.2f}",
    )
)

fig.add_annotation(
    x=mu,
    y=(1/3) * stats.norm.pdf(mu, mu, std), # Positions of annotation
    text=f"<b>Shaded area<br>represents<br>probability<br>P({ci_lower:.1f} < x < {ci_upper:.1f})\
    <br>of {cl:.0%} confidence level",
    align="center",
    font=dict(
        size=18,
        color="black",
        family="Sans Serif"
    )

)

fig.update_layout(
    height=600,
    width=1500,
    title=f"Confidence Interval with Normal Approximation of {n} Binomial Trials<br>for population proportion",
    title_x=0.5,
    yaxis = dict(title=r"$pdf(\overline{X})$", linewidth=1, zeroline=True, linecolor="black"),
    xaxis = dict(title=r"$\text{Sample Mean }(\overline{X})$", linewidth=1, zeroline=True, linecolor="black"),
    plot_bgcolor="white",
    showlegend=True,
    font=dict(size=18, color="black", family="Sans Serif")
)

fig.show()

### **3.** Discussions: Enter numerical values to see how the areas, lower, upper bounds, and ci change.


#### **a.** Complete the following table
|Confidence Level (cl)|True population proportion $\hat{p}$|lower bound of CI ($ci_{\text{lower}}$)|lower bound of CI ($ci_{\text{lower}}$)|Probability
|:--:|:--:|:--:|:--:|:--:|
|cl = 0.00|0.46|0|0|0.00|
|cl = 0.4|0.46|0.44|0.47|0.4|
|cl = 0.95|0.46|0.39|0.52|0.95|
|cl = 1.00|0.46|$-\infty$|$\infty$|1.00|


#### **b.** Narrative Summary.  When the cl = 0, the confidence interval is also 0, and the probability is 0, meaning that the chance that the sample proportion is within the confidence interval of the true population proportion is 0.


If our confidence level is zero, the probability is zero, which means that our sample proportion cannot be within the confidence interval. As we start to increase our confidence interval from zero, our confidence level grows, as we have a greater change that our sample proportion falls within our confidence interval.

#### **c.** Will the lower or upper confidence intervals ever be greater than 1?


Because we cannot have a proportion greater than one (you cannot have more than 100% of a sample/population), the confidence intervals cannot ever be greater than one.