![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)
# **Introduction to Inferential Statistics**

We use $\color{green}{\text{sample data}}$ to make generalizations about an $\color{red}{\text{unknown population}}.$ This part of statistics is called <b> inferential statistics.  The sample data help us to make an estimate of a $\color{red}{\text{population parameter}}.$</b>

<details>
  <summary><b>Learning Objectives</b></summary>

By the end of this chapter, the student should be able to use Python to:

* Calculate and interpret confidence intervals for estimating a population mean and a population proportion
* Perform one sample and two samples hypothesis testing of mean-based and proportion-based samples.
* Graph the probability representing p values over a distribution for one sample and side-by-side boxplots for two sample hypothesis tests.
* Perform a Chi-Square test of frequency-based sample.
* Perform One-Way Analysis of Variances (ANOVA).
</details>

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)
## **Confidence Intervals**




###**Understanding Confidence Intervals**
<details>
  <summary><b>Show details</b></summary>

A **confidence interval** is a statistical range or interval that provides an estimate of the plausible values for an unknown population parameter based on sample data. It quantifies the level of uncertainty or variability associated with the estimated parameter.

In practical terms, a confidence interval consists of two values: an upper bound and a lower bound. These bounds define a range of values within which the true population parameter is likely to fall with a specified level of confidence. The **confidence level** is typically expressed as a percentage (e.g., 95%, 99%) and represents the probability that the interval will capture the true population parameter in repeated sampling.

Here's how a confidence interval is typically interpreted: If we were to conduct the same study multiple times and calculate a confidence interval each time, we would expect the true population parameter to be contained within the confidence interval in a certain proportion of those intervals (the proportion is equal to the confidence level).
</details>

### **Graphic Illustration of 95% Confidence Interval of Population Mean**
<details>
  <summary><b>Show diagram</b></summary>
  
<img src = "https://drive.google.com/uc?id=1IDze0EcvE5TVRp6WkyDGoSlwNcyWos3t" alt="A probability distribution density function showing confidence level of 95% probability as the area under the curve center at the mean with corresponding interval centered at the mean and 2.5% probability as area shaded in red on each side of the probability representing the 95% confidence interval. Four lines for sample confidence intervals are drawn along the horizontal random variable axis, with text indicating if the confidence interval line included the mean or not."/>

</details>

We will be learning how to create confidence intervals for the mean and proportion in Lab 5.1.

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)
## **Hypothesis Testing Basics: $\color{magenta}{\text{Not}}$ about $\color{magenta}{\text{certainty}}$.  BUT $\color{magenta}{\text{are}}$  about $\color{magenta}{\text{confidence}}$.**



### **Description**


<details>
  <summary><b>Description of a Hypothesis Test</b></summary>

A hypothesis test is a statistical procedure used to make inferences and draw conclusions about a population based on sample data. It involves formulating two competing **hypotheses**, the **null hypothesis ($H_0$)** and the **alternative hypothesis ($H_a$ or $H_1$)**, and examining the evidence from the data to determine which hypothesis is more plausible.

Hypothesis testing is a statistical method that can be used to make decisions about a data set without having to examine every element in that dataset.
</details>

<details>
  <summary><b>Why are we working with sample data?</b></summary>

Why are we bothering to take a sample? Because we want to make a decision, and checking every element in the population might be too difficult (billions of events), or just impossible (testing food means destroying it).

The issues then become:

* There is something we want to know about the entire population, but we can’t interrogate all of it.
* We sample the population and learn something about that sample, but since it’s only a sample, we can’t be sure that it is, in fact, representative of the entire population.
* Finally, what - if anything – can we guess about the population, given what we’ve learned about the sample?

</details>

### **Hypothesis Testing Steps**

#### **Step 1: Set up the Null and Alternative Hypotheses.**

<details>
  <summary><b>Show Definitions of Null and Alternative Hypothesis</b></summary>
A hypothesis test starts with making two hypotheses:

* The null hypothesis – in general, this is a “suppose there’s nothing to see here” case.
* The alternative hypothesis – this is what we’re checking for.
The test works by assuming the null hypothesis is true and then checking to see how likely a sample fits into that hypothesis. If it’s not likely enough, then we can suggest the alternative hypothesis is true.
</details>

#### **Step 2: Choose a Significance Level**

<details>
  <summary><b>Importance of the Significance Level</b></summary>

The **significance level**, denoted as $\alpha$ (alpha), is a predefined threshold used in hypothesis testing to determine the level of evidence required to reject the null hypothesis.

In other words, the significance level is the probability at which we are willing to tolerate making a wrong conclusion by rejecting the null hypothesis when it is actually correct. It is typically set before conducting the hypothesis test and serves as a guide for decision-making.

The most commonly used significance levels are 0.05 (5%) and 0.01 (1%). These values are somewhat arbitrary but are widely adopted in many fields of research.
</details>

#### **Step 3: Collect Sample Data and Compare to Null Hypothesis**

<details>
  <summary><b>Collecting and Analyzing Sample Data</b></summary>

In hypothesis testing, sample data is used to assess the strength of evidence against the null hypothesis (H0) and make a decision about its validity.

We gather data from a representative sample of the population and calculate relevant statistics (such as means or proportions) based on the sample. We want to determine if our sample is unusual under the assumption that the null hypothesis is true.

If a sample is considered unusual, it suggests that the observed data or results deviate significantly from what would be expected under certain assumptions or null hypotheses.
</details>

<details>
  <summary><b>Two Approachs for Determining If Our Sample Is Unusual</b></summary>

We compare our sample statistic to the curve for the null hypothesis, and we figure out how likely it is that this sample could have come from a population where the null hypothesis is true.

There are two main approaches to hypothesis testing:

1. **Critical value approach**: In this approach, a critical value or cutoff is determined based on the desired significance level ($\alpha$). If the test statistic exceeds the critical value, the null hypothesis is rejected in favor of the alternative hypothesis.

2. **P-value approach**: The p-value is calculated based on the observed test statistic and represents the probability of obtaining results as extreme or more extreme than what was observed, assuming the null hypothesis is true. If the p-value is less than the chosen significance level ($\alpha$), the null hypothesis is rejected.

Our class will use the more commonly used p-value approach.

</details>




#### **Step 4. Compute the $p$ Value**


<details>
  <summary><b>What is a $p$ value?</b></summary>
  
  A **$p$-value** is a statistical measure that quantifies the strength of evidence against the **null hypothesis** (H0) in hypothesis testing. It represents the probability of obtaining results as extreme or more extreme than the observed data, assuming the null hypothesis is true.

When conducting a hypothesis test, the p-value is calculated by determining the area under the curve beyond the observed test statistic in the direction specified by the **alternative hypothesis**.

In hypothesis testing, the $p$-value is calculated based on the observed test statistic and its distribution under the null hypothesis. The $p$-value is compared to a predetermined **significance level** ($\alpha$) to make a decision about the null hypothesis.

Thus, a $p$-value has no meaning outside of the given sample, and cannot be related to any other sample or p-value, and doesn’t give an indication of how accurate the sample value is.

</details>

<details>
  <summary><b>How is the $p$-value is related to the area under the probability density curve?</b></summary>

For a **one-tailed test**: If the alternative hypothesis is one-sided (e.g., $H_a$: $\mu > \mu_0$ or $H_a$: $\mu < \mu_0$), the $p$-value is the area under the curve beyond the observed test statistic in the corresponding tail of the distribution.

For a **two-tailed test**: If the alternative hypothesis is two-sided (e.g., $H_a$: $\mu \neq \mu_0$), the $p$-value is the combined area in both tails beyond the observed test statistic.

In both cases, a smaller $p$-value indicates a smaller area under the curve beyond the observed test statistic, indicating more extreme data and stronger evidence against the null hypothesis.

In terms of graph, we are, in effect, comparing:
* The area under the probability density curve to the left, right (one-tailed tests) or both (two-tailed tests) where the sample result(s) is(are)
* to the total area under this curve.

This fraction is the probability of how likely it is that the sample came from a population that had a mean or proportion that matched the null hypothesis.


</details>



  




#### **Step 5. Drawing Conclusions about the Sample**

<details>
  <summary><b>Interpreting the $p$ value</b></summary>

  Here's how the interpretation of the $p$-value typically works:

* If the $p$-value is less than the significance level ($\alpha$), typically 0.05 or 0.01, it is considered **statistically significant**. This means that the observed data is unlikely to have occurred by chance alone under the assumption of the null hypothesis. In such cases, the null hypothesis is rejected in favor of the alternative hypothesis ($H_1$ or $H_a$).

* If the $p$-value is greater than the significance level, it is not considered statistically significant. This suggests that the observed data is reasonably likely to have occurred by chance under the null hypothesis. In these cases, there is insufficient evidence to reject the null hypothesis.

It's important to note that the $p$-value does not provide direct information about the effect size, the practical significance of the results, or the probability of the alternative hypothesis being true. It simply measures the strength of evidence against the null hypothesis based on the observed data.
</details>

<details>
  <summary><b>Show Options</b></summary>
  
|Significance Level ($\alpha$)and $p$-value|Conclusion|
|:--:|--|
| |**1.** The probability of this sample coming from a population with the values assumed by the null hypothesis is **significant.**|
|$\mathbf{p}$ **value** $ \mathbf{< \alpha}$ | **2.** We **can** reject the null hypothesis, which suggests the alternative may be true. <br> **NOTE:** </bf> this doesn’t prove the alternative hypothesis; only that we can feel a degree of confidence.
| |**3.** We **cannot** say anything else about the actual value of the underlying population – i.e. we can’t say that it’s likely to be 82%, or even close to 82%|<br>
<br>
| | **1.** The probability of this sample coming from a population with the values assumed by the null hypothesis is **not significant.**
| $\mathbf{p}$ **value** $ \mathbf{> \alpha}$| **2.** We **cannot** reject the null hypothesis.
| |**3** We **cannot** say _anything_ else about the actual value the underlying population – i.e. we can’t say that it’s likely to be 82%, or even close to 82%|

</details>

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)
## **The Different Types of Hypothesis Tests in this Lab:**

<details>
  <summary><b>Show Chart</b></summary>

<img src = "https://drive.google.com/uc?id=12ZK8o_7pYuAyIHS3qR7yANMPSyQm0VVW" />

</details>

###**Lab 5.2 Mean-Based One Sample t-Test**
We learn about **hypothesis tests for the mean when the population standard deviation $\sigma$ is unknown** (1 sample t-test).  This is the most common type of single population mean test, as it is not realistic to know sigma without knowing the mean. 1 sample t-tests compare sample data to some claimed value (for example, college students sleep less than 6.5 hours a night).

###**Lab 5.3 Propotion-Based One Sample Z-Test**
We learn about **hypothesis tests involving proportions**.  Proportions come from binomial data (either the subject is in the category or it is not).  We use the normal distribution to approximate the binomial distribution, which results in **z-tests**.  Similar to the mean tests, hypothesis tests involving proportions can compare a **single sample** to some claimed value (for example - more than 10% of the population is left-handed) or from **two different samples** from different populations (for example - males are more likely to be left-handed than females).

###**Lab 5.4 Propotion-Based Two Sample Z-Test**
We learn three different tests that involve frequencies. The **goodness of fit** test compares sample data to some claimed distribution.  For example, we could test whether concert goers buy the same number of S, M, L, XL sized t-shirts.  A **test for homogeneity**, also known as a test for homogeneity of variance, is a hypothesis test used to assess whether the variability (variance) of a continuous dependent variable is approximately the same across different groups or populations.  A **test for independence** is a statistical test used to determine whether there is a significant relationship or association between two categorical variables. The test is designed to examine whether the occurrence of one categorical variable is related to the occurrence of another categorical variable.  All three of these tests use the chi-squared distribution.

###**Lab 5.5 Mean-Based Two Sample t-Test**
We learn about **hypothesis tests for the mean involving two independent samples when $\sigma$ is unknown**. The two samples can come from **two different populations** (for example - first year college students vs. second year college students).  This type of test would be a **2-sample independent t-test**.  The two samples can also come from the **same population** (for example weight before vs. after going on a diet).  This scenario is often refered to as a matched-pairs test or **2-sample paired t-test** or **2-sample dependent t-test**.

###**Lab 5.6 Dependent variables Mean-Based t-Test**
We learn about **hypothesis tests for the mean involving two **dependent** samples**.  The key characteristic of dependent samples is that each data point in one group is uniquely associated with a data point in the other group. This pairing could arise from before-and-after measurements on the same individuals, measurements taken on the same subject under different conditions, or any other scenario where the pairs have a natural connection.

###**Lab 5.7 Frequency-Based Two Sample Chi-Square Test**
We learn about **hypothesis tests involving two independent proportions**.  Proportions come from binomial data (either the subject is in the category or it is not).  We use the normal distribution to approximate the binomial distribution, which results in **z-tests**.  Similar to the mean tests, hypothesis tests involving proportions can compare a **single sample** to some claimed value (for example - more than 10% of the population is left-handed) or from **two different samples** from different populations (for example - males are more likely to be left-handed than females).

###**Lab 5.8 Analysis of Variance**
We learn about ANOVA, which stands for **Analysis of Variance**, and it is a statistical method used to compare the means of three or more groups to determine if there are significant differences among them. ANOVA is commonly used when dealing with continuous data and multiple categorical groups.  (note:  ANOVA can be done with two groups as well but in this course we will use 2 sample t-tests for two groups.)





![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)
# **Additional Resources**

https://www.statology.org/two-sample-t-test-python/

https://www.marsja.se/how-to-perform-a-two-sample-t-test-with-python-3-different-methods/

https://www.geeksforgeeks.org/how-to-conduct-a-two-sample-t-test-in-python/

https://tirthajyoti.github.io/Intro_Hypothesis_Testing.html

https://sonalake.com/latest/hypothesis-testing-of-proportion-based-samples/#
