# Lesson 4: Hypothesis Testing


Welcome to our lesson on **"Hypothesis Testing"!** Hypothesis testing is a key tool in statistics for making data-driven decisions. For example, imagine a scientist testing if a new drug is effective. Hypothesis testing helps determine if the observed effect is due to the drug or just chance. By the end of this lesson, you'll understand the basics, conduct a hypothesis test, and interpret its results using Python.

---

## What is Hypothesis Testing?

Hypothesis testing is like being a detective. You gather data and decide if there's enough evidence to support your claim.

---

## Null and Alternative Hypotheses

- **Null Hypothesis (H₀)**:  
  The default position that there is no effect or difference. It's what we assume to be true until proven otherwise.  
  Example: Testing if a coin is fair.  
  - **H₀**: The coin is fair (it lands heads 50% of the time).

- **Alternative Hypothesis (Hₐ)**:  
  What you want to prove. It represents an effect or difference.  
  - **Hₐ**: The coin is not fair (it does not land heads 50% of the time).

---

## Significance Level and P-value

- **Significance Level (α)**:  
  The threshold for rejecting the null hypothesis, usually set at 0.05 (5%). If the probability of observing your data given that H₀ is true is less than α, you reject H₀.

- **P-value**:  
  Indicates how likely it is to get results at least as extreme as yours, assuming H₀ is true.  
  - Smaller p-values = Stronger evidence against H₀.

---

## Step-by-Step Explanation of Hypothesis Testing: Part 1

### Example:
You have test scores from a small class and want to test if the class average is different from 70.

- **H₀**: The mean test score is 70.  
- **Hₐ**: The mean test score is not 70.  
- **Significance Level**: α = 0.05.

---

## Step-by-Step Explanation of Hypothesis Testing: Part 2

### Sample Data:  
`[71, 72, 69, 74, 73]`

We will conduct a **one-sample t-test** to compare the sample mean against the hypothesized population mean (70). The t-test helps determine if there is a significant difference between the means, considering sample size and variability.

---

## Performing a One-Sample t-Test in Python

```python
# Conducting a t-test
from scipy.stats import ttest_1samp

# Sample data
data = [71, 72, 69, 74, 73]

# Perform one-sample t-test against the null hypothesis mean = 70
t_stat, p_value = ttest_1samp(data, 70)

print("T-statistic:", t_stat)  # T-statistic: 2.092457497388744
print("P-value:", p_value)      # P-value: 0.10453999977837579
```

### Interpreting the Results:
- **T-statistic**: Measures the difference between your sample mean and the population mean relative to sample variation.
- **P-value**: Tells how likely it is to observe the data if H₀ is true.

#### Decision:
- If p-value < α (0.05): Reject H₀.  
- If p-value ≥ α (0.05): Fail to reject H₀.  

In this case, **p-value > 0.05**, so we fail to reject the null hypothesis. There isn’t enough evidence to prove the mean test score is not 70.

---

## Lesson Summary

🎉 Congratulations! You've learned the basics of hypothesis testing, a vital method for making data-based decisions. Here's what we covered:  

- **Key terms**:  
  - Null hypothesis (H₀)  
  - Alternative hypothesis (Hₐ)  
  - Significance level (α)  
  - P-value  

Now it's your turn! In the practice session, you will run hypothesis tests on different datasets. Experiment with various sample data and significance levels to see their effects.  

**Happy coding!**


## Hypothesis Testing for Exam Scores

Here’s the complete code with the appropriate print statements added:

```python
from scipy.stats import ttest_1samp

# Final exam scores
data = [78, 82, 74, 79, 85, 67, 90, 71, 76, 73]

# Perform one-sample t-test against the null hypothesis mean = 75
t_stat, p_value = ttest_1samp(data, 75)

print("T-statistic:", t_stat)
print("P-value:", p_value)

# Decision based on p-value
if p_value < 0.05:
    print("We reject the null hypothesis. The average final exam score is significantly different from 75.")
else:
    print("We fail to reject the null hypothesis. There is not enough evidence to say the average final exam score is different from 75.")
```

### Explanation:
1. **T-statistic**: Shows the difference between the sample mean and the hypothesized mean, adjusted for variability and sample size.
2. **P-value**: If it’s less than 0.05, we reject the null hypothesis, indicating the average is significantly different from 75.
3. **Decision Statement**:
   - Reject **H₀**: When there’s strong evidence against it.
   - Fail to reject **H₀**: When there’s insufficient evidence to conclude a difference.

## Perform a One-Sample T-Test on Daily Water Intake


Hey there, Celestial Traveler! 🌌  

You're given some data on the daily water intake (in liters) of a sample of people. Your mission is to perform a one-sample t-test to determine if the average daily water intake is significantly different from the recommended **2.5 liters**. Complete the code to calculate the t-statistic and p-value, and interpret the results.  

Let’s see if people are meeting their hydration goals! 💧

---

## Python Code to Perform One-Sample t-Test

```python
from scipy.stats import ttest_1samp

# Sample data: daily water intake in liters
data = [2.3, 2.8, 2.1, 2.4, 2.9, 3.0, 2.2, 2.7, 2.5]

# Perform one-sample t-test against the null hypothesis mean = 2.5 liters
t_stat, p_value = ttest_1samp(data, 2.5)

# Print the t-statistic and p-value
print("T-statistic:", t_stat)
print("P-value:", p_value)

# Interpretation
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference in daily water intake.")
else:
    print("Fail to reject the null hypothesis: There is no significant difference in daily water intake.")
```

---

## Explanation

1. **Null Hypothesis (H₀)**:  
   The average daily water intake is **2.5 liters**.  

2. **Alternative Hypothesis (Hₐ)**:  
   The average daily water intake is **not 2.5 liters**.  

3. **Significance Level (α)**:  
   Set at 0.05 (5%).  

4. **T-Statistic**:  
   Measures the difference between the sample mean and the hypothesized mean, accounting for sample variability.  

5. **P-Value**:  
   Indicates the likelihood of observing the data if the null hypothesis is true.  

6. **Decision**:  
   - If **p-value < 0.05**, reject H₀: There is a significant difference.  
   - If **p-value ≥ 0.05**, fail to reject H₀: No significant difference.  

---

## Results Interpretation Example

If the code outputs:  
- **T-statistic**: `1.734`  
- **P-value**: `0.124`  

Then:  
- Since **p-value > 0.05**, we fail to reject the null hypothesis.  
- **Conclusion**: There is no significant difference in daily water intake.  

💡 Happy analyzing, and stay hydrated! 💦


## Two Sample T-Test for Teaching Methods

Hey there, Galactic Pioneer!

You're given test scores of students from two different teaching methods. Complete the code to perform a two-sample t-test and determine if there's a significant difference between the two teaching methods. This will test if the new method is more effective.

Let's see how the two samples compare. You've got this!

from scipy.stats import ttest_ind

# Test scores for two teaching methods
method_1_scores = [88, 90, 78, 92, 85]
method_2_scores = [78, 82, 85, 90, 88]

# TODO: Perform two-sample t-test by calling ttest_ind(method_1_scores, method_2_scores)

print("T-statistic:", t_stat)
print("P-value:", p_value)

# Conclusion based on p-value
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The two teaching methods result in significantly different scores.")
else:
    print("Fail to reject the null hypothesis: No significant difference between teaching methods.")

```markdown
# Comparing Teaching Methods: A Two-Sample t-Test

Hey there, Galactic Pioneer! 🚀  

You’ve been given test scores from two teaching methods. Your mission is to perform a two-sample t-test to determine if there’s a significant difference between the two methods. This will help decide if the new teaching method is more effective.  

Let’s dive in and compare! 📊  

---

## Python Code for Two-Sample t-Test

```python
from scipy.stats import ttest_ind

# Test scores for two teaching methods
method_1_scores = [88, 90, 78, 92, 85]
method_2_scores = [78, 82, 85, 90, 88]

# Perform two-sample t-test
t_stat, p_value = ttest_ind(method_1_scores, method_2_scores)

# Print the t-statistic and p-value
print("T-statistic:", t_stat)
print("P-value:", p_value)

# Conclusion based on p-value
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The two teaching methods result in significantly different scores.")
else:
    print("Fail to reject the null hypothesis: No significant difference between teaching methods.")
```

---

## Explanation

1. **Null Hypothesis (H₀)**:  
   The average scores for both teaching methods are the same.  

2. **Alternative Hypothesis (Hₐ)**:  
   The average scores for the two methods are significantly different.  

3. **Significance Level (α)**:  
   Set at 0.05 (5%).  

4. **Two-Sample t-Test**:  
   Compares the means of two independent groups (e.g., students taught by different methods).  

5. **T-Statistic**:  
   Indicates the size of the difference relative to sample variability.  

6. **P-Value**:  
   Measures the probability of observing such a difference if the null hypothesis is true.  

7. **Decision**:  
   - If **p-value < 0.05**, reject H₀: The two methods are significantly different.  
   - If **p-value ≥ 0.05**, fail to reject H₀: No significant difference.  

---

## Results Interpretation Example

If the code outputs:  
- **T-statistic**: `1.223`  
- **P-value**: `0.245`  

Then:  
- Since **p-value > 0.05**, we fail to reject the null hypothesis.  
- **Conclusion**: No significant difference between the two teaching methods.  

🌟 Great work, Pioneer! Keep exploring the galaxies of data! 🌌
```