###Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.
Ans: \
The **t-test** and **z-test** are both statistical tests used to determine if there is a significant difference between groups or values, but they are used in different situations based on the sample size and knowledge about the population.

---

###  **T-Test vs Z-Test: Key Differences**

| Feature                   | T-Test                                                    | Z-Test                                                      |
|---------------------------|------------------------------------------------------------|--------------------------------------------------------------|
| **Used When**             | Population standard deviation is **unknown**               | Population standard deviation is **known**                   |
| **Sample Size**           | Usually for **small samples** (n < 30)                     | Typically for **large samples** (n ≥ 30)                     |
| **Distribution**          | Follows **t-distribution**                                 | Follows **normal distribution**                              |
| **Variance**              | Estimated from sample                                      | Assumed known                                                |
| **Application**           | More common in real-world applications                     | Less common (requires population SD)                         |

---

###  **Example Use Cases:**

#### **T-Test Example**  
> A researcher wants to know if a new teaching method improves test scores. She tests it on a small group of 20 students and compares the results with historical scores. Since the population standard deviation is unknown and the sample is small, a **t-test** is appropriate.

#### **Z-Test Example**  
> A manufacturer knows that the average weight of a product is 100g with a known population standard deviation of 5g. He samples 50 items to check if the mean weight has changed. Since the population SD is known and the sample is large, a **z-test** is suitable.

### Q2: Differentiate between one-tailed and two-tailed tests.
Ans: \
###  **One-Tailed Test**

- **Definition:** Tests for the possibility of the relationship in **one direction only** (greater than *or* less than).
- **Use case:** When the research hypothesis predicts a specific direction of the effect.
- **Hypotheses example:**
  - **H₀ (null):** μ = 100  
  - **H₁ (alt):** μ > 100 *(right-tailed)* or μ < 100 *(left-tailed)*

- **Visual:** Only one "tail" (side) of the normal distribution is considered.

 **Example:**  
A company wants to test if a new training method increases employee productivity. The test will only check for *increase*, not decrease ⇒ **one-tailed test**.

---

###  **Two-Tailed Test**

- **Definition:** Tests for the possibility of the relationship in **both directions** (greater than or less than).
- **Use case:** When no specific direction is predicted — just a difference.
- **Hypotheses example:**
  - **H₀ (null):** μ = 100  
  - **H₁ (alt):** μ ≠ 100

- **Visual:** Both tails (sides) of the distribution are considered.

 **Example:**  
A researcher wants to know if a new diet plan changes average weight (increase or decrease) ⇒ **two-tailed test**.

---

###  **Key Difference Summary:**

| Feature             | One-Tailed Test                  | Two-Tailed Test                     |
|---------------------|----------------------------------|-------------------------------------|
| Direction           | Tests one direction (>, <)       | Tests both directions (≠)           |
| Critical region     | One side of distribution         | Both sides of distribution          |
| Use when            | Direction of effect is predicted | Direction is **not** predicted      |
| p-value threshold   | Entire α on one side             | α/2 on each side                    |


### Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.
Ans: \
# Type I Error (False Positive)
Definition: Rejecting the null hypothesis when it is actually true.

Symbol: Denoted by α (alpha), the significance level.

Consequence: We conclude there is an effect or difference when there actually isn't.

 Example:
A medical test wrongly indicates a patient has a disease (positive result) when they actually don’t.

 Scenario:
A researcher tests a new drug and concludes it works, but in reality, the drug has no real effect.
→ Type I Error — a false alarm.

# Type II Error (False Negative)
Definition: Failing to reject the null hypothesis when it is actually false.

Symbol: Denoted by β (beta).

Consequence: We miss detecting a real effect or difference.

Example:
A medical test fails to detect a disease in a person who actually has it.

 Scenario:
A company tests a new marketing strategy but concludes it doesn't increase sales, when in fact it does.
→ Type II Error — a missed opportunity.

### Q4: Explain Bayes's theorem with an example.
Ans: \
**Bayes' Theorem** is a fundamental concept in probability theory used to update the probability of a hypothesis based on new evidence.

### **Bayes' Theorem Formula**
$$
[
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
]
$$
Where:

- \(P(A|B)\) = **Posterior Probability**: Probability of event A given that B has occurred  
- \(P(B|A)\) = **Likelihood**: Probability of event B given that A is true  
- \(P(A)\) = **Prior Probability**: Initial probability of event A  
- \(P(B)\) = **Marginal Probability**: Total probability of event B

---

### **Example Scenario: Medical Testing**

Suppose:

- 1% of people have a disease → \(P(D) = 0.01\)
- The test is 99% accurate:
  - If a person has the disease, the test is positive 99% of the time → \(P(T|D) = 0.99\)
  - If a person does **not** have the disease, the test is falsely positive 5% of the time → \$$(P(T|\neg D) = 0.05\)$$

We want to find:  
**What is the probability that someone has the disease if they tested positive?**  
→ \(P(D|T)\)

---

### **Using Bayes' Theorem**

Let’s plug in the values:
$$
[
P(D|T) = \frac{P(T|D) \cdot P(D)}{P(T)}
]
$$

First, calculate \(P(T)\) using **total probability**:
$$
[
P(T) = P(T|D) \cdot P(D) + P(T|\neg D) \cdot P(\neg D)
= (0.99)(0.01) + (0.05)(0.99) = 0.0099 + 0.0495 = 0.0594
]
$$
Now, compute the posterior:
$$
[
P(D|T) = \frac{(0.99)(0.01)}{0.0594} \approx \frac{0.0099}{0.0594} \approx 0.1667
]
$$
---

###  **Interpretation**

Even if someone **tests positive**, there's only about a **16.7% chance** they actually have the disease due to the low prevalence in the population.


### Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.
Ans: \
A **confidence interval (CI)** is a **range of values** that is likely to contain a population parameter (like the mean) with a certain level of **confidence**.

---

### **Why Use It?**

It helps us understand how much uncertainty is associated with a sample statistic.  
For example:  
> “We are 95% confident that the true average height of students lies between 165 cm and 175 cm.”

---

###  **Formula (for population mean, when population standard deviation is unknown):**
$$
[
\text{CI} = \bar{x} \pm t^* \left(\frac{s}{\sqrt{n}}\right)
]
$$

Where:
- \( bar{x} \) = Sample mean  
- \( s \) = Sample standard deviation  
- \( n \) = Sample size  
- \( t^* \) = t-value from the t-distribution (based on confidence level and degrees of freedom)

---

### **Example:**

Let's say:
- Sample size \( n = 25 \)
- Sample mean \( \bar{x} = 100 \)
- Sample standard deviation \( s = 15 \)
- Confidence level = 95%  
- Degrees of freedom \( df = 24 \)

From the t-table, for 95% confidence and df = 24, \( t^* \approx 2.064 \)

Now:
$$
[
\text{Margin of Error} = 2.064 \times \left(\frac{15}{\sqrt{25}}\right) = 2.064 \times 3 = 6.192
]
$$

So,
$$
[
\text{Confidence Interval} = 100 \pm 6.192 = [93.81, 106.19]
]
$$
---

### **Interpretation:**

> We are 95% confident that the true population mean lies between **93.81** and **106.19**.

### Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.
Ans: \
###  **Bayes' Theorem Formula:**
$$
[
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
]
$$
Where:
- \( P(A|B) \) = Probability of event **A** given **B** is true (posterior probability)  
- \( P(B|A) \) = Probability of event **B** given **A** is true (likelihood)  
- \( P(A) \) = Probability of event **A** (prior)  
- \( P(B) \) = Probability of event **B** (marginal likelihood)

---

### **Problem Statement:**

A medical test is used to detect a rare disease:
- 1% of the population has the disease → \( P(D) = 0.01 \)
- If a person has the disease, the test is 99% accurate → $$( P(\text{Positive} | D) = 0.99 )$$
- If a person doesn't have the disease, the test still gives a false positive 5% of the time → \$$( P(\text{Positive}| text{No D}) = 0.05 )$$

A person tests positive. What is the probability they actually have the disease?

---

### **Solution Using Bayes' Theorem:**

Let:
- \( D \) = has disease  
- \( Pos \) = tests positive

We need to calculate \( P(D|Pos) \).

#### Step 1: Known values
- \( P(D) = 0.01 \)
- \( P(Pos|D) = 0.99 \)
- \( P(No D) = 0.99 \)
- \( P(Pos|No D) = 0.05 \)

#### Step 2: Compute \( P(\text{Pos}) \):
$$
[
P(\text{Pos}) = P(\text{Pos}|D) \cdot P(D) + P(\text{Pos}|\text{No D}) \cdot P(\text{No D})
]
$$

$$
[
P(\text{Pos}) = (0.99 \cdot 0.01) + (0.05 \cdot 0.99) = 0.0099 + 0.0495 = 0.0594
]
$$
#### Step 3: Apply Bayes’ Theorem
$$
[
P(D|\text{Pos}) = \frac{0.99 \cdot 0.01}{0.0594} = \frac{0.0099}{0.0594} \approx 0.1667
]
$$
---

### Final Answer:

> The probability that the person **actually has the disease**, given they tested positive, is **~16.67%**.

This shows why **even accurate tests can give surprising results** when the condition is rare!

### Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.
Ans: \
To calculate the **95% confidence interval** for a sample, we use the following formula:

---

### **Confidence Interval Formula** (for large samples or known population standard deviation):
$$
[
\text{CI} = \bar{x} \pm z \cdot \frac{\sigma}{\sqrt{n}}
]
$$
Where:  
- \$$( \bar{x} \) = sample\ mean  $$
- $$( \sigma \) = standard\ deviation  $$
- $$( n \) = sample\ size $$
- $$( z \) = z-score\ for\ desired\ confidence\ level$$  
  - For 95% confidence, \( z = 1.96 \)

---

### Let's Assume:
- Sample mean \$$( \bar{x} = 50 \) $$
- Standard deviation $$\( \sigma = 5 \)$$  
- Sample size $$\( n = 100 \)$$

---

### Step-by-step Calculation:
$$
[
\text{Standard Error (SE)} = \frac{5}{\sqrt{100}} = \frac{5}{10} = 0.5
]
$$

$$
[
\text{CI} = 50 \pm 1.96 \cdot 0.5 = 50 \pm 0.98
]
$$

$$
[
\text{95% Confidence Interval} = (49.02, 50.98)
]
$$
---

###  **Interpretation**:

We are 95% confident that the **true population mean** lies between **49.02 and 50.98**.

### Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.
Ans: \
The **margin of error (MOE)** is the range **above and below the sample estimate** in a confidence interval. It tells you how much **uncertainty** there is in your sample estimate.

###  **Formula:**
$$
[
\text{Margin of Error (MOE)} = z \cdot \frac{\sigma}{\sqrt{n}}
]
$$

Where:  
- \$( z \) = z-score\ for\ desired\ confidence\ level  $

- $( \sigma ) = standard\ deviation $
- $( n ) = sample\ size  $
---

###  How Sample Size Affects MOE:

As **sample size increases**, the denominator  $$( \sqrt{n} )$$increases, making the **MOE smaller**.  
**Larger samples = more precise estimates (smaller margin of error)**

---

### Example:

Let’s say you are estimating the average time users spend on an app.

#### Scenario A: Small Sample
- Mean = 30 minutes  
- Std Dev = 10  
- Sample Size = 25  
- Confidence = 95% (z = 1.96)
$$
[
\text{MOE} = 1.96 \cdot \frac{10}{\sqrt{25}} = 1.96 \cdot 2 = 3.92
]
$$
Confidence Interval = (26.08, 33.92)

---

#### Scenario B: Large Sample
- Mean and std dev are same  
- Sample Size = 100
$$
[
\text{MOE} = 1.96 \cdot \frac{10}{\sqrt{100}} = 1.96 \cdot 1 = 1.96
]
$$
Confidence Interval = (28.04, 31.96)

---

### Conclusion:

In **Scenario B**, the larger sample leads to a **smaller margin of error** and a **narrower confidence interval**, meaning we can be more certain about where the true mean lies.

### Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.
Ans: \
To calculate the **z-score**, we use the formula:
$$
[
z = \frac{x - \mu}{\sigma}
]
$$
Where:  
- $( x ) = data point = 75  $
- $( \mu ) = population\ mean = 70 $
- $( \sigma ) = population\ standard\ deviation = 5 $

---

### Calculation:
$$
[
z = \frac{75 - 70}{5} = \frac{5}{5} = 1.0
]
$$
---

### Interpretation:

A **z-score of 1.0** means the value **75** is **1 standard deviation above** the population mean.

This indicates that 75 is relatively close to the mean and falls within the **typical range** for a normally distributed dataset.  
In a standard normal distribution, about **84.13%** of data falls **below** a z-score of 1.0.

### Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.
Ans: \
To determine if the new weight loss drug is **significantly effective**, we will perform a **one-sample t-test**. Here's how we break it down:

---

###  **Step 1: Define Hypotheses**

We want to test whether the drug causes **weight loss**.

- **Null Hypothesis (H₀):** μ = 0 (The drug has no effect; average weight loss is 0 pounds)
- **Alternative Hypothesis (H₁):** μ > 0 (The drug is effective; average weight loss is greater than 0)

This is a **one-tailed t-test**.

---

###  **Step 2: Given Data**

- Sample size (n) = 50  
- Sample mean (x̄) = 6  
- Sample standard deviation (s) = 2.5  
- Population mean under H₀ (μ₀) = 0  
- Confidence level = 95% → Significance level (α) = 0.05

---

###  **Step 3: Calculate t-statistic**
$$
[
t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} = \frac{6 - 0}{2.5 / \sqrt{50}} = \frac{6}{2.5 / 7.071} \approx \frac{6}{0.3535} \approx 16.97
]
$$
---

###  **Step 4: Degrees of Freedom**
$$
[
df = n - 1 = 49
]
$$
---

###  **Step 5: Find Critical t-value**

Using a t-table or calculator for one-tailed test at α = 0.05 and df = 49:
$$
[
t_{\text{critical}} \approx 1.677
]
$$
---

### **Step 6: Compare t-statistic to t-critical**

- Calculated t = 16.97  
- Critical t = 1.677

Since **16.97 > 1.677**, we **reject the null hypothesis**.

---

### **Conclusion:**

There is **significant evidence** at the 95% confidence level to conclude that the **drug is effective** in causing weight loss.

### Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.
Ans: \
To calculate the **95% confidence interval** for the **population proportion**, we’ll use the formula for the **confidence interval of a proportion**:

---

### **Formula:**
$$
[
CI = \hat{p} \pm Z \cdot \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}
]
$$
Where:  
- \$(\hat{p}\) = sample proportion $
- \(Z\) = z-score for 95% confidence (≈ 1.96)  
- \(n\) = sample size

---

###  **Given:**
- Sample size \(n = 500\)  
- Sample proportion  $(\hat{p} = 0.65)  $
- Z = 1.96 (for 95% confidence level)

---

### **Step 1: Calculate Standard Error (SE)**
$$
[
SE = \sqrt{\frac{0.65(1 - 0.65)}{500}} = \sqrt{\frac{0.65 \cdot 0.35}{500}} = \sqrt{\frac{0.2275}{500}} \approx \sqrt{0.000455} \approx 0.0213
]
$$
---

###  **Step 2: Calculate Margin of Error (ME)**
[
ME = 1.96 \cdot 0.0213 \approx 0.0418
]

---

### **Step 3: Calculate Confidence Interval**
$$
[
CI = 0.65 \pm 0.0418 \Rightarrow (0.6082, 0.6918)
]
$$
---

### **Final Answer:**
The **95% confidence interval** for the true proportion of people who are satisfied with their job is approximately:
$
[
\boxed{(60.8\%,\ 69.2\%)}
]
$

### Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.
Ans: \
To determine if there's a **significant difference** between the two teaching methods, we'll perform an **independent two-sample t-test**.

---

###  **Step 1: Define the Hypotheses**

- **Null Hypothesis \(H_0\):** There is no significant difference between the means of the two groups. $((\mu_A = \mu_B)) $
- **Alternative Hypothesis $(H_1)$:** There is a significant difference between the means. $((\mu_A \ne \mu_B))$  
- **Significance Level $(\alpha = 0.01)$**

---

### **Step 2: Given Data**

- Sample A:  
  - $(\bar{x}_A = 85), (s_A = 6) $
- Sample B:  
  - $(\bar{x}_B = 82), (s_B = 5)  $
- Assume equal sample sizes: $(n_A = n_B = 30)$ (if not provided, this is a reasonable assumption for calculation)

---

### **Step 3: Calculate the t-statistic**
$$
[
t = \frac{\bar{x}_A - \bar{x}_B}{\sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}}
= \frac{85 - 82}{\sqrt{\frac{6^2}{30} + \frac{5^2}{30}}}
= \frac{3}{\sqrt{\frac{36}{30} + \frac{25}{30}}}
= \frac{3}{\sqrt{2.033 + 0.833}} = \frac{3}{\sqrt{2.866}} \approx \frac{3}{1.692} \approx 1.773
]
$$
---

### **Step 4: Degrees of Freedom (approximate)**
$$
[
df = \frac{\left( \frac{s_A^2}{n_A} + \frac{s_B^2}{n_B} \right)^2}{\frac{\left(\frac{s_A^2}{n_A}\right)^2}{n_A - 1} + \frac{\left(\frac{s_B^2}{n_B}\right)^2}{n_B - 1}} \approx 56.6 \approx 57
]
$$
---

###  **Step 5: Find the Critical t-value**

For a two-tailed test with $(\alpha = 0.01)$ and $(df \approx 57)$, the **critical t-value** ≈ ±2.663.

---

###  **Step 6: Compare and Conclude**

- Calculated \(t = 1.773\)  
- Critical \(t = ±2.663\)  

Since \(1.773 < 2.663\), **we fail to reject the null hypothesis**.

---

###  **Conclusion:**
There is **no statistically significant difference** in student performance between the two teaching methods at the 0.01 significance level.

### Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.
Ans: \
To calculate the **90% confidence interval** for the **true population mean**, we'll use the formula for the **confidence interval when population standard deviation is known**:

---

###  **Given:**
- Population mean $((\mu)) = 60 (not\ used\ in\ calculation) $
- Population standard deviation $((\sigma)) = 8 $  
- Sample mean $((\bar{x})) = 65 $  
- Sample size $((n)) = 50 $  
- Confidence level = 90%

---

###  **Step 1: Find the z-value for 90% confidence**

- For 90% confidence level, the **z-value** = **1.645**

---

###  **Step 2: Use the confidence interval formula**
$$
[
\text{CI} = \bar{x} \pm z \cdot \frac{\sigma}{\sqrt{n}}
]
$$

$$
[
\text{CI} = 65 \pm 1.645 \cdot \frac{8}{\sqrt{50}} = 65 \pm 1.645 \cdot 1.131
]
$$

$$
[
\text{CI} = 65 \pm 1.860
]
$$
---

### **Final Answer:**
The **90% confidence interval** for the population mean is approximately:
$
[
(63.14,\ 66.86)
]
$

### Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.
Ans: \
To test whether caffeine has a **significant effect on reaction time**, we will conduct a **one-sample t-test**.

---

### **Given:**
- Sample size $((n)) = 30$  
- Sample mean $((\bar{x}))$ = 0.25 seconds  
- Sample standard deviation $((s))$ = 0.05 seconds  
- Significance level $((\alpha))$ = 0.10 (for 90% confidence level)
- Population mean $((\mu_0))$ = **unknown** — we assume the known mean reaction time without caffeine is **greater** than 0.25 (say 0.27 seconds for testing)

---

### **Step 1: State the hypotheses**
Assuming the baseline (population) reaction time without caffeine is 0.27 seconds.

- **Null Hypothesis (H₀):** µ = 0.27 (no effect of caffeine)
- **Alternative Hypothesis (H₁):** µ < 0.27 (caffeine reduces reaction time)

This is a **left-tailed** test.

---

###  **Step 2: Calculate e t-statistic**

$[
t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}} = \frac{0.25 - 0.27}{0.05/\sqrt{30}} = \frac{-0.02}{0.00913} \approx -2.19
]
$
---

###  **Step 3: Find the critical t-value**

- Degrees of freedom = \(n - 1 = 29\)
- For one-tailed test at 90% confidence, $(t_{critical} \approx -1.311)$

---
###  **Step 4: Make a decision**

Since:
$[
t = -2.19 < -1.311 = t_{critical}
]$

We **reject the null hypothesis**.

---

### **Conclusion:**

There is **significant evidence** at the 90% confidence level that **caffeine reduces reaction time**