### **Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.**

- **t-test**:
  - A **t-test** is used when the sample size is small (typically \(n < 30\)) and/or the population variance is unknown. It assumes that the data follows a normal distribution. 
  - It uses the **t-distribution** to account for the extra uncertainty in the estimate of the population variance due to small sample sizes.
  
  **Example**: You want to test if the mean weight of a small group of 15 individuals is different from 70 kg. Since the sample size is small, you would use a **t-test**.

- **z-test**:
  - A **z-test** is used when the sample size is large (typically \(n \geq 30\)) or the population variance is known. It uses the **normal distribution** (z-distribution).
  
  **Example**: You are testing whether the average height of a large population (say, of 1,000 people) is equal to 170 cm, and you know the population standard deviation. In this case, you would use a **z-test**.

---

### **Q2: Differentiate between one-tailed and two-tailed tests.**

- **One-tailed test**:
  - A **one-tailed test** is used when you are testing for the possibility of the relationship in one direction only. You set up the hypothesis to test whether the mean is greater than or less than a certain value.
  - **Example**: You want to test if a new drug improves recovery time, so you are only interested in whether the mean recovery time is **lower** than a benchmark.

- **Two-tailed test**:
  - A **two-tailed test** is used when you are testing for the possibility of a relationship in both directions. You check whether the mean is either **greater than or less than** a certain value.
  - **Example**: You are testing if the mean weight of a sample is different from 70 kg (it could be either higher or lower).

---

### **Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.**

- **Type 1 Error**:
  - A **Type 1 error** occurs when you **reject** a true null hypothesis (false positive). This means you mistakenly conclude that there is an effect when there isn't one.
  - **Example**: A drug trial shows a statistically significant improvement in recovery, but in reality, the drug has no effect. The test incorrectly rejects the null hypothesis (that the drug has no effect).

- **Type 2 Error**:
  - A **Type 2 error** occurs when you **fail to reject** a false null hypothesis (false negative). This means you mistakenly conclude that there is no effect when there actually is one.
  - **Example**: A drug trial shows no significant improvement in recovery, but in reality, the drug does improve recovery. The test fails to detect this and incorrectly accepts the null hypothesis.

---

### **Q4: Explain Bayes's theorem with an example.**

- **Bayes's Theorem**:
  Bayes's theorem provides a way to update the probability of a hypothesis based on new evidence. It is mathematically expressed as:
  
  \[
  P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
  \]
  
  Where:
  - \( P(A|B) \) is the **posterior** probability, i.e., the probability of \( A \) given \( B \).
  - \( P(B|A) \) is the **likelihood**, i.e., the probability of \( B \) given \( A \).
  - \( P(A) \) is the **prior** probability of \( A \).
  - \( P(B) \) is the **marginal likelihood** or the total probability of \( B \).

  **Example**: Suppose 1% of a population has a rare disease. A test for the disease is 99% accurate (both sensitivity and specificity). You test positive for the disease. What is the probability that you actually have the disease?

  Using Bayes's theorem:
  - \( P(A) = 0.01 \) (prior probability of having the disease),
  - \( P(B|A) = 0.99 \) (probability of testing positive if you have the disease),
  - \( P(B) = 0.01 \cdot 0.99 + 0.99 \cdot 0.99 \) (probability of testing positive).

  Bayes's theorem gives the **posterior probability**, which would tell you the likelihood of actually having the disease given a positive test result.

---

### **Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.**

- **Confidence Interval**:
  A **confidence interval (CI)** is a range of values that is used to estimate the true population parameter. It gives an interval within which we expect the true parameter to fall, with a certain level of confidence (e.g., 95% confidence).

  The general formula for a confidence interval is:
  \[
  \text{CI} = \bar{x} \pm Z \times \frac{\sigma}{\sqrt{n}}
  \]
  where:
  - \( \bar{x} \) is the sample mean,
  - \( Z \) is the z-value (for 95% confidence, \( Z = 1.96 \)),
  - \( \sigma \) is the population standard deviation (or sample standard deviation if population value is unknown),
  - \( n \) is the sample size.

  **Example**:
  Suppose the average test score of a sample of 100 students is 75 with a standard deviation of 10. To calculate the 95% confidence interval for the population mean:
  
  \[
  \text{CI} = 75 \pm 1.96 \times \frac{10}{\sqrt{100}} = 75 \pm 1.96 \times 1 = 75 \pm 1.96
  \]

  The confidence interval is **[73.04, 76.96]**, meaning we are 95% confident that the true population mean lies within this range.
### **Q6: Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.**

**Problem**: 
A factory has a defect rate of 5% for its products. A quality control test can detect defects 95% of the time (True Positive Rate). However, the test also has a 10% false positive rate (i.e., it incorrectly indicates a defect 10% of the time when there is no defect). You test a product and the test shows a defect. What is the probability that the product actually has a defect?

- **Given**:
  - Probability the product has a defect, \( P(\text{Defect}) = 0.05 \)
  - Probability the product does not have a defect, \( P(\text{No Defect}) = 0.95 \)
  - Probability the test is positive given the product has a defect, \( P(\text{Test Positive}|\text{Defect}) = 0.95 \)
  - Probability the test is positive given the product does not have a defect, \( P(\text{Test Positive}|\text{No Defect}) = 0.10 \)
  - You got a positive test result.

**Solution**:
We need to find \( P(\text{Defect}|\text{Test Positive}) \), the probability the product has a defect given that the test was positive, using Bayes' Theorem:

\[
P(\text{Defect}|\text{Test Positive}) = \frac{P(\text{Test Positive}|\text{Defect}) \cdot P(\text{Defect})}{P(\text{Test Positive})}
\]

First, calculate \( P(\text{Test Positive}) \), which is the total probability of a positive test:

\[
P(\text{Test Positive}) = P(\text{Test Positive}|\text{Defect}) \cdot P(\text{Defect}) + P(\text{Test Positive}|\text{No Defect}) \cdot P(\text{No Defect})
\]

Substitute values:

\[
P(\text{Test Positive}) = (0.95 \cdot 0.05) + (0.10 \cdot 0.95) = 0.0475 + 0.095 = 0.1425
\]

Now apply Bayes' Theorem:

\[
P(\text{Defect}|\text{Test Positive}) = \frac{(0.95 \cdot 0.05)}{0.1425} = \frac{0.0475}{0.1425} \approx 0.3333
\]

So, the probability that the product actually has a defect, given a positive test result, is approximately **33.33%**.

---

### **Q7: Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.**

**Problem**:
We are given:
- Sample mean \( \bar{x} = 50 \)
- Standard deviation \( \sigma = 5 \)
- Sample size \( n = 30 \) (assumed)

**Solution**:
The formula for the 95% confidence interval is:

\[
\text{CI} = \bar{x} \pm Z \times \frac{\sigma}{\sqrt{n}}
\]

For a 95% confidence level, the z-value is **1.96**.

\[
\text{CI} = 50 \pm 1.96 \times \frac{5}{\sqrt{30}} = 50 \pm 1.96 \times 0.9129 \approx 50 \pm 1.79
\]

So the 95% confidence interval is **[48.21, 51.79]**.

**Interpretation**: We are 95% confident that the true population mean lies between **48.21** and **51.79**.

---

### **Q8: What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.**

**Margin of Error**:
The **margin of error** is the amount by which we expect our sample estimate to differ from the true population value. It is calculated as:

\[
\text{Margin of Error} = Z \times \frac{\sigma}{\sqrt{n}}
\]

Where:
- \( Z \) is the z-value (for 95% confidence, \( Z = 1.96 \)),
- \( \sigma \) is the population standard deviation,
- \( n \) is the sample size.

**Effect of Sample Size**:
A larger sample size \( n \) leads to a **smaller margin of error** because the denominator \( \sqrt{n} \) increases, which reduces the overall value of the margin of error.

**Example**:
- If \( n = 30 \), and \( \sigma = 5 \), the margin of error is:

\[
1.96 \times \frac{5}{\sqrt{30}} \approx 1.96 \times 0.9129 \approx 1.79
\]

- If \( n = 100 \), and \( \sigma = 5 \), the margin of error is:

\[
1.96 \times \frac{5}{\sqrt{100}} \approx 1.96 \times 0.5 = 0.98
\]

Thus, with a larger sample size of 100, the margin of error reduces to **0.98**, indicating more precision in the estimate.

---

### **Q9: Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.**

**Problem**:
We are given:
- Data point value \( X = 75 \),
- Population mean \( \mu = 70 \),
- Population standard deviation \( \sigma = 5 \).

**Solution**:
The z-score formula is:

\[
Z = \frac{X - \mu}{\sigma}
\]

Substitute the values:

\[
Z = \frac{75 - 70}{5} = \frac{5}{5} = 1
\]

**Interpretation**: The data point 75 is **1 standard deviation** above the mean. This means it is relatively high compared to the rest of the data.

---

### **Q10: In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.**

**Problem**:
- Sample mean \( \bar{x} = 6 \),
- Sample size \( n = 50 \),
- Standard deviation \( s = 2.5 \),
- Population mean \( \mu_0 = 0 \) (assuming the null hypothesis states no weight loss).

**Solution**:

Step 1: **Set up hypotheses**:
- Null hypothesis (\( H_0 \)): \( \mu = 0 \) (no weight loss).
- Alternative hypothesis (\( H_1 \)): \( \mu \neq 0 \) (there is a weight loss).

Step 2: **Calculate the t-statistic**:
The t-statistic is calculated as:

\[
t = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}}
\]

Substitute the values:

\[
t = \frac{6 - 0}{\frac{2.5}{\sqrt{50}}} = \frac{6}{\frac{2.5}{7.071}} = \frac{6}{0.3536} \approx 16.97
\]

Step 3: **Find the degrees of freedom (df)**:
\[
df = n - 1 = 50 - 1 = 49
\]

Step 4: **Find the critical t-value**:
For a 95% confidence level and \( df = 49 \), the critical t-value from the t-distribution table is approximately **2.0096** for a two-tailed test.

Step 5: **Compare the t-statistic with the critical t-value**:
Since the calculated t-statistic (16.97) is much greater than the critical t-value (2.0096), we **reject the null hypothesis**.

**Conclusion**: The drug is significantly effective at a 95% confidence level, as the t-statistic exceeds the critical value. The observed weight loss is statistically significant.
### **Q11: In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.**

**Problem**:
- Sample proportion \( \hat{p} = 0.65 \),
- Sample size \( n = 500 \),
- Confidence level = 95% (z-value = 1.96).

**Solution**:
The formula for the confidence interval for a proportion is:

\[
\text{CI} = \hat{p} \pm Z \times \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}
\]

Substitute the values:

\[
\text{CI} = 0.65 \pm 1.96 \times \sqrt{\frac{0.65(1 - 0.65)}{500}} = 0.65 \pm 1.96 \times \sqrt{\frac{0.65 \times 0.35}{500}} = 0.65 \pm 1.96 \times \sqrt{0.2275 / 500}
\]

\[
\text{CI} = 0.65 \pm 1.96 \times \sqrt{0.000455} = 0.65 \pm 1.96 \times 0.02133 = 0.65 \pm 0.0419
\]

So, the 95% confidence interval is:

\[
[0.6081, 0.6919]
\]

**Interpretation**: We are 95% confident that the true proportion of people satisfied with their job is between **60.81%** and **69.19%**.

---

### **Q12: A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.**

**Problem**:
- Sample A: \( \bar{x}_A = 85 \), \( s_A = 6 \), \( n_A \) (sample size) = not given (assumed to be the same size as sample B, so let's assume \( n_A = 30 \)).
- Sample B: \( \bar{x}_B = 82 \), \( s_B = 5 \), \( n_B = 30 \).
- Significance level \( \alpha = 0.01 \).

**Solution**:

Step 1: **Set up hypotheses**:
- Null hypothesis (\( H_0 \)): \( \mu_A = \mu_B \) (no difference in performance).
- Alternative hypothesis (\( H_1 \)): \( \mu_A \neq \mu_B \) (there is a difference in performance).

Step 2: **Calculate the pooled standard deviation**:

\[
s_p = \sqrt{\frac{(n_A - 1) \cdot s_A^2 + (n_B - 1) \cdot s_B^2}{n_A + n_B - 2}}
\]

Substitute values:

\[
s_p = \sqrt{\frac{(30 - 1) \cdot 6^2 + (30 - 1) \cdot 5^2}{30 + 30 - 2}} = \sqrt{\frac{29 \cdot 36 + 29 \cdot 25}{58}} = \sqrt{\frac{1044 + 725}{58}} = \sqrt{\frac{1769}{58}} = \sqrt{30.47} \approx 5.52
\]

Step 3: **Calculate the t-statistic**:

\[
t = \frac{\bar{x}_A - \bar{x}_B}{s_p \cdot \sqrt{\frac{1}{n_A} + \frac{1}{n_B}}}
\]

Substitute values:

\[
t = \frac{85 - 82}{5.52 \cdot \sqrt{\frac{1}{30} + \frac{1}{30}}} = \frac{3}{5.52 \cdot \sqrt{\frac{2}{30}}} = \frac{3}{5.52 \cdot 0.2582} = \frac{3}{1.426} \approx 2.10
\]

Step 4: **Degrees of freedom (df)**:

\[
df = n_A + n_B - 2 = 30 + 30 - 2 = 58
\]

Step 5: **Find the critical t-value**:
For \( \alpha = 0.01 \) (two-tailed), the critical t-value for \( df = 58 \) is approximately **2.660**.

Step 6: **Compare the t-statistic with the critical t-value**:
Since the calculated t-statistic \( 2.10 \) is **less than** the critical t-value \( 2.660 \), we **fail to reject the null hypothesis**.

**Conclusion**: There is not enough evidence at the 0.01 significance level to conclude that the two teaching methods have a significant difference in student performance.

---

### **Q13: A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.**

**Problem**:
- Population mean \( \mu = 60 \),
- Population standard deviation \( \sigma = 8 \),
- Sample mean \( \bar{x} = 65 \),
- Sample size \( n = 50 \),
- Confidence level = 90% (z-value = 1.645).

**Solution**:
The formula for the confidence interval for a population mean when the population standard deviation is known is:

\[
\text{CI} = \bar{x} \pm Z \times \frac{\sigma}{\sqrt{n}}
\]

Substitute the values:

\[
\text{CI} = 65 \pm 1.645 \times \frac{8}{\sqrt{50}} = 65 \pm 1.645 \times \frac{8}{7.071} = 65 \pm 1.645 \times 1.1314 = 65 \pm 1.860
\]

So the 90% confidence interval is:

\[
[63.14, 66.86]
\]

**Interpretation**: We are 90% confident that the true population mean lies between **63.14** and **66.86**.

---

### **Q14: In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.**

**Problem**:
- Sample mean \( \bar{x} = 0.25 \),
- Standard deviation \( s = 0.05 \),
- Sample size \( n = 30 \),
- Population mean \( \mu_0 = 0.2 \) (assuming null hypothesis is that caffeine has no effect),
- Confidence level = 90% (t-value for \( df = 29 \) is approximately 1.699).

**Solution**:

Step 1: **Set up hypotheses**:
- Null hypothesis (\( H_0 \)): \( \mu = 0.2 \) (no effect of caffeine).
- Alternative hypothesis (\( H_1 \)): \( \mu \neq 0.2 \) (caffeine has an effect).

Step 2: **Calculate the t-statistic**:

\[
t = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}}
\]

Substitute the values:

\[
t = \frac{0.25 - 0.2}{\frac{0.05}{\sqrt{30}}} = \frac{0.05}{\frac{0.05}{5.477}} = \frac{0.05}{0.0091} \approx 5.49
\]

Step 3: **Degrees of freedom (df)**:

\[
df = n - 1 = 30 - 1 = 29
\]

Step 4: **Find the critical t-value**:
For \( \alpha = 0.10 \) (two-tailed), the critical t-value for \( df = 29 \) is approximately **1.699**.

Step 5: **Compare the t-statistic with the critical t-value**:
Since the calculated t-statistic \( 5.49 \) is **greater than** the critical t-value \( 1.699 \), we **reject the null hypothesis**.

**Conclusion**: There is enough evidence at the 90% confidence level to conclude that caffeine has a significant effect on reaction time.