In [None]:
#Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you woulduse each type of test.
The t-test and z-test are both statistical tests used to compare means, but they differ in their assumptions and applications.

### Key Differences:

1. Sample Size:
   - t-test: Typically used when the sample size is small (\( n < 30 \)) and/or the population standard deviation is unknown.
   - z-test: Used for larger samples (\( n \geq 30 \)) when the population standard deviation is known.

2. Distribution:
   - t-test: Based on the t-distribution, which has thicker tails, making it more appropriate for small samples.
   - z-test: Based on the normal distribution, assuming a large enough sample for the Central Limit Theorem to apply.

3. **Application**:
   - **t-test**: More flexible, as it can handle cases where the population standard deviation is unknown, using the sample standard deviation instead.
   - **z-test**: Used when the population variance is known and the sample size is sufficiently large to assume normality.

### Example Scenarios:

- **t-test Example**: A researcher wants to compare the mean test scores of two small groups of students (e.g., \( n = 20 \) each) to 
determine if there s a significant difference between them. The population standard deviation is unknown, so a t-test is appropriate.

In [None]:
#Q2: Differentiate between one-tailed and two-tailed tests.
A **one-tailed test** is used when the research hypothesis predicts the direction of the effect (e.g., "greater than" or "less than").
It tests for a statistically significant effect in one direction.

A **two-tailed test**, on the other hand, checks for an effect in either direction, without specifying whether it will be greater or less 
than a certain value. It tests for any significant difference, regardless of the direction.

**Example**:
- One-tailed: Testing if a new drug increases recovery rates.
- Two-tailed: Testing if the drug has any effect (increase or decrease) on recovery rates.

In [None]:
#Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario foreach type of error.
In hypothesis testing:

- A **Type 1 error** occurs when the null hypothesis is rejected when it is actually true (false positive). This means concluding that there
is an effect when none exists.  
  **Example**: Concluding a new drug works when it actually does not.

- A **Type 2 error** occurs when the null hypothesis is not rejected when it is false (false negative). This means failing to detect an
effect that is actually present.  
  **Example**: Concluding a new drug does not work when it actually does.

Type 1 errors are controlled by the significance level (\(\alpha\)), while Type 2 errors relate to the power of the test.

In [None]:
#Q4: Explain Bayes's theorem with an example.
**Bayes's theorem** is a mathematical formula that describes the probability of an event, based on prior knowledge of conditions related 
to the event. It is expressed as:

\[
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
\]

Where:
- \( P(A|B) \) is the probability of event A given B has occurred,
- \( P(B|A) \) is the probability of event B given A,
- \( P(A) \) and \( P(B) \) are the independent probabilities of A and B.

**Example**: If 1% of people have a disease (A), and a test correctly detects the disease 95% of the time (B), Bayes's theorem helps calculate the probability of having the disease given a positive test result.

In [None]:
#Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.
A **confidence interval** (CI) is a range of values, derived from a sample, that is likely to contain the true population parameter with 
a certain level of confidence (e.g., 95%). It provides an estimate of uncertainty around the sample mean.

### Formula:
\[
CI = \bar{x} \pm Z \left(\frac{\sigma}{\sqrt{n}}\right)
\]
Where:
- \( \bar{x} \) is the sample mean,
- \( Z \) is the Z-score corresponding to the desired confidence level,
- \( \sigma \) is the population standard deviation (or sample standard deviation for smaller samples),
- \( n \) is the sample size.

### Example:
For a sample mean height of 170 cm, with a standard deviation of 5 cm, and \(n = 100\), the 95% CI is:

\[
CI = 170 \pm 1.96 \left(\frac{5}{\sqrt{100}}\right) = 170 \pm 0.98
\]

Thus, the 95% CI is \( [169.02, 170.98] \).

In [None]:
**Problem**: A certain disease affects 1% of a population. A test for the disease is 90% accurate (true positives) and has a 5% false positive rate. If someone tests positive, what is the probability they actually have the disease?

### Solution using **Bayes' Theorem**:
Let:
- \(P(D)\) = 0.01 (probability of having the disease),
- \(P(T|D)\) = 0.90 (probability of testing positive given the disease),
- \(P(T|\neg D)\) = 0.05 (probability of a false positive),
- \(P(\neg D)\) = 0.99 (probability of not having the disease).

Using Bayes' Theorem:

\[
P(D|T) = \frac{P(T|D) \cdot P(D)}{P(T|D) \cdot P(D) + P(T|\neg D) \cdot P(\neg D)}
\]

\[
P(D|T) = \frac{0.90 \cdot 0.01}{(0.90 \cdot 0.01) + (0.05 \cdot 0.99)} = 0.15
\]

Thus, the probability the person has the disease given a positive test result is **15%**.

In [None]:
#Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.
The **95% confidence interval** for the sample data with a mean of 50 and a standard deviation of 5 (assuming a sample size of 30) is
approximately **(48.21, 51.79)**.

### Interpretation:
This means we can be 95% confident that the true population mean lies between 48.21 and 51.79. In practical terms, if we were to take many 
samples and calculate their confidence intervals, about 95% of those intervals would contain the actual population mean.

In [None]:
#Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error?
#Provide an example of a scenario where a larger sample size would result in a smaller margin of error.
The **margin of error** in a confidence interval represents the range within which the true population parameter is expected to lie.
It is calculated as the product of the critical value (from the Z or t distribution) and the standard error of the mean. 

### Effect of Sample Size:
A larger sample size decreases the margin of error because the standard error (\( \frac{\sigma}{\sqrt{n}} \)) decreases as \( n \) increases. This results in a more precise estimate of the population parameter.

### Example Scenario:
In a survey measuring the average height of adults in a city, a sample of 30 individuals may yield a margin of error of ±3 cm. However,
if the sample size is increased to 300, the margin of error may reduce to ±1 cm. This increased sample size provides a more reliable estimate
of the average height, as it better captures the population's variability.

In [None]:
#Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population
standard deviation of 5. Interpret the results.

The **z-score** for the data point with a value of 75, a population mean of 70, and a population standard deviation of 5 is **1.0**.

### Interpretation:
A z-score of 1.0 indicates that the data point (75) is **1 standard deviation above the mean** (70). This suggests that the value of 75
is relatively higher compared to the average in the population. In a normal distribution, about 84% of the data points lie below this value,
reflecting that it is above average but not extremely high.

In [None]:
# #Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average
# of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is
# significantly effective at a 95% confidence level using a t-test.
To conduct a hypothesis test for the effectiveness of the weight loss drug, we follow these steps:

### Hypotheses:
- **Null Hypothesis (\(H_0\))**: The drug has no effect on weight loss (mean loss = 0).
- **Alternative Hypothesis (\(H_a\)**: The drug is effective (mean loss > 0).

### Given Data:
- Sample size (\(n\)) = 50
- Sample mean (\(\bar{x}\)) = 6 pounds
- Sample standard deviation (\(s\)) = 2.5 pounds
- Significance level (\(\alpha\)) = 0.05

### Step 1: Calculate the t-score
\[
t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} = \frac{6 - 0}{2.5 / \sqrt{50}} = \frac{6}{0.3536} \approx 16.97
\]

### Step 2: Determine the critical t-value
For a one-tailed test with \(n - 1 = 49\) degrees of freedom at the 0.05 significance level, the critical t-value is approximately **1.676**.

### Step 3: Compare t-score and critical value
Since \(16.97 > 1.676\), we reject the null hypothesis.

### Conclusion:
There is significant evidence at the 95% confidence level to conclude that the weight loss drug is effective, as participants lost an average of 6 pounds.

In [None]:
#Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95%
confidence interval for the true proportion of people who are satisfied with their job.

import statsmodels.api as sm

# Given data
n = 500  # Sample size
p_hat = 0.65  # Sample proportion

# Calculate the confidence interval for a proportion
confidence_interval = sm.proportion.proportion_confint(count=int(p_hat * n), nobs=n, alpha=0.05, method='normal')
confidence_interval
