In [None]:
### **Q1. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.**

In Python, we can calculate a confidence interval for the sample mean using the formula:
\[
\text{CI} = \mu \pm Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}
\]
Where:
- \(\mu = 50\) (sample mean),
- \(\sigma = 5\) (standard deviation),
- \(n\) is the sample size (assumed if not given),
- \(Z_{\alpha/2}\) is the Z-score for the 95% confidence level (1.96).

```python
import scipy.stats as stats
import numpy as np

# Given data
mean = 50
std_dev = 5
n = 30  # Assume sample size is 30
confidence_level = 0.95

# Calculate Z value
z_value = stats.norm.ppf(1 - (1 - confidence_level) / 2)

# Calculate margin of error
margin_of_error = z_value * (std_dev / np.sqrt(n))

# Confidence Interval
ci_lower = mean - margin_of_error
ci_upper = mean + margin_of_error

(ci_lower, ci_upper)
```

**Interpretation:**
The 95% confidence interval tells us that we are 95% confident that the true population mean lies between the calculated lower and upper bounds of the interval. This gives an estimate of the possible range of the population mean based on the sample.

---

### **Q2. Chi-Square Goodness of Fit Test for M&Ms Color Distribution**

Let's use Python to conduct a chi-square goodness of fit test to see if the observed distribution of M&Ms colors matches the expected distribution.

Expected proportions: 
- 20% blue, 20% orange, 20% green, 10% yellow, 10% red, 20% brown

Assume the observed counts are:

```python
observed = [23, 19, 21, 10, 11, 16]  # Sample data for blue, orange, green, yellow, red, brown
expected_proportions = [0.2, 0.2, 0.2, 0.1, 0.1, 0.2]
n = sum(observed)  # Total number of M&Ms

# Expected counts
expected = [p * n for p in expected_proportions]

# Chi-square test
chi_square_stat, p_value = stats.chisquare(observed, expected)

(chi_square_stat, p_value)
```

**Interpretation:**
- If the p-value is less than 0.05, we reject the null hypothesis, which means the observed distribution significantly differs from the expected one.
- If the p-value is greater than 0.05, we fail to reject the null hypothesis, meaning the observed distribution matches the expected distribution.

---

### **Q3. Chi-Square Test for a Contingency Table**

Let’s say we have a contingency table like this:
```
    Group A    Group B
Outcome 1:  20        15
Outcome 2:  10        25
Outcome 3:  15        20
```
We will calculate the chi-square statistic and p-value to see if the distributions differ significantly.

```python
# Contingency table data
data = [[20, 15], [10, 25], [15, 20]]

# Perform chi-square test
chi_square_stat, p_value, dof, expected = stats.chi2_contingency(data)

(chi_square_stat, p_value, expected)
```

**Interpretation:**
The chi-square test checks whether the observed frequencies in the contingency table differ significantly from the expected frequencies. If the p-value is less than the significance level (0.05), we conclude that the variables are associated.

---

### **Q4. Confidence Interval for a Proportion of Smokers**

We want to calculate the 95% confidence interval for the true proportion of smokers, given that 60 out of 500 individuals smoke.

```python
# Given data
p_hat = 60 / 500  # Proportion of smokers
n = 500
z_value = stats.norm.ppf(1 - 0.05 / 2)

# Standard error of the proportion
se = np.sqrt((p_hat * (1 - p_hat)) / n)

# Confidence Interval
ci_lower = p_hat - z_value * se
ci_upper = p_hat + z_value * se

(ci_lower, ci_upper)
```

**Interpretation:**
The 95% confidence interval gives the range in which we are 95% confident that the true proportion of smokers lies.

---

### **Q5. Calculate the 90% Confidence Interval for a Sample Mean**

For a sample with mean 75, standard deviation 12, and assuming \(n = 40\), we calculate the 90% confidence interval:

```python
mean = 75
std_dev = 12
n = 40
confidence_level = 0.90

# Z value for 90% confidence
z_value = stats.norm.ppf(1 - (1 - confidence_level) / 2)

# Margin of error
margin_of_error = z_value * (std_dev / np.sqrt(n))

# Confidence Interval
ci_lower = mean - margin_of_error
ci_upper = mean + margin_of_error

(ci_lower, ci_upper)
```

**Interpretation:**
This interval indicates the range in which we expect the true population mean to fall with 90% confidence.

---

Would you like to continue with the remaining questions, or any specific part from the list?

In [None]:

### **Q6. Plot the Chi-Square Distribution with 10 Degrees of Freedom**

We can plot the chi-square distribution and shade the area corresponding to a chi-square statistic of 15.

```python
import numpy as np
import matplotlib.pyplot as plt

# Parameters
df = 10  # degrees of freedom
x = np.linspace(0, 30, 1000)
chi2_pdf = stats.chi2.pdf(x, df)

# Plot the chi-square distribution
plt.figure(figsize=(10, 6))
plt.plot(x, chi2_pdf, label='Chi-Square Distribution (df=10)', color='blue')

# Shade the area for chi-square statistic of 15
plt.fill_between(x, chi2_pdf, where=(x >= 15), color='lightblue', alpha=0.5, label='Area > 15')

# Add labels and legend
plt.title('Chi-Square Distribution with 10 Degrees of Freedom')
plt.xlabel('Chi-Square Statistic')
plt.ylabel('Probability Density Function')
plt.axvline(15, color='red', linestyle='--', label='Chi-Square Statistic = 15')
plt.legend()
plt.grid()
plt.show()
```

**Interpretation:**
The plot shows the chi-square distribution for 10 degrees of freedom, with the area shaded representing values greater than the chi-square statistic of 15. The dashed line indicates the statistic's position on the distribution.

---

### **Q7. 99% Confidence Interval for the Proportion of Coke Preference**

For the survey of 1000 people where 520 prefer Coke, we calculate the 99% confidence interval.

```python
# Given data
coke_preference = 520 / 1000  # Proportion preferring Coke
n = 1000
confidence_level = 0.99

# Z value for 99% confidence
z_value = stats.norm.ppf(1 - (1 - confidence_level) / 2)

# Standard error of the proportion
se = np.sqrt((coke_preference * (1 - coke_preference)) / n)

# Confidence Interval
ci_lower = coke_preference - z_value * se
ci_upper = coke_preference + z_value * se

(ci_lower, ci_upper)
```

**Interpretation:**
The 99% confidence interval indicates the range within which we can be 99% confident that the true proportion of the population preferring Coke falls.

---

### **Q8. Chi-Square Goodness of Fit Test for a Biased Coin**

For the coin that was flipped 100 times with 45 tails observed, we will conduct a chi-square goodness of fit test.

Expected frequencies for a fair coin (50 tails, 50 heads):

```python
# Observed frequencies
observed = [45, 55]  # 45 tails, 55 heads
expected = [50, 50]  # Expected for a fair coin

# Chi-square test
chi_square_stat, p_value = stats.chisquare(observed, expected)

(chi_square_stat, p_value)
```

**Interpretation:**
If the p-value is less than 0.05, we reject the null hypothesis that the coin is fair, indicating it may be biased. If greater, we fail to reject the hypothesis, suggesting no bias.

---

### **Q9. Chi-Square Test for Independence for Smoking Status and Lung Cancer Diagnosis**

Given the contingency table:

```
                Lung Cancer: Yes   Lung Cancer: No
Smoker               60                  140
Non-smoker           30                  170
```

Let's perform the chi-square test for independence.

```python
# Contingency table data
contingency_table = [[60, 140], [30, 170]]

# Chi-square test for independence
chi_square_stat, p_value, dof, expected = stats.chi2_contingency(contingency_table)

(chi_square_stat, p_value, expected)
```

**Interpretation:**
If the p-value is less than 0.05, we conclude that there is a significant association between smoking status and lung cancer diagnosis. If greater, we conclude there is no significant association.

---

### **Q10. Chi-Square Test for Independence for Chocolate Preference**

For the chocolate preference survey in the U.S. vs. U.K., let’s say we have the following contingency table:

| Preference       | U.S. | U.K. |
|------------------|------|------|
| Milk Chocolate    | 200  | 150  |
| Dark Chocolate    | 150  | 180  |
| White Chocolate    | 50   | 70   |

We will conduct a chi-square test for independence.

```python
# Contingency table data
chocolate_table = [[200, 150], [150, 180], [50, 70]]

# Chi-square test for independence
chi_square_stat, p_value, dof, expected = stats.chi2_contingency(chocolate_table)

(chi_square_stat, p_value, expected)
```

**Interpretation:**
Again, if the p-value is less than 0.01, we conclude that there is a significant association between chocolate preference and country of origin.

---

### **Q11. Hypothesis Test for Population Mean**

For a random sample of 30 people with a mean of 72 and standard deviation of 10, we will conduct a hypothesis test to see if the population mean is significantly different from 70.

```python
# Given data
sample_mean = 72
sample_std_dev = 10
n = 30
population_mean = 70
significance_level = 0.05

# Calculate the t-statistic
t_statistic = (sample_mean - population_mean) / (sample_std_dev / np.sqrt(n))

# Calculate the p-value (two-tailed)
p_value = 2 * (1 - stats.t.cdf(np.abs(t_statistic), df=n-1))

(t_statistic, p_value)
```

**Interpretation:**
If the p-value is less than 0.05, we reject the null hypothesis, suggesting the population mean is significantly different from 70. If greater, we do not reject the null hypothesis.

---

These calculations and interpretations provide insight into the statistical methods applied to each scenario. If you have any further questions or need additional explanations, feel free to ask!