In [None]:
Q1. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5 using Python. Interpret the results.

ANS-1

To calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5 using Python, you can use the `scipy.stats` module. Specifically, you can use the `norm.interval()` function, which is used to calculate the confidence interval for normally distributed data.

Here's how you can do it:

```python
import scipy.stats as stats

# Given data
sample_mean = 50
sample_std_dev = 5
sample_size = ???  # Replace this with the actual sample size

# Calculate the 95% confidence interval
confidence_level = 0.95
margin_of_error = stats.norm.ppf((1 + confidence_level) / 2) * (sample_std_dev / (sample_size ** 0.5))
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

# Display the results
print(f"95% Confidence Interval: ({lower_bound}, {upper_bound})")
```

Please note that to calculate the confidence interval, you need to know the sample size (i.e., `sample_size`). The confidence interval represents a range within which we can be 95% confident that the true population mean lies. In this case, with a mean of 50 and a standard deviation of 5, the 95% confidence interval will be calculated based on the provided sample size.

Interpretation:
The 95% confidence interval means that if you were to take multiple samples from the same population and calculate their confidence intervals, approximately 95% of those intervals would contain the true population mean. In this specific case, the interval (lower_bound, upper_bound) is the range within which we are 95% confident that the population mean lies based on the sample data.




Q2. Conduct a chi-square goodness of fit test to determine if the distribution of colors of M&Ms in a bag
matches the expected distribution of 20% blue, 20% orange, 20% green, 10% yellow, 10% red, and 20%
brown. Use Python to perform the test with a significance level of 0.05.


ANS-2

To conduct a chi-square goodness of fit test in Python to determine if the distribution of colors of M&Ms in a bag matches the expected distribution, you can use the `scipy.stats` module. Specifically, you'll use the `chisquare()` function, which performs the chi-square test for the goodness of fit.

Here's how you can do it:

```python
import numpy as np
from scipy.stats import chisquare

# Given observed frequencies of colors in the M&M bag
observed_frequencies = np.array([blue_freq, orange_freq, green_freq, yellow_freq, red_freq, brown_freq])
total_observed = observed_frequencies.sum()

# Expected frequencies based on the given distribution
expected_frequencies = np.array([0.2, 0.2, 0.2, 0.1, 0.1, 0.2]) * total_observed

# Perform the chi-square goodness of fit test
chi2_stat, p_value = chisquare(f_obs=observed_frequencies, f_exp=expected_frequencies)

# Define the significance level
alpha = 0.05

# Print the results
print("Chi-square statistic:", chi2_stat)
print("P-value:", p_value)

if p_value < alpha:
    print("The distribution of colors in the bag does not match the expected distribution.")
else:
    print("The distribution of colors in the bag matches the expected distribution.")
```

Before running the code, make sure you have the observed frequencies for each color (e.g., `blue_freq`, `orange_freq`, etc.). Replace the placeholders with the actual observed frequencies in the `observed_frequencies` array.

The chi-square goodness of fit test compares the observed frequencies with the expected frequencies under the null hypothesis that the observed frequencies follow the expected distribution. The p-value obtained from the test tells us whether there is enough evidence to reject the null hypothesis. If the p-value is less than the chosen significance level (in this case, 0.05), we reject the null hypothesis, indicating that the distribution of colors in the M&M bag does not match the expected distribution. If the p-value is greater than the significance level, we fail to reject the null hypothesis, indicating that the distribution of colors in the M&M bag matches the expected distribution.






Q3. Use Python to calculate the chi-square statistic and p-value for a contingency table with the following
data:
    
    
    ANS-3
    
    
    Sure, to calculate the chi-square statistic and p-value for the given contingency table in Python, you can use the `scipy.stats` module again. This time, we'll use the `chi2_contingency()` function, which performs the chi-square test for independence on a contingency table.

Here's how you can do it:

```python
import numpy as np
from scipy.stats import chi2_contingency

# Given data (contingency table)
observed_data = np.array([[20, 15],
                          [10, 25],
                          [15, 20]])

# Perform the chi-square test for independence
chi2_stat, p_value, dof, expected = chi2_contingency(observed_data)

# Print the results
print("Chi-square statistic:", chi2_stat)
print("P-value:", p_value)
print("Degrees of freedom:", dof)
print("Expected frequencies:\n", expected)
```

The `observed_data` matrix represents the contingency table with rows as the different outcomes and columns as the groups. Replace the given data with the actual values in the matrix.

The chi-square test for independence determines whether there is a significant association between the rows and columns of the contingency table. The p-value obtained from the test tells us whether there is enough evidence to reject the null hypothesis of independence. If the p-value is less than the chosen significance level (commonly 0.05), we reject the null hypothesis, indicating that there is a significant association between the outcomes and the groups. If the p-value is greater than the significance level, we fail to reject the null hypothesis, indicating that there is no significant association between the outcomes and the groups.

The `expected` matrix shows the expected frequencies under the assumption of independence. If there is no significant association, the observed frequencies will be close to the expected frequencies.




Q4. A study of the prevalence of smoking in a population of 500 individuals found that 60 individuals
smoked. Use Python to calculate the 95% confidence interval for the true proportion of individuals in the
population who smoke.

ANS-4

To calculate the 95% confidence interval for the true proportion of individuals in the population who smoke, you can use the formula for the confidence interval of a proportion. In this case, you have the sample proportion (p̂) from the study, which is the proportion of individuals who smoke in the sample.

The formula for the confidence interval for a proportion is:

\[ \text{CI} = \left( \hat{p} - z \times \sqrt{\frac{\hat{p} \times (1 - \hat{p})}{n}}, \ \hat{p} + z \times \sqrt{\frac{\hat{p} \times (1 - \hat{p})}{n}} \right) \]

where:
- \(\hat{p}\) is the sample proportion (number of individuals who smoke / total sample size).
- \(n\) is the sample size (total number of individuals in the sample).
- \(z\) is the critical value from the standard normal distribution corresponding to the desired confidence level. For a 95% confidence level, \(z \approx 1.96\).

Now, let's calculate the confidence interval in Python:

```python
import scipy.stats as stats
import math

# Given data
total_population = 500
num_smokers = 60

# Calculate the sample proportion (p̂)
sample_proportion = num_smokers / total_population

# Calculate the critical value (z) for a 95% confidence level
confidence_level = 0.95
z_critical = stats.norm.ppf(1 - (1 - confidence_level) / 2)

# Calculate the standard error of the proportion
standard_error = math.sqrt((sample_proportion * (1 - sample_proportion)) / total_population)

# Calculate the lower and upper bounds of the confidence interval
lower_bound = sample_proportion - z_critical * standard_error
upper_bound = sample_proportion + z_critical * standard_error

# Display the results
print(f"95% Confidence Interval for the proportion of smokers: ({lower_bound:.4f}, {upper_bound:.4f})")
```

The output will provide the 95% confidence interval for the true proportion of individuals in the population who smoke. The interval indicates that we are 95% confident that the true proportion of smokers in the population lies between the lower_bound and upper_bound values.




Q5. Calculate the 90% confidence interval for a sample of data with a mean of 75 and a standard deviation
of 12 using Python. Interpret the results.

ANS-5


To calculate the 90% confidence interval for a sample of data with a mean of 75 and a standard deviation of 12 using Python, you can use the `scipy.stats` module. As before, you can use the `norm.interval()` function to calculate the confidence interval for normally distributed data.

Here's how you can do it:

```python
import scipy.stats as stats

# Given data
sample_mean = 75
sample_std_dev = 12
sample_size = ???  # Replace this with the actual sample size

# Calculate the 90% confidence interval
confidence_level = 0.90
margin_of_error = stats.norm.ppf((1 + confidence_level) / 2) * (sample_std_dev / (sample_size ** 0.5))
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

# Display the results
print(f"90% Confidence Interval: ({lower_bound}, {upper_bound})")
```

Please replace the `sample_size` placeholder with the actual sample size. The confidence interval represents a range within which we can be 90% confident that the true population mean lies. In this case, with a mean of 75 and a standard deviation of 12, the 90% confidence interval will be calculated based on the provided sample size.

Interpretation:
The 90% confidence interval means that if you were to take multiple samples from the same population and calculate their confidence intervals, approximately 90% of those intervals would contain the true population mean. In this specific case, the interval (lower_bound, upper_bound) is the range within which we are 90% confident that the population mean lies based on the sample data.



Q6. Use Python to plot the chi-square distribution with 10 degrees of freedom. Label the axes and shade the
area corresponding to a chi-square statistic of 15.



ANS-6


To plot the chi-square distribution with 10 degrees of freedom in Python and shade the area corresponding to a chi-square statistic of 15, you can use the `scipy.stats` module and the `matplotlib` library. We'll calculate the probability density function (PDF) for the chi-square distribution and then shade the area under the curve corresponding to the chi-square statistic of 15.

Here's how you can do it:

```python
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# Degrees of freedom for the chi-square distribution
df = 10

# Define the range of x values for the plot
x = np.linspace(0, 30, 500)

# Calculate the chi-square probability density function (PDF) for df degrees of freedom
chi2_pdf = stats.chi2.pdf(x, df)

# Plot the chi-square distribution
plt.plot(x, chi2_pdf, label=f'Chi-square (df={df})')

# Shade the area corresponding to the chi-square statistic of 15
shade_x = np.linspace(15, 30, 500)
shade_y = stats.chi2.pdf(shade_x, df)
plt.fill_between(shade_x, shade_y, color='lightcoral', alpha=0.7, label='Chi-square Statistic (15)')

# Label the axes and add a legend
plt.xlabel('Chi-square Statistic')
plt.ylabel('Probability Density Function')
plt.legend()

# Show the plot
plt.grid()
plt.title('Chi-square Distribution')
plt.show()
```

This code will plot the chi-square distribution with 10 degrees of freedom, and the shaded area corresponds to a chi-square statistic of 15. The `stats.chi2.pdf()` function from `scipy.stats` is used to calculate the probability density function of the chi-square distribution, and `plt.fill_between()` is used to shade the area under the curve for the specified chi-square statistic. The plot will help visualize the distribution and the specific area corresponding to the given chi-square statistic.




Q7. A random sample of 1000 people was asked if they preferred Coke or Pepsi. Of the sample, 520
preferred Coke. Calculate a 99% confidence interval for the true proportion of people in the population who
prefer Coke.


ANS-7


To calculate a 99% confidence interval for the true proportion of people in the population who prefer Coke, you can use the same formula as in Q4 for calculating confidence intervals for proportions.

The formula for the confidence interval for a proportion is:

\[ \text{CI} = \left( \hat{p} - z \times \sqrt{\frac{\hat{p} \times (1 - \hat{p})}{n}}, \ \hat{p} + z \times \sqrt{\frac{\hat{p} \times (1 - \hat{p})}{n}} \right) \]

where:
- \(\hat{p}\) is the sample proportion (number of people who prefer Coke / total sample size).
- \(n\) is the sample size (total number of people in the sample).
- \(z\) is the critical value from the standard normal distribution corresponding to the desired confidence level. For a 99% confidence level, \(z \approx 2.576\).

Let's calculate the confidence interval in Python:

```python
import scipy.stats as stats
import math

# Given data
total_sample_size = 1000
num_preferred_coke = 520

# Calculate the sample proportion (p̂)
sample_proportion = num_preferred_coke / total_sample_size

# Calculate the critical value (z) for a 99% confidence level
confidence_level = 0.99
z_critical = stats.norm.ppf(1 - (1 - confidence_level) / 2)

# Calculate the standard error of the proportion
standard_error = math.sqrt((sample_proportion * (1 - sample_proportion)) / total_sample_size)

# Calculate the lower and upper bounds of the confidence interval
lower_bound = sample_proportion - z_critical * standard_error
upper_bound = sample_proportion + z_critical * standard_error

# Display the results
print(f"99% Confidence Interval for the proportion of people who prefer Coke: ({lower_bound:.4f}, {upper_bound:.4f})")
```

The output will provide the 99% confidence interval for the true proportion of people in the population who prefer Coke. The interval indicates that we are 99% confident that the true proportion of people who prefer Coke in the population lies between the lower_bound and upper_bound values.




Q8. A researcher hypothesizes that a coin is biased towards tails. They flip the coin 100 times and observe
45 tails. Conduct a chi-square goodness of fit test to determine if the observed frequencies match the
expected frequencies of a fair coin. Use a significance level of 0.05.


ANS-8


To conduct a chi-square goodness of fit test to determine if the observed frequencies of the coin flips match the expected frequencies of a fair coin, you can use Python with the `scipy.stats` module. In this test, the null hypothesis is that the coin is fair (not biased towards tails), and the alternative hypothesis is that the coin is biased towards tails.

Here's how you can perform the chi-square goodness of fit test:

```python
import numpy as np
from scipy.stats import chisquare

# Given data
total_flips = 100
observed_tails = 45
expected_tails = total_flips * 0.5  # For a fair coin, expected frequency of tails is 50%

# Create the observed and expected frequencies arrays for the chi-square test
observed_frequencies = np.array([observed_tails, total_flips - observed_tails])
expected_frequencies = np.array([expected_tails, total_flips - expected_tails])

# Perform the chi-square goodness of fit test
chi2_stat, p_value = chisquare(f_obs=observed_frequencies, f_exp=expected_frequencies)

# Define the significance level
alpha = 0.05

# Print the results
print("Chi-square statistic:", chi2_stat)
print("P-value:", p_value)

if p_value < alpha:
    print("The observed frequencies do not match the expected frequencies of a fair coin.")
else:
    print("The observed frequencies match the expected frequencies of a fair coin.")
```

In this code, we first calculate the expected frequency of tails assuming a fair coin (i.e., 50% chance of tails). Then, we use the `chisquare()` function from `scipy.stats` to perform the chi-square goodness of fit test. The `chisquare()` function compares the observed and expected frequencies and returns the chi-square statistic and the p-value.

The p-value obtained from the test tells us whether there is enough evidence to reject the null hypothesis of a fair coin. If the p-value is less than the chosen significance level (0.05 in this case), we reject the null hypothesis, indicating that the observed frequencies do not match the expected frequencies of a fair coin. If the p-value is greater than the significance level, we fail to reject the null hypothesis, suggesting that the observed frequencies are consistent with those expected for a fair coin.




