Q1: What is Estimation Statistics? Explain point estimate and interval estimate.

Estimation statistics involves making inferences or predictions about population parameters based on sample data. There are two main types of estimation: point estimation and interval estimation.

Q2. Write a Python function to estimate the population mean using a sample mean and standard
deviation.

In [None]:
import numpy as np

def population_mean_estimate(sample_mean, sample_std, sample_size):
    # Calculate the standard error of the mean
    standard_error = sample_std / np.sqrt(sample_size)

    # Calculate the margin of error (using a 95% confidence interval)
    margin_of_error = 1.96 * standard_error  # for a 95% confidence interval

    # Calculate the lower and upper bounds of the confidence interval
    lower_bound = sample_mean - margin_of_error
    upper_bound = sample_mean + margin_of_error

    # Return the interval estimate
    return lower_bound, upper_bound

Q3: What is Hypothesis testing? Why is it used? State the importance of Hypothesis testing.

Hypothesis testing is a statistical method used to make inferences about population parameters based on sample data. It involves setting up two competing hypotheses, the null hypothesis (H0) and the alternative hypothesis (H1), and using sample data to determine which hypothesis is more likely to be true. Hypothesis testing is used to evaluate the strength of evidence against the null hypothesis and make decisions about population parameters or the underlying data-generating process.

Q4. Create a hypothesis that states whether the average weight of male college students is greater than
the average weight of female college students.

Hypothesis:
Null Hypothesis (H0): The average weight of male college students is equal to or less than the average weight of female college students.
Alternative Hypothesis (H1): The average weight of male college students is greater than the average weight of female college students.

Q5. Write a Python script to conduct a hypothesis test on the difference between two population means,
given a sample from each population.

In [None]:
import numpy as np
from scipy import stats

# Sample data for male and female college students
male_weights = np.array([70, 72, 68, 75, 71])  # Sample weights of male students
female_weights = np.array([65, 68, 64, 70, 67])  # Sample weights of female students

# Calculate sample statistics
mean_male = np.mean(male_weights)
mean_female = np.mean(female_weights)
std_male = np.std(male_weights, ddof=1)  # Using Bessel's correction for sample standard deviation
std_female = np.std(female_weights, ddof=1)

# Perform independent two-sample t-test
t_stat, p_value = stats.ttest_ind(male_weights, female_weights, equal_var=False)

# Print results
print("Sample mean weight of male students:", mean_male)
print("Sample mean weight of female students:", mean_female)
print("t-statistic:", t_stat)
print("p-value:", p_value)

# Check significance level
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There is sufficient evidence to conclude that the average weight of male college students is greater than the average weight of female college students.")
else:
    print("Fail to reject the null hypothesis. There is not sufficient evidence to conclude that the average weight of male college students is greater than the average weight of female college students.")

Q6: What is a null and alternative hypothesis? Give some examples.

Null Hypothesis (H0): The mean weight of a certain population is equal to a specified value. For example, H0: μ = 70 (the population mean weight is 70 kg).
Alternative Hypothesis (H1): The mean weight of a certain population is not equal to a specified value. For example, H1: μ ≠ 70 (the population mean weight is not equal to 70 kg).

Q7: Write down the steps involved in hypothesis testing.

Formulate Hypotheses:
-Null Hypothesis (H0): Assumes no effect or no difference.
Alternative Hypothesis (H1 or Ha): Assumes there is an effect or a difference.
-Choose Significance Level (α):
The significance level (α) determines the threshold for rejecting the null hypothesis.
Commonly used significance levels are 0.05 (5%) and 0.01 (1%).
-Select a Test Statistic:
Choose a statistical test appropriate for the data and hypothesis being tested.
Common tests include t-tests, chi-square tests, ANOVA, etc.

Q8. Define p-value and explain its significance in hypothesis testing.

The p-value is the probability of observing a test statistic as extreme as or more extreme than the one observed in the sample data, under the assumption that the null hypothesis is true. It is a measure of the strength of the evidence against the null hypothesis.

Q9. Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom
parameter set to 10.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t

# Generate data points for t-distribution
df = 10  # Degrees of freedom
x = np.linspace(t.ppf(0.001, df), t.ppf(0.999, df), 1000)  # 99.9% confidence interval

# Plot t-distribution
plt.plot(x, t.pdf(x, df), 'r-', lw=2, label='t-distribution (df=10)')
plt.title("Student's t-distribution")
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()

Q10. Write a Python program to calculate the two-sample t-test for independent samples, given two
random samples of equal size and a null hypothesis that the population means are equal.

In [None]:
import numpy as np
from scipy.stats import ttest_ind

# Generate two random samples
sample1 = np.random.normal(loc=10, scale=2, size=100)
sample2 = np.random.normal(loc=12, scale=2, size=100)

# Perform two-sample t-test
t_statistic, p_value = ttest_ind(sample1, sample2)

# Print results
print("t-statistic:", t_statistic)
print("p-value:", p_value)

# Interpret results
alpha = 0.05  # Significance level
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between the population means.")
else:
    print("Fail to reject the null hypothesis: There is no significant difference between the population means.")

Q11: What is Student’s t distribution? When to use the t-Distribution.

tudent’s t distribution is a probability distribution that arises when estimating the population mean of a normally distributed population with an unknown standard deviation, and the sample size is small (typically less than 30)

Q12: What is t-statistic? State the formula for t-statistic.

The t-statistic is a measure used in hypothesis testing to determine if there is a significant difference between the means of two groups. The formula for the t-statistic in a two-sample t-test for independent samples is:

Q13. A coffee shop owner wants to estimate the average daily revenue for their shop. They take a random
sample of 50 days and find the sample mean revenue to be $500 with a standard deviation of $50.
Estimate the population mean revenue with a 95% confidence interval.

In [None]:
import numpy as np

# Given data
sample_mean = 500
sample_std = 50
sample_size = 50
confidence_level = 0.95

# Calculate the margin of error
z_score = 1.96  # Z-score for 95% confidence level
margin_of_error = z_score * (sample_std / np.sqrt(sample_size))

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print("95% Confidence Interval for the population mean revenue: (${:.2f}, ${:.2f})".format(lower_bound, upper_bound))

Q14. A researcher hypothesizes that a new drug will decrease blood pressure by 10 mmHg. They conduct a
clinical trial with 100 patients and find that the sample mean decrease in blood pressure is 8 mmHg with a
standard deviation of 3 mmHg. Test the hypothesis with a significance level of 0.05.

In [None]:
from scipy.stats import t

# Given data
sample_mean = 8  # Sample mean decrease in blood pressure
population_mean = 10  # Hypothesized population mean decrease in blood pressure
sample_std = 3  # Sample standard deviation
sample_size = 100  # Sample size
alpha = 0.05  # Significance level

# Calculate the t-statistic
t_statistic = (sample_mean - population_mean) / (sample_std / np.sqrt(sample_size))

# Calculate the critical t-value
df = sample_size - 1  # Degrees of freedom
critical_t_value = t.ppf(1 - alpha / 2, df)

# Perform the hypothesis test
if abs(t_statistic) > critical_t_value:
    print("Reject the null hypothesis: There is sufficient evidence to conclude that the new drug has an effect on blood pressure.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to conclude that the new drug has an effect on blood pressure.")

Q15. An electronics company produces a certain type of product with a mean weight of 5 pounds and a
standard deviation of 0.5 pounds. A random sample of 25 products is taken, and the sample mean weight
is found to be 4.8 pounds. Test the hypothesis that the true mean weight of the products is less than 5
pounds with a significance level of 0.01.

In [None]:
sample_mean = 4.8  # Sample mean weight of the products
population_mean = 5  # Hypothesized population mean weight of the products
sample_std = 0.5  # Sample standard deviation
sample_size = 25  # Sample size
alpha = 0.01  # Significance level

# Calculate the t-statistic
t_statistic = (sample_mean - population_mean) / (sample_std / np.sqrt(sample_size))

# Calculate the critical t-value
df = sample_size - 1  # Degrees of freedom
critical_t_value = t.ppf(alpha, df)

# Perform the hypothesis test
if t_statistic < critical_t_value:
    print("Reject the null hypothesis: There is sufficient evidence to conclude that the true mean weight of the products is less than 5 pounds.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to conclude that the true mean weight of the products is less than 5 pounds.")

Q16. Two groups of students are given different study materials to prepare for a test. The first group (n1 =
30) has a mean score of 80 with a standard deviation of 10, and the second group (n2 = 40) has a mean
score of 75 with a standard deviation of 8. Test the hypothesis that the population means for the two
groups are equal with a significance level of 0.01.

In [None]:
# Given data for Group 1
n1 = 30
mean1 = 80
std1 = 10

# Given data for Group 2
n2 = 40
mean2 = 75
std2 = 8

# Significance level
alpha = 0.01

# Calculate the pooled standard deviation
pooled_std = np.sqrt(((n1 - 1) * std1 ** 2 + (n2 - 1) * std2 ** 2) / (n1 + n2 - 2))

# Calculate the t-statistic
t_statistic = (mean1 - mean2) / (pooled_std * np.sqrt(1 / n1 + 1 / n2))

# Calculate the degrees of freedom
df = n1 + n2 - 2

# Calculate the critical t-value
critical_t_value = t.ppf(1 - alpha / 2, df)

# Perform the hypothesis test
if abs(t_statistic) > critical_t_value:
    print("Reject the null hypothesis: There is sufficient evidence to conclude that the population means for the two groups are not equal.")
else:
    print("Fail to reject the null hypothesis: There is not enough evidence to conclude that the population means for the two groups are not equal.")

Q17. A marketing company wants to estimate the average number of ads watched by viewers during a TV
program. They take a random sample of 50 viewers and find that the sample mean is 4 with a standard
deviation of 1.5. Estimate the population mean with a 99% confidence interval.

In [None]:
# Given data
sample_mean = 4
sample_std = 1.5
sample_size = 50
confidence_level = 0.99

# Calculate the margin of error
z_score = 2.576  # Z-score for 99% confidence level
margin_of_error = z_score * (sample_std / np.sqrt(sample_size))

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print("99% Confidence Interval for the population mean number of ads watched: ({:.2f}, {:.2f})".format(lower_bound, upper_bound))