In [None]:
Q1: What is Estimation Statistics? Explain point estimate and interval estimate.
Estimation Statistics is the process of making inferences about a population based on a sample. It is used to estimate the parameters of the population (e.g., mean, variance).
•	Point Estimate: A single value estimate of a population parameter. For example, using the sample mean (xˉ\bar{x}xˉ) as an estimate for the population mean (μ\muμ).
•	Interval Estimate: A range of values (an interval) within which the population parameter is expected to lie. For example, a confidence interval for the population mean.
________________________________________
Q2: Write a Python function to estimate the population mean using a sample mean and standard deviation.

import scipy.stats as stats
import numpy as np

def estimate_population_mean(sample_data, confidence_level=0.95):
    sample_mean = np.mean(sample_data)
    sample_std = np.std(sample_data, ddof=1)
    sample_size = len(sample_data)
    
    # Calculate confidence interval
    z_score = stats.t.ppf((1 + confidence_level) / 2, sample_size - 1)
    margin_of_error = z_score * (sample_std / np.sqrt(sample_size))
    
    lower_bound = sample_mean - margin_of_error
    upper_bound = sample_mean + margin_of_error
    
    return sample_mean, (lower_bound, upper_bound)

# Example
sample_data = [500, 520, 480, 510, 530]
mean, conf_interval = estimate_population_mean(sample_data)
print(f"Sample mean: {mean}, 95% Confidence Interval: {conf_interval}")
________________________________________
Q3: What is Hypothesis Testing? Why is it used? State the importance of Hypothesis Testing.
Hypothesis Testing is a statistical method used to make decisions or inferences about population parameters based on sample data. It involves testing an assumption (hypothesis) about a population parameter (e.g., mean, proportion).
•	Importance: It allows researchers to assess whether their assumptions or claims about the population are statistically significant or likely to have occurred by random chance.
________________________________________
Q4: Create a hypothesis that states whether the average weight of male college students is greater than the average weight of female college students.
Null Hypothesis (H0H_0H0): The average weight of male college students is equal to the average weight of female college students.
Alternative Hypothesis (HaH_aHa): The average weight of male college students is greater than the average weight of female college students.
________________________________________
Q5: Write a Python script to conduct a hypothesis test on the difference between two population means, given a sample from each population.
python
Copy code
import numpy as np
from scipy import stats

# Hypothesis test for two independent samples
def hypothesis_test_two_means(sample1, sample2, alpha=0.05):
    mean1 = np.mean(sample1)
    mean2 = np.mean(sample2)
    std1 = np.std(sample1, ddof=1)
    std2 = np.std(sample2, ddof=1)
    n1 = len(sample1)
    n2 = len(sample2)
    
    # Pooled standard deviation
    pooled_std = np.sqrt(((std1**2 / n1) + (std2**2 / n2)))
    
    # t-statistic
    t_stat = (mean1 - mean2) / pooled_std
    
    # Degrees of freedom
    df = n1 + n2 - 2
    
    # p-value
    p_value = 2 * (1 - stats.t.cdf(abs(t_stat), df))
    
    # Decision
    if p_value < alpha:
        return f"Reject the null hypothesis (p-value = {p_value})"
    else:
        return f"Fail to reject the null hypothesis (p-value = {p_value})"

# Example
sample1 = [80, 82, 78, 85, 79]
sample2 = [75, 78, 72, 77, 76]
print(hypothesis_test_two_means(sample1, sample2))
________________________________________
Q6: What is a null and alternative hypothesis? Give some examples.
•	Null Hypothesis (H0H_0H0): A statement that there is no effect or no difference in the population. It is what is assumed to be true until evidence suggests otherwise.
•	Alternative Hypothesis (HaH_aHa): A statement that there is an effect or difference. It is what the researcher wants to prove.
Example:
•	H0H_0H0: The average height of men is equal to the average height of women.
•	HaH_aHa: The average height of men is different from the average height of women.
________________________________________
Q7: Write down the steps involved in hypothesis testing.
1.	State the null and alternative hypotheses.
2.	Choose the significance level (α\alphaα), usually 0.05 or 0.01.
3.	Collect data and calculate the test statistic (e.g., t-statistic or z-statistic).
4.	Determine the p-value corresponding to the test statistic.
5.	Make a decision:
o	If p≤αp \leq \alphap≤α, reject the null hypothesis.
o	If p>αp > \alphap>α, fail to reject the null hypothesis.
6.	Interpret the result in context.
________________________________________
Q8: Define p-value and explain its significance in hypothesis testing.
p-value is the probability of obtaining test results at least as extreme as the observed results, under the assumption that the null hypothesis is true. It helps determine the strength of the evidence against the null hypothesis.
•	Significance: A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, leading to its rejection.
________________________________________
Q9: Generate a Student's t-distribution plot using Python's matplotlib library, with the degrees of freedom parameter set to 10.
python
Copy code
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t

# Degrees of freedom
df = 10

# Generate x values
x = np.linspace(-5, 5, 1000)

# Generate t-distribution y values
y = t.pdf(x, df)

# Plotting the t-distribution
plt.plot(x, y, label=f't-distribution (df={df})')
plt.title("Student's t-distribution")
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()
________________________________________
Q10: Write a Python program to calculate the two-sample t-test for independent samples, given two random samples of equal size and a null hypothesis that the population means are equal.
python
Copy code
from scipy import stats

# Two independent samples
sample1 = [82, 85, 87, 86, 84]
sample2 = [78, 80, 81, 79, 82]

# Perform t-test
t_stat, p_value = stats.ttest_ind(sample1, sample2)

print(f"T-statistic: {t_stat}, P-value: {p_value}")

# Interpretation
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")
________________________________________
Q11: What is Student’s t-distribution? When to use the t-Distribution?
Student’s t-distribution is used to estimate population parameters when the sample size is small, and the population standard deviation is unknown. It is symmetric and bell-shaped but has heavier tails than the normal distribution.
•	When to use: When dealing with small sample sizes (typically n<30n < 30n<30) and when the population standard deviation is unknown.
________________________________________
Q12: What is t-statistic? State the formula for t-statistic.
The t-statistic measures the difference between the sample mean and the population mean in terms of standard error. It is used in hypothesis testing when the population standard deviation is unknown.
•	Formula: t=xˉ−μsnt = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}}t=nsxˉ−μ Where:
o	xˉ\bar{x}xˉ = sample mean,
o	μ\muμ = population mean,
o	sss = sample standard deviation,
o	nnn = sample size.
Q13: Estimate the population mean revenue with a 95% confidence interval.
We are given:
•	Sample mean (xˉ\bar{x}xˉ) = $500
•	Sample standard deviation (sss) = $50
•	Sample size (nnn) = 50
•	Confidence level = 95%
To calculate the confidence interval, we use the formula:
CI=xˉ±zα/2×snCI = \bar{x} \pm z_{\alpha/2} \times \frac{s}{\sqrt{n}}CI=xˉ±zα/2×ns
Where:
•	zα/2z_{\alpha/2}zα/2 is the critical value for a 95% confidence interval (1.96 for a normal distribution).
•	sn\frac{s}{\sqrt{n}}ns is the standard error.
Solution:
CI=500±1.96×5050=500±1.96×7.071=500±13.86CI = 500 \pm 1.96 \times \frac{50}{\sqrt{50}} = 500 \pm 1.96 \times 7.071 = 500 \pm 13.86CI=500±1.96×5050=500±1.96×7.071=500±13.86 CI=(486.14,513.86)CI = (486.14, 513.86)CI=(486.14,513.86)
The 95% confidence interval for the population mean revenue is ($486.14, $513.86).
________________________________________
Q14: Hypothesis test for decrease in blood pressure (significance level of 0.05).
We are given:
•	Sample mean (xˉ\bar{x}xˉ) = 8 mmHg
•	Population mean (μ\muμ) = 10 mmHg (hypothesized)
•	Sample standard deviation (sss) = 3 mmHg
•	Sample size (nnn) = 100
•	Significance level (α\alphaα) = 0.05
Null Hypothesis (H0H_0H0): The new drug decreases blood pressure by 10 mmHg. (μ=10\mu = 10μ=10)
Alternative Hypothesis (HaH_aHa): The new drug decreases blood pressure by less than 10 mmHg. (μ<10\mu < 10μ<10)
We perform a one-sample t-test:
t=xˉ−μsn=8−103100=−20.3=−6.67t = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}} = \frac{8 - 10}{\frac{3}{\sqrt{100}}} = \frac{-2}{0.3} = -6.67t=nsxˉ−μ=10038−10=0.3−2=−6.67
Next, we find the critical value for a one-tailed test at α=0.05\alpha = 0.05α=0.05 with 99 degrees of freedom. Using a t-distribution table, the critical value is approximately −1.66-1.66−1.66.
Since t=−6.67t = -6.67t=−6.67 is less than −1.66-1.66−1.66, we reject the null hypothesis. There is significant evidence that the drug decreases blood pressure by less than 10 mmHg.
________________________________________
Q15: Hypothesis test for product weight (significance level of 0.01).
We are given:
•	Sample mean (xˉ\bar{x}xˉ) = 4.8 pounds
•	Population mean (μ\muμ) = 5 pounds (hypothesized)
•	Sample standard deviation (sss) = 0.5 pounds
•	Sample size (nnn) = 25
•	Significance level (α\alphaα) = 0.01
Null Hypothesis (H0H_0H0): The true mean weight is 5 pounds. (μ=5\mu = 5μ=5)
Alternative Hypothesis (HaH_aHa): The true mean weight is less than 5 pounds. (μ<5\mu < 5μ<5)
We perform a one-sample t-test:
t=xˉ−μsn=4.8−50.525=−0.20.1=−2t = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}} = \frac{4.8 - 5}{\frac{0.5}{\sqrt{25}}} = \frac{-0.2}{0.1} = -2t=nsxˉ−μ=250.54.8−5=0.1−0.2=−2
Using a t-distribution table, the critical value for a one-tailed test at α=0.01\alpha = 0.01α=0.01 with 24 degrees of freedom is approximately −2.492-2.492−2.492.
Since t=−2t = -2t=−2 is greater than −2.492-2.492−2.492, we fail to reject the null hypothesis. There is not enough evidence to conclude that the true mean weight is less than 5 pounds.
________________________________________
Q16: Hypothesis test for difference in test scores (significance level of 0.01).
We are given:
•	n1=30n_1 = 30n1=30, xˉ1=80\bar{x}_1 = 80xˉ1=80, s1=10s_1 = 10s1=10
•	n2=40n_2 = 40n2=40, xˉ2=75\bar{x}_2 = 75xˉ2=75, s2=8s_2 = 8s2=8
•	Significance level (α\alphaα) = 0.01
Null Hypothesis (H0H_0H0): The population means are equal (μ1=μ2\mu_1 = \mu_2μ1=μ2).
Alternative Hypothesis (HaH_aHa): The population means are not equal (μ1≠μ2\mu_1 \neq \mu_2μ1=μ2).
We perform a two-sample t-test:
t=xˉ1−xˉ2s12n1+s22n2=80−7510230+8240=53.33+1.6=52.28=2.19t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} = \frac{80 - 75}{\sqrt{\frac{10^2}{30} + \frac{8^2}{40}}} = \frac{5}{\sqrt{3.33 + 1.6}} = \frac{5}{2.28} = 2.19t=n1s12+n2s22xˉ1−xˉ2=30102+408280−75=3.33+1.65=2.285=2.19
Degrees of freedom can be calculated using the formula for unequal variances or approximated as 30 + 40 - 2 = 68.
Using a t-distribution table, the critical value for a two-tailed test at α=0.01\alpha = 0.01α=0.01 with 68 degrees of freedom is approximately ±2.66\pm 2.66±2.66.
Since t=2.19t = 2.19t=2.19 is less than 2.662.662.66, we fail to reject the null hypothesis. There is no significant difference between the population means.
________________________________________
Q17: Estimate the population mean with a 99% confidence interval.
We are given:
•	Sample mean (xˉ\bar{x}xˉ) = 4
•	Sample standard deviation (sss) = 1.5
•	Sample size (nnn) = 50
•	Confidence level = 99%
The confidence interval is calculated as:
CI=xˉ±zα/2×snCI = \bar{x} \pm z_{\alpha/2} \times \frac{s}{\sqrt{n}}CI=xˉ±zα/2×ns
For a 99% confidence level, the critical value zα/2z_{\alpha/2}zα/2 is 2.576.
CI=4±2.576×1.550=4±2.576×0.212=4±0.546CI = 4 \pm 2.576 \times \frac{1.5}{\sqrt{50}} = 4 \pm 2.576 \times 0.212 = 4 \pm 0.546CI=4±2.576×501.5=4±2.576×0.212=4±0.546 CI=(3.454,4.546)CI = (3.454, 4.546)CI=(3.454,4.546)
The 99% confidence interval for the population mean is (3.454, 4.546).

