In [None]:
Ans 1 
Estimation statistics is a branch of statistics that deals with the process of estimating unknown population parameters based on sample data.
It involves making inferences about the characteristics of a population by analyzing information collected from a smaller subset, known as a sample.

There are two common types of estimates in estimation statistics: point estimates and interval estimates.

Point Estimate: A point estimate is a single value that is calculated from the sample data and is used to estimate the value of an unknown
population parameter. It provides a best guess or approximation of the parameter. For example, if you want to estimate the average height of 
all students in a school, you can take a sample of students, calculate the mean height of that sample, and use it as a point estimate for 
the average height of all students in the school.

Interval Estimate: An interval estimate, also known as a confidence interval, is a range of values within which the population parameter is 
likely to fall. It provides a range of plausible values instead of a single point. The interval estimate is constructed based on the sample data
and is accompanied by a confidence level that quantifies the level of confidence in the interval. For example, a 95% confidence interval for
the average height of students might be [160 cm, 170 cm]. This means that there is a 95% probability that the true average height
of all students falls within this range.

In [None]:
Ans 2 
import math

def estimate_population_mean(sample_mean, sample_std_deviation, sample_size):
    # Calculate the standard error (standard deviation of the sampling distribution)
    standard_error = sample_std_deviation / math.sqrt(sample_size)
    
    # Calculate the margin of error (usually multiplied by a z-score for a desired confidence level)
    margin_of_error = 1.96 * standard_error  # Assuming a 95% confidence level (z-score of 1.96)
    
    # Calculate the lower and upper bounds of the confidence interval
    lower_bound = sample_mean - margin_of_error
    upper_bound = sample_mean + margin_of_error
    
    # Return the estimated population mean and the confidence interval
    return sample_mean, (lower_bound, upper_bound)

sample_mean = 75.2
sample_std_deviation = 5.6
sample_size = 100

population_mean, confidence_interval = estimate_population_mean(sample_mean, sample_std_deviation, sample_size)

print("Estimated population mean:", population_mean)
print("Confidence interval:", confidence_interval)


In [None]:
Ans 3
Hypothesis testing is a statistical technique used to make inferences or draw conclusions about a population based on sample data.
It involves formulating two competing hypotheses, the null hypothesis (H0) and the alternative hypothesis (HA), and assessing the evidence
from the sample to determine which hypothesis is more supported by the data.

The null hypothesis (H0) represents the status quo or the absence of an effect, while the alternative hypothesis (HA) represents the claim
or the presence of an effect. Hypothesis testing helps to evaluate the validity of the alternative hypothesis by testing it against the
null hypothesis.

The importance of hypothesis testing lies in its ability to provide a systematic framework for making decisions and drawing conclusions based on data.
Here are some key reasons why hypothesis testing is valuable:

Objective Decision Making: Hypothesis testing provides an objective and systematic approach to decision making. It allows researchers and
analysts to evaluate the evidence from the data and make informed decisions regarding the acceptance or rejection of a hypothesis.

Inference about Populations: Hypothesis testing enables inference about population parameters based on sample data. It allows us to draw conclusions
about characteristics of a larger population by analyzing a smaller representative sample.

In [None]:
Ans 4 
Null hypothesis (H0): The average weight of male college students is equal to or less than the average weight of female college students.

Alternative hypothesis (HA): The average weight of male college students is greater than the average weight of female college students.

Symbolically:

H0: μm ≤ μf

HA: μm > μf

In this hypothesis, μm represents the population mean weight of male college students, and μf represents the population mean weight of 
female college students. The null hypothesis assumes that there is no significant difference in the average weight between the two groups, 
while the alternative hypothesis suggests that there is a difference, specifically that the average weight of male college students is 
greater than that of female college students.

This hypothesis can be tested by collecting data on the weights of male and female college students, calculating the sample means for 
each group, and performing statistical tests such as a t-test or a z-test to determine if the observed difference in means is statistically 
significant.

In [None]:
Ans 5
import numpy as np
from scipy import stats

# Sample data from two populations
sample1 = [75, 80, 85, 90, 95]  # Sample from population 1
sample2 = [70, 75, 80, 85, 90]  # Sample from population 2

# Set the significance level (alpha)
alpha = 0.05

# Perform a two-sample t-test
t_statistic, p_value = stats.ttest_ind(sample1, sample2)

# Print the results
print("Sample 1 mean:", np.mean(sample1))
print("Sample 2 mean:", np.mean(sample2))
print("t-statistic:", t_statistic)
print("p-value:", p_value)

# Compare the p-value with the significance level to make a decision
if p_value < alpha:
    print("Reject the null hypothesis. There is a significant difference in population means.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference in population means.")


In [None]:
Ans 6 
In statistical hypothesis testing, the null hypothesis (H0) and alternative hypothesis (HA) are two competing statements about
a population parameter or the relationship between variables. These hypotheses are formulated based on the research question or
claim being investigated. Here are some examples to illustrate the concepts of null and alternative hypotheses:

Example: Testing the Effectiveness of a New Drug
Null hypothesis (H0): The new drug has no effect on reducing blood pressure.
Alternative hypothesis (HA): The new drug is effective in reducing blood pressure.
In this example, the null hypothesis suggests that the new drug has no effect, while the alternative hypothesis suggests that 
the new drug is effective in reducing blood pressure. The study would aim to gather evidence to support one of these hypotheses.

Example: Gender Differences in Exam Performance
Null hypothesis (H0): There is no difference in average exam scores between males and females.
Alternative hypothesis (HA): There is a difference in average exam scores between males and females.

In [None]:
Ans 7
Hypothesis testing involves a systematic set of steps to assess the evidence from data and make decisions about the validity of a hypothesis. The general steps in hypothesis testing are as follows:

1. Formulate the Null and Alternative Hypotheses: Clearly state the null hypothesis (H0) and the alternative hypothesis (HA) based on 
the research question or claim being investigated. The null hypothesis typically represents the absence of an effect or no difference, 
while the alternative hypothesis represents the presence of an effect or a difference.

2. Set the Significance Level: Determine the significance level (alpha) for the test, which represents the threshold for deciding whether
to reject the null hypothesis. Commonly used values for alpha are 0.05 (5%) and 0.01 (1%), representing a 95% and 99% confidence level, respectively.

3. Select the Test Statistic: Choose an appropriate test statistic based on the nature of the data and the hypothesis being tested.
The test statistic should be capable of measuring the observed difference or relationship between variables.

4. Determine the Sampling Distribution: Determine the sampling distribution for the chosen test statistic under the assumption that 
the null hypothesis is true. This distribution is used to calculate the probability of obtaining the observed test statistic or a
more extreme value if the null hypothesis were true.

5. Collect and Analyze the Data: Collect the relevant data and perform the necessary calculations or statistical analyses to obtain the test statistic.
Calculate any additional statistics required for the chosen test, such as degrees of freedom or standard error.

6. Calculate the p-value: Calculate the p-value, which is the probability of obtaining the observed test statistic or a more
extreme value under the assumption that the null hypothesis is true. The p-value represents the strength of the evidence against the null hypothesis.

7. Make a Decision: Compare the p-value with the significance level (alpha). If the p-value is less than or equal to alpha, reject 
the null hypothesis in favor of the alternative hypothesis. If the p-value is greater than alpha, fail to reject the null hypothesis, 
suggesting insufficient evidence to support the alternative hypothesis.

8. Draw Conclusions: Interpret the results of the hypothesis test in the context of the research question or claim being investigated. 
Provide conclusions and any relevant insights based on the evidence obtained from the data.

It is important to note that hypothesis testing is a statistical inference procedure and that the conclusions drawn from hypothesis testing
are probabilistic in nature. The results provide evidence for or against a hypothesis, but they do not provide definitive proof or establish causation.

In [None]:
Ans 8
In hypothesis testing, the p-value is a measure of the strength of evidence against the null hypothesis (H0).
It quantifies the probability of observing the observed test statistic or a more extreme value, assuming that the null hypothesis is true.
The p-value is used to make decisions about whether to reject or fail to reject the null hypothesis.

The significance of the p-value lies in its interpretation and its role in hypothesis testing:

Interpreting the p-value:

A small p-value (typically below the chosen significance level, alpha) indicates that the observed data is unlikely to occur by chance alone
if the null hypothesis is true. It suggests strong evidence against the null hypothesis and supports the alternative hypothesis.
A large p-value (typically above the significance level) suggests that the observed data is likely to occur by chance even if 
the null hypothesis is true. It indicates weak evidence against the null hypothesis and does not support the alternative hypothesis.
Decision-Making:

If the p-value is less than or equal to the significance level (p ≤ alpha), the result is considered statistically significant.
In this case, the null hypothesis is rejected in favor of the alternative hypothesis. It is concluded that there is sufficient evidence 
to support the claim or effect stated in the alternative hypothesis.

In [None]:
Ans 9
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t

# Set the degrees of freedom
df = 10

# Generate x-values for the plot
x = np.linspace(-4, 4, 1000)

# Calculate the corresponding y-values from the t-distribution
y = t.pdf(x, df)

# Plot the t-distribution
plt.plot(x, y, label='t-distribution (df=10)')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.title("Student's t-Distribution (df=10)")
plt.legend()
plt.grid(True)
plt.show()


In [None]:
Ans 10
import numpy as np
from scipy import stats

# Generate two random samples
sample1 = np.random.normal(5, 2, 100)  # Sample 1
sample2 = np.random.normal(6, 2, 100)  # Sample 2

# Set the significance level (alpha)
alpha = 0.05

# Perform the two-sample t-test
t_statistic, p_value = stats.ttest_ind(sample1, sample2)

# Print the results
print("Sample 1 mean:", np.mean(sample1))
print("Sample 2 mean:", np.mean(sample2))
print("t-statistic:", t_statistic)
print("p-value:", p_value)

# Compare the p-value with the significance level to make a decision
if p_value < alpha:
    print("Reject the null hypothesis. There is a significant difference in population means.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference in population means.")


In [None]:
Ans 11 
The Student's t-distribution, often referred to as the t-distribution, is a probability distribution that arises when estimating
the population mean of a normally distributed variable with a small sample size or when the population standard deviation is unknown. 
It is a fundamental distribution in statistics, widely used in hypothesis testing and confidence interval estimation.

The t-distribution is similar in shape to the standard normal distribution (Z-distribution), but it has heavier tails. The shape of the t-distribution
depends on the degrees of freedom (df), which is determined by the sample size. As the degrees of freedom increase, the t-distribution approaches
the shape of the standard normal distribution.

The t-distribution is used in situations where the underlying population follows a normal distribution, and one of the following conditions is met:

Small Sample Size: When the sample size is small (typically less than 30), the t-distribution is used to account for the additional uncertainty 
that arises due to the limited sample size. This is because the sample mean becomes less reliable as the sample size decreases, and the t-distribution
accommodates this by having wider tails.

Unknown Population Standard Deviation: When the population standard deviation is unknown, and it needs to be estimated from the sample, 
the t-distribution is used. In such cases, the sample standard deviation is used as an estimate, and the t-distribution accounts for 
the additional uncertainty in the estimation process.

In [None]:
Ans 12 
The t-statistic is a measure used in hypothesis testing to determine the significance of the difference between sample means or between a 
sample mean and a population mean when the population standard deviation is unknown or estimated from the sample data. It quantifies the 
difference between the observed sample mean and the hypothesized population mean in terms of the standard error of the sampling distribution.

The formula for the t-statistic depends on the specific scenario and hypothesis being tested. Here are the commonly used formulas for two scenarios:

One-sample t-test:

Hypothesis: Testing the difference between a sample mean (x̄) and a hypothesized population mean (μ).
Formula: t = (x̄ - μ) / (s / √n)
In this formula, x̄ represents the sample mean, μ represents the hypothesized population mean, s represents the sample standard deviation, and n represents the sample size.
Independent two-sample t-test:

Hypothesis: Testing the difference between two independent sample means (x̄1 and x̄2).
Formula: t = (x̄1 - x̄2) / √((s1^2 / n1) + (s2^2 / n2))
In this formula, x̄1 and x̄2 represent the sample means of the two samples, s1 and s2 represent the sample standard deviations, 
n1 and n2 represent the sample sizes of the respective samples.
In both cases, the t-statistic represents the difference between the observed means scaled by the standard error of the sampling distribution. 
It measures how many standard errors the observed mean is away from the hypothesized mean or the difference between two sample means.

The t-statistic is used to calculate the p-value, which is then compared to the chosen significance level (alpha) to make decisions 
about the hypotheses being tested. If the absolute value of the t-statistic is large, it indicates a significant difference between means, supporting the rejection of the null hypothesis in favor of the alternative hypothesis.








In [None]:
Ans 13 
Given:

Sample size (n) = 50
Sample mean (x̄) = $500
Sample standard deviation (s) = $50
The formula for calculating the confidence interval is:

Confidence Interval = Sample Mean ± Margin of Error

The margin of error depends on the desired confidence level and the standard error of the mean. Since the sample size is relatively large (n > 30),
we can use the z-score instead of the t-score to determine the critical value for a 95% confidence level.

The critical value for a 95% confidence level (two-tailed test) is approximately 1.96.

First, calculate the standard error of the mean (SE):

SE = s / √n

SE = $50 / √50 ≈ $7.07

Next, calculate the margin of error (ME):

ME = Critical Value * SE

ME = 1.96 * $7.07 ≈ $13.86

Finally, calculate the confidence interval:

Confidence Interval = Sample Mean ± Margin of Error

Confidence Interval = $500 ± $13.86

Therefore, the 95% confidence interval estimate for the population mean revenue is approximately $486.14 to $513.86.

In summary, we can estimate with 95% confidence that the average daily revenue for the coffee shop falls within the range of
approximately $486.14 to $513.86.








In [None]:
Ans 14 
To test the hypothesis that the new drug decreases blood pressure by 10 mmHg, we can perform a one-sample t-test. 
The null hypothesis (H0) assumes that the mean decrease in blood pressure is equal to 10 mmHg, while the alternative hypothesis (H1)
suggests that the mean decrease is different from 10 mmHg.

Given:
- Sample size (n) = 100
- Sample mean decrease in blood pressure (x̄) = 8 mmHg
- Sample standard deviation (s) = 3 mmHg
- Significance level (alpha) = 0.05

The formula for the t-statistic in a one-sample t-test is:

t = (x̄ - μ) / (s / √n)

where μ represents the hypothesized mean decrease in blood pressure.

Let's perform the calculations:

H0: μ = 10 (Assumed mean decrease of 10 mmHg)
H1: μ ≠ 10 (Mean decrease is different from 10 mmHg)

First, calculate the t-statistic:

t = (x̄ - μ) / (s / √n)
t = (8 - 10) / (3 / √100)
t = -2 / (3 / 10)
t = -20 / 3 ≈ -6.67

Next, determine the critical value(s) based on the significance level and degrees of freedom. Since we have a large sample size (n = 100),
we can approximate the critical value using the standard normal distribution.

For a two-tailed test with alpha = 0.05, the critical values are approximately ±1.96.

Finally, compare the absolute value of the t-statistic to the critical value(s) and make a decision:

|t| > critical value(s) → Reject H0
|t| ≤ critical value(s) → Fail to reject H0

Since |t| = |-6.67| = 6.67 > 1.96, we reject the null hypothesis.

Therefore, based on the given data and a significance level of 0.05, there is sufficient evidence to conclude that the new drug has
a mean decrease in blood pressure different from 10 mmHg.

In [None]:
Ans 15 
To test the hypothesis that the true mean weight of the products is less than 5 pounds, we can perform a one-sample t-test. The null hypothesis (H0) assumes that the mean weight is equal to 5 pounds, while the alternative hypothesis (H1) suggests that the mean weight is less than 5 pounds.

Given:
- Population mean weight (μ) = 5 pounds
- Population standard deviation (σ) = 0.5 pounds
- Sample size (n) = 25
- Sample mean weight (x̄) = 4.8 pounds
- Significance level (alpha) = 0.01

The formula for the t-statistic in a one-sample t-test is:

t = (x̄ - μ) / (s / √n)

where x̄ represents the sample mean, μ represents the hypothesized population mean, s represents the sample standard deviation, 
and n represents the sample size.

Let's perform the calculations:

H0: μ = 5 (True mean weight is 5 pounds)
H1: μ < 5 (True mean weight is less than 5 pounds)

First, calculate the t-statistic:

t = (x̄ - μ) / (s / √n)
t = (4.8 - 5) / (0.5 / √25)
t = -0.2 / (0.5 / 5)
t = -0.2 / 0.1
t = -2

Next, determine the critical value based on the significance level and degrees of freedom. Since we have a sample size of 25, we have 25 - 1 = 24 
degrees of freedom. For a one-tailed test with alpha = 0.01 and 24 degrees of freedom, the critical value is approximately -2.492.

Finally, compare the t-statistic to the critical value and make a decision:

t < critical value → Reject H0

Since -2 < -2.492, we reject the null hypothesis.

Therefore, based on the given data and a significance level of 0.01, there is sufficient evidence to conclude that the true
mean weight of the products is less than 5 pounds.

In [None]:
Ans 16
To test the hypothesis that the population means for the two groups are equal, we can perform an independent two-sample t-test. The null hypothesis (H0) assumes that the population means are equal, while the alternative hypothesis (H1) suggests that the population means are different.

Given:
- Group 1: Sample size (n1) = 30, Sample mean (x̄1) = 80, Sample standard deviation (s1) = 10
- Group 2: Sample size (n2) = 40, Sample mean (x̄2) = 75, Sample standard deviation (s2) = 8
- Significance level (alpha) = 0.01

The formula for the t-statistic in an independent two-sample t-test is:

t = (x̄1 - x̄2) / √((s1^2 / n1) + (s2^2 / n2))

where x̄1 and x̄2 represent the sample means of the two groups, s1 and s2 represent the sample standard deviations,
n1 and n2 represent the sample sizes of the respective groups.

Let's perform the calculations:

H0: μ1 = μ2 (Population means of the two groups are equal)
H1: μ1 ≠ μ2 (Population means of the two groups are different)

First, calculate the t-statistic:

t = (x̄1 - x̄2) / √((s1^2 / n1) + (s2^2 / n2))
t = (80 - 75) / √((10^2 / 30) + (8^2 / 40))
t = 5 / √((100/30) + (64/40))
t = 5 / √(10/3 + 8/5)
t ≈ 5 / √(50/15 + 32/15)
t ≈ 5 / √(82/15)
t ≈ 5 / √(5.4667)
t ≈ 5 / 2.334

Next, determine the critical value based on the significance level and degrees of freedom. Since we have two independent samples,
the degrees of freedom can be calculated using the formula: df = (n1 + n2) - 2. For a two-tailed test with alpha = 0.01 and 
degrees of freedom = (30 + 40) - 2 = 68, the critical value is approximately ±2.616.

Finally, compare the absolute value of the t-statistic to the critical value and make a decision:

|t| > critical value → Reject H0
|t| ≤ critical value → Fail to reject H0

Since |t| = |5 / 2.334| ≈ 2.14 ≤ 2.616, we fail to reject the null hypothesis.

Therefore, based on the given data and a significance level of 0.01, there is insufficient evidence to conclude that 
the population means for the two groups are different.

In [None]:
Ans 17
To estimate the population mean with a 99% confidence interval, we can use the sample mean and standard deviation along with the appropriate formula.

Given:
- Sample size (n) = 50
- Sample mean (x̄) = 4
- Sample standard deviation (s) = 1.5
- Confidence level = 99%

The formula for calculating the confidence interval is:

Confidence Interval = Sample Mean ± Margin of Error

The margin of error depends on the desired confidence level and the standard error of the mean.

Since the sample size is relatively large (n > 30), we can use the z-score instead of the t-score to determine the
critical value for a 99% confidence level.

The critical value for a 99% confidence level (two-tailed test) is approximately 2.576.

First, calculate the standard error of the mean (SE):

SE = s / √n
SE = 1.5 / √50 ≈ 0.2121

Next, calculate the margin of error (ME):

ME = Critical Value * SE
ME = 2.576 * 0.2121 ≈ 0.5464

Finally, calculate the confidence interval:

Confidence Interval = Sample Mean ± Margin of Error
Confidence Interval = 4 ± 0.5464

Therefore, the 99% confidence interval estimate for the population mean is approximately 3.4536 to 4.5464.

In summary, we can estimate with 99% confidence that the average number of ads watched by viewers during the TV program falls within
the range of approximately 3.4536 to 4.5464.