Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would
use each type of test.

Difference between t-test and z-test:

Type of Data:

Z-Test: Typically used when the population standard deviation is known.
T-Test: Used when the population standard deviation is unknown and estimated from the sample.
Sample Size:

Z-Test: Suitable for large sample sizes (typically n > 30).
T-Test: Appropriate for small to moderate sample sizes.
Distribution:

Z-Test: Assumes a normal distribution of the population or a large sample size, allowing for the use of the standard normal distribution.
T-Test: Assumes a normal distribution but is less sensitive to deviations from normality, especially with larger sample sizes.
Critical Values:

Z-Test: Critical values are obtained from the standard normal distribution.
T-Test: Critical values depend on the degrees of freedom and are obtained from the t-distribution.
Example Scenario:

Z-Test: Imagine you are testing whether the average height of students in a large university is different from a known national average height. If the national standard deviation of height is known, you might use a z-test.
T-Test: Now, suppose you want to test whether a new teaching method significantly improves student performance. You take a small sample of students, and the population standard deviation is unknown. In this case, you would use a t-test.
Example Scenario:

Let's consider a scenario where we are comparing the average test scores of two groups:

If the population standard deviation of test scores is known, you might use a z-test.
If the population standard deviation is unknown and you have small sample sizes, a t-test would be more appropriate.

In [1]:
import numpy as np
from scipy.stats import ttest_ind, zscore

# Generate two groups of test scores
np.random.seed(42)
group1_scores = np.random.normal(loc=75, scale=10, size=30)
group2_scores = np.random.normal(loc=80, scale=12, size=25)

# Perform z-test (assuming population standard deviation is known)
z_stat, p_value_z = zscore(group1_scores), zscore(group2_scores)
z_test_result = np.abs(np.mean(z_stat)) > 1.96  # Using a 95% confidence level

# Perform t-test (population standard deviation is unknown)
t_stat, p_value_t = ttest_ind(group1_scores, group2_scores)
t_test_result = p_value_t < 0.05  # Using a 5% significance level

print(f"Z-Test Result: {'Reject' if z_test_result else 'Fail to reject'} the null hypothesis")
print(f"T-Test Result: {'Reject' if t_test_result else 'Fail to reject'} the null hypothesis")


Z-Test Result: Fail to reject the null hypothesis
T-Test Result: Fail to reject the null hypothesis


Q2: Differentiate between one-tailed and two-tailed tests.

**One-Tailed Test vs. Two-Tailed Test:**

**1. Definition:**
   - **One-Tailed Test:** In a one-tailed test, the critical region is on only one side of the distribution, either the right side or the left side.
   - **Two-Tailed Test:** In a two-tailed test, the critical region is on both sides of the distribution, allowing for the detection of differences in either direction.

**2. Hypotheses:**
   - **One-Tailed Test:** The null hypothesis (\(H_0\)) and alternative hypothesis (\(H_a\)) are defined in terms of a specific direction of effect (e.g., greater than or less than).
   - **Two-Tailed Test:** The null hypothesis assumes no effect, and the alternative hypothesis states that there is a significant effect, without specifying the direction.

**3. Critical Region:**
   - **One-Tailed Test:** The critical region is located in one tail of the distribution, either the right tail (greater than) or the left tail (less than).
   - **Two-Tailed Test:** The critical region is split between the two tails of the distribution.

**4. Significance Level:**
   - **One-Tailed Test:** The significance level (\(\alpha\)) is typically divided by 2 to account for one tail. Common values include 0.05 or 0.01.
   - **Two-Tailed Test:** The full significance level is used since the critical region is split between two tails. Common values include 0.05 or 0.01.

**5. Example:**
   - **One-Tailed Test:** Suppose you are testing whether a new drug increases the average response time. Your null hypothesis (\(H_0\)) might be "The drug has no effect or decreases response time," and the alternative hypothesis (\(H_a\)) could be "The drug increases response time."
   - **Two-Tailed Test:** Suppose you are testing whether a new teaching method has any effect on student performance. Your null hypothesis might be "The teaching method has no effect," and the alternative hypothesis could be "The teaching method has a significant effect, whether positive or negative."

**6. Decision Rule:**
   - **One-Tailed Test:** You reject the null hypothesis if the test statistic falls into the critical region in the specified direction.
   - **Two-Tailed Test:** You reject the null hypothesis if the test statistic falls into either of the two critical regions.

**7. Sensitivity:**
   - **One-Tailed Test:** More sensitive to detecting effects in a specific direction.
   - **Two-Tailed Test:** More flexible, capable of detecting effects in either direction.

**8. Directionality:**
   - **One-Tailed Test:** Assumes and tests for an effect in a specific direction.
   - **Two-Tailed Test:** Tests for a significant effect without assuming a specific direction.

In summary, the choice between a one-tailed and two-tailed test depends on the research question and the expected direction of the effect. A one-tailed test is used when the research hypothesis specifies a direction, while a two-tailed test is more conservative and detects effects in either direction.

Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for
each type of error.

**Type I Error and Type II Error in Hypothesis Testing:**

**1. Type I Error (False Positive):**
   - **Definition:** Type I error occurs when the null hypothesis (\(H_0\)) is incorrectly rejected when it is actually true. In other words, it's a false positive or an incorrect claim of a significant effect.
   - **Symbol:** \(\alpha\) (alpha)
   - **Probability:** The probability of committing a Type I error is denoted by \(\alpha\), the significance level.
   - **Example Scenario:** Suppose a medical test is designed to detect a rare disease, and the null hypothesis is "The patient does not have the disease." If a healthy person receives a positive result (rejecting the null hypothesis), it would be a Type I error.

**2. Type II Error (False Negative):**
   - **Definition:** Type II error occurs when the null hypothesis (\(H_0\)) is incorrectly not rejected when it is actually false. In other words, it's a false negative or a failure to detect a true effect.
   - **Symbol:** \(\beta\) (beta)
   - **Probability:** The probability of committing a Type II error is denoted by \(\beta\).
   - **Example Scenario:** Consider a medical test for a disease, and the null hypothesis is "The patient has the disease." If a person with the disease receives a negative result (not rejecting the null hypothesis), it would be a Type II error.

**Example Scenarios:**

1. **Type I Error (False Positive):**
   - **Scenario:** A company claims that a new drug increases productivity in the workplace. They conduct a study, and the null hypothesis (\(H_0\)) is "The new drug has no effect on productivity."
   - **Type I Error:** If, based on the study, they incorrectly reject the null hypothesis and claim that the drug significantly increases productivity (when it actually doesn't), it would be a Type I error.

2. **Type II Error (False Negative):**
   - **Scenario:** A quality control process is implemented to detect defective products in a manufacturing line. The null hypothesis (\(H_0\)) is "The product is not defective."
   - **Type II Error:** If, based on the quality control process, they fail to reject the null hypothesis and conclude that the product is not defective (when it actually is defective), it would be a Type II error.

**Trade-Off:**
- Adjusting the significance level (\(\alpha\)) can influence the rates of Type I and Type II errors. Lowering \(\alpha\) decreases the probability of Type I error but increases the probability of Type II error, and vice versa.
- There is often a trade-off between Type I and Type II errors, and researchers need to choose an appropriate balance based on the context and consequences of errors in a specific study.

Q4: Explain Bayes's theorem with an example.


Bayes's Theorem:

Bayes's Theorem is a mathematical formula that describes the probability of an event based on prior knowledge of conditions that might be related to the event. It is named after Thomas Bayes, an 18th-century statistician and theologian. The theorem is particularly useful in updating probabilities as new evidence becomes available.

The formula for Bayes's Theorem is given by:

�
(
�
∣
�
)
=
�
(
�
∣
�
)
⋅
�
(
�
)
�
(
�
)
P(A∣B)= 
P(B)
P(B∣A)⋅P(A)
​
 

where:

�
(
�
∣
�
)
P(A∣B) is the probability of event A occurring given that event B has occurred (posterior probability).
�
(
�
∣
�
)
P(B∣A) is the probability of event B occurring given that event A has occurred (likelihood).
�
(
�
)
P(A) is the prior probability of event A.
�
(
�
)
P(B) is the prior probability of event B.
Example Scenario: Medical Diagnosis

Let's consider a medical diagnosis scenario where:

Event A: The patient has a certain medical condition (e.g., a rare disease).
Event B: The patient tests positive for the condition.
Given Information:

The prevalence of the medical condition in the general population is low (
�
(
�
)
P(A)).
The sensitivity of the medical test (the probability of testing positive given that the patient has the condition) is known (
�
(
�
∣
�
)
P(B∣A)).
The false positive rate of the test (the probability of testing positive given that the patient does not have the condition) is also known (
�
(
�
∣
¬
�
)
P(B∣¬A)).
Using Bayes's Theorem:

�
(
�
∣
�
)
=
�
(
�
∣
�
)
⋅
�
(
�
)
�
(
�
)
P(A∣B)= 
P(B)
P(B∣A)⋅P(A)
​
 

�
(
�
∣
�
)
P(A∣B): Probability that the patient has the condition given a positive test result.
�
(
�
∣
�
)
P(B∣A): Sensitivity of the test (probability of a positive test given the condition).
�
(
�
)
P(A): Prior probability of the patient having the condition.
�
(
�
)
P(B): Total probability of testing positive (considering both true positives and false positives).
Calculation Steps:

Calculate the prior probability of the patient having the condition (
�
(
�
)
P(A)).
Calculate the total probability of testing positive (
�
(
�
)
P(B)).
Apply Bayes's Theorem to calculate the posterior probability of having the condition given a positive test result (
�
(
�
∣
�
)
P(A∣B)).

In [2]:
# Given information
prevalence_condition = 0.01  # Example: 1% prevalence in the general population
sensitivity_test = 0.95  # Example: 95% sensitivity
false_positive_rate = 0.05  # Example: 5% false positive rate

# Calculation
prior_probability_condition = prevalence_condition
prior_probability_no_condition = 1 - prior_probability_condition

# Total probability of testing positive
total_probability_positive = (sensitivity_test * prior_probability_condition) + (false_positive_rate * prior_probability_no_condition)

# Bayes's Theorem
posterior_probability_condition_given_positive = (sensitivity_test * prior_probability_condition) / total_probability_positive

print(f"Posterior probability of having the condition given a positive test: {posterior_probability_condition_given_positive:.4f}")


Posterior probability of having the condition given a positive test: 0.1610


Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.

Confidence Interval:

A confidence interval is a statistical range that provides an estimated range of values which is likely to include the true value of an unknown parameter, with a certain level of confidence. It is a way to quantify the uncertainty or precision associated with a sample estimate.

Calculating a Confidence Interval:

The formula for a confidence interval for a population mean (
�
μ) is given by:

Confidence Interval
=
�
ˉ
±
�
(
�
�
)
Confidence Interval= 
x
ˉ
 ±Z( 
n
​
 
s
​
 )

where:

�
ˉ
x
ˉ
  is the sample mean,
�
s is the sample standard deviation,
�
n is the sample size,
�
Z is the Z-score corresponding to the desired confidence level.
Example:

Suppose you want to estimate the average height of a population using a sample of 50 individuals. You collect the sample and find that the sample mean height (
�
ˉ
x
ˉ
 ) is 65 inches, and the sample standard deviation (
�
s) is 3 inches.

Assuming a normal distribution and a 95% confidence level, you need to find the Z-score corresponding to a 95% confidence level, which is approximately 1.96.

Confidence Interval
=
65
±
1.96
(
3
50
)
Confidence Interval=65±1.96( 
50
​
 
3
​
 )

Now, calculate the lower and upper bounds of the confidence interval:

Lower Bound
=
65
−
1.96
(
3
50
)
Lower Bound=65−1.96( 
50
​
 
3
​
 )
Upper Bound
=
65
+
1.96
(
3
50
)
Upper Bound=65+1.96( 
50
​
 
3
​
 )

In [3]:
import numpy as np
from scipy.stats import norm

# Given values
sample_mean = 65
sample_std = 3
sample_size = 50
confidence_level = 0.95

# Calculate the Z-score for a 95% confidence level
z_score = norm.ppf((1 + confidence_level) / 2)

# Calculate the margin of error (ME)
margin_of_error = z_score * (sample_std / np.sqrt(sample_size))

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"95% Confidence Interval for the population mean height: {lower_bound:.2f} to {upper_bound:.2f}")


95% Confidence Interval for the population mean height: 64.17 to 65.83


Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the
event's probability and new evidence. Provide a sample problem and solution.


Certainly! Let's consider a classic example involving a medical diagnostic test.

Example: Medical Diagnostic Test

Suppose a medical diagnostic test is designed to detect a certain disease, and you want to calculate the probability that a person actually has the disease given the positive test result.

Given information:

The prevalence of the disease in the population is 1% (
�
(
Disease
)
=
0.01
P(Disease)=0.01).
The sensitivity of the test is 90% (
�
(
Positive Test | Disease
)
=
0.90
P(Positive Test | Disease)=0.90).
The false positive rate of the test is 5% (
�
(
Positive Test | No Disease
)
=
0.05
P(Positive Test | No Disease)=0.05).
We want to find 
�
(
Disease | Positive Test
)
P(Disease | Positive Test), the probability that a person has the disease given a positive test result.

Applying Bayes' Theorem:

�
(
Disease | Positive Test
)
=
�
(
Positive Test | Disease
)
⋅
�
(
Disease
)
�
(
Positive Test
)
P(Disease | Positive Test)= 
P(Positive Test)
P(Positive Test | Disease)⋅P(Disease)
​
 

�
(
Positive Test
)
=
�
(
Positive Test | Disease
)
⋅
�
(
Disease
)
+
�
(
Positive Test | No Disease
)
⋅
�
(
No Disease
)
P(Positive Test)=P(Positive Test | Disease)⋅P(Disease)+P(Positive Test | No Disease)⋅P(No Disease)

�
(
No Disease
)
=
1
−
�
(
Disease
)
P(No Disease)=1−P(Disease)

In [4]:
# Given information
prevalence_disease = 0.01
sensitivity_test = 0.90
false_positive_rate = 0.05

# Calculate complement probabilities
prevalence_no_disease = 1 - prevalence_disease

# Calculate denominator of Bayes' Theorem
prob_positive_test = (sensitivity_test * prevalence_disease) + (false_positive_rate * prevalence_no_disease)

# Apply Bayes' Theorem
prob_disease_given_positive = (sensitivity_test * prevalence_disease) / prob_positive_test

print(f"Probability of having the disease given a positive test result: {prob_disease_given_positive:.4f}")


Probability of having the disease given a positive test result: 0.1538


Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5. Interpret the results.

To calculate the 95% confidence interval for a sample with a mean of 50 and a standard deviation of 5, we will use the formula for the confidence interval:

Confidence Interval
=
�
ˉ
±
�
(
�
�
)
Confidence Interval= 
x
ˉ
 ±Z( 
n
​
 
s
​
 )

where:

�
ˉ
x
ˉ
  is the sample mean,
�
s is the sample standard deviation,
�
n is the sample size,
�
Z is the Z-score corresponding to the desired confidence level.
For a 95% confidence interval, 
�
≈
1.96
Z≈1.96 (assuming a normal distribution).

Calculation:

Confidence Interval
=
50
±
1.96
(
5
�
)
Confidence Interval=50±1.96( 
n
​
 
5
​
 )

Assuming a reasonably large sample size (
�
n), we can use 
1.96
1.96 as the Z-score.

Interpretation:

The 95% confidence interval provides a range of values within which we can reasonably expect the true population mean to fall with 95% confidence. In this context:

The lower bound of the interval represents the lower limit of where we expect the true population mean to be.
The upper bound of the interval represents the upper limit of where we expect the true population mean to be.

In [5]:
import scipy.stats as stats

# Given values
sample_mean = 50
sample_std = 5
sample_size = 30  # Adjust sample size as needed

# Calculate the Z-score for a 95% confidence level
z_score = stats.norm.ppf(0.975)  # 0.975 for a two-tailed test

# Calculate the margin of error (ME)
margin_of_error = z_score * (sample_std / (sample_size ** 0.5))

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"95% Confidence Interval: {lower_bound:.2f} to {upper_bound:.2f}")


95% Confidence Interval: 48.21 to 51.79


Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error?
Provide an example of a scenario where a larger sample size would result in a smaller margin of error.

Margin of Error in a Confidence Interval:

The margin of error (ME) in a confidence interval is the range of values above and below the sample estimate that is likely to contain the true population parameter with a certain level of confidence. It quantifies the precision or uncertainty associated with the sample estimate.

The formula for the margin of error in a confidence interval for a population mean is given by:

Margin of Error (ME)
=
�
(
�
�
)
Margin of Error (ME)=Z( 
n
​
 
s
​
 )

where:

�
Z is the Z-score corresponding to the desired confidence level,
�
s is the sample standard deviation,
�
n is the sample size.
Effect of Sample Size on Margin of Error:

As Sample Size Increases:

The margin of error decreases.
Larger sample sizes result in more precise estimates of the population parameter.
Smaller variability within the sample leads to a smaller standard error, reducing the spread of the confidence interval.
As Sample Size Decreases:

The margin of error increases.
Smaller sample sizes result in less precise estimates and wider confidence intervals.
Larger variability within the sample leads to a larger standard error, increasing the spread of the confidence interval.

In [6]:
import scipy.stats as stats

# Given values
sample_mean = 50
sample_std = 5
confidence_level = 0.95

# Sample sizes
sample_size_1 = 100
sample_size_2 = 500

# Calculate Z-scores for the confidence level
z_score = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate margins of error for both sample sizes
margin_of_error_1 = z_score * (sample_std / (sample_size_1 ** 0.5))
margin_of_error_2 = z_score * (sample_std / (sample_size_2 ** 0.5))

print(f"Margin of Error for Sample Size {sample_size_1}: {margin_of_error_1:.2f}")
print(f"Margin of Error for Sample Size {sample_size_2}: {margin_of_error_2:.2f}")


Margin of Error for Sample Size 100: 0.98
Margin of Error for Sample Size 500: 0.44


Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population
standard deviation of 5. Interpret the results.

The Z-score, also known as the standard score, measures how many standard deviations a data point is from the mean of a distribution. The formula for calculating the Z-score is given by:

\[ Z = \frac{X - \mu}{\sigma} \]

where:
- \( X \) is the data point's value,
- \( \mu \) is the population mean,
- \( \sigma \) is the population standard deviation.

**Calculation:**

For the given values:
- Data point value (\(X\)) = 75
- Population mean (\(\mu\)) = 70
- Population standard deviation (\(\sigma\)) = 5

\[ Z = \frac{75 - 70}{5} \]

\[ Z = 1 \]

**Interpretation:**

A Z-score of 1 indicates that the data point (value 75) is 1 standard deviation above the mean in the distribution. The sign of the Z-score (+1 in this case) indicates the direction from the mean. Positive values indicate a data point above the mean, while negative values indicate a data point below the mean.

In practical terms, a Z-score helps assess how unusual or typical a data point is within a distribution. A Z-score of 1 suggests that the data point is moderately higher than the average in the context of the population's distribution.

Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average
of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is
significantly effective at a 95% confidence level using a t-test.

To conduct a hypothesis test using a t-test, we need to specify the null hypothesis (
�
0
H 
0
​
 ) and the alternative hypothesis (
�
�
H 
a
​
 ):

Null Hypothesis (
�
0
H 
0
​
 ): The new weight loss drug is not significantly effective, meaning the average weight loss is zero or negligible.
Alternative Hypothesis (
�
�
H 
a
​
 ): The new weight loss drug is significantly effective, meaning the average weight loss is different from zero.
We'll use a two-sample t-test because we have a sample mean, sample standard deviation, and a relatively small sample size. The formula for the t-statistic is given by:

�
=
�
ˉ
−
�
0
�
�
t= 
n
​
 
s
​
 
x
ˉ
 −μ 
0
​
 
​
 

where:

�
ˉ
x
ˉ
  is the sample mean,
�
0
μ 
0
​
  is the hypothesized population mean under the null hypothesis,
�
s is the sample standard deviation,
�
n is the sample size.
For a two-tailed test with a 95% confidence level, we'll use a significance level (
�
α) of 0.05. The critical t-value for a 95% confidence level with 49 degrees of freedom (50 participants - 1) is approximately 
±
2.009
±2.009.

In [7]:
import scipy.stats as stats

# Given values
sample_mean = 6
sample_std = 2.5
sample_size = 50
population_mean_null = 0  # Null hypothesis: no weight loss

# Calculate the t-statistic
t_statistic = (sample_mean - population_mean_null) / (sample_std / (sample_size ** 0.5))

# Degrees of freedom
degrees_of_freedom = sample_size - 1

# Critical t-value for a two-tailed test at a 95% confidence level
critical_t_value = stats.t.ppf(0.975, df=degrees_of_freedom)

# Calculate p-value
p_value = 2 * (1 - stats.t.cdf(abs(t_statistic), df=degrees_of_freedom))

# Hypothesis testing
alpha = 0.05
result = "reject" if p_value < alpha else "fail to reject"

print(f"t-statistic: {t_statistic:.4f}")
print(f"Critical t-value: {critical_t_value:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Result: We {result} the null hypothesis at a {alpha*100}% confidence level.")


t-statistic: 16.9706
Critical t-value: 2.0096
P-value: 0.0000
Result: We reject the null hypothesis at a 5.0% confidence level.


Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95%
confidence interval for the true proportion of people who are satisfied with their job.

In [8]:
import math

# Given values
sample_proportion = 0.65
sample_size = 500
confidence_level = 0.95

# Calculate the Z-score for a 95% confidence level
z_score = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate the margin of error (ME)
margin_of_error = z_score * math.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)

# Calculate the confidence interval
lower_bound = sample_proportion - margin_of_error
upper_bound = sample_proportion + margin_of_error

print(f"95% Confidence Interval: {lower_bound:.4f} to {upper_bound:.4f}")


95% Confidence Interval: 0.6082 to 0.6918


Q12. A researcher is testing the effectiveness of two different teaching methods on student performance.
Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82
with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a
significant difference in student performance using a t-test with a significance level of 0.01.

To conduct a hypothesis test for the difference in means between two independent samples, we can use a two-sample t-test. The null hypothesis (
�
0
H 
0
​
 ) and the alternative hypothesis (
�
�
H 
a
​
 ) are stated as follows:

Null Hypothesis (
�
0
H 
0
​
 ): There is no significant difference in student performance between the two teaching methods (
�
�
−
�
�
=
0
μ 
A
​
 −μ 
B
​
 =0).
Alternative Hypothesis (
�
�
H 
a
​
 ): There is a significant difference in student performance between the two teaching methods (
�
�
−
�
�
≠
0
μ 
A
​
 −μ 
B
​
 

=0).
The formula for the t-statistic for independent samples is given by:

�
=
�
ˉ
�
−
�
ˉ
�
�
�
2
�
�
+
�
�
2
�
�
t= 
n 
A
​
 
s 
A
2
​
 
​
 + 
n 
B
​
 
s 
B
2
​
 
​
 
​
 
x
ˉ
  
A
​
 − 
x
ˉ
  
B
​
 
​
 

where:

�
ˉ
�
x
ˉ
  
A
​
  and 
�
ˉ
�
x
ˉ
  
B
​
  are the sample means,
�
�
s 
A
​
  and 
�
�
s 
B
​
  are the sample standard deviations,
�
�
n 
A
​
  and 
�
�
n 
B
​
  are the sample sizes.
For a two-tailed test with a significance level (
�
α) of 0.01, the critical t-value is obtained from the t-distribution.

In [9]:
# Given values for Sample A
mean_A = 85
std_dev_A = 6
sample_size_A = 30  # Adjust sample size as needed

# Given values for Sample B
mean_B = 82
std_dev_B = 5
sample_size_B = 30  # Adjust sample size as needed

# Calculate the t-statistic
t_statistic = (mean_A - mean_B) / math.sqrt((std_dev_A**2 / sample_size_A) + (std_dev_B**2 / sample_size_B))

# Degrees of freedom for a two-sample t-test
degrees_of_freedom = sample_size_A + sample_size_B - 2

# Critical t-value for a two-tailed test at a 0.01 significance level
critical_t_value = stats.t.ppf(1 - 0.005, df=degrees_of_freedom)

# Calculate p-value
p_value = 2 * (1 - stats.t.cdf(abs(t_statistic), df=degrees_of_freedom))

# Hypothesis testing
alpha = 0.01
result = "reject" if p_value < alpha else "fail to reject"

print(f"t-statistic: {t_statistic:.4f}")
print(f"Critical t-value: {critical_t_value:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Result: We {result} the null hypothesis at a {alpha*100}% significance level.")


t-statistic: 2.1039
Critical t-value: 2.6633
P-value: 0.0397
Result: We fail to reject the null hypothesis at a 1.0% significance level.


Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean
of 65. Calculate the 90% confidence interval for the true population mean.

To calculate the 90% confidence interval for the true population mean, we will use the formula for the confidence interval for a population mean:

Confidence Interval
=
�
ˉ
±
�
(
�
�
)
Confidence Interval= 
x
ˉ
 ±Z( 
n
​
 
s
​
 )

where:

�
ˉ
x
ˉ
  is the sample mean,
�
Z is the Z-score corresponding to the desired confidence level,
�
s is the sample standard deviation,
�
n is the sample size.
For a 90% confidence interval, 
�
≈
1.645
Z≈1.645 (assuming a normal distribution).

In [10]:
# Given values
sample_mean = 65
sample_size = 50
population_mean = 60
population_std_dev = 8

# Calculate the Z-score for a 90% confidence level
z_score = stats.norm.ppf((1 + 0.90) / 2)

# Calculate the margin of error (ME)
margin_of_error = z_score * (population_std_dev / math.sqrt(sample_size))

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print(f"90% Confidence Interval: {lower_bound:.4f} to {upper_bound:.4f}")


90% Confidence Interval: 63.1391 to 66.8609


Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average
reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to
determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [11]:
# Given values
sample_mean = 0.25
sample_std = 0.05
sample_size = 30
population_mean_null = 0.25  # Null hypothesis: no effect of caffeine

# Calculate the t-statistic
t_statistic = (sample_mean - population_mean_null) / (sample_std / math.sqrt(sample_size))

# Degrees of freedom for a one-sample t-test
degrees_of_freedom = sample_size - 1

# Critical t-values for a two-tailed test at a 90% confidence level
critical_t_value_lower = stats.t.ppf(0.05 / 2, df=degrees_of_freedom)
critical_t_value_upper = stats.t.ppf(1 - 0.05 / 2, df=degrees_of_freedom)

# Calculate p-value
p_value = 2 * (1 - stats.t.cdf(abs(t_statistic), df=degrees_of_freedom))

# Hypothesis testing
alpha = 0.10
result = "reject" if p_value < alpha else "fail to reject"

print(f"t-statistic: {t_statistic:.4f}")
print(f"Critical t-values: {critical_t_value_lower:.4f} and {critical_t_value_upper:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Result: We {result} the null hypothesis at a {alpha*100}% confidence level.")


t-statistic: 0.0000
Critical t-values: -2.0452 and 2.0452
P-value: 1.0000
Result: We fail to reject the null hypothesis at a 10.0% confidence level.
