##  Statistical Tests

One-sample and two-sample tests are types of statistical tests
### 1. One-Sample Test:
1. A one-sample test is used when you have a single sample and want to compare it to a known value or a hypothesized population parameter.
2. Alternative hypothesis (Ha): μ > μ₀ (greater than), or μ < μ₀ (less than), or μ ≠ μ₀ (not equal to)
3. Null hypothesis (H₀): μ = μ₀ (equal to)

### 2. Two-Sample Test:
1. In a two-sample test, you are comparing two independent samples to each other. 
2. Alternative hypothesis (Ha): μ₁ > μ₂ (greater than), or μ₁ < μ₂ (less than), or μ₁ ≠ μ₂ (not equal to)
3. Null hypothesis (H₀): μ₁ = μ₂ (equal to)

# Z test

1. Assume that population parameter follow normal distribution
2. population standard deviation (σ) is known. 
3. sample sizes are small (typically, less than 30). However, for larger sample sizes, the central limit theorem often allows for approximate normality, even if the population is not perfectly normal.
4. Equal Population Standard Deviations (for independent samples z-test) i.e. (σ₁ = σ₂)

### Example1:
Write a Python function to estimate the population mean using a sample mean and standard deviation.

In [13]:
from scipy.stats import norm

    #define a function
def estimate_population_mean(sample_mean, sample_std, sample_size):
    
    standard_error = sample_std / (sample_size ** 0.5) # Calculate the standard error 
    margin_of_error =  1.96  * standard_error # Calculate the margin of error (use 95% confidence interval)
    lower_bound = sample_mean - margin_of_error   # Calculate the lower and upper bounds of the confidence interval
    upper_bound = sample_mean + margin_of_error   
    estimated_population_mean = (lower_bound + upper_bound) / 2# Calculate the estimated population mean
    
    return estimated_population_mean , (lower_bound, upper_bound) # Return the estimated population mean and the confidence interval

In [14]:
#use above function to get values
sample_mean = 75.2
sample_std = 8.6
sample_size = 100

estimated_mean, confidence_interval = estimate_population_mean(sample_mean, sample_std, sample_size)

print(f"Estimated Population Mean:{estimated_mean:.3f}")
print(f"Estimated Population: ({confidence_interval[0]:.3f}, {confidence_interval[1]:.3f})")

Estimated Population Mean:75.200
Estimated Population: (73.514, 76.886)


### Example 2 - one tailed right sided test

Suppose we have a sample of 50 students, and we want to test whether their average height, based on the sample mean of 170 cm, is greater than the population average height of 165 cm. We know the population standard deviation is 10 cm. We will use a significance level (alpha) of 0.05.

In [22]:
import numpy as np
from scipy.stats import norm

#sample statistics
n = 50
popl_mean=165
sample_mean= 170
std=10

alpha=0.05

# state hypothesis
# H0: mu <= 165
# H1: mu > 165

# calculate z statistics
z_statistic = (sample_mean-popl_mean)/(std/np.sqrt(n))

# calculate critical value 
#Since it is a one-tailed right-sided test, we use the 1- z_alpha critical value.
critical_value = 1- norm.isf(alpha)

# calculate p value
p_value = 1- norm.cdf(z_statistic)

# Print the results
print("Test Statistic (z-score):", z_statistic)
print("Critical Value:", critical_value)
print("P-value:", p_value)

# Make a conclusion
if abs(z_statistic) > critical_value:
    print("z test is greater than critical value,  Reject the null hypothesis")
else:
    print("z test is less than critical value, Accept the null hypothesis")
    
 # Compare p-value with the significance level
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print( "fail to reject null hypothesis")

Test Statistic (z-score): 3.5355339059327378
Critical Value: -0.6448536269514729
P-value: 0.00020347600872250293
z test is greater than critical value,  Reject the null hypothesis
Reject the null hypothesis.


### Example 3:  one tailed left sided test

Suppose we want to test whether a new teaching method improves students' test scores. We have a sample of 40 students who were taught using the new method, and we want to determine if their average test score is significantly lower than the population average test score of 75. We know that the population standard deviation is 5.

In [23]:
import numpy as np
from scipy.stats import norm

#sample statistics
n = 40
popl_mean=75
sample_mean= 72
std=5
alpha = 0.05

# state hypothesis
# H0: mu >= 75
# H1: mu < 75

# calculate z statistics
z_statistic = (sample_mean-popl_mean)/(std/np.sqrt(n))

# calculate critical value 
#Since it is a one-tailed left-sided test, we use the z_alpha critical value.
critical_value = norm.isf(alpha)

# calculate p value
p_value =  norm.cdf(z_statistic)

# Print the results
print("Test Statistic (z-score):", z_statistic)
print("Critical Value:", critical_value)
print("P-value:", p_value)

# Make a conclusion
if abs(z_statistic) > critical_value:
    print("z test is greater than critical value,  Reject the null hypothesis")
else:
    print("z test is less than critical value, Accept the null hypothesis")
    
 # Compare p-value with the significance level
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print( "fail to reject null hypothesis")

Test Statistic (z-score): -3.7947331922020555
Critical Value: 1.6448536269514729
P-value: 7.390115516722686e-05
z test is greater than critical value,  Reject the null hypothesis
Reject the null hypothesis.


### Example 4 : Two Tailed test

A school claims that their students have an average IQ score higher than the general population. The population mean IQ score is known to be 100 with a standard deviation of 15. The school collects a sample of 50 students and calculates their average IQ score to be 105.

In [1]:
import numpy as np
from scipy.stats import norm

#sample statistics
n = 50
popl_mean=100
sample_mean= 105
std=15
alpha = 0.05

# state hypothesis
# H0: mu = 105
# H1: mu =!105

# calculate z statistics
z_statistic = (sample_mean-popl_mean)/(std/np.sqrt(n))

# calculate critical value 
#Since it is a two-tailed left-sided test, we use the z_alpha/2 critical value.
critical_value = norm.isf(alpha/2)

# calculate p value
p_value =  2*(1-norm.cdf(abs(z_statistic)))

# Print the results
print("Test Statistic (z-score):", z_statistic)
print("Critical Value:", critical_value)
print("P-value:", p_value)

# Make a conclusion
if abs(z_statistic) > critical_value:
    print("z test is greater than critical value,  Reject the null hypothesis")
else:
    print("z test is less than critical value, Accept the null hypothesis")
    
 # Compare p-value with the significance level
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print( "fail to reject null hypothesis")

Test Statistic (z-score): 2.3570226039551585
Critical Value: 1.9599639845400545
P-value: 0.018422125454099048
z test is greater than critical value,  Reject the null hypothesis
Reject the null hypothesis.


### Example 5: One sided right two sample test

We will test whether the average heights of two populations, A and B, suggest that population B has a greater average height than population A.

In [7]:
import numpy as np
from scipy.stats import norm

# Given data
x1= 170
x2 = 175
sigma1 = 5
sigma2 = 6
n1 = 50
n2 = 60

# state hypothesis
# H0: mu1 = mu2
# H1: mu2 > mu1

# Calculate the pooled standard deviation
pooled_std = ((n1 - 1) * sigma1 ** 2 + (n2 - 1) * sigma2 ** 2) / (n1 + n2 - 2)
pooled_std = np.sqrt(pooled_std)

# Calculate the test statistic (z-score)
z = (x2 - x1) / (pooled_std * np.sqrt(1 / n1 + 1 / n2))

# calculate critical value 
#Since it is a one-tailed right-sided two sample test, we use the z_alpha critical value.
critical_value = 1 - norm.isf(alpha)

# Calculate the p-value for a one-sided right-tailed test
p_value = 1 - norm.cdf(z)

# Print the results
print("Test Statistic (z-score):", z)
print("Critical Value:", critical_value)
print("P-value:", p_value)

# Make a conclusion
if abs(z) > critical_value:
    print("z test is greater than critical value,  Reject the null hypothesis")
else:
    print("z test is less than critical value, Accept the null hypothesis")
    
 # Compare p-value with the significance level
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print( "fail to reject null hypothesis")

Test Statistic (z-score): 4.689090266090898
Critical Value: -0.6448536269514729
P-value: 1.3721116206566464e-06
z test is greater than critical value,  Reject the null hypothesis
Reject the null hypothesis.


### Example 6 : One sided left two sample test

We will test whether the average heights of two populations, A and B, suggest that population B has a smaller average height than population A.

In [21]:
import numpy as np
from scipy.stats import norm
import scipy.stats as stats 
# Given data
x1= 175
x2 = 160
sigma1 = 5
sigma2 = 6
n1 = 50
n2 = 60

# state hypothesis
# H0: mu1 = mu2 
# H1: mu2 < mu1

# Calculate the test statistic (z-score)
z = ((x2- x1))/ np.sqrt((sigma1 ** 2 / n1) + (sigma2 ** 2 / n2))

# calculate critical value 
#Since it is a one-tailed left-sided two sample test, we use the z_alpha critical value.
critical_value = norm.isf(alpha)

# Calculate the p-value for a one-sided right-tailed test
p_value = stats.norm.cdf(z)

# Print the results
print("Test Statistic (z-score):", z)
print("Critical Value:", critical_value)
print("P-value:", p_value)

# Make a conclusion
if abs(z) > critical_value:
    print("z test is greater than critical value,  Reject the null hypothesis")
else:
    print("z test is less than critical value, Accept the null hypothesis")
    
 # Compare p-value with the significance level
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print( "fail to reject null hypothesis")

Test Statistic (z-score): -14.301938838683883
Critical Value: 1.6448536269514729
P-value: 1.0639866372752082e-46
z test is greater than critical value,  Reject the null hypothesis
Reject the null hypothesis.


# T Test

1. Assume that population parameter follow normal distribution
2. population standard deviation (σ) is unknown. 
3. sample sizes are small (typically, less than 30)
4. (Homogeneity of Variance): The variances of the two populations being compared are assumed to be equal.

### Example1:  right sided one sample  

Problem Statement:
A researcher wants to investigate whether a new teaching method improves student performance in a specific subject. The researcher selects a random sample of 30 students and provides them with the new teaching method. The average score of the sample is calculated to be 82. The researcher wants to determine if there is sufficient evidence to conclude that the new teaching method leads to a higher average score compared to the population mean score of 78.

In [23]:
import scipy.stats as stats
import numpy as np

# Given data
sample_mean = 82  # Sample mean
sample_size = 30  # Sample size
sample_std = 10  # Sample standard deviation (assumed)

#Null Hypothesis (H0):(µ <= 78)
#Alternative Hypothesis (H1):(µ > 78)

# Perform one-tailed right-sided t-test
null_mean = 78  # Null hypothesis: true population mean <= 78

# Calculate the standard error of the mean
standard_error = sample_std / np.sqrt(sample_size)

# Calculate the t-statistic
t_statistic = (sample_mean - null_mean) / standard_error

# Calculate the degrees of freedom
df = sample_size - 1

# Calculate the p-value
p_value = 1 - stats.t.cdf(t_statistic, df)

# Calculate the critical value for the given alpha level (significance level)
alpha = 0.05
critical_value = stats.t.isf(alpha, df)

print("Sample mean:", sample_mean)
print("Null hypothesis (population mean):", null_mean)
print("Test statistic (t-score):", t_statistic)
print("Critical value:", critical_value)
print("p-value:", p_value)

if t_statistic > critical_value and p_value < alpha:
    print("Reject the null hypothesis")
    print("There is sufficient evidence to conclude that the new teaching method leads to a higher average score.")
else:
    print("Fail to reject the null hypothesis")
    print("There is not enough evidence to conclude that the new teaching method leads to a higher average score.")


Sample mean: 82
Null hypothesis (population mean): 78
Test statistic (t-score): 2.1908902300206643
Critical value: 1.6991270265334977
p-value: 0.018322341974252643
Reject the null hypothesis
There is sufficient evidence to conclude that the new teaching method leads to a higher average score.


### Example 2: left sided one sample

In [24]:
import scipy.stats as stats
import numpy as np

# Given data
sample_mean = 78.5  # Sample mean
sample_size = 40  # Sample size
sample_std = 8.2  # Sample standard deviation (assumed)

# Perform one-tailed left-sided t-test
null_mean = 80  # Null hypothesis: true population mean >= 80

# Calculate the standard error of the mean
standard_error = sample_std / np.sqrt(sample_size)

# Calculate the t-statistic
t_statistic = (sample_mean - null_mean) / standard_error

# Calculate the degrees of freedom
df = sample_size - 1

# Calculate the p-value
p_value = stats.t.cdf(t_statistic, df)

# Calculate the critical value for the given alpha level (significance level)
alpha = 0.05
critical_value = stats.t.isf(1-alpha, df)

print("Sample mean:", sample_mean)
print("Null hypothesis (population mean):", null_mean)
print("Test statistic (t-score):", t_statistic)
print("Critical value:", critical_value)
print("p-value:", p_value)

if t_statistic < critical_value and p_value < alpha:
    print("Reject the null hypothesis")
    print("There is sufficient evidence to conclude that the new treatment has a lower effect.")
else:
    print("Fail to reject the null hypothesis")
    print("There is not enough evidence to conclude that the new treatment has a lower effect.")


Sample mean: 78.5
Null hypothesis (population mean): 80
Test statistic (t-score): -1.1569308512811145
Critical value: -1.6848751194973992
p-value: 0.12716896741350125
Fail to reject the null hypothesis
There is not enough evidence to conclude that the new treatment has a lower effect.


### Example 3 : two tailed one sample test

A manufacturer claims that the average weight of their cereal boxes is 500 grams. To test this claim, a random sample of 40 cereal boxes is selected, and their weights are recorded. Perform a two-tailed one-sample t-test to determine if there is sufficient evidence to support the manufacturer's claim. Use a significance level of 0.05.

In [26]:
import scipy.stats as stats
import numpy as np

# Given data
sample_mean = 492  # Sample mean
sample_size = 40  # Sample size
sample_std = 12  # Sample standard deviation

# Perform two-tailed one-sample t-test
null_mean = 500  # Null hypothesis: true population mean = 500

# Calculate the standard error of the mean
standard_error = sample_std / np.sqrt(sample_size)

# Calculate the t-statistic
t_statistic = (sample_mean - null_mean) / standard_error

# Calculate the degrees of freedom
df = sample_size - 1

# Calculate the p-value
p_value = 2 * (1 - stats.t.cdf(abs(t_statistic), df))

# Calculate the critical values for the given alpha level (significance level)
alpha = 0.05
critical_value_right = stats.t.isf(alpha / 2, df)
critical_value_left = -critical_value_right

print("Sample mean:", sample_mean)
print("Null hypothesis (population mean):", null_mean)
print("Test statistic (t-score):", t_statistic)
print("Critical values:", critical_value_left, ",", critical_value_right)
print("p-value:", p_value)

if t_statistic > critical_value_right or t_statistic < critical_value_left:
    print("Reject the null hypothesis")
    print("There is sufficient evidence to conclude that the average weight of the cereal boxes is different from 500 grams.")
else:
    print("Fail to reject the null hypothesis")
    print("There is not enough evidence to conclude that the average weight of the cereal boxes is different from 500 grams.")


Sample mean: 492
Null hypothesis (population mean): 500
Test statistic (t-score): -4.216370213557839
Critical values: -2.0226909117347285 , 2.0226909117347285
p-value: 0.00014263327560937178
Reject the null hypothesis
There is sufficient evidence to conclude that the average weight of the cereal boxes is different from 500 grams.


### Example 4: Two sample two tailed test

Problem Statement:
A researcher wants to compare the mean scores of two groups, Group A and Group B, on a standardized test. A random sample of 40 students is selected from each group, and their test scores are recorded. Perform a two-sample t-test to determine if there is a significant difference in the mean scores between Group A and Group B. Use a significance level of 0.05.

In [2]:
import scipy.stats as stats
import numpy as np

n1 = 40
x1bar = 75
s1 = 8.5
n2 = 40
x2bar = 81
s2 = 7.8

#Null hypothesis (H₀): Mean score for Group A (µ₁) = Mean score for Group B (µ₂)
#Alternative hypothesis (H₁): Mean score for Group A (µ₁) ≠ Mean score for Group B (µ₂)

# Calculate the standard error of the mean
standard_error =np.sqrt((s1**2/n1)+(s2**2/n2))

# Calculate the t-statistic
t_statistic = (x1bar-x2bar)/ standard_error

# Calculate the degrees of freedom
df = n1+n2-2

# Calculate the p-value
p_value = 2 * (1 - stats.t.cdf(abs(t_statistic), df))

# Calculate the critical values for the given alpha level (significance level)
alpha = 0.05
critical_value_right = stats.t.isf(alpha / 2, df)
critical_value_left = -critical_value_right

print("Test statistic (t-score):", t_statistic)
print("Critical values:", critical_value_left, ",", critical_value_right)
print("p-value:", p_value)

if t_statistic > critical_value_right or t_statistic < critical_value_left:
    print("Reject the null hypothesis")
    print("There is sufficient evidence to conclude that there is a significant difference in the mean scores between Group A and Group B")
else:
    print("Fail to reject the null hypothesis")
    print("There is not enough evidence to conclude that there is a significant difference in the mean scores between Group A and Group B.")




Test statistic (t-score): -3.2893382368672035
Critical values: -1.9908470685550523 , 1.9908470685550523
p-value: 0.001508523763473324
Reject the null hypothesis
There is sufficient evidence to conclude that there is a significant difference in the mean scores between Group A and Group B
