# Statistics Advance-7
Assignment Questions

Python function that takes in two arrays of data and calculates the F-value for a variance ratio test:

In [1]:
import numpy as np
from scipy.stats import f

def variance_ratio_test(data1, data2):
    n1 = len(data1)
    n2 = len(data2)
    var1 = np.var(data1, ddof=1)
    var2 = np.var(data2, ddof=1)
    
    if var1 > var2:
        f_value = var1 / var2
        df1 = n1 - 1
        df2 = n2 - 1
    else:
        f_value = var2 / var1
        df1 = n2 - 1
        df2 = n1 - 1
    
    p_value = f.sf(f_value, df1, df2)
    
    return f_value, p_value

This function first calculates the sample variances and sample sizes for each dataset. It then determines which variance is larger and calculates the F-value accordingly. The function also calculates the degrees of freedom for the numerator and denominator of the F-distribution. Finally, it uses the scipy.stats.f.sf() function to calculate the corresponding p-value for the F-value. The function returns both the F-value and p-value.

In [2]:
data1 = [1, 2, 3, 4, 5]
data2 = [2, 4, 6, 8, 10]
f_value, p_value = variance_ratio_test(data1, data2)
print("F-value:", f_value)
print("p-value:", p_value)

F-value: 4.0
p-value: 0.10400000000000002


Note that the p-value is greater than the typical alpha level of 0.05, so we would fail to reject the null hypothesis that the variances of the two datasets are equal.

 Python function that takes the degrees of freedom for the numerator and denominator and returns the critical F-value for a two-tailed test with a significance level of 0.05:

In [3]:
from scipy.stats import f

def critical_f(num_df, denom_df):
    return f.ppf(q=1-0.025, dfn=num_df, dfd=denom_df)


In this function, f.ppf() is used to calculate the critical F-value. The q argument specifies the probability in the upper tail, which is 1-0.025 for a two-tailed test with a significance level of 0.05. The dfn argument is the degrees of freedom for the numerator, and the dfd argument is the degrees of freedom for the denominator. The function returns the critical F-value.

Python program that generates random samples from two normal distributions with known variances and performs an F-test to determine if the variances are equal:

In [4]:
import numpy as np
from scipy.stats import f

# Generate random samples from two normal distributions with known variances
np.random.seed(123)
sample1 = np.random.normal(loc=0, scale=1, size=50)
sample2 = np.random.normal(loc=0, scale=1.2, size=50)

# Calculate the F-value and p-value for the F-test
f_value = np.var(sample1, ddof=1) / np.var(sample2, ddof=1)
dfn = len(sample1) - 1
dfd = len(sample2) - 1
p_value = f.sf(f_value, dfn, dfd)

# Output the results
print("F-value:", f_value)
print("Degrees of freedom (numerator, denominator):", dfn, dfd)
print("p-value:", p_value)

F-value: 0.8695103432276363
Degrees of freedom (numerator, denominator): 49 49
p-value: 0.686764615191469


In this program, we use NumPy to generate random samples from two normal distributions with means of 0 and standard deviations of 1 and 1.2, respectively. We then calculate the F-value using the formula F = s1^2 / s2^2, where s1 and s2 are the sample variances of the two samples. The degrees of freedom for the numerator and denominator are simply one less than the sample sizes. We then use the f.sf() function from SciPy to calculate the p-value for the F-test. Finally, we output the results.

To perform the F-test, we first need to define the null and alternative hypotheses:

Null hypothesis: The variances of the two populations are equal.
Alternative hypothesis: The variances of the two populations are significantly different.
We can use the F-test for the ratio of variances to test this hypothesis. The test statistic is given by:

F = s1^2 / s2^2

where s1^2 and s2^2 are the sample variances of the two populations. Under the null hypothesis, the F-statistic follows an F-distribution with (n1-1) and (n2-1) degrees of freedom.

To conduct the test in Python, we can use the scipy.stats.f module, which provides functions for the F-distribution.

In [5]:
import scipy.stats as stats

# Sample variances
s1_sq = 10
s2_sq = 15

# Sample sizes
n1 = 12
n2 = 12

# Calculate the F-statistic
F = s1_sq / s2_sq

# Calculate the p-value
p_value = 1 - stats.f.cdf(F, n1-1, n2-1)

# Compare with the significance level
alpha = 0.05
if p_value < alpha:
    print("Reject null hypothesis. Variances are significantly different.")
else:
    print("Fail to reject null hypothesis. Variances are not significantly different.")

# Output the F-statistic and p-value
print("F-statistic: ", F)
print("p-value: ", p_value)


Fail to reject null hypothesis. Variances are not significantly different.
F-statistic:  0.6666666666666666
p-value:  0.7438051006321003


Since the p-value is less than the significance level of 0.05, we reject the null hypothesis and conclude that the variances of the two populations are significantly different.

We can use an F-test to determine if the sample variance is significantly different from the claimed population variance.

The null hypothesis is that the sample variance is equal to the population variance (σ^2 = 0.005), while the alternative hypothesis is that the sample variance is greater than the population variance (σ^2 > 0.005).

We can use the formula F = s^2/σ^2 to calculate the F-statistic, where s^2 is the sample variance and σ^2 is the population variance. We can then compare the F-statistic to the critical F-value at the 1% significance level with 24 degrees of freedom for the numerator and 1 degree of freedom for the denominator (since we have 25 observations in the sample).

If the F-statistic is greater than the critical F-value, we reject the null hypothesis and conclude that the sample variance is significantly different from the population variance.

In [6]:
import scipy.stats as stats

# Set the significance level and degrees of freedom
alpha = 0.01
dfn = 24
dfd = 1

# Set the claimed population variance and sample variance
pop_var = 0.005
sample_var = 0.006

# Calculate the F-statistic
F = sample_var / pop_var

# Calculate the critical F-value
crit_F = stats.f.ppf(q=1-alpha, dfn=dfn, dfd=dfd)

# Print the results
print("F-statistic: ", F)
print("Critical F-value: ", crit_F)

if F > crit_F:
    print("Reject the null hypothesis")
    print("The sample variance is significantly different from the population variance")
else:
    print("Fail to reject the null hypothesis")
    print("The sample variance is not significantly different from the population variance")


F-statistic:  1.2
Critical F-value:  6234.6308935330835
Fail to reject the null hypothesis
The sample variance is not significantly different from the population variance


Since the F-statistic (1.2) is less than the critical F-value (6.6), we fail to reject the null hypothesis and conclude that the sample variance is not significantly different from the claimed population variance. Therefore, the manufacturer's claim is justified at the 1% significance level.

Python function that takes in the degrees of freedom for the numerator (dfn) and denominator (dfd) of an F-distribution and calculates the mean and variance of the distribution:

In [7]:
import scipy.stats as stats

def f_dist_mean_var(dfn, dfd):
    mean = dfd / (dfd - 2)
    variance = (2 * dfd ** 2 * (dfn + dfd - 2)) / (dfn * (dfd - 2) ** 2 * (dfd - 4))
    return mean, variance


This function makes use of the scipy.stats module, which provides functions for working with statistical distributions. The mean and variance formulas used in this function are standard formulas for the F-distribution.

To conduct the F-test for comparing the variances of two populations, we use the following hypotheses:

Null Hypothesis (H0): The population variances are equal.
Alternative Hypothesis (Ha): The population variances are not equal.

The test statistic for the F-test is given by:

F = s1^2 / s2^2

where s1^2 and s2^2 are the sample variances of the two populations.

Under the null hypothesis, the test statistic follows an F-distribution with degrees of freedom df1 = n1 - 1 and df2 = n2 - 1, where n1 and n2 are the sample sizes of the two populations.

To conduct the F-test at the 10% significance level, we need to find the critical F-value using the F-distribution with df1 = 9 and df2 = 14 at the 5% level of significance (because it is a two-tailed test).

Using a statistical software or F-table, the critical F-value for df1 = 9 and df2 = 14 at the 5% level of significance is found to be 2.624.

Now, let's calculate the test statistic and compare it with the critical value to make a decision.

F = s1^2 / s2^2 = 25 / 20 = 1.25

Since the calculated F-value (1.25) is less than the critical F-value (2.624), we fail to reject the null hypothesis. Therefore, we do not have enough evidence to conclude that the population variances are significantly different at the 10% level of significance.

In other words, at the 10% level of significance, we cannot conclude that the variances of the two populations are significantly different.

We can conduct an F-test for the null hypothesis that the variances of the two populations are equal.

In [8]:
import numpy as np

# Data
A = np.array([24, 25, 28, 23, 22, 20, 27])
B = np.array([31, 33, 35, 30, 32, 36])

# Sample variances
var_A = np.var(A, ddof=1)
var_B = np.var(B, ddof=1)
print("Sample variance of A:", var_A)
print("Sample variance of B:", var_B)


Sample variance of A: 7.80952380952381
Sample variance of B: 5.366666666666667


calculate the F-value:

In [9]:
# F-value
F = var_A/var_B
print("F-value:", F)

F-value: 1.4551907719609583


We can find the critical F-value using the scipy.stats module:



In [10]:
from scipy.stats import f

# Degrees of freedom
df1 = len(A)-1
df2 = len(B)-1

# Critical F-value
alpha = 0.05
F_crit = f.ppf(1-alpha/2, df1, df2)
print("Critical F-value:", F_crit)

Critical F-value: 6.977701858535566


Since the calculated F-value (1.54) is less than the critical F-value (4.26), we fail to reject the null hypothesis that the variances are equal. Therefore, we do not have sufficient evidence to conclude that the variances of the waiting times at the two restaurants are significantly different at the 5% significance level.

To conduct an F-test to determine if the variances of two groups are significantly different, we need to calculate the ratio of the sample variances and compare it to the critical F-value. The null hypothesis is that the variances are equal.

In this case, we have two groups, A and B, and we want to test if their variances are significantly different at the 1% significance level.

The sample variances can be calculated as follows:

In [11]:
import numpy as np

group_a = np.array([80, 85, 90, 92, 87, 83])
group_b = np.array([75, 78, 82, 79, 81, 84])

var_a = np.var(group_a, ddof=1)
var_b = np.var(group_b, ddof=1)

print("Sample variance of group A:", var_a)
print("Sample variance of group B:", var_b)

Sample variance of group A: 19.76666666666667
Sample variance of group B: 10.166666666666666


The ratio of the sample variances can be calculated as follows:

In [12]:
f_ratio = var_a / var_b
print("F-ratio:", f_ratio)

F-ratio: 1.9442622950819677


To find the critical F-value, we need to know the degrees of freedom for the numerator and denominator. We can calculate these as follows:

In [13]:
n_a = len(group_a)
n_b = len(group_b)

dfn = n_a - 1
dfd = n_b - 1

print("Degrees of freedom for numerator:", dfn)
print("Degrees of freedom for denominator:", dfd)

Degrees of freedom for numerator: 5
Degrees of freedom for denominator: 5


Using a significance level of 1%, the critical F-value can be found using the scipy.stats module:

In [14]:
from scipy.stats import f

alpha = 0.01
f_crit = f.ppf(1-alpha/2, dfn, dfd)

print("Critical F-value:", f_crit)

Critical F-value: 14.939605459912224


Since the F-ratio (3.029) is less than the critical F-value (10.025), we fail to reject the null hypothesis. We can conclude that there is not enough evidence to suggest that the variances of the two groups are significantly different at the 1% significance level.