# [Statistics]  F-test with Python library

## F-test introduction

< Introduction >

In statistics, many tests are used to compare the different samples or groups and draw conclusions about populations. These techniques are commonly known as statistical tests or hypothesis tests. 

It focuses on analyzing the likelihood or probability of obtaining the observed data that they are random or follow specific assumptions or hypotheses. These tests give an outline for analyzing evidence in support or opposition to a certain hypothesis.



< Hypothesis >

A statistical test begins with the formation of a null hypothesis (H0) and an alternative hypothesis (Ha).

- The null hypothesis represents the default or no-effect assumption
- The alternative hypothesis suggests a specific relationship or effect.

< Statistical method : p-value >

Different statistical test methods are available to calculate the probability, typically measured as a p-value, of obtaining the observed data. Based on the calculated p-value and a predetermined significance level, researchers make a decision to either accept or reject the null hypothesis. 

- p-value indicates the likelihood of observing the data or more extreme results assuming the null hypothesis is true.


< F-test >

F-test is the statistical test used to compare the variances of two or more samples or populations in hypothesis testing to determine whether they are significantly different or not. The F-statistic is a test statistic that measures the ratio of the variances between groups or populations. It is calculated by dividing the population sample variance by each sample variance.

The F-test is used in statistics and machine learning for comparing variances or testing the overall significance of a statistical model, such as in the analysis of variance (ANOVA) or regression analysis.

![2024-01-23%2022_40_33-How%20F-tests%20work%20in%20Analysis%20of%20Variance%20%28ANOVA%29%20-%20Statistics%20By%20Jim.png](attachment:2024-01-23%2022_40_33-How%20F-tests%20work%20in%20Analysis%20of%20Variance%20%28ANOVA%29%20-%20Statistics%20By%20Jim.png)

By performing the F-test, we compare the calculated F statistic to a critical value or a specified significance level. If the results of the F-test are statistically significant, meaning that the calculated F statistic exceeds the critical value, we can reject the null hypothesis, which assumes equal variances.

 On the other hand, if the results are not statistically significant, we fail to reject the null hypothesis, indicating that there is not enough evidence to conclude that the variances are significantly different.

The F-statistic incorporates both measures of variability discussed above. Let's take a look at how these measures can work together to produce low and high F-values. Look at the graphs below and compare the width of the spread of the group means to the width of the spread within each group.

![222.png](attachment:222.png)

< F-statistics using Python >

The required library is scipy stats. \
The scipy stats.f() function in Python with the certain parameters required to be passed to get the F- test of the given data.

### F-test

In [6]:
import numpy as np
import scipy.stats as stats
 
# Create the data for two groups
group1 = np.round(np.random.rand(25)* 100)
group2 = np.round(np.random.rand(20)* 100)
 
# Calculate the sample variances
variance1 = np.var(group1, ddof=1)
variance2 = np.var(group2, ddof=1)
 
# Calculate the F-statistic
f_value = variance1 / variance2
 
# Calculate the degrees of freedom
df1 = len(group1) - 1
df2 = len(group2) - 1
 
# Calculate the p-value
p_value = stats.f.cdf(f_value, df1, df2)
 
# Print the results
print('Degree of freedom 1:',df1)
print('Degree of freedom 2:',df2)
print("F-statistic:", f_value)
print("p-value:", p_value)

Degree of freedom 1: 24
Degree of freedom 2: 19
F-statistic: 1.377440245784886
p-value: 0.7598804871766337
