# One Way ANOVA
Example: Three varieties of wheat are sown in four plots each, and the yields are recorded as shown below. Conduct an ANOVA to analyze the differences between the wheat varieties.

Null Hypothesis: The difference between 3 types of wheat is not significant.
![image-3.png](attachment:image-3.png)

# Method "1" (One-Way ANOVA)

In [1]:
from scipy import stats

In [2]:
# Data
wheat_variety_A = [5, 3, 4, 2]
wheat_variety_B = [4, 4, 3, 3]
wheat_variety_C = [3, 2, 5, 2]

In [3]:
# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(wheat_variety_A, wheat_variety_B, wheat_variety_C)

In [4]:
# Print results
print(f"F-statistic: {f_statistic}")
print(f"P-value: {p_value}")

F-statistic: 0.25000000000000006
P-value: 0.7840343108765091


In [5]:
# Check if the difference is significant at a 5% significance level
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. The difference between wheat varieties is significant.")
else:
    print("Fail to reject the null hypothesis. The difference between wheat varieties is not significant.")

Fail to reject the null hypothesis. The difference between wheat varieties is not significant.


# Method "2" (One-Way ANOVA)

In [67]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

In [68]:
df=pd.read_csv("1Way.csv")

In [69]:
df

Unnamed: 0,Varieties,Plot,yields
0,P1,A,5
1,P1,B,4
2,P1,C,3
3,P2,A,3
4,P2,B,4
5,P2,C,2
6,P3,A,4
7,P3,B,3
8,P3,C,5
9,P4,A,2


In [75]:
Anova_model=ols("yields~Plot",data=df).fit()

One_way_Anova=sm.stats.anova_lm(Anova_model,type=2)   #type=2 refers to the type of sum of squares to be used in the analysis of variance (ANOVA). In the context of ANOVA, the type of sum of squares determines how the variance is partitioned into different components.

In [76]:
One_way_Anova

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
Plot,2.0,0.666667,0.333333,0.25,0.784034
Residual,9.0,12.0,1.333333,,


# Two Way ANOVA
Four workers alternatively work on four machines. The number of defective goods which each worker has produced are shown below in the table. Use ANOVA to examine the difference between the machine and difference between the workers.

Null Hypotheses: 

1. The difference between 4 types of machines is not significant

2. The difference between 4 types of workers is not significant.

![image-2.png](attachment:image-2.png)

In [66]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols  #ols: Ordinary Least Squares is a method for estimating the parameters in a linear regression model. It minimizes the sum of the squared differences between the observed and predicted values.

In [59]:
df=pd.read_csv("2Way.csv")

In [60]:
df

Unnamed: 0,workers,machine,def_items
0,W1,M1,10
1,W1,M2,12
2,W1,M3,14
3,W1,M4,8
4,W2,M1,15
5,W2,M2,20
6,W2,M3,7
7,W2,M4,16
8,W3,M1,7
9,W3,M2,10


In [65]:
Anova_model=ols("def_items~machine+workers",data=df).fit()

Two_way_Anova=sm.stats.anova_lm(Anova_model,type=2)   #type=2 refers to the type of sum of squares to be used in the analysis of variance (ANOVA). In the context of ANOVA, the type of sum of squares determines how the variance is partitioned into different components.

In [64]:
Two_way_Anova

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
machine,3.0,48.75,16.25,0.745223,0.55174
workers,3.0,30.75,10.25,0.470064,0.710521
Residual,9.0,196.25,21.805556,,
