## Chi-Square Test (χ² Test)

**Use:**  
- To test the association/independence between categorical variables.  
- To compare observed vs expected frequencies.  

**Test statistic:**

$$
\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}
$$

where:  
- \( O_i \) = Observed frequency  
- \( E_i \) = Expected frequency  

**Distribution:**  
- Follows a Chi-Square distribution with degrees of freedom:  

$$
df = (r - 1)(c - 1)
$$

where \( r \) = number of rows, \( c \) = number of columns (in contingency table).


In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
from scipy.stats import chi2_contingency

In [4]:
df = sns.load_dataset('titanic')
df

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


In [6]:
contigency_table = pd.crosstab(df['sex'], df['survived'])
contigency_table

survived,0,1
sex,Unnamed: 1_level_1,Unnamed: 2_level_1
female,81,233
male,468,109


In [11]:
chi2, p_value, dof, expected = chi2_contingency(contigency_table)

In [14]:
expected  # (row value * column value)/ total value

array([[193.47474747, 120.52525253],
       [355.52525253, 221.47474747]])

In [15]:
dof # degree of freedom (col - 1 * row - 1)

1

In [13]:
chi2 # chi square value

260.71702016732104

In [16]:
p_value # compare with significance value

1.1973570627755645e-58

In [18]:
alpha = 0.05
if p_value < alpha:
    print("I will reject the null hypothesis and observe that there is a significant difference between gender and survivor")
else:
    print("I will accept the null hypothesis, and there is no significant difference")

I will reject the null hypothesis and observe that there is a significant difference between gender and survivor


## **Created By:** *Hafiz Muhammad Talal*