# Chi-Square Tests in Machine Learning

This notebook demonstrates how to perform and interpret the two main types of Chi-Square tests:

- **Goodness-of-Fit Test**
- **Test of Independence**

Each test includes explanations, code, and real-world ML use cases.

## 1. Goodness-of-Fit Test

**Use Case:** Suppose you're running an e-commerce platform and expect equal sales across three regions. But you observe different numbers. Is this deviation due to chance or something else?


In [None]:
import numpy as np
from scipy.stats import chisquare

# Observed sales across 3 regions
observed_sales = np.array([50, 30, 20])

# Expected sales assuming equal distribution
expected_sales = np.array([33.33, 33.33, 33.34])

# Perform Goodness-of-Fit Test
chi_statistic_gof, p_value_gof = chisquare(f_obs=observed_sales, f_exp=expected_sales)

print(f"Chi-Square Statistic: {chi_statistic_gof:.4f}")
print(f"p-value: {p_value_gof:.4f}")

### Interpretation:
- A **low p-value (< 0.05)** means we reject the null hypothesis that sales are evenly distributed.
- This implies a statistically significant deviation from expected proportions.

## 2. Chi-Square Test of Independence

**Use Case:** You're analyzing whether churn behavior differs between male and female customers.

In [None]:
import pandas as pd
from scipy.stats import chi2_contingency

data = {
    'Gender': ['Male', 'Male', 'Female', 'Female'],
    'Churn': ['Yes', 'No', 'Yes', 'No'],
    'Count': [40, 60, 30, 70]
}
df = pd.DataFrame(data)
contingency_table = df.pivot(index='Gender', columns='Churn', values='Count')

chi_statistic_indep, p_value_indep, dof, expected = chi2_contingency(contingency_table)

print("Contingency Table:")
print(contingency_table)
print(f"\nChi-Square Statistic: {chi_statistic_indep:.4f}")
print(f"p-value: {p_value_indep:.4f}")

### Interpretation:
- A **high p-value (> 0.05)** means churn and gender are likely **independent**.
- A **low p-value (< 0.05)** would suggest a relationship between churn and gender.