##### How Binning Helps

1. Non-linear relationship with target
   - Binning can capture non-linear patterns that a linear model might miss.

2. Skewed distribution
   - Binning smooths out skewness and reduces the impact of extreme values.

3. Interpretability is key
   - Easier for business users to understand ranges like "age 18–25" than a specific value like "age = 23."

4. Model prone to overfitting
   - Binning reduces granularity, leading to fewer splits and less overfitting.

5. Need to reduce cardinality
   - Helps when a numeric column has too many unique values.

6. Sparse or noisy data
   - Binning groups rare or noisy values to improve signal strength.

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

#### 2. Basic Prosseding

In [4]:
df = pd.read_csv("data/processed/CEHHbInToW_calculated_outliers.csv")
df.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42.0,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41.0,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42.0,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,38.91,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43.0,2,125510.82,1,1,1,79084.1,0


- we are going to add binning to credit scores also It is acceptable to add binning to Age and Tuner columns as well but migh not for salary ... 

350 - 580 => 'poor'
580 - 670 => 'fair'
670 - 740 => 'good'
740 - 800 => 'very good'
800 - 850 => 'excellent'

In [None]:
def custom_binning_credit_score(score):
    if score < 580:
        return 'Poor'
    if score < 670:
        return 'Fair' 
    if score < 740:
        return 'good'
    if score < 800:
        return 'very good'
    if score < 850:
        return 'excellent'
    else:
        assert True , 'credit score cant go beyound 850'


df['CreditScoreBines'] = df['CreditScore'].apply(custom_binning_credit_score)
## apply a function along either axis (rows or columns) of a DataFrame or to each element of a Series
df
del df['CreditScore']
df

##### What `assert` does

- `assert` is used in Python to **check if a condition is True**.  
- If the condition is **False**, it raises an `AssertionError` with the given message.  

**Syntax:**  

```python
assert condition, "error message if condition is False"

In [8]:
df.to_csv("data/processed/CEHHbInToW_binning_applied.csv", index=False)