# Data Description

- Background:
  本資料集為 ABC Multistate Bank 的客戶流失資料集，包含銀行客戶的基本資料、財務狀況與互動行為，用於訓練與評估客戶流失（Churn）預測模型，以判斷客戶是否在觀測期間內離開銀行。

- date received：20260112  

- Path to data
  https://www.kaggle.com/datasets/gauravtopre/bank-customer-churn-dataset  

- Unit of observation:
  每一筆紀錄對應一位銀行客戶。

- Sample period:
  資料集未提供明確的時間區間，屬於某一觀測期間內的橫斷面（cross-sectional）客戶資料。

- Known issues:
  - `Country`、`Gender` 為文字類別欄位，需轉換為數值編碼後才能用於模型訓練與 SHAP 分析。  
  - `CustomerId` 為識別欄位，無預測意義，建模前應移除。  
  - `Balance`、`EstimatedSalary`、`CreditScore` 等變數尺度差異大，建模前應進行標準化（scaling）。  
  - `Churn` 為目標變數（標籤），其編碼為 1 = 流失、0 = 未流失。  


- Definition for each variable

  - CustomerId：客戶的唯一識別碼，用來區分不同客戶（不作為模型輸入）。  
  - CreditScore：客戶信用評分（數值型），反映客戶的信用風險與財務可靠度。  
  - Country：客戶所屬國家（類別型，例如 France、Germany、Spain）。  
  - Gender：客戶性別（類別型，例如 Male、Female）。  
  - Age：客戶年齡（數值型）。  
  - Tenure：客戶與銀行往來的年數（數值型），表示關係持續時間。  
  - Balance：客戶在銀行帳戶中的餘額（數值型），代表其資金規模。  
  - Products_number：客戶持有的銀行產品數量（數值型，例如帳戶、信用卡、貸款等）。  
  - Credit_card：客戶是否持有銀行信用卡（0 = 否，1 = 是）。  
  - Active_member：客戶是否為活躍會員（0 = 不活躍，1 = 活躍）。  
  - Estimated_salary：客戶的預估年薪（數值型），反映其收入水準。  
  - Churn：是否流失的標籤（目標變數），1 = 客戶已離開銀行，0 = 客戶仍為銀行客戶。  



In [48]:
import pandas as pd

# Change for Your Own Data

In [49]:
input_data_file = "/Users/chengxianghuang/Downloads/Bank Customer Churn Prediction.csv"

# Summary

In [50]:
df = pd.read_csv(input_data_file)

# Sample

In [51]:
df.head(10)

Unnamed: 0,customer_id,credit_score,country,gender,age,tenure,balance,products_number,credit_card,active_member,estimated_salary,churn
0,15634602,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,15647311,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,15619304,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,15701354,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,15737888,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0
5,15574012,645,Spain,Male,44,8,113755.78,2,1,0,149756.71,1
6,15592531,822,France,Male,50,7,0.0,2,1,1,10062.8,0
7,15656148,376,Germany,Female,29,4,115046.74,4,1,0,119346.88,1
8,15792365,501,France,Male,44,4,142051.07,2,0,1,74940.5,0
9,15592389,684,France,Male,27,2,134603.88,1,1,1,71725.73,0


# Summary Stats

In [52]:
df.describe()

Unnamed: 0,customer_id,credit_score,age,tenure,balance,products_number,credit_card,active_member,estimated_salary,churn
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,15690940.0,650.5288,38.9218,5.0128,76485.889288,1.5302,0.7055,0.5151,100090.239881,0.2037
std,71936.19,96.653299,10.487806,2.892174,62397.405202,0.581654,0.45584,0.499797,57510.492818,0.402769
min,15565700.0,350.0,18.0,0.0,0.0,1.0,0.0,0.0,11.58,0.0
25%,15628530.0,584.0,32.0,3.0,0.0,1.0,0.0,0.0,51002.11,0.0
50%,15690740.0,652.0,37.0,5.0,97198.54,1.0,1.0,1.0,100193.915,0.0
75%,15753230.0,718.0,44.0,7.0,127644.24,2.0,1.0,1.0,149388.2475,0.0
max,15815690.0,850.0,92.0,10.0,250898.09,4.0,1.0,1.0,199992.48,1.0
