# Background
- **Author**: `<郭伊軒>`
- **Created At**: `<2025-11-1>`
- **Path to Training Data： discount-timing-DE.csv**
- **Path to Testing Data： discount-timing-DE.csv**
- **Model Specification 
    - Method：logistic regression
    - Variables：  
    ['Age', 'MultiPlayer', 'PlayerGrowthRate1W', 'FollowersGrowthRate1W', 'PositiveRateGrowthRate1W', 'SalePeriod', 'DiscountFreq3M', 'DLC_sum_1W', 'Sequel_sum_1W']
    - Tuning Parameters：if SMOTE
    - Optimization Method：balance model(SMOTE)
- **Main Findings and Takeaways：**
    - In-sample `<Accuracy, F1, AUC>`:  
    DiscountOrNot(0.9793,    0.0000,  0.8088), DiscountDuringSale(0.8806,    0.1470,  0.9640), DiscountOutOfSale(0.7109,    0.0514,  0.8241)
    - Out-sample `<Accuracy, F1, AUC>`:  
    DiscountOrNot(0.7293,    0.0788,  0.7399), DiscountDuringSale(0.9044,    0.0791,  0.9756), DiscountOutOfSale(0.7875,    0.0705,  0.7597)
    - 個體差異不顯著
- **Future Direciton：**

In [64]:
# Load packages here
import pandas as pd
import numpy as np
from imblearn.over_sampling import SMOTE
import statsmodels.api as sm
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score, confusion_matrix
from statsmodels.stats.outliers_influence import variance_inflation_factor
from sklearn.preprocessing import StandardScaler



In [65]:
# Load the TRAINING data here and please finish all the data manipulation here.
input_data_file = "/Users/10610/Desktop/114-1 資料/steam-project/discount-timing-DE.csv"
#input_data_file = "/Users/user/Desktop/114-1 資料/steam-project/discount-timing-DE.csv"
df = pd.read_csv(input_data_file)
df_dummies = pd.get_dummies(df, columns=['GameID'], drop_first=True)

train = df_dummies[df_dummies['Date'] < '2025-01-01']
test = df_dummies[df_dummies['Date'] >= '2025-01-01']

def prepare_xy(df, feature_cols, target_col):
    X = df[feature_cols].copy()
    y = df[target_col].copy()
     
    # 將 bool 欄轉成 int
    X = X.astype({col: 'int' for col in X.select_dtypes(bool).columns})
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X) 
    X_scaled_df = pd.DataFrame(X_scaled, columns=X.columns, index=X.index)
    X_scaled_df = sm.add_constant(X_scaled_df)
    
    return X_scaled_df, y


In [66]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
GameID,23938.0,461376.742,298559.181056,10.0,244850.0,431730.0,644930.0,1145360.0
MultiPlayer,23938.0,0.464241,0.49873,0.0,0.0,0.0,1.0,1.0
ConstantDiscount,23938.0,0.214387,0.410405,0.0,0.0,0.0,0.0,1.0
DiscountOrNot,23938.0,0.019885,0.139607,0.0,0.0,0.0,0.0,1.0
DiscountDuration,23938.0,0.221196,1.715483,0.0,0.0,0.0,0.0,32.0
DiscountFreq3M,23938.0,1.797644,1.043279,0.0,1.0,2.0,3.0,6.0
Age,23938.0,7.634427,4.458471,2.389041,4.95137,6.323288,8.479452,24.84658
AccumulatedPositiveRate,23938.0,0.928061,0.064186,0.738751,0.905517,0.953165,0.972651,0.9929734
SalePeriod,23938.0,0.14642,0.353534,0.0,0.0,0.0,0.0,1.0
DiscountDuringSale,23938.0,0.008647,0.09259,0.0,0.0,0.0,0.0,1.0


### The actual modeling starts below
For the remaining blocks, make sure you have followed the guidelines as specified in [專案資料夾結構、檔案命名與文件規範](https://docs.google.com/document/d/1sl6gEFMdmiGsiNjLe17UmZ30xKxq15U0Mb2B-Jvusxg/edit?tab=t.33iie8ybx7s4).


In [None]:
def evaluate_model(name, model, X_train, y_train, X_test, y_test):
    y_prob_train = model.predict(X_train)
    y_pred_train = (y_prob_train >= 0.5).astype(int)

    y_prob_test = model.predict(X_test)
    y_pred_test = (y_prob_test >= 0.5).astype(int)

    acc_train = accuracy_score(y_train, y_pred_train)
    f1_train = f1_score(y_train, y_pred_train)
    auc_train = roc_auc_score(y_train, y_prob_train)

    acc_test = accuracy_score(y_test, y_pred_test)
    f1_test = f1_score(y_test, y_pred_test)
    auc_test = roc_auc_score(y_test, y_prob_test)
    cm = confusion_matrix(y_test, y_pred_test)

    results = {
        'Accuracy': [round(acc_train, 4), round(acc_test, 4)],
        'F1 score': [round(f1_train, 4), round(f1_test, 4)],
        'AUC': [round(auc_train, 4), round(auc_test, 4)]
    }

    row_names = ['train', 'test']

    result = pd.DataFrame(results, index=row_names)

    print(f"\n=== {name} ===")
    print("Confusion matrix:\n", cm)
    return result

# 1W

### 所有折扣

In [68]:
feature_cols_gameid = [
    'Age', 'PlayerGrowthRate1W', 'FollowersGrowthRate1W', 'PositiveRateGrowthRate1W', 
    'SalePeriod', 'DLC_sum_1W', 'Sequel_sum_1W'
] + [col for col in df_dummies.columns if col.startswith('GameID_')]

feature_cols = [
    'Age','AccumulatedPositiveRate', "MultiPlayer", 'PlayerGrowthRate1W', 'FollowersGrowthRate1W', 'PositiveRateGrowthRate1W', 
    'SalePeriod', 'DiscountFreq3M', 'DLC_sum_1W', 'Sequel_sum_1W'
]


#### 證明個體沒有明顯差異

In [69]:
X_train, y_train = prepare_xy(train, feature_cols_gameid, 'DiscountOrNot')
X_test, y_test = prepare_xy(test, feature_cols_gameid, 'DiscountOrNot') 
logit_model = sm.Logit(y_train, X_train).fit(method='bfgs', maxiter=100)
print(logit_model.summary())

         Current function value: 0.087856
         Iterations: 100
         Function evaluations: 101
         Gradient evaluations: 101
                           Logit Regression Results                           
Dep. Variable:          DiscountOrNot   No. Observations:                17116
Model:                          Logit   Df Residuals:                    17081
Method:                           MLE   Df Model:                           34
Date:                Sun, 16 Nov 2025   Pseudo R-squ.:                  0.1293
Time:                        17:12:10   Log-Likelihood:                -1503.7
converged:                      False   LL-Null:                       -1727.1
Covariance Type:            nonrobust   LLR p-value:                 1.875e-73
                               coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------
const                       -4.6824      0.16

  res = _minimize_bfgs(f, x0, args, fprime, callback=callback, **opts)


##### 共線性

In [70]:
#檢查共線性 AccumulatedPositiveRate 和 Age 有共線性問題
vif_data = pd.DataFrame()
vif_data["feature"] = X_train.columns[1:]  # 跳過常數項 'const'
vif_data["VIF"] = [
    variance_inflation_factor(X_train.iloc[:, 1:].values, i)
    for i in range(X_train.shape[1] - 1)
]
print(vif_data)

                     feature        VIF
0                        Age  87.198322
1         PlayerGrowthRate1W   1.219776
2      FollowersGrowthRate1W   2.350926
3   PositiveRateGrowthRate1W   1.510173
4                 SalePeriod   1.037660
5                 DLC_sum_1W   1.111978
6              Sequel_sum_1W   1.015323
7                GameID_3590  12.664379
8                GameID_4000   7.536524
9              GameID_108600  27.180471
10             GameID_233860  51.572844
11             GameID_242760  48.961063
12             GameID_244210  31.592986
13             GameID_244850  53.400790
14             GameID_294100  51.125701
15             GameID_323190  48.764740
16             GameID_367520  42.324516
17             GameID_376210  36.451853
18             GameID_381210  39.412920
19             GameID_413150  37.198470
20             GameID_431730  36.973992
21             GameID_431960  50.921457
22             GameID_457140  55.600975
23             GameID_477160  40.048350


##### Wald test

In [71]:
# 1. 取得所有 dummy variable 的名稱列表
game_cols = [col for col in df_dummies.columns if col.startswith('GameID_')]
game_cnt = len(game_cols)
variable_cnt = len(feature_cols_gameid) + 1 # 包含常數項及其他變數的總數

# 2. 初始化 R 矩陣
R_matrix = np.zeros([game_cnt, variable_cnt])

# 3. 找出這些變數在模型參數列表中的位置，並設定 R 矩陣
for i, var_name in enumerate(game_cols):
    # 找到該變數在 model.params 中的索引位置
    param_index = logit_model.params.index.get_loc(var_name)
    R_matrix[i, param_index] = 1


print('\n unbalance')
print(logit_model.wald_test(R_matrix))


 unbalance
<Wald test (chi2): statistic=[[74.56627923]], p-value=2.445621130060853e-06, df_denom=27>




沒有明顯個體差異

#### model summary

In [72]:
X_train, y_train = prepare_xy(train, feature_cols, 'DiscountOrNot')
X_test, y_test = prepare_xy(test, feature_cols, 'DiscountOrNot') 

logit_model = sm.Logit(y_train, X_train).fit(method='bfgs', maxiter=100)
print(logit_model.summary())

Optimization terminated successfully.
         Current function value: 0.087297
         Iterations: 83
         Function evaluations: 85
         Gradient evaluations: 85
                           Logit Regression Results                           
Dep. Variable:          DiscountOrNot   No. Observations:                17116
Model:                          Logit   Df Residuals:                    17105
Method:                           MLE   Df Model:                           10
Date:                Sun, 16 Nov 2025   Pseudo R-squ.:                  0.1349
Time:                        17:12:13   Log-Likelihood:                -1494.2
converged:                       True   LL-Null:                       -1727.1
Covariance Type:            nonrobust   LLR p-value:                 8.318e-94
                               coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------
const     

PlayerGrowthRate1W、FollowersGrowthRate1W、SalePeriod、DiscountFreq3M顯著

##### 共線性

In [73]:
vif_data = pd.DataFrame()
vif_data["feature"] = X_train.columns[1:]  # 跳過常數項 'const'
vif_data["VIF"] = [
    variance_inflation_factor(X_train.iloc[:, 1:].values, i)
    for i in range(X_train.shape[1] - 1)
]
print(vif_data)

                    feature       VIF
0                       Age  1.396897
1   AccumulatedPositiveRate  1.318244
2               MultiPlayer  1.428010
3        PlayerGrowthRate1W  1.099575
4     FollowersGrowthRate1W  1.153060
5  PositiveRateGrowthRate1W  1.069457
6                SalePeriod  1.075514
7            DiscountFreq3M  1.192157
8                DLC_sum_1W  1.039151
9             Sequel_sum_1W  1.006282


#### 模型效果

In [74]:
smote = SMOTE(random_state=42)
X_train_sm, y_train_sm = smote.fit_resample(X_train, y_train)
logit_model_sm = sm.Logit(y_train_sm, X_train_sm).fit(method='bfgs', maxiter=100)
result1 = evaluate_model('unbalance', logit_model, X_train, y_train, X_test, y_test)
result2 = evaluate_model('balance', logit_model_sm, X_train, y_train, X_test, y_test)

combined_results = pd.concat([result1, result2], keys=['unbalance', 'balance'])
print("\n模型比較結果:")
print(combined_results)


Optimization terminated successfully.
         Current function value: 0.517634
         Iterations: 51
         Function evaluations: 52
         Gradient evaluations: 52

       Accuracy  F1 score     AUC
train    0.9793       0.0  0.8088
test     0.9823       0.0  0.7332
Confusion matrix:
 [[6701    0]
 [ 121    0]]

       Accuracy  F1 score     AUC
train    0.7560    0.1088  0.8124
test     0.7293    0.0788  0.7399
Confusion matrix:
 [[4896 1805]
 [  42   79]]

模型比較結果:
                 Accuracy  F1 score     AUC
unbalance train    0.9793    0.0000  0.8088
          test     0.9823    0.0000  0.7332
balance   train    0.7560    0.1088  0.8124
          test     0.7293    0.0788  0.7399


有經過平衡處理的模型表現比較好

### 季節性折扣

#### model summary

In [75]:
X_train, y_train = prepare_xy(train, feature_cols, 'DiscountDuringSale')
X_test, y_test = prepare_xy(test, feature_cols, 'DiscountDuringSale')

logit_model = sm.Logit(y_train, X_train).fit_regularized(alpha=1)
print(logit_model.summary())

Optimization terminated successfully    (Exit mode 0)
            Current function value: 0.03545034833550609
            Iterations: 152
            Function evaluations: 153
            Gradient evaluations: 152
                           Logit Regression Results                           
Dep. Variable:     DiscountDuringSale   No. Observations:                17116
Model:                          Logit   Df Residuals:                    17106
Method:                           MLE   Df Model:                            9
Date:                Sun, 16 Nov 2025   Pseudo R-squ.:                  0.4042
Time:                        17:12:14   Log-Likelihood:                -592.48
converged:                       True   LL-Null:                       -994.37
Covariance Type:            nonrobust   LLR p-value:                3.263e-167
                               coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------

PlayerGrowthRate1W、PositiveRateGrowthRate1W、salePeriod顯著 DiscountFreq3M (0.018)

#### 模型效果

In [76]:
smote = SMOTE(random_state=42)
X_train_sm, y_train_sm = smote.fit_resample(X_train, y_train)
logit_model_sm = sm.Logit(y_train_sm, X_train_sm).fit()
result1 = evaluate_model('unbalance', logit_model, X_train, y_train, X_test, y_test)
result2 = evaluate_model('balance', logit_model_sm, X_train, y_train, X_test, y_test)

combined_results = pd.concat([result1, result2], keys=['unbalance', 'balance'])
print("\n模型比較結果:")
print(combined_results)


         Current function value: 0.184348
         Iterations: 35

       Accuracy  F1 score     AUC
train    0.9895    0.0219  0.9640
test     0.9811    0.1783  0.9762
Confusion matrix:
 [[6679  115]
 [  14   14]]

       Accuracy  F1 score     AUC
train    0.8806    0.1470  0.9640
test     0.9044    0.0791  0.9756
Confusion matrix:
 [[6142  652]
 [   0   28]]

模型比較結果:
                 Accuracy  F1 score     AUC
unbalance train    0.9895    0.0219  0.9640
          test     0.9811    0.1783  0.9762
balance   train    0.8806    0.1470  0.9640
          test     0.9044    0.0791  0.9756




有經過平衡處理的模型表現比較好

### 非季節性折扣

#### model summary

In [77]:
X_train, y_train = prepare_xy(train, feature_cols, 'DiscountOutOfSale')
X_test, y_test = prepare_xy(test, feature_cols, 'DiscountOutOfSale')
logit_model = sm.Logit(y_train, X_train).fit(method='bfgs', maxiter=100)
print(logit_model.summary())

Optimization terminated successfully.
         Current function value: 0.049662
         Iterations: 84
         Function evaluations: 85
         Gradient evaluations: 85
                           Logit Regression Results                           
Dep. Variable:      DiscountOutOfSale   No. Observations:                17116
Model:                          Logit   Df Residuals:                    17105
Method:                           MLE   Df Model:                           10
Date:                Sun, 16 Nov 2025   Pseudo R-squ.:                  0.1332
Time:                        17:12:15   Log-Likelihood:                -850.02
converged:                       True   LL-Null:                       -980.69
Covariance Type:            nonrobust   LLR p-value:                 2.224e-50
                               coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------
const     

DLC_sum_1W(0.010)、DiscountFreq3M(0.000)、FollowersGrowthRate1W(0.006)

#### 模型效果

In [78]:
smote = SMOTE(random_state=42)
X_train_sm, y_train_sm = smote.fit_resample(X_train, y_train)
logit_model_sm = sm.Logit(y_train_sm, X_train_sm).fit(method='bfgs', maxiter=100)
result1 = evaluate_model('unbalance', logit_model, X_train, y_train, X_test, y_test)
result2 = evaluate_model('balance', logit_model_sm, X_train, y_train, X_test, y_test)


combined_results = pd.concat([result1, result2], keys=['unbalance', 'balance'])
print("\n模型比較結果:")
print(combined_results)

Optimization terminated successfully.
         Current function value: 0.488506
         Iterations: 50
         Function evaluations: 51
         Gradient evaluations: 51

       Accuracy  F1 score     AUC
train    0.9897       0.0  0.8235
test     0.9864       0.0  0.7605
Confusion matrix:
 [[6729    0]
 [  93    0]]

       Accuracy  F1 score     AUC
train    0.7109    0.0514  0.8241
test     0.7875    0.0705  0.7597
Confusion matrix:
 [[5317 1412]
 [  38   55]]

模型比較結果:
                 Accuracy  F1 score     AUC
unbalance train    0.9897    0.0000  0.8235
          test     0.9864    0.0000  0.7605
balance   train    0.7109    0.0514  0.8241
          test     0.7875    0.0705  0.7597


# 2W

In [79]:
feature_cols = [
    'Age', 'AccumulatedPositiveRate', "MultiPlayer", 'PlayerGrowthRate2W', 'FollowersGrowthRate2W', 'PositiveRateGrowthRate2W', 
    'SalePeriod', 'DiscountFreq3M', 'DLC_sum_2W', 'Sequel_sum_2W'
]

### 所有折扣

#### model summary

In [80]:
X_train, y_train = prepare_xy(train, feature_cols, 'DiscountOrNot')
X_test, y_test = prepare_xy(test, feature_cols, 'DiscountOrNot')
logit_model = sm.Logit(y_train, X_train).fit(method='bfgs', maxiter=100)
print(logit_model.summary())


Optimization terminated successfully.
         Current function value: 0.088254
         Iterations: 83
         Function evaluations: 85
         Gradient evaluations: 85
                           Logit Regression Results                           
Dep. Variable:          DiscountOrNot   No. Observations:                17116
Model:                          Logit   Df Residuals:                    17105
Method:                           MLE   Df Model:                           10
Date:                Sun, 16 Nov 2025   Pseudo R-squ.:                  0.1254
Time:                        17:12:16   Log-Likelihood:                -1510.6
converged:                       True   LL-Null:                       -1727.1
Covariance Type:            nonrobust   LLR p-value:                 8.050e-87
                               coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------
const     

PlayerGrowthRate2W、FollowersGrowthRate2W、SalePeriod、FollowersGrowthRate2W顯著

#### 模型效果

In [81]:
smote = SMOTE(random_state=42)
X_train_sm, y_train_sm = smote.fit_resample(X_train, y_train)
logit_model_sm = sm.Logit(y_train_sm, X_train_sm).fit()
result1 = evaluate_model('unbalance', logit_model, X_train, y_train, X_test, y_test)
result2 = evaluate_model('balance', logit_model_sm, X_train, y_train, X_test, y_test)

combined_results = pd.concat([result1, result2], keys=['unbalance', 'balance'])
print("\n模型比較結果:")
print(combined_results)


         Current function value: 0.519164
         Iterations: 35

       Accuracy  F1 score     AUC
train    0.9793    0.0056  0.8118
test     0.9823    0.0000  0.7393
Confusion matrix:
 [[6701    0]
 [ 121    0]]

       Accuracy  F1 score     AUC
train    0.7513    0.1066  0.8128
test     0.7262    0.0798  0.7402
Confusion matrix:
 [[4873 1828]
 [  40   81]]

模型比較結果:
                 Accuracy  F1 score     AUC
unbalance train    0.9793    0.0056  0.8118
          test     0.9823    0.0000  0.7393
balance   train    0.7513    0.1066  0.8128
          test     0.7262    0.0798  0.7402




### 季節性折扣

#### model summary

In [82]:
X_train, y_train = prepare_xy(train, feature_cols, 'DiscountDuringSale')
X_test, y_test = prepare_xy(test, feature_cols, 'DiscountDuringSale')
logit_model = sm.Logit(y_train, X_train).fit(method='bfgs', maxiter=100)
print(logit_model.summary())

Optimization terminated successfully.
         Current function value: 0.036801
         Iterations: 87
         Function evaluations: 88
         Gradient evaluations: 88
                           Logit Regression Results                           
Dep. Variable:     DiscountDuringSale   No. Observations:                17116
Model:                          Logit   Df Residuals:                    17105
Method:                           MLE   Df Model:                           10
Date:                Sun, 16 Nov 2025   Pseudo R-squ.:                  0.3665
Time:                        17:12:16   Log-Likelihood:                -629.89
converged:                       True   LL-Null:                       -994.37
Covariance Type:            nonrobust   LLR p-value:                3.784e-150
                               coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------
const     

PlayerGrowthRate2W顯著FollowersGrowthRate2W(0.004)、DiscountFreq3M(0.006)

#### 模型效果

In [83]:
X_train_sm, y_train_sm = smote.fit_resample(X_train, y_train)
logit_model_sm = sm.Logit(y_train_sm, X_train_sm).fit()
result1 = evaluate_model('unbalance', logit_model, X_train, y_train, X_test, y_test)
result2 = evaluate_model('balance', logit_model_sm, X_train, y_train, X_test, y_test)

combined_results = pd.concat([result1, result2], keys=['unbalance', 'balance'])
print("\n模型比較結果:")
print(combined_results)

         Current function value: 0.204355
         Iterations: 35

       Accuracy  F1 score     AUC
train    0.9895     0.000  0.9499
test     0.9252     0.086  0.9721
Confusion matrix:
 [[6288  506]
 [   4   24]]

       Accuracy  F1 score     AUC
train    0.8611    0.1290  0.9492
test     0.9018    0.0771  0.9727
Confusion matrix:
 [[6124  670]
 [   0   28]]

模型比較結果:
                 Accuracy  F1 score     AUC
unbalance train    0.9895    0.0000  0.9499
          test     0.9252    0.0860  0.9721
balance   train    0.8611    0.1290  0.9492
          test     0.9018    0.0771  0.9727




### 非季節性折扣

#### model summary

In [84]:
X_train, y_train = prepare_xy(train, feature_cols, 'DiscountOutOfSale')
X_test, y_test = prepare_xy(test, feature_cols, 'DiscountOutOfSale')
logit_model = sm.Logit(y_train, X_train).fit(method='bfgs', maxiter=100)
print(logit_model.summary())

Optimization terminated successfully.
         Current function value: 0.049617
         Iterations: 96
         Function evaluations: 97
         Gradient evaluations: 97
                           Logit Regression Results                           
Dep. Variable:      DiscountOutOfSale   No. Observations:                17116
Model:                          Logit   Df Residuals:                    17105
Method:                           MLE   Df Model:                           10
Date:                Sun, 16 Nov 2025   Pseudo R-squ.:                  0.1340
Time:                        17:12:17   Log-Likelihood:                -849.24
converged:                       True   LL-Null:                       -980.69
Covariance Type:            nonrobust   LLR p-value:                 1.040e-50
                               coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------
const     

#### 模型效果

PlayerGrowthRate2W(0.039)、DiscountFreq3M(0.000)、DLC_sum_2W(0.033)

In [85]:
X_train_sm, y_train_sm = smote.fit_resample(X_train, y_train)
logit_model_sm = sm.Logit(y_train_sm, X_train_sm).fit(method='bfgs', maxiter=100)
result1 = evaluate_model('unbalance', logit_model, X_train, y_train, X_test, y_test)
result2 = evaluate_model('balance', logit_model_sm, X_train, y_train, X_test, y_test)


combined_results = pd.concat([result1, result2], keys=['unbalance', 'balance'])
print("\n模型比較結果:")
print(combined_results)


Optimization terminated successfully.
         Current function value: 0.483764
         Iterations: 52
         Function evaluations: 53
         Gradient evaluations: 53

       Accuracy  F1 score     AUC
train    0.9897       0.0  0.8259
test     0.9864       0.0  0.7735
Confusion matrix:
 [[6729    0]
 [  93    0]]

       Accuracy  F1 score    AUC
train    0.7241    0.0563  0.828
test     0.7986    0.0741  0.773
Confusion matrix:
 [[5393 1336]
 [  38   55]]

模型比較結果:
                 Accuracy  F1 score     AUC
unbalance train    0.9897    0.0000  0.8259
          test     0.9864    0.0000  0.7735
balance   train    0.7241    0.0563  0.8280
          test     0.7986    0.0741  0.7730
