# 简介
<br>在进行数据清洗后，需要对模型进行调参，本文只做示例，因此出于运算速度的考虑，选取了较小的数据量，进行3折训练来寻找最优参数</br>


在实操过程中，建议使用更多数据和更高的折数

In [1]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
import xgboost as xgb
from sklearn.model_selection import GridSearchCV
import sys
from utils import save_model_params_to_yml

# 基本参数，实操过程建议调整
data_ratio = 0.001       # 用data_ratio的数据选参
cv_number = 3           # 交叉验证的折树

data = pd.read_csv('data\data_after_clean.csv', index_col=[0])
yml_path = 'configs\model_best_params.yml'
columns_list = list(data.columns)
label_column = 'isDefault'
columns_list.remove(label_column)
features = data[columns_list].copy()
labels = data[label_column].copy()
seed = 1466
X = features[0:int(data_ratio*len(features))]
y = labels[0:int(data_ratio*len(features))] 

### 1. 决策树

In [None]:
decision_tree = DecisionTreeClassifier()
param_grid = {
    'max_depth': [15, 30, 45, 60, 80], 
    'min_samples_split': [100, 200, 400, 600],
    'min_samples_leaf': [100, 200, 400, 600],
    'criterion': ['gini', 'entropy'],
    'max_features': [None, 'sqrt', 'log2']
}
grid_search = GridSearchCV(estimator=decision_tree, param_grid=param_grid, scoring='roc_auc', cv=cv_number)
grid_search.fit(X, y)
print("Best Parameters: ", grid_search.best_params_)
print("Best AUC Score: ", grid_search.best_score_)
save_model_params_to_yml(model_name='decision_tree', model_params= grid_search.best_params_, yml_path=yml_path)

### 2. 随机森林

In [None]:
rf_classifier = RandomForestClassifier()
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [15, 25, 35, 50],
    'min_samples_split': [50, 100, 200]
}
grid_search = GridSearchCV(estimator=rf_classifier, param_grid=param_grid, scoring='roc_auc', cv=cv_number)
grid_search.fit(X, y)
print("Best Parameters: ", grid_search.best_params_)
print("Best AUC Score: ", grid_search.best_score_)
save_model_params_to_yml(model_name='random_forest', model_params= grid_search.best_params_,yml_path=yml_path)

### 3. XGBoost

In [None]:
XGBoost_model = xgb.XGBClassifier()
param_grid = {
    'max_depth': [20, 40, 60],
    'learning_rate': [0.1, 0.01, 0.001],
    'n_estimators': [100, 200, 300],
    'gamma': [0, 0.1, 0.2]
}
grid_search = GridSearchCV(estimator=XGBoost_model, param_grid=param_grid, scoring='roc_auc', cv=cv_number)
grid_search.fit(X, y)
print("Best Parameters: ", grid_search.best_params_)
print("Best AUC Score: ", grid_search.best_score_)
save_model_params_to_yml(model_name='XGBoost', model_params= grid_search.best_params_, yml_path=yml_path)

#### 4. SVM

In [3]:
SVM = SVC()
param_grid = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf', 'poly'],
    'gamma': ['scale', 'auto', 0.1],
    'degree': [2, 3, 4],
    'coef0': [0.0, 0.5, 1.0]
}
grid_search = GridSearchCV(estimator=SVM, param_grid=param_grid, scoring='roc_auc', cv=cv_number)
grid_search.fit(X, y)
print("Best Parameters: ", grid_search.best_params_)
print("Best AUC Score: ", grid_search.best_score_)
save_model_params_to_yml(model_name='SVM', model_params= grid_search.best_params_, yml_path=yml_path)

### 5. MLP
此模型参数以及超参数较多，经实验获得了一套表现还不错的参数，其网络结构如下图所示

![Image](deep_learn_model_save/MLP_structure.png)

# 模型结果
经过上述两个过程后，已经获得了整理完成的数据和最优化模型的参数，可以运行main.py文件获得模型的运行结果，模型的最佳结果保存在outputs文件夹下，包含部分常见指标。