## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [1]:
from sklearn import datasets, metrics
boston=datasets.load_boston()
wine=datasets.load_wine()
digits=datasets.load_digits()

In [25]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split, KFold, GridSearchCV

# boston

In [22]:
x_train,x_test,y_train,y_test=train_test_split(boston.data,boston.target,test_size=0.2,random_state=4)

In [23]:
clf=GradientBoostingRegressor()
clf.fit(x_train,y_train)
y_pred=clf.predict(x_test)
mse=metrics.mean_squared_error(y_test,y_pred)
print(f"MSE:{mse:.4f}")

MSE:10.8713


In [27]:
# 設定要訓練的超參數組合
n_estimators = [100, 200, 300]
max_depth = [1, 3, 5]
param_grid = dict(n_estimators=n_estimators, max_depth=max_depth)

## 建立搜尋物件，放入模型及參數組合字典 (n_jobs=-1 會使用全部 cpu 平行運算)
grid_search = GridSearchCV(clf, param_grid, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)

# 開始搜尋最佳參數
grid_result = grid_search.fit(x_train, y_train)

# 預設會跑 3-fold cross-validadtion，總共 9 種參數組合，總共要 train 27 次模型

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Fitting 3 folds for each of 9 candidates, totalling 27 fits


[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    1.4s finished


In [28]:
# 印出最佳結果與最佳參數
print("Best Accuracy: %f using %s" % (
    grid_result.best_score_, grid_result.best_params_))

Best Accuracy: -10.612238 using {'max_depth': 3, 'n_estimators': 200}


In [29]:
grid_result.best_params_

{'max_depth': 3, 'n_estimators': 200}

In [31]:
clf_best_params=GradientBoostingRegressor(max_depth=grid_result.best_params_["max_depth"],n_estimators=grid_result.best_params_["n_estimators"])
clf_best_params.fit(x_train,y_train)
y_pred=clf_best_params.predict(x_test)
mse=metrics.mean_squared_error(y_test,y_pred)
print(f"MSE:{mse:.4f}")

MSE:10.2847


# wine

In [38]:
x_train,x_test,y_train,y_test=train_test_split(wine.data,wine.target,test_size=0.2,random_state=4)

In [39]:
clf=GradientBoostingClassifier()
clf.fit(x_train,y_train)
y_pred=clf.predict(x_test)
acc=metrics.accuracy_score(y_test,y_pred)
print(f"Accuracy:{acc:.4f}")

Accuracy:1.0000


In [40]:
# 設定要訓練的超參數組合
n_estimators = [100, 200, 300]
max_depth = [1, 3, 5]
param_grid = dict(n_estimators=n_estimators, max_depth=max_depth)

## 建立搜尋物件，放入模型及參數組合字典 (n_jobs=-1 會使用全部 cpu 平行運算)
grid_search = GridSearchCV(clf, param_grid, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)

# 開始搜尋最佳參數
grid_result = grid_search.fit(x_train, y_train)

# 預設會跑 3-fold cross-validadtion，總共 9 種參數組合，總共要 train 27 次模型

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Fitting 3 folds for each of 9 candidates, totalling 27 fits


[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    4.3s finished


In [42]:
# 印出最佳結果與最佳參數
print("Best Accuracy: %f using %s" % (
    grid_result.best_score_, grid_result.best_params_))

Best Accuracy: -0.035211 using {'max_depth': 1, 'n_estimators': 200}


In [43]:
grid_result.best_params_

{'max_depth': 1, 'n_estimators': 200}

In [45]:
clf_best_params=GradientBoostingClassifier(max_depth=grid_result.best_params_["max_depth"],n_estimators=grid_result.best_params_["n_estimators"])
clf_best_params.fit(x_train,y_train)
y_pred=clf_best_params.predict(x_test)
acc=metrics.accuracy_score(y_test,y_pred)
print(f"Accuracy:{acc:.4f}")

Accuracy:0.9722


# digits

In [20]:
x_train,x_test,y_train,y_test=train_test_split(digits.data,digits.target,test_size=0.2,random_state=4)

In [21]:
clf=GradientBoostingRegressor()
clf.fit(x_train,y_train)
y_pred=clf.predict(x_test)
mse=metrics.mean_squared_error(y_test,y_pred)
print(f"MSE:{mse:.4f}")

MSE:1.4597


In [46]:
# 設定要訓練的超參數組合
n_estimators = [100, 200, 300]
max_depth = [1, 3, 5]
param_grid = dict(n_estimators=n_estimators, max_depth=max_depth)

## 建立搜尋物件，放入模型及參數組合字典 (n_jobs=-1 會使用全部 cpu 平行運算)
grid_search = GridSearchCV(clf, param_grid, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)

# 開始搜尋最佳參數
grid_result = grid_search.fit(x_train, y_train)

# 預設會跑 3-fold cross-validadtion，總共 9 種參數組合，總共要 train 27 次模型

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Fitting 3 folds for each of 9 candidates, totalling 27 fits


[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    4.4s finished


In [47]:
# 印出最佳結果與最佳參數
print("Best Accuracy: %f using %s" % (
    grid_result.best_score_, grid_result.best_params_))

Best Accuracy: -0.035211 using {'max_depth': 1, 'n_estimators': 200}


In [48]:
grid_result.best_params_

{'max_depth': 1, 'n_estimators': 200}

In [None]:
clf_best_params=GradientBoostingClassifier(max_depth=grid_result.best_params_["max_depth"],n_estimators=grid_result.best_params_["n_estimators"])
clf_best_params.fit(x_train,y_train)
y_pred=clf_best_params.predict(x_test)
mse=metrics.Mean_Squar(y_test,y_pred)
print(f"Accuracy:{mse:.4f}")