## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [1]:
from sklearn import datasets, metrics
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split

In [2]:
iris = datasets.load_iris()
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)
print(iris.feature_names)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


In [3]:
for n in [2, 5, 10, 20, 50]:
    print('n_estimators =', n)
    clf = RandomForestClassifier(n_estimators=n, max_depth=4)
    clf.fit(x_train, y_train)
    y_pred = clf.predict(x_test)
    acc = metrics.accuracy_score(y_test, y_pred)
    print("Accuracy: ", acc)
    print("Feature importance: ", clf.feature_importances_)
    print()

n_estimators = 2
Accuracy:  0.9473684210526315
Feature importance:  [0.14502146 0.09488909 0.01741135 0.7426781 ]

n_estimators = 5
Accuracy:  0.9736842105263158
Feature importance:  [0.16688724 0.02263126 0.40251066 0.40797084]

n_estimators = 10
Accuracy:  0.9736842105263158
Feature importance:  [0.09903656 0.02150131 0.45199309 0.42746905]

n_estimators = 20
Accuracy:  0.9736842105263158
Feature importance:  [0.05173615 0.02808756 0.43628541 0.48389088]

n_estimators = 50
Accuracy:  0.9736842105263158
Feature importance:  [0.06414237 0.01521086 0.47883561 0.44181116]



In [4]:
for d in [1, 2, 5, 10, 20]:
    print('max_depth =', d)
    clf = RandomForestClassifier(n_estimators=20, max_depth=d)
    clf.fit(x_train, y_train)
    y_pred = clf.predict(x_test)
    acc = metrics.accuracy_score(y_test, y_pred)
    print("Accuracy: ", acc)
    print("Feature importance: ", clf.feature_importances_)
    print()

max_depth = 1
Accuracy:  0.9473684210526315
Feature importance:  [0.2 0.  0.4 0.4]

max_depth = 2
Accuracy:  0.9736842105263158
Feature importance:  [0.11230579 0.02812713 0.41862691 0.44094017]

max_depth = 5
Accuracy:  0.9736842105263158
Feature importance:  [0.17121485 0.04180279 0.45127513 0.33570723]

max_depth = 10
Accuracy:  0.9736842105263158
Feature importance:  [0.05870944 0.01181273 0.60839308 0.32108475]

max_depth = 20
Accuracy:  0.9736842105263158
Feature importance:  [0.09751501 0.03478181 0.53029101 0.33741217]



In [5]:
boston = datasets.load_boston()
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.25, random_state=4)
print(boston.feature_names)

['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']


In [6]:
for n in [2, 5, 10, 20, 50]:
    print('n_estimators =', n)
    reg = RandomForestRegressor(n_estimators=n, max_depth=4)
    reg.fit(x_train, y_train)
    y_pred = reg.predict(x_test)
    mse = metrics.mean_squared_error(y_test, y_pred)
    print("MSE: ", mse)
    print("Feature importance: ", reg.feature_importances_)
    print()

n_estimators = 2
MSE:  20.305594451388735
Feature importance:  [0.04149678 0.         0.         0.00509707 0.00570535 0.43479009
 0.00401518 0.01656708 0.         0.         0.02023916 0.
 0.47208929]

n_estimators = 5
MSE:  25.6776279200314
Feature importance:  [3.40305820e-02 0.00000000e+00 0.00000000e+00 1.43976468e-03
 1.46145738e-02 6.15506503e-01 1.04167168e-02 4.08111191e-02
 5.55187135e-04 8.58352410e-03 2.22871683e-02 1.36450503e-03
 2.50390356e-01]

n_estimators = 10
MSE:  19.12382749820557
Feature importance:  [0.0412714  0.         0.00568763 0.00315764 0.00526155 0.54298719
 0.00426109 0.04618436 0.00334847 0.00942845 0.00321039 0.00723725
 0.32796459]

n_estimators = 20
MSE:  17.308142755270364
Feature importance:  [0.04245871 0.         0.00099583 0.00164287 0.02164703 0.49537287
 0.00335432 0.04682527 0.00173868 0.01051351 0.01899312 0.00170271
 0.35475509]

n_estimators = 50
MSE:  18.699245494699884
Feature importance:  [0.04854767 0.00054693 0.00420166 0.0030371  0.0

In [7]:
for d in [1, 2, 5, 10, 20]:
    print('max_depth =', d)
    reg = RandomForestRegressor(n_estimators=20, max_depth=d)
    reg.fit(x_train, y_train)
    y_pred = reg.predict(x_test)
    mse = metrics.mean_squared_error(y_test, y_pred)
    print("MSE: ", mse)
    print("Feature importance: ", reg.feature_importances_)
    print()

max_depth = 1
MSE:  42.09351648109223
Feature importance:  [0.   0.   0.   0.   0.   0.55 0.   0.   0.   0.   0.   0.   0.45]

max_depth = 2
MSE:  28.45116762048483
Feature importance:  [0.00587395 0.         0.         0.         0.         0.49662136
 0.         0.         0.         0.         0.         0.
 0.49750469]

max_depth = 5
MSE:  15.211215186461859
Feature importance:  [5.56006406e-02 4.30039546e-04 5.42834938e-03 5.61720999e-04
 2.14266451e-02 4.88431674e-01 6.71384259e-03 5.17878715e-02
 2.78520275e-03 7.11515267e-03 1.62693980e-02 7.88779579e-03
 3.35561667e-01]

max_depth = 10
MSE:  15.162184904635023
Feature importance:  [0.07019364 0.00086256 0.00627345 0.00081137 0.01268776 0.40525597
 0.01431991 0.04147062 0.00277051 0.01408809 0.01771434 0.01195332
 0.40159846]

max_depth = 20
MSE:  14.983869283464566
Feature importance:  [5.80961598e-02 8.07442249e-04 8.29591095e-03 3.54573765e-04
 2.09479195e-02 3.98877438e-01 1.20811729e-02 6.33834350e-02
 2.82885892e-03 1.940