## [作業重點]
目前你應該已經要很清楚資料集中，資料的型態是什麼樣子囉！包含特徵 (features) 與標籤 (labels)。因此要記得未來不管什麼專案，必須要把資料清理成相同的格式，才能送進模型訓練。
今天的作業開始踏入決策樹這個非常重要的模型，請務必確保你理解模型中每個超參數的意思，並試著調整看看，對最終預測結果的影響為何

## 作業

1. 試著調整 DecisionTreeClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型的結果進行比較

In [1]:
from sklearn import datasets, metrics

# 如果是分類問題，請使用 DecisionTreeClassifier，若為回歸問題，請使用 DecisionTreeRegressor
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split

### DecisionTreeClassifier 在不同的min_samples_split效果

In [8]:

iris = datasets.load_iris()
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)

for split in range(3,6):
    clf = DecisionTreeClassifier(min_samples_split = split)
    # 訓練模型
    clf.fit(x_train, y_train)
    # 預測測試集
    y_pred = clf.predict(x_test)

    acc = metrics.accuracy_score(y_test, y_pred)
    print("Acuuracy: ", acc)
    print("Feature importance: ", clf.feature_importances_)

Acuuracy:  0.9736842105263158
Feature importance:  [0.         0.01796599 0.05992368 0.92211033]
Acuuracy:  0.9736842105263158
Feature importance:  [0.         0.         0.06101997 0.93898003]
Acuuracy:  0.9736842105263158
Feature importance:  [0.         0.         0.06230224 0.93769776]


###  DecisionTreeClassifier 在不同的min_samples_leaf效果

In [23]:
iris = datasets.load_iris()
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)

for leaf in range(1,6):
    clf = DecisionTreeClassifier(min_samples_leaf = leaf, random_state = 0)
    # 訓練模型
    clf.fit(x_train, y_train)
    # 預測測試集
    y_pred = clf.predict(x_test)

    acc = metrics.accuracy_score(y_test, y_pred)
    print("Acuuracy: ", acc)
    print("Feature importance: ", clf.feature_importances_)

Acuuracy:  0.9736842105263158
Feature importance:  [0.01796599 0.         0.05992368 0.92211033]
Acuuracy:  1.0
Feature importance:  [0.01341996 0.         0.06274172 0.92383832]
Acuuracy:  0.9736842105263158
Feature importance:  [0.00882094 0.         0.06348814 0.92769091]
Acuuracy:  0.9736842105263158
Feature importance:  [0.01140681 0.         0.05512761 0.93346558]
Acuuracy:  0.9736842105263158
Feature importance:  [0.0050521 0.        0.0344831 0.9604648]


### boston

In [15]:
boston = datasets.load_boston()
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.25, random_state=4)
print(x_train.shape)
print(x_test.shape)

(379, 13)
(127, 13)


In [24]:
rgr = DecisionTreeRegressor(random_state = 0)
# 訓練模型
rgr.fit(x_train, y_train)
# 預測測試集
y_pred = rgr.predict(x_test)

rmse = metrics.mean_squared_error(y_test, y_pred)
print("mean_squared_error: ", rmse)

mean_squared_error:  0.02631578947368421
mean_squared_error:  0.02631578947368421
mean_squared_error:  0.02631578947368421
mean_squared_error:  0.02631578947368421
mean_squared_error:  0.02631578947368421


### DecisionTreeRegressor 在不同的min_samples_split效果

In [27]:
for split in range(3,8):
    rgr = DecisionTreeRegressor(min_samples_split=split, random_state=0)
    # 訓練模型
    rgr.fit(x_train, y_train)
    # 預測測試集
    y_pred = rgr.predict(x_test)
    rmse = metrics.mean_squared_error(y_test, y_pred)
    print("mean_squared_error: ", rmse)

mean_squared_error:  0.02631578947368421
mean_squared_error:  0.02631578947368421
mean_squared_error:  0.01644736842105263
mean_squared_error:  0.01644736842105263
mean_squared_error:  0.01644736842105263


### DecisionTreeRegressor 在不同的min_samples_leaf效果

In [29]:
for leaf in range(1,8):
    rgr = DecisionTreeRegressor(min_samples_leaf=leaf, random_state=0)
    # 訓練模型
    rgr.fit(x_train, y_train)
    # 預測測試集
    y_pred = rgr.predict(x_test)
    rmse = metrics.mean_squared_error(y_test, y_pred)
    print("mean_squared_error: ", rmse)

mean_squared_error:  0.02631578947368421
mean_squared_error:  0.019736842105263157
mean_squared_error:  0.022295321637426896
mean_squared_error:  0.021842105263157895
mean_squared_error:  0.016496710526315787
mean_squared_error:  0.01585343567251462
mean_squared_error:  0.01573408968850698
