## [作業重點]
確保你了解隨機森林模型中每個超參數的意義，並觀察調整超參數對結果的影響  

## 作業

1. 試著調整 RandomForestClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型與決策樹的結果進行比較

In [1]:
from sklearn import datasets, metrics
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split

In [2]:
# 讀取鳶尾花資料集
iris = datasets.load_iris()

# 分割訓練集與測試集
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)

# 建立隨機森林模型
clf = RandomForestClassifier(n_estimators=10, criterion='entropy', max_depth=5, min_samples_leaf=2)

# 訓練模型
clf.fit(x_train, y_train)

# 預測測試集結果
y_pred = clf.predict(x_test)

# 計算準確率分數
acc = metrics.accuracy_score(y_test, y_pred)
print('Accuracy：', acc)

print('\n' + '-'*40 +'\n')

# 取得個特徵欄位的重要性
print(iris.feature_names)
print(clf.feature_importances_)

Accuracy： 0.9736842105263158

----------------------------------------

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
[0.06635509 0.01112639 0.4112203  0.51129822]


### **調整參數後，準確率分數沒有差異**

In [6]:
# 讀取紅酒資料集
wine = datasets.load_wine()

# 分割訓練集與測試集
x1_train, x1_test, y1_train, y1_test = train_test_split(wine.data, wine.target, test_size=0.25, random_state=4)

# 建立隨機森林迴歸模型
regr = RandomForestRegressor(n_estimators=10, criterion='mse', max_depth=5, min_samples_leaf=2)

# 訓練模型
regr.fit(x1_train, y1_train)

# 預測測試集結果
y1_pred = regr.predict(x1_test)



In [7]:
from sklearn.metrics import mean_squared_error

# 使用 MSE 計算預測值與實際值的差距
print('Mean Square Error：%.2f' %mean_squared_error(y1_test, y1_pred))

print('\n' + '-'*40 +'\n')

# 取得個特徵欄位的重要性
print(wine.feature_names)
print(regr.feature_importances_)

Mean Square Error：0.04

----------------------------------------

['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']
[0.05853428 0.00039127 0.0008126  0.00053688 0.01463738 0.
 0.31072241 0.         0.         0.11344023 0.00184335 0.26318097
 0.23590064]


In [8]:
# 讀取紅酒資料集
wine = datasets.load_wine()

# 分割訓練集與測試集
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.25, random_state=4)

# 建立隨機森林迴歸模型
clf = RandomForestClassifier(n_estimators=10, criterion='entropy', max_depth=5, min_samples_leaf=2)

# 訓練模型
clf.fit(x_train, y_train)

# 預測測試集結果
y_pred = clf.predict(x_test)

# 計算準確率分數
acc = metrics.accuracy_score(y_test, y_pred)
print('Accuracy：', acc)

print('\n' + '-'*40 +'\n')

# 取得個特徵欄位的重要性
print(wine.feature_names)
print(clf.feature_importances_)

print('\n' + '-'*40 +'\n')

from sklearn.metrics import mean_squared_error
# 使用 MSE 計算預測值與實際值的差距
print('Mean Square Error：%.2f' %mean_squared_error(y_test, y_pred))


Accuracy： 0.9777777777777777

----------------------------------------

['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']
[0.11391036 0.05946835 0.01234337 0.04033198 0.0202268  0.06184782
 0.13412136 0.00121739 0.02936317 0.13319502 0.02794775 0.11721985
 0.24880676]

----------------------------------------

Mean Square Error：0.02


In [9]:
from sklearn.tree import DecisionTreeClassifier

# 讀取內建紅酒資料集
wine = datasets.load_wine()

# 分割訓練集、測試集
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.25, random_state=4)

# 建立分類模型
clf = DecisionTreeClassifier(criterion='gini', random_state=0)

# 訓練模型，將訓練集給模型訓練
clf.fit(x_train, y_train)

# 預測測試集結果
y_pred = clf.predict(x_test)

# 顯示特徵名稱
print(wine.feature_names)
print('\n' + '-'*40 +'\n')
# 特徵魚的重要性，數值越大越重要
print('Feature importance：', clf.feature_importances_)
print('\n' + '-'*40 +'\n')
# 計算準確率
acc = metrics.accuracy_score(y_test, y_pred)
print('Accuracy：', acc)
print('\n' + '-'*40 +'\n')
# 使用 MSE 計算預測值與實際值的差距
print('Mean Square Error：%.2f' %mean_squared_error(y_test, y_pred))


['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']

----------------------------------------

Feature importance： [0.01364138 0.         0.         0.         0.         0.06142526
 0.08158611 0.         0.         0.41184168 0.         0.04285558
 0.38865   ]

----------------------------------------

Accuracy： 0.9111111111111111

----------------------------------------

Mean Square Error：0.22


### **決策樹的效果比隨機森林差一點點，對於隨機森林來說，迴歸模型的均方差數值較分類模型來的大。**    


### **以紅酒為例，效果最好的是隨機森林的分類模型，其次是決策樹的分類模型，最後是隨機森林的迴歸模型。**