# 分類模型評估

## 郭耀仁

## K-fold 交叉驗證法

- `from sklearn.cross_validation import StratifiedKFold`

```python
import numpy as np
from sklearn.cross_validation import StratifiedKFold

kfold = StratifiedKFold(y=y_train, n_folds=10, random_state = 87)
scores = []
for k, (train, test) in enumerate(kfold):
    pipe_lr.fit(X_train[train], y_train[train])
    score = pipe_lr.score(X_train[test], y_train[test])
    scores.append(score)
    print('Fold: %s, Class dist.: %s, Acc: %.3f' % (k+1, np.bincount(y_train[train]), score))

print('CV accuracy: %.3f +/- %.3f' % (np.mean(scores), np.std(scores)))
```

## 分類模型的評估指標

- 不論是準確率（Accuracy）、精確率（Precision）或召回率（Recall）都可以由混淆矩陣計算

```python
from sklearn.metrics import confusion_matrix

model.fit(X_train, y_train)
y_pred = model.predict(X_test)
conf_mat = confusion_matrix(y_true = y_test, y_pred = y_pred)
print(conf_mat)
```

## 分類模型的評估指標（2）

- 或者直接使用 Scikit-Learn 算評估指標
    - `sklearn.metrics` 的 `accuracy_score()`
    - `sklearn.metrics` 的 `precision_score()`
    - `sklearn.metrics` 的 `recall_score()`

```python
from sklearn.metrics import accuracy_score, precision_score, recall_score

print('Accuracy: %.3f' % accuracy_score(y_true = y_test, y_pred = y_pred))
print('Precision: %.3f' % precision_score(y_true = y_test, y_pred = y_pred))
print('Recall: %.3f' % recall_score(y_true = y_test, y_pred = y_pred))
```

## 二元分類模型的評估指標

- `sklearn.metrics` 的 `roc_auc_score()`

```python
from sklearn.metrics import roc_auc_score

print('ROC AUC: %.3f' % roc_auc_score(y_true = y_test, y_score = y_pred))
```

## 多元分類模型的評估指標

- `sklearn.metrics` 的 `make_scorer()`

```python
from sklearn.metrics import make_scorer

pre_scorer = make_scorer(score_func=precision_score, pos_label=1, greater_is_better=True, average='micro')
```