# Ch 3. 監督式學習：分類
## 3-2. 評估分類器的效能
 - [OvR vs OvO](#sec1)  
 - [分類結果的可靠度](#sec2)
***

<a id='sec1'></a>
## OvR vs OvO

In [1]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split

X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
X_train[:2, :]

array([[5.9, 3. , 4.2, 1.5],
       [5.8, 2.6, 4. , 1.2]])

In [2]:
from sklearn.linear_model import LogisticRegression

# 建立邏輯斯迴歸模型(預設為 OvR 策略)
logit = LogisticRegression()
logit.fit(X_train, y_train)
logit.score(X_test, y_test)

0.9473684210526315

In [3]:
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
logit = LogisticRegression()

# OvR策略應用二元分類器到多元分類，初始化需傳入分類的模型
ovr = OneVsRestClassifier(logit)
ovr.fit(X_train, y_train)
print('OvR:', ovr.score(X_test, y_test))

# OvO策略應用二元分類器到多元分類，初始化需傳入分類的模型
ovo = OneVsOneClassifier(logit, n_jobs=-1)
ovo.fit(X_train, y_train)
print('OvO:', ovo.score(X_test, y_test))

OvR: 0.9473684210526315
OvO: 0.9736842105263158


In [4]:
# 邏輯斯迴歸，多元分類方式採多項分布(multinomial)
lr2 = LogisticRegression(multi_class="multinomial", 
                         solver="newton-cg")
lr2.fit(X_train, y_train)
lr2.score(X_test, y_test)

0.9736842105263158

<a id='sec2'></a>
## 分類結果的可靠度

In [5]:
lr = LogisticRegression(multi_class='ovr')
lr.fit(X_train, y_train)

print('Descision function:\n', 
      lr.decision_function(X_test)[:6, :])

Descision function:
 [[ -7.13694157  -0.92795879   2.38161731]
 [ -4.0117161    0.92198955  -3.19335692]
 [  4.2153921   -3.44990648 -12.61342745]
 [ -9.7981228   -0.16248219   3.92061648]
 [  3.5833002   -1.7514933  -11.81133027]
 [ -9.0242711   -1.54802525   4.64420162]]


In [6]:
print('Predicted Prob.:\n', lr.predict_proba(X_test)[:6, :])

Predicted Prob.:
 [[6.62373131e-04 2.36204755e-01 7.63132872e-01]
 [2.30124494e-02 9.25972506e-01 5.10150451e-02]
 [9.69716326e-01 3.02803998e-02 3.27391545e-06]
 [3.85761023e-05 3.19057501e-01 6.80903922e-01]
 [8.68074646e-01 1.31918734e-01 6.62003188e-06]
 [1.03292902e-04 1.50408837e-01 8.49487870e-01]]


In [7]:
iris = datasets.load_iris()
print(lr.predict(X_test)[:6])
print(iris.target_names[lr.predict(X_test)][:6])

[2 1 0 2 0 2]
['virginica' 'versicolor' 'setosa' 'virginica' 'setosa' 'virginica']
