### 取り組み課題：きのこ分類
### Kaggle から mushrooms.csv を取得
### カラム等の意味

| 変数名 | 意味 | 変数タイプ |  
|:------:|:----:|:----------:|  
|cap-shape|傘の形状|名義尺度/カテゴリー変数|
|cap-surface|傘の表面|名義尺度/カテゴリー変数|
|cap-color|傘の色|名義尺度/カテゴリー変数|
|bruises|傷の有無|名義尺度/2値|
|odor|臭い|名義尺度/カテゴリー変数|
|gill-attachment|ひだの柄に対する付き方|名義尺度/カテゴリー変数|
|gill-spacing|ひだの密集度|順序尺度?|
|gill-size|ひだのサイズ|順序尺度?/2値|
|gill-color|ひだの色|名義尺度/カテゴリー変数|
|stalk-shape|茎の形状|名義尺度/2値|
|stalk-root|茎と根っこ|名義尺度/カテゴリー変数|
|stalk-surface-above-ring|リングより上の表面質|名義尺度/カテゴリー変数|
|stalk-surface-below-ring|リングより下の表面質|名義尺度/カテゴリー変数|
|stalk-color-above-ring|リングより上の色|名義尺度/カテゴリー変数|
|stalk-color-below-ring|リングより下の色|名義尺度/カテゴリー変数|
|veil-type|ヴェールのタイプ|名義尺度/2値|
|veil-color|ヴェールの色|名義尺度/カテゴリー変数|
|ring-number|リングの数|順序尺度?|
|ring-type|リングのタイプ|名義尺度/カテゴリー変数|
|spore-print-color|胞子の色|名義尺度/カテゴリー変数|
|population|生息数|順序尺度?|
|habitat|生息場所|名義尺度/カテゴリー変数|

In [271]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression,LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

In [272]:
def replace_category_param_exclude_col(df_data, remove_columns):
    """
    必要なカラムに絞り、カテゴリ変数をダミー変数で置き換える
    """
    # カテゴリー変数を全てダミー変数で置き換える
    # 置換するためにカラムリストを準備する
    col_list = df_data.columns.values.tolist()

    col_list_for_dumy = col_list[:]
    for remove_e in remove_columns:
        col_list_for_dumy.remove(remove_e)

    # カテゴリー変数を全てダミー変数で置き換え、class と結合する
    # drop_first=True: カテゴリ変数の一番目はダミー変数には含めない。
    # 理由：多重共線性を防ぐため。
    df_data_tmp = pd.concat([df_data['class'], pd.get_dummies(df_data[col_list_for_dumy], drop_first=True)], axis=1)
#    display(df_data_tmp)
    return df_data_tmp

In [273]:
def replace_category_param_include_col(df_data, include_columns):
    """
    必要なカラムに絞り、カテゴリ変数をダミー変数で置き換える
    """
    # カテゴリー変数を全てダミー変数で置き換え、class と結合する
    # drop_first=True: カテゴリ変数の一番目はダミー変数には含めない。
    # 理由：多重共線性を防ぐため。
    df_data_tmp = pd.concat([df_data['class'], pd.get_dummies(df_data[include_columns], drop_first=True)], axis=1)
#    display(df_data_tmp)
    return df_data_tmp

In [274]:
def display_score(y_test, y_pred):
    # 評価
    # 結果を確認する
    # 混同行列の作成
    conf_matrix = confusion_matrix(y_test, y_pred)

    # データ合計
    data_sum = np.sum(conf_matrix)

    # TP/TN/FP/FN
    tp = conf_matrix[0][0]
    fn = conf_matrix[0][1]
    fp = conf_matrix[1][0]
    tn = conf_matrix[1][1]

    print("##### 混同行列から算出した評価 #####")
    
    # Accuracy (正答率：正しく "e" と "p" を予測できた割合)
    accuracy = (tp + tn) / data_sum
    print('accuracy:', accuracy)

    # Precision (適合率："e" だと予想したものがどれだけ当たったか)
    precision_e = tp / (tp + fp)
    print('precision(e):', precision_e)

    # Precision (適合率："p" だと予想したものがどれだけ当たったか)
    precision_p = tn / (tn + fn)
    print('precision(p):', precision_p)

    # Recall (再現率："e" を予測できた割合)
    recall_e = tp / (tp + fn)
    print('recall(e):', recall_e)

    # Recall (再現率："p" を予測できた割合)
    recall_p = tn / (fp + tn)
    print('recall(p):', recall_p)

    print("\n##### sklearn ライブラリを用いて算出した評価 #####")
    # sklearn.metrix.accuracy_score で accuracy の計算が可能
    print('accuracy_score:', accuracy_score(y_test, y_pred))

    # sklearn.metrix.classification_report で precision/recall/f値の計算が可能
    print(classification_report(y_test, y_pred))

In [275]:
# Kaggle から取得した mushrooms.csv を読み込む
input_file = "./mushrooms.csv"
df_data = pd.read_csv(input_file)

In [276]:
# 表示するカラム数の設定
pd.set_option('display.max_columns', 100)

# 読み込んだ CSV ファイルのカラムを確認
display(df_data.columns)

# データの中身を確認
display(df_data.head())
display(df_data.tail())

# 統計情報を確認
display(df_data.describe(include='all'))

Index(['class', 'cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor',
       'gill-attachment', 'gill-spacing', 'gill-size', 'gill-color',
       'stalk-shape', 'stalk-root', 'stalk-surface-above-ring',
       'stalk-surface-below-ring', 'stalk-color-above-ring',
       'stalk-color-below-ring', 'veil-type', 'veil-color', 'ring-number',
       'ring-type', 'spore-print-color', 'population', 'habitat'],
      dtype='object')

Unnamed: 0,class,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,stalk-shape,stalk-root,stalk-surface-above-ring,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g


Unnamed: 0,class,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,stalk-shape,stalk-root,stalk-surface-above-ring,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
8119,e,k,s,n,f,n,a,c,b,y,e,?,s,s,o,o,p,o,o,p,b,c,l
8120,e,x,s,n,f,n,a,c,b,y,e,?,s,s,o,o,p,n,o,p,b,v,l
8121,e,f,s,n,f,n,a,c,b,n,e,?,s,s,o,o,p,o,o,p,b,c,l
8122,p,k,y,n,f,y,f,c,n,b,t,?,s,k,w,w,p,w,o,e,w,v,l
8123,e,x,s,n,f,n,a,c,b,y,e,?,s,s,o,o,p,o,o,p,o,c,l


Unnamed: 0,class,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,stalk-shape,stalk-root,stalk-surface-above-ring,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
count,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124,8124
unique,2,6,4,10,2,9,2,2,2,12,2,5,4,4,9,9,1,4,3,5,9,6,7
top,e,x,y,n,f,n,f,c,b,b,t,b,s,s,w,w,p,w,o,p,w,v,d
freq,4208,3656,3244,2284,4748,3528,7914,6812,5612,1728,4608,3776,5176,4936,4464,4384,8124,7924,7488,3968,2388,4040,3148


In [277]:
# class に対するカラム毎の集計(データ数と割合)
for c in df_data.columns:
    if c != 'class':
        display(pd.crosstab(df_data['class'], df_data[c], margins=True))
        display(pd.crosstab(df_data['class'], df_data[c], normalize='columns'))

cap-shape,b,c,f,k,s,x,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
e,404,0,1596,228,32,1948,4208
p,48,4,1556,600,0,1708,3916
All,452,4,3152,828,32,3656,8124


cap-shape,b,c,f,k,s,x
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
e,0.893805,0.0,0.506345,0.275362,1.0,0.532823
p,0.106195,1.0,0.493655,0.724638,0.0,0.467177


cap-surface,f,g,s,y,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
e,1560,0,1144,1504,4208
p,760,4,1412,1740,3916
All,2320,4,2556,3244,8124


cap-surface,f,g,s,y
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
e,0.672414,0.0,0.447574,0.463625
p,0.327586,1.0,0.552426,0.536375


cap-color,b,c,e,g,n,p,r,u,w,y,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
e,48,32,624,1032,1264,56,16,16,720,400,4208
p,120,12,876,808,1020,88,0,0,320,672,3916
All,168,44,1500,1840,2284,144,16,16,1040,1072,8124


cap-color,b,c,e,g,n,p,r,u,w,y
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
e,0.285714,0.727273,0.416,0.56087,0.553415,0.388889,1.0,1.0,0.692308,0.373134
p,0.714286,0.272727,0.584,0.43913,0.446585,0.611111,0.0,0.0,0.307692,0.626866


bruises,f,t,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
e,1456,2752,4208
p,3292,624,3916
All,4748,3376,8124


bruises,f,t
class,Unnamed: 1_level_1,Unnamed: 2_level_1
e,0.306655,0.815166
p,0.693345,0.184834


odor,a,c,f,l,m,n,p,s,y,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
e,400,0,0,400,0,3408,0,0,0,4208
p,0,192,2160,0,36,120,256,576,576,3916
All,400,192,2160,400,36,3528,256,576,576,8124


odor,a,c,f,l,m,n,p,s,y
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
e,1.0,0.0,0.0,1.0,0.0,0.965986,0.0,0.0,0.0
p,0.0,1.0,1.0,0.0,1.0,0.034014,1.0,1.0,1.0


gill-attachment,a,f,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
e,192,4016,4208
p,18,3898,3916
All,210,7914,8124


gill-attachment,a,f
class,Unnamed: 1_level_1,Unnamed: 2_level_1
e,0.914286,0.507455
p,0.085714,0.492545


gill-spacing,c,w,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
e,3008,1200,4208
p,3804,112,3916
All,6812,1312,8124


gill-spacing,c,w
class,Unnamed: 1_level_1,Unnamed: 2_level_1
e,0.441574,0.914634
p,0.558426,0.085366


gill-size,b,n,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
e,3920,288,4208
p,1692,2224,3916
All,5612,2512,8124


gill-size,b,n
class,Unnamed: 1_level_1,Unnamed: 2_level_1
e,0.698503,0.11465
p,0.301497,0.88535


gill-color,b,e,g,h,k,n,o,p,r,u,w,y,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
e,0,96,248,204,344,936,64,852,0,444,956,64,4208
p,1728,0,504,528,64,112,0,640,24,48,246,22,3916
All,1728,96,752,732,408,1048,64,1492,24,492,1202,86,8124


gill-color,b,e,g,h,k,n,o,p,r,u,w,y
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
e,0.0,1.0,0.329787,0.278689,0.843137,0.89313,1.0,0.571046,0.0,0.902439,0.795341,0.744186
p,1.0,0.0,0.670213,0.721311,0.156863,0.10687,0.0,0.428954,1.0,0.097561,0.204659,0.255814


stalk-shape,e,t,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
e,1616,2592,4208
p,1900,2016,3916
All,3516,4608,8124


stalk-shape,e,t
class,Unnamed: 1_level_1,Unnamed: 2_level_1
e,0.459613,0.5625
p,0.540387,0.4375


stalk-root,?,b,c,e,r,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
e,720,1920,512,864,192,4208
p,1760,1856,44,256,0,3916
All,2480,3776,556,1120,192,8124


stalk-root,?,b,c,e,r
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
e,0.290323,0.508475,0.920863,0.771429,1.0
p,0.709677,0.491525,0.079137,0.228571,0.0


stalk-surface-above-ring,f,k,s,y,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
e,408,144,3640,16,4208
p,144,2228,1536,8,3916
All,552,2372,5176,24,8124


stalk-surface-above-ring,f,k,s,y
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
e,0.73913,0.060708,0.703246,0.666667
p,0.26087,0.939292,0.296754,0.333333


stalk-surface-below-ring,f,k,s,y,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
e,456,144,3400,208,4208
p,144,2160,1536,76,3916
All,600,2304,4936,284,8124


stalk-surface-below-ring,f,k,s,y
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
e,0.76,0.0625,0.688817,0.732394
p,0.24,0.9375,0.311183,0.267606


stalk-color-above-ring,b,c,e,g,n,o,p,w,y,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
e,0,0,96,576,16,192,576,2752,0,4208
p,432,36,0,0,432,0,1296,1712,8,3916
All,432,36,96,576,448,192,1872,4464,8,8124


stalk-color-above-ring,b,c,e,g,n,o,p,w,y
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
e,0.0,0.0,1.0,1.0,0.035714,1.0,0.307692,0.616487,0.0
p,1.0,1.0,0.0,0.0,0.964286,0.0,0.692308,0.383513,1.0


stalk-color-below-ring,b,c,e,g,n,o,p,w,y,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
e,0,0,96,576,64,192,576,2704,0,4208
p,432,36,0,0,448,0,1296,1680,24,3916
All,432,36,96,576,512,192,1872,4384,24,8124


stalk-color-below-ring,b,c,e,g,n,o,p,w,y
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
e,0.0,0.0,1.0,1.0,0.125,1.0,0.307692,0.616788,0.0
p,1.0,1.0,0.0,0.0,0.875,0.0,0.692308,0.383212,1.0


veil-type,p,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1
e,4208,4208
p,3916,3916
All,8124,8124


veil-type,p
class,Unnamed: 1_level_1
e,0.517971
p,0.482029


veil-color,n,o,w,y,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
e,96,96,4016,0,4208
p,0,0,3908,8,3916
All,96,96,7924,8,8124


veil-color,n,o,w,y
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
e,1.0,1.0,0.506815,0.0
p,0.0,0.0,0.493185,1.0


ring-number,n,o,t,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
e,0,3680,528,4208
p,36,3808,72,3916
All,36,7488,600,8124


ring-number,n,o,t
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
e,0.0,0.491453,0.88
p,1.0,0.508547,0.12


ring-type,e,f,l,n,p,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
e,1008,48,0,0,3152,4208
p,1768,0,1296,36,816,3916
All,2776,48,1296,36,3968,8124


ring-type,e,f,l,n,p
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
e,0.363112,1.0,0.0,0.0,0.794355
p,0.636888,0.0,1.0,1.0,0.205645


spore-print-color,b,h,k,n,o,r,u,w,y,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
e,48,48,1648,1744,48,0,48,576,48,4208
p,0,1584,224,224,0,72,0,1812,0,3916
All,48,1632,1872,1968,48,72,48,2388,48,8124


spore-print-color,b,h,k,n,o,r,u,w,y
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
e,1.0,0.029412,0.880342,0.886179,1.0,0.0,1.0,0.241206,1.0
p,0.0,0.970588,0.119658,0.113821,0.0,1.0,0.0,0.758794,0.0


population,a,c,n,s,v,y,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
e,384,288,400,880,1192,1064,4208
p,0,52,0,368,2848,648,3916
All,384,340,400,1248,4040,1712,8124


population,a,c,n,s,v,y
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
e,1.0,0.847059,1.0,0.705128,0.29505,0.621495
p,0.0,0.152941,0.0,0.294872,0.70495,0.378505


habitat,d,g,l,m,p,u,w,All
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
e,1880,1408,240,256,136,96,192,4208
p,1268,740,592,36,1008,272,0,3916
All,3148,2148,832,292,1144,368,192,8124


habitat,d,g,l,m,p,u,w
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
e,0.597205,0.655493,0.288462,0.876712,0.118881,0.26087,1.0
p,0.402795,0.344507,0.711538,0.123288,0.881119,0.73913,0.0


In [278]:
# veil-type は "p" という一つの値しかないので、説明変数としては不適切
# veil-type を削除する
df_data.drop(['veil-type'], axis=1, inplace=True)

# その他の値はそのまま使ってみることにする。

In [279]:
# 欠損値の確認
df_data.isnull().sum()

class                       0
cap-shape                   0
cap-surface                 0
cap-color                   0
bruises                     0
odor                        0
gill-attachment             0
gill-spacing                0
gill-size                   0
gill-color                  0
stalk-shape                 0
stalk-root                  0
stalk-surface-above-ring    0
stalk-surface-below-ring    0
stalk-color-above-ring      0
stalk-color-below-ring      0
veil-color                  0
ring-number                 0
ring-type                   0
spore-print-color           0
population                  0
habitat                     0
dtype: int64

In [280]:
# 目的変数の class を数値に置き換える
class_mapping = {label: idx for idx, label in enumerate(np.unique(df_data['class']))}
print("置換条件=", class_mapping)
df_data['class'] = df_data['class'].map(class_mapping)
display(df_data)

置換条件= {'e': 0, 'p': 1}


Unnamed: 0,class,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,stalk-shape,stalk-root,stalk-surface-above-ring,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,1,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,w,o,p,k,s,u
1,0,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,w,o,p,n,n,g
2,0,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,w,o,p,n,n,m
3,1,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,w,o,p,k,s,u
4,0,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,w,o,e,n,a,g
5,0,x,y,y,t,a,f,c,b,n,e,c,s,s,w,w,w,o,p,k,n,g
6,0,b,s,w,t,a,f,c,b,g,e,c,s,s,w,w,w,o,p,k,n,m
7,0,b,y,w,t,l,f,c,b,n,e,c,s,s,w,w,w,o,p,n,s,m
8,1,x,y,w,t,p,f,c,n,p,e,e,s,s,w,w,w,o,p,k,v,g
9,0,b,s,y,t,a,f,c,b,g,e,c,s,s,w,w,w,o,p,k,s,m


In [281]:
# カテゴリー変数を全てダミー変数で置き換える
remove_columns = ['class']
df_data_tmp = replace_category_param(df_data, remove_columns)

Unnamed: 0,class,cap-shape_c,cap-shape_f,cap-shape_k,cap-shape_s,cap-shape_x,cap-surface_g,cap-surface_s,cap-surface_y,cap-color_c,cap-color_e,cap-color_g,cap-color_n,cap-color_p,cap-color_r,cap-color_u,cap-color_w,cap-color_y,bruises_t,odor_c,odor_f,odor_l,odor_m,odor_n,odor_p,odor_s,odor_y,gill-attachment_f,gill-spacing_w,gill-size_n,gill-color_e,gill-color_g,gill-color_h,gill-color_k,gill-color_n,gill-color_o,gill-color_p,gill-color_r,gill-color_u,gill-color_w,gill-color_y,stalk-shape_t,stalk-root_b,stalk-root_c,stalk-root_e,stalk-root_r,stalk-surface-above-ring_k,stalk-surface-above-ring_s,stalk-surface-above-ring_y,stalk-surface-below-ring_k,stalk-surface-below-ring_s,stalk-surface-below-ring_y,stalk-color-above-ring_c,stalk-color-above-ring_e,stalk-color-above-ring_g,stalk-color-above-ring_n,stalk-color-above-ring_o,stalk-color-above-ring_p,stalk-color-above-ring_w,stalk-color-above-ring_y,stalk-color-below-ring_c,stalk-color-below-ring_e,stalk-color-below-ring_g,stalk-color-below-ring_n,stalk-color-below-ring_o,stalk-color-below-ring_p,stalk-color-below-ring_w,stalk-color-below-ring_y,veil-color_o,veil-color_w,veil-color_y,ring-number_o,ring-number_t,ring-type_f,ring-type_l,ring-type_n,ring-type_p,spore-print-color_h,spore-print-color_k,spore-print-color_n,spore-print-color_o,spore-print-color_r,spore-print-color_u,spore-print-color_w,spore-print-color_y,population_c,population_n,population_s,population_v,population_y,habitat_g,habitat_l,habitat_m,habitat_p,habitat_u,habitat_w
0,1,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0
1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0
2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0
3,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0
4,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
5,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0
6,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0
7,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0
8,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0
9,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0


In [282]:
# 訓練データとテストデータに分ける(ここでは3:1くらい)
# Q5 訓練データとテストデータの最適な割合は？
df_train = df_data_tmp[:6000]
df_test = df_data_tmp[6000:-1]

# 説明変数の設定
# 訓練用
X_train = df_train.copy().drop(['class'], axis=1)
display(X_train)
# テスト用
X_test = df_test.copy().drop(['class'], axis=1)
display(X_test)

# 目的変数の設定
# 訓練用
y_train = df_train['class']
display(y_train)
# テスト用
y_test = df_test['class']
display(y_test)

Unnamed: 0,cap-shape_c,cap-shape_f,cap-shape_k,cap-shape_s,cap-shape_x,cap-surface_g,cap-surface_s,cap-surface_y,cap-color_c,cap-color_e,cap-color_g,cap-color_n,cap-color_p,cap-color_r,cap-color_u,cap-color_w,cap-color_y,bruises_t,odor_c,odor_f,odor_l,odor_m,odor_n,odor_p,odor_s,odor_y,gill-attachment_f,gill-spacing_w,gill-size_n,gill-color_e,gill-color_g,gill-color_h,gill-color_k,gill-color_n,gill-color_o,gill-color_p,gill-color_r,gill-color_u,gill-color_w,gill-color_y,stalk-shape_t,stalk-root_b,stalk-root_c,stalk-root_e,stalk-root_r,stalk-surface-above-ring_k,stalk-surface-above-ring_s,stalk-surface-above-ring_y,stalk-surface-below-ring_k,stalk-surface-below-ring_s,stalk-surface-below-ring_y,stalk-color-above-ring_c,stalk-color-above-ring_e,stalk-color-above-ring_g,stalk-color-above-ring_n,stalk-color-above-ring_o,stalk-color-above-ring_p,stalk-color-above-ring_w,stalk-color-above-ring_y,stalk-color-below-ring_c,stalk-color-below-ring_e,stalk-color-below-ring_g,stalk-color-below-ring_n,stalk-color-below-ring_o,stalk-color-below-ring_p,stalk-color-below-ring_w,stalk-color-below-ring_y,veil-color_o,veil-color_w,veil-color_y,ring-number_o,ring-number_t,ring-type_f,ring-type_l,ring-type_n,ring-type_p,spore-print-color_h,spore-print-color_k,spore-print-color_n,spore-print-color_o,spore-print-color_r,spore-print-color_u,spore-print-color_w,spore-print-color_y,population_c,population_n,population_s,population_v,population_y,habitat_g,habitat_l,habitat_m,habitat_p,habitat_u,habitat_w
0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0
1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0
2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0
3,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0
4,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
5,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0
6,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0
7,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0
8,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0
9,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0


Unnamed: 0,cap-shape_c,cap-shape_f,cap-shape_k,cap-shape_s,cap-shape_x,cap-surface_g,cap-surface_s,cap-surface_y,cap-color_c,cap-color_e,cap-color_g,cap-color_n,cap-color_p,cap-color_r,cap-color_u,cap-color_w,cap-color_y,bruises_t,odor_c,odor_f,odor_l,odor_m,odor_n,odor_p,odor_s,odor_y,gill-attachment_f,gill-spacing_w,gill-size_n,gill-color_e,gill-color_g,gill-color_h,gill-color_k,gill-color_n,gill-color_o,gill-color_p,gill-color_r,gill-color_u,gill-color_w,gill-color_y,stalk-shape_t,stalk-root_b,stalk-root_c,stalk-root_e,stalk-root_r,stalk-surface-above-ring_k,stalk-surface-above-ring_s,stalk-surface-above-ring_y,stalk-surface-below-ring_k,stalk-surface-below-ring_s,stalk-surface-below-ring_y,stalk-color-above-ring_c,stalk-color-above-ring_e,stalk-color-above-ring_g,stalk-color-above-ring_n,stalk-color-above-ring_o,stalk-color-above-ring_p,stalk-color-above-ring_w,stalk-color-above-ring_y,stalk-color-below-ring_c,stalk-color-below-ring_e,stalk-color-below-ring_g,stalk-color-below-ring_n,stalk-color-below-ring_o,stalk-color-below-ring_p,stalk-color-below-ring_w,stalk-color-below-ring_y,veil-color_o,veil-color_w,veil-color_y,ring-number_o,ring-number_t,ring-type_f,ring-type_l,ring-type_n,ring-type_p,spore-print-color_h,spore-print-color_k,spore-print-color_n,spore-print-color_o,spore-print-color_r,spore-print-color_u,spore-print-color_w,spore-print-color_y,population_c,population_n,population_s,population_v,population_y,habitat_g,habitat_l,habitat_m,habitat_p,habitat_u,habitat_w
6000,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0
6001,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0
6002,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0
6003,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0
6004,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0
6005,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0
6006,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0
6007,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0
6008,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0
6009,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0


0       1
1       0
2       0
3       1
4       0
5       0
6       0
7       0
8       1
9       0
10      0
11      0
12      0
13      1
14      0
15      0
16      0
17      1
18      1
19      1
20      0
21      1
22      0
23      0
24      0
25      1
26      0
27      0
28      0
29      0
       ..
5970    0
5971    1
5972    1
5973    1
5974    1
5975    1
5976    0
5977    1
5978    0
5979    1
5980    0
5981    1
5982    1
5983    1
5984    1
5985    1
5986    1
5987    1
5988    1
5989    1
5990    1
5991    1
5992    1
5993    1
5994    1
5995    1
5996    0
5997    0
5998    1
5999    1
Name: class, Length: 6000, dtype: int64

6000    1
6001    1
6002    1
6003    1
6004    1
6005    1
6006    1
6007    1
6008    1
6009    1
6010    1
6011    1
6012    1
6013    1
6014    1
6015    1
6016    1
6017    1
6018    1
6019    1
6020    1
6021    1
6022    1
6023    1
6024    1
6025    1
6026    1
6027    1
6028    1
6029    1
       ..
8093    1
8094    0
8095    1
8096    0
8097    1
8098    1
8099    0
8100    0
8101    1
8102    0
8103    0
8104    0
8105    0
8106    0
8107    0
8108    1
8109    0
8110    0
8111    0
8112    0
8113    1
8114    1
8115    0
8116    1
8117    1
8118    1
8119    0
8120    0
8121    0
8122    1
Name: class, Length: 2123, dtype: int64

In [283]:
# ロジスティック回帰分析アルゴリズムを用いる
lr = LogisticRegression()

# 学習させる
lr.fit(X_train, y_train)

# 学習させたモデルで予測する
y_pred = lr.predict(X_test)

In [284]:
# 結果を確認する
display_score(y_test, y_pred)

##### 混同行列から算出した評価 #####
accuracy: 0.8973151201130476
precision(e): 0.9753846153846154
precision(p): 0.8832035595105673
recall(e): 0.6015180265654649
recall(p): 0.9949874686716792

##### sklearn ライブラリを用いて算出した評価 #####
accuracy_score: 0.8973151201130476
             precision    recall  f1-score   support

          0       0.98      0.60      0.74       527
          1       0.88      0.99      0.94      1596

avg / total       0.91      0.90      0.89      2123



In [285]:
# 試行２
# odor のみで分類を試みる
include_columns = ['odor']
df_data_tmp = replace_category_param_include_col(df_data, include_columns)

# 訓練データとテストデータに分ける(ここでは3:1)
# Q5 訓練データとテストデータの最適な割合は？
df_train = df_data_tmp[:6000]
df_test = df_data_tmp[6000:-1]

# 説明変数の設定
# 訓練用
X_train = df_train.copy().drop(['class'], axis=1)
# テスト用
X_test = df_test.copy().drop(['class'], axis=1)

# 目的変数の設定
# 訓練用
y_train = df_train['class']
# テスト用
y_test = df_test['class']

# ロジスティック回帰分析アルゴリズムを用いる
lr = LogisticRegression()

# 学習させる
lr.fit(X_train, y_train)

# 学習させたモデルで予測する
y_pred = lr.predict(X_test)

# 結果を確認する。
display_score(y_test, y_pred)

##### 混同行列から算出した評価 #####
accuracy: 0.9792746113989638
precision(e): 0.9229422066549913
precision(p): 1.0
recall(e): 1.0
recall(p): 0.9724310776942355

##### sklearn ライブラリを用いて算出した評価 #####
accuracy_score: 0.9792746113989638
             precision    recall  f1-score   support

          0       0.92      1.00      0.96       527
          1       1.00      0.97      0.99      1596

avg / total       0.98      0.98      0.98      2123



In [286]:
# 試行３
# odor+spore-print-color で分類を試みる
include_columns = ['odor', 'spore-print-color']
df_data_tmp = replace_category_param_include_col(df_data, include_columns)

# 訓練データとテストデータに分ける(ここでは3:1)
df_train = df_data_tmp[:6000]
df_test = df_data_tmp[6000:-1]

# 説明変数の設定
# 訓練用
X_train = df_train.copy().drop(['class'], axis=1)
# テスト用
X_test = df_test.copy().drop(['class'], axis=1)

# 目的変数の設定
# 訓練用
y_train = df_train['class']
# テスト用
y_test = df_test['class']

# ロジスティック回帰分析アルゴリズムを用いる
lr = LogisticRegression()

# 学習させる
lr.fit(X_train, y_train)

# 学習させたモデルで予測する
y_pred = lr.predict(X_test)

# 結果を確認する。
display_score(y_test, y_pred)

##### 混同行列から算出した評価 #####
accuracy: 0.9962317475270843
precision(e): 0.9850467289719627
precision(p): 1.0
recall(e): 1.0
recall(p): 0.9949874686716792

##### sklearn ライブラリを用いて算出した評価 #####
accuracy_score: 0.9962317475270843
             precision    recall  f1-score   support

          0       0.99      1.00      0.99       527
          1       1.00      0.99      1.00      1596

avg / total       1.00      1.00      1.00      2123



### 疑問
- pd.crosstab でデータ数と割合を一つのグラフ or 表で表示する方法はないか？
- 使えそうなカラムの判断基準はどう決めるのか？
- 順序尺度と思われる変数の扱いは本来どうすべきなのか。(アルゴリズムによる？)
- odor のみでもかなり分類できそうだが、データ数が少ない値が問題になる？
- 訓練データとテストデータの最適な割合は？