# ニューラルネットワークの動作確認（part 2）

<a href="http://scikit-learn.org/dev/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn-neural-network-mlpclassifier"><b>sklearn.neural_network.MLPClassifier</b></a> を使用します。

データセットを変えてみて動きを確認してみます。

## (1) テストデータ／環境準備

In [1]:
'''
    テスト環境を準備するためのモジュールを使用します。
'''
import sys
import os
learning_dir = os.path.abspath("../../") #<--- donusagi-bot/learning
os.chdir(learning_dir)

if learning_dir not in sys.path:
    sys.path.append(learning_dir)

from prototype.modules import TestTool

### (1-1) テストデータをコピー

In [2]:
'''
    データファイルは、既存の訓練データを別場所にコピーしてから使用します
    テストデータは、csv_file_name で指定した複数件のファイルを使用します。
'''
csv_file_names = [
    'test_daikin_conversation.csv',
    'test_benefitone_conversation.csv',
    'test_septeni_conversation.csv',
    'test_ptna_conversation.csv',
]
temp_path = TestTool.copy_testdata_csv(learning_dir, csv_file_names)

CSV file for test=[/Users/makmorit/GitHub/donusagi-bot/learning/prototype/resources/test_daikin_conversation.csv]
CSV file for test=[/Users/makmorit/GitHub/donusagi-bot/learning/prototype/resources/test_benefitone_conversation.csv]
CSV file for test=[/Users/makmorit/GitHub/donusagi-bot/learning/prototype/resources/test_septeni_conversation.csv]
CSV file for test=[/Users/makmorit/GitHub/donusagi-bot/learning/prototype/resources/test_ptna_conversation.csv]


## (2) テストデータを変えつつ動作確認

上記４本のファイル別に、学習(MLPClassifier.fit)--->評価(Evaluator.evaluate)の流れで実行させ、実行時間計測、およびaccuracy計測を行います。

In [3]:
from time import time
from sklearn.neural_network import MLPClassifier

from learning.core.evaluator import Evaluator
    
def fit_and_cross_validation(path):
    '''
        訓練データのTF-IDFベクターを作成
    '''
    basename = os.path.basename(path)
    print("prepare_tf_idf_vectors: dataset=%s..." % basename)
    t0 = time()

    X, y, vectorizer = TestTool.prepare_tf_idf_vectors(path)
    print("prepare_tf_idf_vectors: done in %0.3fs." % (time() - t0))

    '''
        訓練データ全体を使用して学習実施        
    '''
    print("MLPClassifier: fitting...")
    t0 = time()

    cls = MLPClassifier(activation='logistic', shuffle=False, random_state=0)
    cls.fit(X, y)
    print("MLPClassifier: done in %0.3fs." % (time() - t0))

    ''' 
        クロスバリデーション（モデル評価フェーズ）を実施
        プロダクションと同様、Evaluator クラスを使用して評価します
        
        Evaluator クラスで使用している cross_val_score 関数は、
        引数の estimator により、
        内部で fit, predict, predict_proba の各関数を実行しています。
    '''
    print("Evaluator: evaluating...")
    t0 = time()

    evaluator = Evaluator()
    evaluator.evaluate(cls, X, y, threshold=0.5)
    print("Evaluator: done in %0.3fs." % (time() - t0))
    
    return (basename, X, y, vectorizer, cls, evaluator)



In [4]:
list_of_classifiers = []
for path in temp_path:
    classifier = fit_and_cross_validation(path)
    list_of_classifiers.append(classifier)

2017/04/03 PM 10:00:02 TrainingMessageFromCsv#__build_learning_training_messages count of learning data: 17443
2017/04/03 PM 10:00:02 TextArray#__init__ start


prepare_tf_idf_vectors: dataset=test_daikin_conversation.csv...


2017/04/03 PM 10:00:11 TextArray#to_vec start
2017/04/03 PM 10:00:12 TextArray#to_vec end


prepare_tf_idf_vectors: done in 10.330s.
MLPClassifier: fitting...


2017/04/03 PM 10:02:57 self.threshold: 0.5


MLPClassifier: done in 164.826s.
Evaluator: evaluating...


2017/04/03 PM 10:05:06 Evaluator#evaluate#elapsed time: 129307.130098 ms
2017/04/03 PM 10:05:06 accuracy: 0.983493810179
2017/04/03 PM 10:05:06 TrainingMessageFromCsv#__build_learning_training_messages count of learning data: 4114
2017/04/03 PM 10:05:06 TextArray#__init__ start


0.983493810179
Evaluator: done in 129.310s.
prepare_tf_idf_vectors: dataset=test_benefitone_conversation.csv...


2017/04/03 PM 10:05:08 TextArray#to_vec start
2017/04/03 PM 10:05:08 TextArray#to_vec end


prepare_tf_idf_vectors: done in 1.639s.
MLPClassifier: fitting...


2017/04/03 PM 10:05:19 self.threshold: 0.5


MLPClassifier: done in 11.342s.
Evaluator: evaluating...


2017/04/03 PM 10:05:28 Evaluator#evaluate#elapsed time: 8741.152048 ms
2017/04/03 PM 10:05:28 accuracy: 0.979611650485
2017/04/03 PM 10:05:28 TrainingMessageFromCsv#__build_learning_training_messages count of learning data: 2156
2017/04/03 PM 10:05:28 TextArray#__init__ start


0.979611650485
Evaluator: done in 8.744s.
prepare_tf_idf_vectors: dataset=test_septeni_conversation.csv...


2017/04/03 PM 10:05:28 TextArray#to_vec start
2017/04/03 PM 10:05:28 TextArray#to_vec end


prepare_tf_idf_vectors: done in 0.829s.
MLPClassifier: fitting...


2017/04/03 PM 10:05:36 self.threshold: 0.5


MLPClassifier: done in 7.214s.
Evaluator: evaluating...


2017/04/03 PM 10:05:41 Evaluator#evaluate#elapsed time: 5483.077049 ms
2017/04/03 PM 10:05:41 accuracy: 0.838888888889
2017/04/03 PM 10:05:41 TrainingMessageFromCsv#__build_learning_training_messages count of learning data: 4559
2017/04/03 PM 10:05:41 TextArray#__init__ start


0.838888888889
Evaluator: done in 5.486s.
prepare_tf_idf_vectors: dataset=test_ptna_conversation.csv...


2017/04/03 PM 10:05:43 TextArray#to_vec start
2017/04/03 PM 10:05:43 TextArray#to_vec end


prepare_tf_idf_vectors: done in 1.793s.
MLPClassifier: fitting...


2017/04/03 PM 10:05:55 self.threshold: 0.5


MLPClassifier: done in 12.053s.
Evaluator: evaluating...


2017/04/03 PM 10:06:04 Evaluator#evaluate#elapsed time: 9199.355125 ms
2017/04/03 PM 10:06:04 accuracy: 0.975460122699


0.975460122699
Evaluator: done in 9.202s.
