# ニューラルネットワークの動作確認（part 3）

<a href="http://scikit-learn.org/dev/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn-neural-network-mlpclassifier"><b>sklearn.neural_network.MLPClassifier</b></a> を使用します。

条件を変えてみて動きを確認してみます。

## (1) テストデータ／環境準備

In [1]:
'''
    テスト環境を準備するためのモジュールを使用します。
'''
import sys
import os
learning_dir = os.path.abspath("../../") #<--- donusagi-bot/learning
os.chdir(learning_dir)

if learning_dir not in sys.path:
    sys.path.append(learning_dir)

from prototype.modules import TestTool

### (1-1) テストデータをコピー

In [2]:
'''
    データファイルは、既存の訓練データを別場所にコピーしてから使用します
    テストデータは、csv_file_name で指定した複数件のファイルを使用します。
'''
csv_file_names = [
    'test_daikin_conversation.csv',
    'test_benefitone_conversation.csv',
    'test_septeni_conversation.csv',
    'test_ptna_conversation.csv',
]
temp_path = TestTool.copy_testdata_csv(learning_dir, csv_file_names)

CSV file for test=[/Users/makmorit/GitHub/donusagi-bot/learning/prototype/resources/test_daikin_conversation.csv]
CSV file for test=[/Users/makmorit/GitHub/donusagi-bot/learning/prototype/resources/test_benefitone_conversation.csv]
CSV file for test=[/Users/makmorit/GitHub/donusagi-bot/learning/prototype/resources/test_septeni_conversation.csv]
CSV file for test=[/Users/makmorit/GitHub/donusagi-bot/learning/prototype/resources/test_ptna_conversation.csv]


## (2) テストデータを変えつつ動作確認

上記４本のファイル別に、学習(MLPClassifier.fit)--->評価(Evaluator.evaluate)の流れで実行させ、実行時間計測、およびaccuracy計測を行います。

In [3]:
from time import time
from sklearn.neural_network import MLPClassifier

from learning.core.evaluator import Evaluator
    
def fit_and_cross_validation(path):
    '''
        訓練データのTF-IDFベクターを作成
    '''
    basename = os.path.basename(path)
    print("prepare_tf_idf_vectors: dataset=%s..." % basename)
    t0 = time()

    X, y, vectorizer = TestTool.prepare_tf_idf_vectors(path)
    print("prepare_tf_idf_vectors: done in %0.3fs." % (time() - t0))

    '''
        訓練データ全体を使用して学習実施
        レイヤーはデフォルトの1層
        レイヤーに200件ユニットを生成する設定
    '''
    print("MLPClassifier: fitting...")
    t0 = time()

    cls = MLPClassifier(hidden_layer_sizes=(200,), max_iter=1000,
                        activation='logistic', shuffle=False, random_state=0)
    cls.fit(X, y)
    print("MLPClassifier: done in %0.3fs." % (time() - t0))

    ''' 
        クロスバリデーション（モデル評価フェーズ）を実施
        プロダクションと同様、Evaluator クラスを使用して評価します
        
        Evaluator クラスで使用している cross_val_score 関数は、
        引数の estimator により、
        内部で fit, predict, predict_proba の各関数を実行しています。
    '''
    print("Evaluator: evaluating...")
    t0 = time()

    evaluator = Evaluator()
    evaluator.evaluate(cls, X, y, threshold=0.5)
    print("Evaluator: done in %0.3fs." % (time() - t0))
    
    return (basename, X, y, vectorizer, cls, evaluator)



In [4]:
list_of_classifiers = []
for path in temp_path:
    classifier = fit_and_cross_validation(path)
    list_of_classifiers.append(classifier)

2017/04/04 PM 03:57:20 TrainingMessageFromCsv#__build_learning_training_messages count of learning data: 17443
2017/04/04 PM 03:57:20 TextArray#__init__ start


prepare_tf_idf_vectors: dataset=test_daikin_conversation.csv...


2017/04/04 PM 03:57:29 TextArray#to_vec start
2017/04/04 PM 03:57:29 TextArray#to_vec end


prepare_tf_idf_vectors: done in 9.703s.
MLPClassifier: fitting...


2017/04/04 PM 04:01:24 self.threshold: 0.5


MLPClassifier: done in 234.502s.
Evaluator: evaluating...


2017/04/04 PM 04:04:39 Evaluator#evaluate#elapsed time: 194701.505899 ms
2017/04/04 PM 04:04:39 accuracy: 0.986703347088
2017/04/04 PM 04:04:39 TrainingMessageFromCsv#__build_learning_training_messages count of learning data: 4114
2017/04/04 PM 04:04:39 TextArray#__init__ start


0.986703347088
Evaluator: done in 194.704s.
prepare_tf_idf_vectors: dataset=test_benefitone_conversation.csv...


2017/04/04 PM 04:04:40 TextArray#to_vec start
2017/04/04 PM 04:04:40 TextArray#to_vec end


prepare_tf_idf_vectors: done in 1.731s.
MLPClassifier: fitting...


2017/04/04 PM 04:05:00 self.threshold: 0.5


MLPClassifier: done in 19.183s.
Evaluator: evaluating...


2017/04/04 PM 04:05:17 Evaluator#evaluate#elapsed time: 17512.425900 ms
2017/04/04 PM 04:05:17 accuracy: 0.985436893204
2017/04/04 PM 04:05:17 TrainingMessageFromCsv#__build_learning_training_messages count of learning data: 2156
2017/04/04 PM 04:05:17 TextArray#__init__ start


0.985436893204
Evaluator: done in 17.515s.
prepare_tf_idf_vectors: dataset=test_septeni_conversation.csv...


2017/04/04 PM 04:05:18 TextArray#to_vec start
2017/04/04 PM 04:05:18 TextArray#to_vec end


prepare_tf_idf_vectors: done in 0.798s.
MLPClassifier: fitting...


2017/04/04 PM 04:05:35 self.threshold: 0.5


MLPClassifier: done in 16.922s.
Evaluator: evaluating...


2017/04/04 PM 04:05:46 Evaluator#evaluate#elapsed time: 11655.032158 ms
2017/04/04 PM 04:05:46 accuracy: 0.953703703704
2017/04/04 PM 04:05:47 TrainingMessageFromCsv#__build_learning_training_messages count of learning data: 4559
2017/04/04 PM 04:05:47 TextArray#__init__ start


0.953703703704
Evaluator: done in 11.658s.
prepare_tf_idf_vectors: dataset=test_ptna_conversation.csv...


2017/04/04 PM 04:05:48 TextArray#to_vec start
2017/04/04 PM 04:05:48 TextArray#to_vec end


prepare_tf_idf_vectors: done in 1.961s.
MLPClassifier: fitting...


2017/04/04 PM 04:06:09 self.threshold: 0.5


MLPClassifier: done in 20.208s.
Evaluator: evaluating...


2017/04/04 PM 04:06:27 Evaluator#evaluate#elapsed time: 18745.523930 ms
2017/04/04 PM 04:06:27 accuracy: 0.982471516214


0.982471516214
Evaluator: done in 18.749s.
