### 三、模型训练、测试

#### **模型训练**
模型训练代码为 ```bert_lesson_model.ipynb```

**1. 设置模型输入**  
如代码，每条数据数据有两个句子输入，每个句子会被转化为对应的格式

```python
the_feature = {
        "input_ids_a":tf.placeholder(tf.int32, [None,cfg.max_seq_length], name='input_ids_a'),
        "input_ids_b":tf.placeholder(tf.int32, [None,cfg.max_seq_length], name='input_ids_b'),
        "input_mask_a":tf.placeholder(tf.int32, [None,cfg.max_seq_length], name='input_mask_a'),
        "input_mask_b":tf.placeholder(tf.int32, [None,cfg.max_seq_length], name='input_mask_b'),
        "seg_ids_a":tf.placeholder(tf.int32, [None,cfg.max_seq_length], name='seg_ids_a'),
        "seg_ids_b":tf.placeholder(tf.int32, [None,cfg.max_seq_length], name='seg_ids_b')
}
```

**2. 设置损失函数**  
上一节中提到的损失函数

```python
loss_op = net(the_feature,256,True)
```

**3. 设置优化方法**

此处采用Adam（Adaptive Moment Estimation）优化方法

```python 
train_op = tf.train.AdamOptimizer(cfg.learning_rate).minimize(loss_op)
```

**4. 从BERT预训练模型中导入网络结构和参数**

cfg.init_checkpoint中存放着BERT预训练模型；tvars为需要训练的tensor  
```python 
(assignment_map, initialized_variable_names) = modeling.get_assignment_map_from_checkpoint(tvars, cfg.init_checkpoint)
tf.train.init_from_checkpoint(cfg.init_checkpoint, assignment_map)
```

**5. 开始训练**  
（1）函数```get_data_batch_auto()```获取一个batch的数据，feed进模型中  
（2）根据前文设置的损失函数和优化方法进行训练：```    _, loss_value = sess.run([train_op,loss_op],feed)```  
（3）若干步骤后保存一次模型：```saver.save(sess, os.path.join(cfg.output_dir, model_name + '.ckpt')) ```  
注，ckpt格式的文件说明：  
&emsp;&emsp;checkpoint文件：b包含最新的和所有的文件地址  
&emsp;&emsp;.data文件：包含训练变量的文件  
&emsp;&emsp;.index文件：描述variable中key和value的对应关系  
&emsp;&emsp;.meta文件：保存完整的网络图结构  
使用这种方法保存模型时会保存成上面这四个文件，重新加载模型时通常只会用到.meta文件恢复图结构然后用.data文件把各个变量的值再加进去。  


```python 
# 循环次数视loss值以及模型测试效果进行增减
for idx in range(3000):
    # 获取一个batch的数据
    temp_features = get_data_batch_auto(left_array_pos,right_array_pos,left_array_neg,right_array_neg,batch_num,tokenizer)
    feed = {
            the_feature["input_ids_a"]:temp_features["input_ids_a"],
            the_feature["input_ids_b"]:temp_features["input_ids_b"],
            the_feature["input_mask_a"]:temp_features["input_mask_a"],
            the_feature["input_mask_b"]:temp_features["input_mask_b"],
            the_feature["seg_ids_a"]:temp_features["seg_ids_a"],
            the_feature["seg_ids_b"]:temp_features["seg_ids_b"]
            }
    
    # 开启训练
    _, loss_value = sess.run([train_op,loss_op],feed)
    
    # 打印loss
    print (idx,loss_value)
    if (idx % 100 == 0 and idx != 0):
        saver.save(sess, os.path.join(cfg.output_dir, model_name + '.ckpt'))
    if loss_value < 0.0001:
        saver.save(sess, os.path.join(cfg.output_dir, model_name + '.ckpt'))
        break
        
    # 刷新本cell的输出
    ipd.clear_output(wait=True)


```

#### **保存pb格式模型**  
保存pb模型的代码在save_and_eval.ipynb  
注：.pb文件里面保存了图结构+数据，加载模型时只需要这一个文件就好  

函数：```freeze_graph()```  
  
参数：```ckpt:ckpt格式模型的位置```  
&emsp;&emsp;&emsp;```output_graph:pb模型的存储位置及名字```  
  
返回：```无```  


### **执行实验**

#### 导入相关模块

In [9]:
import tensorflow as tf
import os
import csv
import numpy as np
from random import shuffle, sample
from data_input import load_raw_data, get_data_batch_auto  # 数据预处理
import tokenization
from config import Config as cfg  
from model_for_chatbot import net   # 模型
import modeling
import IPython.display as ipd

#### 设置本实验演示过程中的临时参数

In [None]:
batch_num = cfg.train_batch_size
model_name = cfg.model_name
output_dir = './exercise/output'
release_dir = './exercise/release'

# 设置GPU显存可灵活增长
tf.reset_default_graph()
config = tf.ConfigProto()  
config.gpu_options.allow_growth = True  
sess = tf.Session(config=config) 
model_name = cfg.model_name

#### 导入数据

In [11]:

left_array_pos, right_array_pos = load_raw_data(model_name+'_pos.tsv')
#neg file is not necessary, default is blank list
left_array_neg = []
right_array_neg = []
left_array_neg, right_array_neg = load_raw_data(model_name+'_neg.tsv')

No files:./raw_data/chatbot_neg.tsv


#### 设置模型输入

In [12]:
the_feature = {
        "input_ids_a":tf.placeholder(tf.int32, [None,cfg.max_seq_length], name='input_ids_a'),
        "input_ids_b":tf.placeholder(tf.int32, [None,cfg.max_seq_length], name='input_ids_b'),
        "input_mask_a":tf.placeholder(tf.int32, [None,cfg.max_seq_length], name='input_mask_a'),
        "input_mask_b":tf.placeholder(tf.int32, [None,cfg.max_seq_length], name='input_mask_b'),
        "seg_ids_a":tf.placeholder(tf.int32, [None,cfg.max_seq_length], name='seg_ids_a'),
        "seg_ids_b":tf.placeholder(tf.int32, [None,cfg.max_seq_length], name='seg_ids_b')
}

#### 设置模型结构，优化方法，以及损失函数

In [13]:
loss_op = net(the_feature,256,False)    # 设置模型损失函数

train_op = tf.train.AdamOptimizer(cfg.learning_rate).minimize(loss_op)#,global_step=global_step)   # 设置模型优化方法

tvars = tf.trainable_variables()

(assignment_map, initialized_variable_names) = modeling.get_assignment_map_from_checkpoint(tvars, cfg.init_checkpoint)
tf.train.init_from_checkpoint(cfg.init_checkpoint, assignment_map)  # 从预训练模型中初始化模型结构和模型参数
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()




The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
Use keras.layers.dense instead.

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


NameError: name 'sess' is not defined

#### 执行模型训练步骤（演示）

In [16]:
tokenizer = tokenization.FullTokenizer(vocab_file='./model/vocab.txt', do_lower_case=True)

for idx in range(10):
    # 获得一个batch的数据
    temp_features = get_data_batch_auto(left_array_pos,right_array_pos,left_array_neg,right_array_neg,batch_num,tokenizer)
    
    # 将数据喂到设置好的模型输入place_hold中
    feed = {
            the_feature["input_ids_a"]:temp_features["input_ids_a"],
            the_feature["input_ids_b"]:temp_features["input_ids_b"],
            the_feature["input_mask_a"]:temp_features["input_mask_a"],
            the_feature["input_mask_b"]:temp_features["input_mask_b"],
            the_feature["seg_ids_a"]:temp_features["seg_ids_a"],
            the_feature["seg_ids_b"]:temp_features["seg_ids_b"]
            }
    
    # 执行训练
    _, loss_value = sess.run([train_op,loss_op],feed)
    print (idx,loss_value)
    if (idx % 1000 == 0 and idx != 0):        
        saver.save(sess, os.path.join(output_dir, model_name + '.ckpt'))
    if loss_value < 0.00001:
        saver.save(sess, os.path.join(output_dir, model_name + '.ckpt'))
        break
    ipd.clear_output(wait=True)

# 训练结束， 保存ckpt模型文件
saver.save(sess, os.path.join(output_dir, model_name + '.ckpt'))

NameError: name 'sess' is not defined

#### 

#### 将模型保存为pb格式以便调用

In [None]:
import tensorflow as tf
from tensorflow.python.framework import graph_util
from tensorflow.python.platform import gfile
tf.reset_default_graph() 
def freeze_graph(ckpt, output_graph):
    output_node_names = 'MatMul'
    saver = tf.compat.v1.train.import_meta_graph(ckpt+'.meta', clear_devices=True)
    graph = tf.get_default_graph()
    input_graph_def = graph.as_graph_def()
 
    with tf.Session() as sess:
        saver.restore(sess, ckpt)
        output_graph_def = graph_util.convert_variables_to_constants(
            sess=sess,
            input_graph_def=input_graph_def,
            output_node_names=output_node_names.split(',')
        )
        with tf.gfile.GFile(output_graph, 'wb') as fw:
            fw.write(output_graph_def.SerializeToString())
        print ('{} ops in the final graph.'.format(len(output_graph_def.node)))

freeze_graph(output_dir + '/' + model_name + '.ckpt', release_dir +'/' + model_name + '.pb')
