# BERT Fine-Tuned Notebook
## W266 Final Project
### Game of Thrones Text Classification
### T. P. Goter
### Fall 2019

This notebook is used to perform the baseline, finetuned BERT supervised text classification. The original UDA process utilized a Python script wrapped in a bash shell script. This notebook was generated in order to better show and annotate the process.

## Acknowledgement
Much of this code was leveraged from the open source [UDA](https://github.com/google-research/uda). It has been adapted to the Game of Thrones dataset. 

## Import Data Libraries

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import json
import os
import tensorflow as tf

import uda
from bert import modeling
from utils import proc_data_utils
from utils import raw_data_utils

import yaml
import pprint

from absl import app
from absl import logging


print(tf.__version__)

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])



1.14.0


  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


## Define Some Options
This section replaces passing the input parameters as command line arguments. This section is very important. It controls the entire model. See the dictionary below.

### Task Options:
- **do_train:** Boolean of whether we are training
- **do_eval:** Boolean of whether we are just evaluating

### Training Options:
- **sup_train_data_dir:** Input directory for supervised data. This should be set to "./Data/proc_data/train_##" where the ## is one of the subsets of training data generated from the prepro_ALL.csh script.
- **eval_data_dir:**  The input data dir of the evaluation data. This should be the path to the development data with which we will do hyperparameter tuning. We can change this to the test data directory once we are ready for final evaluation. The dev data path is: "./Data/proc_data/dev"
- **unsup_data_dir:** The input data dir of the unsupervised data. Path for the unsupervised, augmented data. This should be equal to "./Data/proc_data/unsup"
- **bert_config_file:** Absolute path to the json file corresponding to the pre-trained BERT model. For us this is: "./bert_pretrained/bert_base/bert_config.json"
- **vocab_file:** The vocabulary file that the BERT model was trained on. This should be equal to "./bert_pretrained/bert_base/vocab.txt"
- **init_checkpoint:** Initial checkpoint from the pre-trained BERT model. This should be equal to: "./bert_pretrained/bert_base/bert_model.ckpt"
- **task_name:** The name of the task to train. This should be equal to "GoT"
- **model_dir:** The output directory where the model checkpoints will be written. This will be set to "models" followed by a case specific identifier.

### Model configuration
- **use_one_hot_embeddings:** Boolean, default: True, If True, tf.one_hot will be used for embedding lookups, otherwise tf.nn.embedding_lookup will be used. On TPUs, this should be True since it is much faster."
- **max_seq_length":** Integer, default = 128, The maximum total sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded. Note, GoT data was processed to be on-average close to this length to minimize lost data.
- **model_dropout:** Float, default = -1 (i.e., no dropout). Dropout rate for both the attention and the hidden states.

### Training hyper-parameters
- **train_batch_size:** Integer, default = 32. Based on the discussion here https://github.com/google-research/bert#out-of-memory-issues. 32 is probably the largest we can run with 11 GB of RAM while using BERT base with a maximum sequence length of 128.
- **eval_batch_size:** Integer, default = 8, "Base batch size for evaluation."
- **save_checkpoints_num:** Integer, default = 20, Number of checkpoints to save during training.
- **iterations_per_loop:** Integer, default = 200, Number of steps to make in each estimator call.
- **num_train_steps:** Integer, no default, number of training steps

### Optimizer hyperparameters
- **learning_rate:** Float, default = 2e-5, The initial learning rate for Adam Optimizer
- **num_warmup_steps:** Integer, no default, Number of warmup steps
- **clip_norm:** Float, default= 1.0, Gradient clip hyperparameter.

### UDA Options:
- **unsup_ratio:** Integer - ratio between unsupervised batch size and supervised batch size. If zero - dont use
- **aug_ops:** String - what augmentation procedure do you want to run
- **aug_copy:** Integer - how many augmentations per example are to be generated
- **uda_coeff:** Float - default 1 - This is the coefficient on the UDA loss. Basically you can rely more or less on the UDA loss during the supervised training. The UDA paper generally kept this at 1
- **tsa:** String - Annealing schedule to use. Options provided are "" none, linear_schedule, log_schedule, exp_schedule
- **uda_softmax_temp:** Float, default -1, A smaller temperature will accentuate differences in probabilities. Low temps were used in the UDA paper for cases with low numbers of labeled data, after masking out uncertain predictions.
- **uda_confidence_thresh:** Float, default -1, Threshold value above which the consistency loss term from the UDA is used. Basically ensures we are using loss from random guesses.

### TPU and GPU Options:
- **use_tpu:** Boolean - self-explanatory - it affects how the model is run. If we run in colab this could be important. False means use CPU or GPU. We will default to FALSE.
- **tpu_name:** String - address of the tpu
- **gcp_project:** String - project name when using TPU
- **tpu_zone:** String - can be set or detected
- **master:** Address of the TPU master, if applicable



### Defaults

The defaults below should not be changed. Note that a config file will be read in after this in order to update these if desired.

In [2]:
options = {
### Training Options:
'bert_config_file' : "./bert_pretrained/bert_base/bert_config.json",
'vocab_file' : "./bert_pretrained/bert_base/vocab.txt",
'init_checkpoint' : "./bert_pretrained/bert_base/bert_model.ckpt",
'task_name' : "GoT",

### Directory locations:
'sup_train_data_dir': None,
'eval_data_dir': None,
'unsup_data_dir': None,
    
### Model configuration
'use_one_hot_embeddings' : True,
'max_seq_length' : 128,
'model_dropout' : -1 ,

### Training hyper-parameters
'train_batch_size' : 8,
'eval_batch_size' : 8,
'save_checkpoints_num' : 20,
'iterations_per_loop' : 200,

### Optimizer hyperparameters
'learning_rate' : 2e-5,
'clip_norm' : 1.0,

### UDA Options - only important if using UDA
'aug_ops': "",
'aug_copy': -1,
'unsup_ratio' : 0,
'uda_coeff' : 1 ,
'tsa' : "" ,
'uda_softmax_temp' : -1,
'uda_confidence_thresh' : -1,

### TPU and GPU Options:
'use_tpu': False,
'master' : None
}

## Set the Case to Run
This will ensure that different configurations are being controlled and saved separately. Just load in the correct yaml file that specifies all of the parameters.

In [3]:
# Set the config file to load - controls what is run
config = 'base_20'
with open('./config/' + config + '.yml', 'r') as config_in:
    options_from_file = yaml.safe_load(config_in)
    print()
    print("="*50 + "\nCase Specific Options: \n" + "="*50)
    pprint.pprint(options_from_file)

# merge dictionaries    
options.update(options_from_file)

#
print()
print("="*50 + "\nFull Listing of Options: \n" + "="*50)
pprint.pprint(options)


Case Specific Options: 
{'bert_config_file': './bert_pretrained/bert_base/bert_config.json',
 'do_eval': True,
 'do_train': True,
 'eval_data_dir': './Data/proc_data/GoT/dev',
 'init_checkpoint': './bert_pretrained/bert_base/bert_model.ckpt',
 'learning_rate': 3e-05,
 'model_dir': 'model/base_20',
 'num_train_steps': 3000,
 'num_warmup_steps': 300,
 'save_checkpoints_num': 48,
 'sup_train_data_dir': './Data/proc_data/GoT/train_20',
 'task_name': 'GoT',
 'use_tpu': False,
 'vocab_file': './bert_pretrained/bert_base/vocab.txt'}

Full Listing of Options: 
{'aug_copy': -1,
 'aug_ops': '',
 'bert_config_file': './bert_pretrained/bert_base/bert_config.json',
 'clip_norm': 1.0,
 'do_eval': True,
 'do_train': True,
 'eval_batch_size': 8,
 'eval_data_dir': './Data/proc_data/GoT/dev',
 'init_checkpoint': './bert_pretrained/bert_base/bert_model.ckpt',
 'iterations_per_loop': 200,
 'learning_rate': 3e-05,
 'master': None,
 'max_seq_length': 128,
 'model_dir': 'model/base_20',
 'model_dropout': -1

## Setup the Job
This section of the code grabs the right data and reads in the BERT config file. We also dump our configuration options to a JSON file in the model directory.

In [4]:
# Record informational logs
logging.set_verbosity(logging.INFO)

# Specify the task as that controls how the data is read and cleaned
processor = raw_data_utils.get_processor(options['task_name'])

# Read in the labels
label_list = processor.get_labels()

# Check the labels  -  they should be 1 through 5
print(label_list)

# Read the BertConfig File
bert_config = modeling.BertConfig.from_json_file(
      options['bert_config_file'],
      options['model_dropout'])

# Create the directory for the current model
tf.io.gfile.makedirs(options['model_dir'])

# Dump the configuration dictionary to an output json file in the model specific directory
tf.io.write_file(os.path.join(options['model_dir'], "OPTIONS.json"), json.dumps(options))

['1', '2', '3', '4', '5']

INFO:tensorflow:Setting up BERT Config using data from ./bert_pretrained/bert_base/bert_config.json
INFO:tensorflow:Setting up BERT Config using data from {'attention_probs_dropout_prob': 0.1, 'hidden_act': 'gelu', 'hidden_dropout_prob': 0.1, 'hidden_size': 768, 'initializer_range': 0.02, 'intermediate_size': 3072, 'max_position_embeddings': 512, 'num_attention_heads': 12, 'num_hidden_layers': 12, 'type_vocab_size': 2, 'vocab_size': 30522}


<tf.Operation 'WriteFile' type=WriteFile>

## Model Specific Setup

In [5]:
logging.info("warmup steps {}/{}".format(
      options['num_warmup_steps'], options['num_train_steps']))

# Specify where the checkpoints will be saved. This is just integer division between the total number of training steps and the number of checkpoints
save_checkpoints_steps = options['num_train_steps'] // options['save_checkpoints_num']

# Log the checkpoints
logging.info("setting save checkpoints steps to {:d}".format(
      save_checkpoints_steps))

# Update iterations per loop
options['iterations_per_loop'] = min(save_checkpoints_steps,
                                  options['iterations_per_loop'])

INFO:absl:warmup steps 300/3000
INFO:absl:setting save checkpoints steps to 62


## Setup Hardware and Run Configuration

In [6]:
# If you want to run on TPUs, make sure you have the appropriate information in the config file. This will then create a ClusterResolver object with that info
if options['use_tpu'] and options['tpu_name']:
    tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(
        options['tpu_name'], zone=options['tpu_zone'], project=options['gcp_project'])
else:
    tpu_cluster_resolver = None

is_per_host = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2
run_config = tf.estimator.tpu.RunConfig(
      cluster=tpu_cluster_resolver,
      master=options['master'],
     model_dir=options['model_dir'],
#      save_checkpoints_steps=save_checkpoints_steps,
#      keep_checkpoint_max=1000,
      tpu_config=tf.contrib.tpu.TPUConfig(
          iterations_per_loop=options['iterations_per_loop'],
          per_host_input_for_training=is_per_host))


The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



## Create our model
Feed our hyperparameters and model configuration information to the model function builder in the uda module

In [7]:
model_fn = uda.model_fn_builder(
      bert_config=bert_config,
      init_checkpoint=options['init_checkpoint'],
      learning_rate=options['learning_rate'],
      clip_norm=options['clip_norm'],
      num_train_steps=options['num_train_steps'],
      num_warmup_steps=options['num_warmup_steps'],
      use_tpu=options['use_tpu'],
      use_one_hot_embeddings=options['use_one_hot_embeddings'],
      num_labels=len(label_list),
      unsup_ratio=options['unsup_ratio'],
      uda_coeff=options['uda_coeff'],
      tsa=options['tsa'],
      print_feature=False,
      print_structure=False,
  )

# # If TPU is not available, this will fall back to normal Estimator on CPU or GPU.
estimator = tf.estimator.tpu.TPUEstimator(
      use_tpu=options['use_tpu'],
      model_fn=model_fn,
      config=run_config,
     params={"model_dir": options['model_dir']},
      train_batch_size=options['train_batch_size'],
      eval_batch_size=options['eval_batch_size'])

# # Use base Estimator Class instead of tpu derivative
# estimator = tf.estimator.Estimator(
#       model_fn=model_fn,
#       config=run_config,
#       params={"model_dir": options['model_dir']},
# #       train_batch_size=options['train_batch_size'],
# #       eval_batch_size=options['eval_batch_size'])

INFO:tensorflow:Using config: {'_model_dir': 'model/base_20', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f24d931e250>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=62, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_nam

INFO:tensorflow:Using config: {'_model_dir': 'model/base_20', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f24d931e250>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=62, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_nam

INFO:tensorflow:_TPUContext: eval_on_tpu True


INFO:tensorflow:_TPUContext: eval_on_tpu True






## Ready to Train

In [None]:
# Logical check to determine if we are training (vice evaluating)
if options['do_train']:
    logging.info("  >>> sup data dir : {}".format(options['sup_train_data_dir']))
    
    # Are we doing UDA or just simple finetuning?
    if options['unsup_ratio'] > 0:
        logging.info("  >>> unsup data dir : {}".format(
          options['unsup_data_dir']))
    
    # Pass on all of the training sup/unsup options
    train_input_fn = proc_data_utils.training_input_fn_builder(
        options['sup_train_data_dir'],
        options['unsup_data_dir'],
        options['aug_ops'],
        options['aug_copy'],
        options['unsup_ratio'],
        max_seq_len=options['max_seq_length'])

# Logical check to see if we are evaluating against the development set (or test set if you change the eval_data_dir)
if options['do_eval']:
    logging.info("  >>> dev data dir : {}".format(options['eval_data_dir']))
    eval_input_fn = proc_data_utils.evaluation_input_fn_builder(
        options['eval_data_dir'],
        "clas")

    eval_size = processor.get_dev_size()
    eval_steps = int(eval_size / options['eval_batch_size'])

# IF we are training and evaluating
if options['do_train'] and options['do_eval']:
    logging.info("***** Running training & evaluation *****")
    logging.info("  Supervised batch size = {:d}".format(
        options['train_batch_size']))
    logging.info("  Unsupervised batch size = {:d}".format(
        options['train_batch_size'] * options['unsup_ratio']))
    logging.info("  Num steps = {}".format(options['num_train_steps']))
    logging.info("  Base evaluation batch size = {:d}".format(
        options['eval_batch_size']))
    logging.info("  Num steps = {:d}".format(eval_steps))
    
    # Initialize
    best_acc = 0
    
    # Looping over training steps by subset (for each checkpoint)
    for _ in range(0, options['num_train_steps'], save_checkpoints_steps):
        logging.info("*** Running training ***")
        
        estimator.train(
              input_fn=train_input_fn,
              steps=save_checkpoints_steps)
        
        logging.info("*** Running evaluation ***")
        dev_result = estimator.evaluate(input_fn=eval_input_fn, steps=eval_steps)
        logging.info(">> Results:")
        
        # Keep track of the evaluation results
        for key in dev_result.keys():
            logging.info("  {} = {}".format(key, str(dev_result[key])))
            dev_result[key] = dev_result[key].item()
        
        # Update the best accuracy object
        best_acc = max(best_acc, dev_result["eval_classify_accuracy"])
    logging.info("***** Final evaluation result *****")
    logging.info("Best acc: {:.3f}\n\n".format(best_acc))
elif options['do_train']:
    logging.info("***** Running training *****")
    logging.info("  Supervised batch size = {}".format(options['train_batch_size']))
    logging.info("  Unsupervised batch size = {}".format(
                    options['train_batch_size'] * options['unsup_ratio']))
    logging.info("  Num steps = {}".format(options['num_train_steps']))
    estimator.train(input_fn=train_input_fn, max_steps=options['num_train_steps'])
elif options['do_eval']:
    logging.info("***** Running evaluation *****")
    logging.info("  Base evaluation batch size = {}".format(options['eval_batch_size']))
    logging.info("  Num steps = {}".format(eval_steps))
    
    # Load in the checkpoint from training to do the evaluation
    checkpoint_state = tf.train.get_checkpoint_state(options['model_dir'])

    best_acc = 0
    for ckpt_path in checkpoint_state.all_model_checkpoint_paths:
        if not tf.io.gfile.exists(ckpt_path + ".data-00000-of-00001"):
            logging.info(
                "Warning: checkpoint {:s} does not exist".format(ckpt_path))
        continue
        logging.info("Evaluating {:s}".format(ckpt_path))
        dev_result = estimator.evaluate(
          input_fn=eval_input_fn,
          steps=eval_steps,
          checkpoint_path=ckpt_path,)
        logging.info(">> Results:")
        
        # keep track of evaluation metrics
        for key in dev_result.keys():
            logging.info("  {:s} = {:s}".format(key, str(dev_result[key])))
            dev_result[key] = dev_result[key].item()
        
        # update our best accuracy variable
        best_acc = max(best_acc, dev_result["eval_classify_accuracy"])
    logging.info("***** Final evaluation result *****")
    logging.info("Best acc: {:.3f}\n\n".format(best_acc))

INFO:absl:  >>> sup data dir : ./Data/proc_data/GoT/train_20


INFO:tensorflow:looking in ./Data/proc_data/GoT/train_20 for files


INFO:tensorflow:looking in ./Data/proc_data/GoT/train_20 for files


INFO:tensorflow:loading training data from these files: ./Data/proc_data/GoT/train_20/tf_examples.tfrecord.0.0


INFO:tensorflow:loading training data from these files: ./Data/proc_data/GoT/train_20/tf_examples.tfrecord.0.0
INFO:absl:  >>> dev data dir : ./Data/proc_data/GoT/dev


INFO:tensorflow:loading eval clas data from these files: ./Data/proc_data/GoT/dev/tf_examples.tfrecord.0.0


INFO:tensorflow:loading eval clas data from these files: ./Data/proc_data/GoT/dev/tf_examples.tfrecord.0.0
INFO:absl:***** Running training & evaluation *****
INFO:absl:  Supervised batch size = 8
INFO:absl:  Unsupervised batch size = 0
INFO:absl:  Num steps = 3000
INFO:absl:  Base evaluation batch size = 8
INFO:absl:  Num steps = 312
INFO:absl:*** Running training ***


Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


INFO:tensorflow:sup batch size: 8


INFO:tensorflow:sup batch size: 8


INFO:tensorflow:***** Max Sequence Length = 128 *****


INFO:tensorflow:***** Max Sequence Length = 128 *****


<DatasetV1Adapter shapes: (), types: tf.string>
Instructions for updating:
Use `tf.data.Dataset.shuffle(buffer_size, seed)` followed by `tf.data.Dataset.repeat(count)`. Static tf.data optimizations will take care of using the fused implementation.


Instructions for updating:
Use `tf.data.Dataset.shuffle(buffer_size, seed)` followed by `tf.data.Dataset.repeat(count)`. Static tf.data optimizations will take care of using the fused implementation.


Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.


Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.


INFO:tensorflow:sup batch size: 8


INFO:tensorflow:sup batch size: 8


INFO:tensorflow:total sample in a batch: 8


INFO:tensorflow:total sample in a batch: 8


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Running train on CPU


INFO:tensorflow:Running train on CPU






INFO:absl:Creating supervised model


INFO:tensorflow:asserting rank for IteratorGetNext:0


INFO:tensorflow:asserting rank for IteratorGetNext:0














INFO:tensorflow:Looking up embeddings using the embedding_lookup funcion


INFO:tensorflow:Looking up embeddings using the embedding_lookup funcion


INFO:tensorflow:Creating bert embeddings in the embedding_lookup function of bert/modeling.py


INFO:tensorflow:Creating bert embeddings in the embedding_lookup function of bert/modeling.py








INFO:tensorflow:Post-processing the word embeddings.


INFO:tensorflow:Post-processing the word embeddings.


INFO:tensorflow:asserting rank for bert/embeddings/Reshape_1:0


INFO:tensorflow:asserting rank for bert/embeddings/Reshape_1:0


INFO:tensorflow:Adding token type embeddings.


INFO:tensorflow:Adding token type embeddings.


INFO:tensorflow:Adding positional embeddings


INFO:tensorflow:Adding positional embeddings


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:asserting rank for IteratorGetNext:0


INFO:tensorflow:asserting rank for IteratorGetNext:0


INFO:tensorflow:asserting rank for IteratorGetNext:1


INFO:tensorflow:asserting rank for IteratorGetNext:1


INFO:tensorflow:asserting rank for bert/embeddings/dropout/mul_1:0


INFO:tensorflow:asserting rank for bert/embeddings/dropout/mul_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/Reshape_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/Reshape_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/Reshape_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/Reshape_1:0


Instructions for updating:
Use keras.layers.dense instead.


Instructions for updating:
Use keras.layers.dense instead.


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling








INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:asserting rank for bert_1/encoder/layer_0/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_0/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_0/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_0/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:asserting rank for bert_1/encoder/layer_1/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_1/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_1/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_1/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:asserting rank for bert_1/encoder/layer_2/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_2/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_2/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_2/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:asserting rank for bert_1/encoder/layer_3/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_3/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_3/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_3/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:asserting rank for bert_1/encoder/layer_4/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_4/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_4/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_4/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:asserting rank for bert_1/encoder/layer_5/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_5/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_5/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_5/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:asserting rank for bert_1/encoder/layer_6/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_6/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_6/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_6/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:asserting rank for bert_1/encoder/layer_7/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_7/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_7/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_7/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:asserting rank for bert_1/encoder/layer_8/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_8/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_8/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_8/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:asserting rank for bert_1/encoder/layer_9/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_9/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_9/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_9/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:asserting rank for bert_1/encoder/layer_10/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_10/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_10/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:asserting rank for bert_1/encoder/layer_10/output/LayerNorm/batchnorm/add_1:0


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Adding dropout to layer at rate of 0.10


INFO:tensorflow:Normalizing layer - centering and scaling


INFO:tensorflow:Normalizing layer - centering and scaling














Instructions for updating:
Deprecated in favor of operator or tf.math.divide.


Instructions for updating:
Deprecated in favor of operator or tf.math.divide.


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


Instructions for updating:
Use standard file APIs to check for files with this prefix.


Instructions for updating:
Use standard file APIs to check for files with this prefix.


INFO:tensorflow:Restoring parameters from model/base_20/model.ckpt-0


INFO:tensorflow:Restoring parameters from model/base_20/model.ckpt-0


Instructions for updating:
Use standard file utilities to get mtimes.


Instructions for updating:
Use standard file utilities to get mtimes.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Saving checkpoints for 0 into model/base_20/model.ckpt.


INFO:tensorflow:Saving checkpoints for 0 into model/base_20/model.ckpt.
