#                                                                                 BERT

BERT, or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks.

Academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: https://arxiv.org/abs/1810.04805.

Github account for the paper can be found here: https://github.com/google-research/bert

BERT is a method of pre-training language representations, meaning training of a general-purpose "language understanding" model on a large text corpus (like Wikipedia), and then using that model for downstream NLP tasks (like question answering). BERT outperforms previous methods because it is the first *unsupervised, deeply bidirectional *system for pre-training NLP.

![](https://www.lyrn.ai/wp-content/uploads/2018/11/transformer.png)


# Downloading all necessary dependencies
You will have to turn on internet for that.

This code is slightly modefied version of this colab notebook https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb

In [2]:
import pandas as pd
import os
import numpy as np
import zipfile
from matplotlib import pyplot as plt
%matplotlib inline
import sys
import datetime

In [17]:
#downloading weights and cofiguration file for the model
!wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip

--2019-03-16 11:24:14--  https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.193.128, 2a00:1450:400b:c01::80
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.193.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 407727028 (389M) [application/zip]
Saving to: ‘uncased_L-12_H-768_A-12.zip’


2019-03-16 11:24:17 (177 MB/s) - ‘uncased_L-12_H-768_A-12.zip’ saved [407727028/407727028]



In [18]:
repo = 'model_repo'
with zipfile.ZipFile("uncased_L-12_H-768_A-12.zip","r") as zip_ref:
    zip_ref.extractall(repo)

In [19]:
!ls 'model_repo/uncased_L-12_H-768_A-12'

bert_config.json		     bert_model.ckpt.index  vocab.txt
bert_model.ckpt.data-00000-of-00001  bert_model.ckpt.meta


In [5]:
!wget https://raw.githubusercontent.com/google-research/bert/master/modeling.py 
!wget https://raw.githubusercontent.com/google-research/bert/master/optimization.py 
!wget https://raw.githubusercontent.com/google-research/bert/master/run_classifier.py 
!wget https://raw.githubusercontent.com/google-research/bert/master/tokenization.py 

--2019-03-12 10:25:25--  https://raw.githubusercontent.com/google-research/bert/master/modeling.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.16.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.16.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 37922 (37K) [text/plain]
Saving to: ‘modeling.py’


2019-03-12 10:25:25 (3.48 MB/s) - ‘modeling.py’ saved [37922/37922]

--2019-03-12 10:25:26--  https://raw.githubusercontent.com/google-research/bert/master/optimization.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.16.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.16.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6258 (6.1K) [text/plain]
Saving to: ‘optimization.py’


2019-03-12 10:25:26 (136 MB/s) - ‘optimization.py’ saved [6258/6258]

--2019-03-12 10:25:26--  https://raw.githubusercontent.com/google-researc

Example below is done on preprocessing code, similar to **CoLa**:

The Corpus of Linguistic Acceptability is
a binary single-sentence classification task, where 
the goal is to predict whether an English sentence
is linguistically “acceptable” or not

You can use pretrained BERT model for wide variety of tasks, including classification.
The task of CoLa is close to the task of Quora competition, so I thought it woud be interesting to use that example.
Obviously, outside sources aren't allowed in Quora competition, so you won't be able to use BERT to submit a prediction.





In [4]:
# Available pretrained model checkpoints:
#   uncased_L-12_H-768_A-12: uncased BERT base model
#   uncased_L-24_H-1024_A-16: uncased BERT large model
#   cased_L-12_H-768_A-12: cased BERT large model
#We will use the most basic of all of them
BERT_MODEL = 'uncased_L-12_H-768_A-12'
BERT_PRETRAINED_DIR = f'{repo}/uncased_L-12_H-768_A-12'
OUTPUT_DIR = f'{repo}/outputs'
print(f'***** Model output directory: {OUTPUT_DIR} *****')
print(f'***** BERT pretrained directory: {BERT_PRETRAINED_DIR} *****')


***** Model output directory: model_repo/outputs *****
***** BERT pretrained directory: model_repo/uncased_L-12_H-768_A-12 *****


In [5]:
from sklearn.model_selection import train_test_split

train_df =  pd.read_csv('input/train.csv')

train, test = train_test_split(train_df, test_size = 0.1, random_state=42)

train_lines, train_labels = train.question_text.values, train.target.values
test_lines, test_labels = test.question_text.values, test.target.values

In [16]:
! ls model_repo/outputs/
#! rm -r model_repo/outputs

checkpoint
events.out.tfevents.1552492994.TestPri2
events.out.tfevents.1552506308.TestPri2
events.out.tfevents.1552509591.TestPri2
events.out.tfevents.1552515231.TestPri2
events.out.tfevents.1552576467.TestPri2
events.out.tfevents.1552577887.TestPri2
events.out.tfevents.1552586048.TestPri2
events.out.tfevents.1552599670.TestPri2
events.out.tfevents.1552608533.TestPri2
graph.pbtxt
graph.pbtxt.tmp7905681f66134ab8aef12e8b07bddc9f
model.ckpt-88000.data-00000-of-00001
model.ckpt-88000.index
model.ckpt-88000.meta
model.ckpt-89000.data-00000-of-00001
model.ckpt-89000.index
model.ckpt-89000.meta
model.ckpt-90000.data-00000-of-00001
model.ckpt-90000.index
model.ckpt-90000.meta
model.ckpt-91000.data-00000-of-00001
model.ckpt-91000.index
model.ckpt-91000.meta
model.ckpt-91836.data-00000-of-00001
model.ckpt-91836.index
model.ckpt-91836.meta


In [12]:
import modeling
import optimization
import run_classifier
import tokenization
import tensorflow as tf


def create_examples(lines, set_type, labels=None):
#Generate data for the BERT model
    guid = f'{set_type}'
    examples = []
    if guid == 'train':
        for line, label in zip(lines, labels):
            text_a = line
            label = str(label)
            examples.append(
              run_classifier.InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
    else:
        for line in lines:
            text_a = line
            label = '0'
            examples.append(
              run_classifier.InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
    return examples

TRAIN_BATCH_SIZE = 32
EVAL_BATCH_SIZE = 8
LEARNING_RATE = 2e-5
NUM_TRAIN_EPOCHS = 2.5
WARMUP_PROPORTION = 0.1
MAX_SEQ_LENGTH = 128
# Model configs
SAVE_CHECKPOINTS_STEPS = 1000 #if you wish to finetune a model on a larger dataset, use larger interval
# each checpoint weights about 1,5gb
ITERATIONS_PER_LOOP = 1000
NUM_TPU_CORES = 8
VOCAB_FILE = os.path.join(BERT_PRETRAINED_DIR, 'vocab.txt')
CONFIG_FILE = os.path.join(BERT_PRETRAINED_DIR, 'bert_config.json')
INIT_CHECKPOINT = os.path.join(OUTPUT_DIR, 'model.ckpt-91836')
DO_LOWER_CASE = BERT_MODEL.startswith('uncased')

label_list = ['0', '1']
tokenizer = tokenization.FullTokenizer(vocab_file=VOCAB_FILE, do_lower_case=DO_LOWER_CASE)
train_examples = create_examples(train_lines, 'train', labels=train_labels)

tpu_cluster_resolver = None #Since training will happen on GPU, we won't need a cluster resolver
#TPUEstimator also supports training on CPU and GPU. You don't need to define a separate tf.estimator.Estimator.
run_config = tf.contrib.tpu.RunConfig(
    cluster=tpu_cluster_resolver,
    model_dir=OUTPUT_DIR,
    save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS,
    tpu_config=tf.contrib.tpu.TPUConfig(
        iterations_per_loop=ITERATIONS_PER_LOOP,
        num_shards=NUM_TPU_CORES,
        per_host_input_for_training=tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2))

num_train_steps = int(
    len(train_examples) / TRAIN_BATCH_SIZE * NUM_TRAIN_EPOCHS)
num_warmup_steps = int(num_train_steps * WARMUP_PROPORTION)

model_fn = run_classifier.model_fn_builder(
    bert_config=modeling.BertConfig.from_json_file(CONFIG_FILE),
    num_labels=len(label_list),
    init_checkpoint=INIT_CHECKPOINT,
    learning_rate=LEARNING_RATE,
    num_train_steps=num_train_steps,
    num_warmup_steps=num_warmup_steps,
    use_tpu=False, #If False training will fall on CPU or GPU, depending on what is available  
    use_one_hot_embeddings=True)

estimator = tf.contrib.tpu.TPUEstimator(
    use_tpu=False, #If False training will fall on CPU or GPU, depending on what is available 
    model_fn=model_fn,
    config=run_config,
    train_batch_size=TRAIN_BATCH_SIZE,
    eval_batch_size=EVAL_BATCH_SIZE)

INFO:tensorflow:Using config: {'_model_dir': 'model_repo/outputs', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7faeaa1c5780>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, 

In [8]:
"""
Note: You might see a message 'Running train on CPU'. 
This really just means that it's running on something other than a Cloud TPU, which includes a GPU.
"""

# Train the model.
print('Please wait...')
train_features = run_classifier.convert_examples_to_features(
    train_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
print('***** Started training at {} *****'.format(datetime.datetime.now()))
print('  Num examples = {}'.format(len(train_examples)))
print('  Batch size = {}'.format(TRAIN_BATCH_SIZE))
tf.logging.info("  Num steps = %d", num_train_steps)
train_input_fn = run_classifier.input_fn_builder(
    features=train_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=True,
    drop_remainder=True)
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
print('***** Finished training at {} *****'.format(datetime.datetime.now()))

Please wait...
INFO:tensorflow:Writing example 0 of 1175509
INFO:tensorflow:*** Example ***
INFO:tensorflow:guid: train
INFO:tensorflow:tokens: [CLS] are the candidates given a holiday after the 6 months training at indian naval academy ? [SEP]
INFO:tensorflow:input_ids: 101 2024 1996 5347 2445 1037 6209 2044 1996 1020 2706 2731 2012 2796 3987 2914 1029 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

INFO:tensorflow:Writing example 540000 of 1175509
INFO:tensorflow:Writing example 550000 of 1175509
INFO:tensorflow:Writing example 560000 of 1175509
INFO:tensorflow:Writing example 570000 of 1175509
INFO:tensorflow:Writing example 580000 of 1175509
INFO:tensorflow:Writing example 590000 of 1175509
INFO:tensorflow:Writing example 600000 of 1175509
INFO:tensorflow:Writing example 610000 of 1175509
INFO:tensorflow:Writing example 620000 of 1175509
INFO:tensorflow:Writing example 630000 of 1175509
INFO:tensorflow:Writing example 640000 of 1175509
INFO:tensorflow:Writing example 650000 of 1175509
INFO:tensorflow:Writing example 660000 of 1175509
INFO:tensorflow:Writing example 670000 of 1175509
INFO:tensorflow:Writing example 680000 of 1175509
INFO:tensorflow:Writing example 690000 of 1175509
INFO:tensorflow:Writing example 700000 of 1175509
INFO:tensorflow:Writing example 710000 of 1175509
INFO:tensorflow:Writing example 720000 of 1175509
INFO:tensorflow:Writing example 730000 of 1175509


INFO:tensorflow:  name = bert/encoder/layer_2/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_2/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_2/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_2/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_2/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_2/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_2/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_2/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_2/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tenso

INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT

INFO:tensorflow:  name = bert/encoder/layer_11/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/pooler/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/pooler/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = output_weights:0, shape = (2, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = output_bias:0, shape = (2,), *INIT_FROM_CKPT*
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring par

INFO:tensorflow:examples/sec: 17.3987
INFO:tensorflow:global_step/sec: 0.583535
INFO:tensorflow:examples/sec: 18.6731
INFO:tensorflow:global_step/sec: 0.58215
INFO:tensorflow:examples/sec: 18.6288
INFO:tensorflow:global_step/sec: 0.581951
INFO:tensorflow:examples/sec: 18.6224
INFO:tensorflow:global_step/sec: 0.582295
INFO:tensorflow:examples/sec: 18.6334
INFO:tensorflow:global_step/sec: 0.581645
INFO:tensorflow:examples/sec: 18.6126
INFO:tensorflow:global_step/sec: 0.582195
INFO:tensorflow:examples/sec: 18.6302
INFO:tensorflow:global_step/sec: 0.581786
INFO:tensorflow:examples/sec: 18.6171
INFO:tensorflow:global_step/sec: 0.581987
INFO:tensorflow:examples/sec: 18.6236
INFO:tensorflow:global_step/sec: 0.581738
INFO:tensorflow:examples/sec: 18.6156
INFO:tensorflow:Saving checkpoints for 38000 into model_repo/outputs/model.ckpt.
INFO:tensorflow:global_step/sec: 0.543909
INFO:tensorflow:examples/sec: 17.4051
INFO:tensorflow:global_step/sec: 0.58341
INFO:tensorflow:examples/sec: 18.6691
INF

INFO:tensorflow:examples/sec: 18.6012
INFO:tensorflow:global_step/sec: 0.581128
INFO:tensorflow:examples/sec: 18.5961
INFO:tensorflow:global_step/sec: 0.580779
INFO:tensorflow:examples/sec: 18.5849
INFO:tensorflow:global_step/sec: 0.582397
INFO:tensorflow:examples/sec: 18.6367
INFO:tensorflow:global_step/sec: 0.580843
INFO:tensorflow:examples/sec: 18.587
INFO:tensorflow:global_step/sec: 0.581428
INFO:tensorflow:examples/sec: 18.6057
INFO:tensorflow:Saving checkpoints for 47000 into model_repo/outputs/model.ckpt.
INFO:tensorflow:global_step/sec: 0.543528
INFO:tensorflow:examples/sec: 17.3929
INFO:tensorflow:global_step/sec: 0.582539
INFO:tensorflow:examples/sec: 18.6412
INFO:tensorflow:global_step/sec: 0.581609
INFO:tensorflow:examples/sec: 18.6115
INFO:tensorflow:global_step/sec: 0.581409
INFO:tensorflow:examples/sec: 18.6051
INFO:tensorflow:global_step/sec: 0.581821
INFO:tensorflow:examples/sec: 18.6183
INFO:tensorflow:global_step/sec: 0.581365
INFO:tensorflow:examples/sec: 18.6037
IN

INFO:tensorflow:examples/sec: 18.5952
INFO:tensorflow:global_step/sec: 0.58153
INFO:tensorflow:examples/sec: 18.609
INFO:tensorflow:Saving checkpoints for 56000 into model_repo/outputs/model.ckpt.
INFO:tensorflow:global_step/sec: 0.543483
INFO:tensorflow:examples/sec: 17.3915
INFO:tensorflow:global_step/sec: 0.58375
INFO:tensorflow:examples/sec: 18.68
INFO:tensorflow:global_step/sec: 0.582632
INFO:tensorflow:examples/sec: 18.6442
INFO:tensorflow:global_step/sec: 0.581487
INFO:tensorflow:examples/sec: 18.6076
INFO:tensorflow:global_step/sec: 0.582093
INFO:tensorflow:examples/sec: 18.627
INFO:tensorflow:global_step/sec: 0.581732
INFO:tensorflow:examples/sec: 18.6154
INFO:tensorflow:global_step/sec: 0.581866
INFO:tensorflow:examples/sec: 18.6197
INFO:tensorflow:global_step/sec: 0.581304
INFO:tensorflow:examples/sec: 18.6017
INFO:tensorflow:global_step/sec: 0.580056
INFO:tensorflow:examples/sec: 18.5618
INFO:tensorflow:global_step/sec: 0.58153
INFO:tensorflow:examples/sec: 18.609
INFO:tens

INFO:tensorflow:examples/sec: 18.6644
INFO:tensorflow:global_step/sec: 0.581815
INFO:tensorflow:examples/sec: 18.6181
INFO:tensorflow:global_step/sec: 0.581285
INFO:tensorflow:examples/sec: 18.6011
INFO:tensorflow:global_step/sec: 0.581335
INFO:tensorflow:examples/sec: 18.6027
INFO:tensorflow:global_step/sec: 0.5818
INFO:tensorflow:examples/sec: 18.6176
INFO:tensorflow:global_step/sec: 0.58208
INFO:tensorflow:examples/sec: 18.6266
INFO:tensorflow:global_step/sec: 0.580954
INFO:tensorflow:examples/sec: 18.5905
INFO:tensorflow:global_step/sec: 0.581261
INFO:tensorflow:examples/sec: 18.6004
INFO:tensorflow:global_step/sec: 0.581416
INFO:tensorflow:examples/sec: 18.6053
INFO:tensorflow:Saving checkpoints for 66000 into model_repo/outputs/model.ckpt.
INFO:tensorflow:global_step/sec: 0.542988
INFO:tensorflow:examples/sec: 17.3756
INFO:tensorflow:global_step/sec: 0.583169
INFO:tensorflow:examples/sec: 18.6614
INFO:tensorflow:global_step/sec: 0.582376
INFO:tensorflow:examples/sec: 18.636
INFO:

INFO:tensorflow:examples/sec: 18.6316
INFO:tensorflow:global_step/sec: 0.581981
INFO:tensorflow:examples/sec: 18.6234
INFO:tensorflow:global_step/sec: 0.581823
INFO:tensorflow:examples/sec: 18.6183
INFO:tensorflow:global_step/sec: 0.582331
INFO:tensorflow:examples/sec: 18.6346
INFO:tensorflow:global_step/sec: 0.58311
INFO:tensorflow:examples/sec: 18.6595
INFO:tensorflow:Saving checkpoints for 75000 into model_repo/outputs/model.ckpt.
INFO:tensorflow:global_step/sec: 0.54426
INFO:tensorflow:examples/sec: 17.4163
INFO:tensorflow:global_step/sec: 0.584578
INFO:tensorflow:examples/sec: 18.7065
INFO:tensorflow:global_step/sec: 0.582432
INFO:tensorflow:examples/sec: 18.6378
INFO:tensorflow:global_step/sec: 0.582089
INFO:tensorflow:examples/sec: 18.6269
INFO:tensorflow:global_step/sec: 0.582169
INFO:tensorflow:examples/sec: 18.6294
INFO:tensorflow:global_step/sec: 0.581464
INFO:tensorflow:examples/sec: 18.6069
INFO:tensorflow:global_step/sec: 0.581411
INFO:tensorflow:examples/sec: 18.6051
INF

INFO:tensorflow:examples/sec: 18.5759
INFO:tensorflow:Saving checkpoints for 84000 into model_repo/outputs/model.ckpt.
INFO:tensorflow:global_step/sec: 0.542719
INFO:tensorflow:examples/sec: 17.367
INFO:tensorflow:global_step/sec: 0.582159
INFO:tensorflow:examples/sec: 18.6291
INFO:tensorflow:global_step/sec: 0.580971
INFO:tensorflow:examples/sec: 18.5911
INFO:tensorflow:global_step/sec: 0.580322
INFO:tensorflow:examples/sec: 18.5703
INFO:tensorflow:global_step/sec: 0.580638
INFO:tensorflow:examples/sec: 18.5804
INFO:tensorflow:global_step/sec: 0.58048
INFO:tensorflow:examples/sec: 18.5754
INFO:tensorflow:global_step/sec: 0.580264
INFO:tensorflow:examples/sec: 18.5685
INFO:tensorflow:global_step/sec: 0.58042
INFO:tensorflow:examples/sec: 18.5734
INFO:tensorflow:global_step/sec: 0.579663
INFO:tensorflow:examples/sec: 18.5492
INFO:tensorflow:global_step/sec: 0.580366
INFO:tensorflow:examples/sec: 18.5717
INFO:tensorflow:Saving checkpoints for 85000 into model_repo/outputs/model.ckpt.
INF

In [9]:
"""
There is a weird bug in original code.
When predicting, estimator returns an empty dict {}, without batch_size.
I redefine input_fn_builder and hardcode batch_size, irnoring 'params' for now.
"""

def input_fn_builder(features, seq_length, is_training, drop_remainder):
    """Creates an `input_fn` closure to be passed to TPUEstimator."""
    all_input_ids = []
    all_input_mask = []
    all_segment_ids = []
    all_label_ids = []

    for feature in features:
        all_input_ids.append(feature.input_ids)
        all_input_mask.append(feature.input_mask)
        all_segment_ids.append(feature.segment_ids)
        all_label_ids.append(feature.label_id)

    def input_fn(params):
        """The actual input function."""
        print(params)
        batch_size = 32

        num_examples = len(features)

        d = tf.data.Dataset.from_tensor_slices({
            "input_ids":
                tf.constant(
                    all_input_ids, shape=[num_examples, seq_length],
                    dtype=tf.int32),
            "input_mask":
                tf.constant(
                    all_input_mask,
                    shape=[num_examples, seq_length],
                    dtype=tf.int32),
            "segment_ids":
                tf.constant(
                    all_segment_ids,
                    shape=[num_examples, seq_length],
                    dtype=tf.int32),
            "label_ids":
                tf.constant(all_label_ids, shape=[num_examples], dtype=tf.int32),
        })

        if is_training:
            d = d.repeat()
            d = d.shuffle(buffer_size=100)

        d = d.batch(batch_size=batch_size, drop_remainder=drop_remainder)
        return d

    return input_fn

In [13]:
predict_examples = create_examples(test_lines, 'test')

predict_features = run_classifier.convert_examples_to_features(
    predict_examples, label_list, MAX_SEQ_LENGTH, tokenizer)

predict_input_fn = input_fn_builder(
    features=predict_features,
    seq_length=MAX_SEQ_LENGTH,
    is_training=False,
    drop_remainder=False)

result = estimator.predict(input_fn=predict_input_fn)

INFO:tensorflow:Writing example 0 of 130613
INFO:tensorflow:*** Example ***
INFO:tensorflow:guid: test
INFO:tensorflow:tokens: [CLS] what is the most effective classroom management skill / technique to create a good learning environment ? [SEP]
INFO:tensorflow:input_ids: 101 2054 2003 1996 2087 4621 9823 2968 8066 1013 6028 2000 3443 1037 2204 4083 4044 1029 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

In [14]:
from tqdm import tqdm
preds = []
for prediction in tqdm(result):
    for class_probability in prediction['probabilities']:
        preds.append(float(class_probability))

results = []
for i in tqdm(range(0,len(preds),2)):
    if preds[i] < 0.9:
        results.append(1)
    else:
        results.append(0)


0it [00:00, ?it/s][A

{}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running infer on CPU
INFO:tensorflow:*** Features ***
INFO:tensorflow:  name = input_ids, shape = (?, 128)
INFO:tensorflow:  name = input_mask, shape = (?, 128)
INFO:tensorflow:  name = label_ids, shape = (?,)
INFO:tensorflow:  name = segment_ids, shape = (?, 128)
INFO:tensorflow:**** Trainable Variables ****
INFO:tensorflow:  name = bert/embeddings/word_embeddings:0, shape = (30522, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/position_embeddings:0, shape = (512, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encode

INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_4/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_4/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_4/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_4/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_4/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_4/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKP

INFO:tensorflow:  name = bert/encoder/layer_8/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_8/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_9/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorfl


1it [01:12, 72.13s/it][A
33it [01:12, 50.50s/it][A
65it [01:13, 35.35s/it][A
97it [01:13, 24.75s/it][A
129it [01:14, 17.33s/it][A
161it [01:15, 12.14s/it][A
193it [01:15,  8.50s/it][A
225it [01:16,  5.96s/it][A
257it [01:16,  4.18s/it][A
289it [01:17,  2.93s/it][A
321it [01:18,  2.06s/it][A
353it [01:18,  1.44s/it][A
385it [01:19,  1.02s/it][A
417it [01:19,  1.39it/s][A
449it [01:20,  1.97it/s][A
481it [01:21,  2.77it/s][A
513it [01:21,  3.87it/s][A
545it [01:22,  5.36it/s][A
577it [01:22,  7.35it/s][A
609it [01:23,  9.92it/s][A
641it [01:24, 13.12it/s][A
673it [01:24, 16.96it/s][A
705it [01:25, 21.31it/s][A
737it [01:25, 25.99it/s][A
769it [01:26, 30.73it/s][A
801it [01:27, 35.26it/s][A
833it [01:27, 39.34it/s][A
865it [01:28, 42.54it/s][A
897it [01:28, 45.34it/s][A
929it [01:29, 47.28it/s][A
961it [01:30, 48.98it/s][A
993it [01:30, 50.21it/s][A
1025it [01:31, 51.07it/s][A
1057it [01:31, 51.75it/s][A
1089it [01:32, 52.37it/s][A
1121it [01:33, 52.43i

9057it [04:02, 52.75it/s][A
9089it [04:03, 52.68it/s][A
9121it [04:03, 52.41it/s][A
9153it [04:04, 52.43it/s][A
9185it [04:05, 52.78it/s][A
9217it [04:05, 52.70it/s][A
9249it [04:06, 52.91it/s][A
9281it [04:06, 52.87it/s][A
9313it [04:07, 52.58it/s][A
9345it [04:08, 52.41it/s][A
9377it [04:08, 52.76it/s][A
9409it [04:09, 52.61it/s][A
9441it [04:10, 52.52it/s][A
9473it [04:10, 52.38it/s][A
9505it [04:11, 52.32it/s][A
9537it [04:11, 52.65it/s][A
9569it [04:12, 52.59it/s][A
9601it [04:13, 52.78it/s][A
9633it [04:13, 52.80it/s][A
9665it [04:14, 52.94it/s][A
9697it [04:14, 52.87it/s][A
9729it [04:15, 52.64it/s][A
9761it [04:16, 52.52it/s][A
9793it [04:16, 52.36it/s][A
9825it [04:17, 52.69it/s][A
9857it [04:17, 52.51it/s][A
9889it [04:18, 52.80it/s][A
9921it [04:19, 52.60it/s][A
9953it [04:19, 52.66it/s][A
9985it [04:20, 52.86it/s][A
10017it [04:20, 53.12it/s][A
10049it [04:21, 52.92it/s][A
10081it [04:22, 53.11it/s][A
10113it [04:22, 52.93it/s][A
10145it [0

17825it [06:49, 52.50it/s][A
17857it [06:49, 52.63it/s][A
17889it [06:50, 52.49it/s][A
17921it [06:51, 52.38it/s][A
17953it [06:51, 52.65it/s][A
17985it [06:52, 52.61it/s][A
18017it [06:52, 52.87it/s][A
18049it [06:53, 52.70it/s][A
18081it [06:54, 52.60it/s][A
18113it [06:54, 52.48it/s][A
18145it [06:55, 52.29it/s][A
18177it [06:56, 52.12it/s][A
18209it [06:56, 52.14it/s][A
18241it [06:57, 52.25it/s][A
18273it [06:57, 52.54it/s][A
18305it [06:58, 52.47it/s][A
18337it [06:59, 52.49it/s][A
18369it [06:59, 52.77it/s][A
18401it [07:00, 52.64it/s][A
18433it [07:00, 52.37it/s][A
18465it [07:01, 52.37it/s][A
18497it [07:02, 52.28it/s][A
18529it [07:02, 52.25it/s][A
18561it [07:03, 52.32it/s][A
18593it [07:03, 52.22it/s][A
18625it [07:04, 52.57it/s][A
18657it [07:05, 52.56it/s][A
18689it [07:05, 52.80it/s][A
18721it [07:06, 52.66it/s][A
18753it [07:06, 52.47it/s][A
18785it [07:07, 52.65it/s][A
18817it [07:08, 52.59it/s][A
18849it [07:08, 52.65it/s][A
18881it [0

26561it [09:36, 52.14it/s][A
26593it [09:37, 52.17it/s][A
26625it [09:37, 52.18it/s][A
26657it [09:38, 52.02it/s][A
26689it [09:38, 52.05it/s][A
26721it [09:39, 52.07it/s][A
26753it [09:40, 52.05it/s][A
26785it [09:40, 52.05it/s][A
26817it [09:41, 52.13it/s][A
26849it [09:42, 52.20it/s][A
26881it [09:42, 52.12it/s][A
26913it [09:43, 52.07it/s][A
26945it [09:43, 52.13it/s][A
26977it [09:44, 52.10it/s][A
27009it [09:45, 52.11it/s][A
27041it [09:45, 52.13it/s][A
27073it [09:46, 52.06it/s][A
27105it [09:46, 52.21it/s][A
27137it [09:47, 52.02it/s][A
27169it [09:48, 52.12it/s][A
27201it [09:48, 52.06it/s][A
27233it [09:49, 52.09it/s][A
27265it [09:50, 52.11it/s][A
27297it [09:50, 52.13it/s][A
27329it [09:51, 52.11it/s][A
27361it [09:51, 52.16it/s][A
27393it [09:52, 52.13it/s][A
27425it [09:53, 52.07it/s][A
27457it [09:53, 52.09it/s][A
27489it [09:54, 52.03it/s][A
27521it [09:54, 52.20it/s][A
27553it [09:55, 52.31it/s][A
27585it [09:56, 52.17it/s][A
27617it [0

35297it [12:24, 52.09it/s][A
35329it [12:24, 52.11it/s][A
35361it [12:25, 52.05it/s][A
35393it [12:25, 52.10it/s][A
35425it [12:26, 52.09it/s][A
35457it [12:27, 52.13it/s][A
35489it [12:27, 52.05it/s][A
35521it [12:28, 52.11it/s][A
35553it [12:29, 52.27it/s][A
35585it [12:29, 51.85it/s][A
35617it [12:30, 52.17it/s][A
35649it [12:30, 52.12it/s][A
35681it [12:31, 52.18it/s][A
35713it [12:32, 52.10it/s][A
35745it [12:32, 52.14it/s][A
35777it [12:33, 52.03it/s][A
35809it [12:33, 52.10it/s][A
35841it [12:34, 52.17it/s][A
35873it [12:35, 52.09it/s][A
35905it [12:35, 52.20it/s][A
35937it [12:36, 52.15it/s][A
35969it [12:37, 52.05it/s][A
36001it [12:37, 52.08it/s][A
36033it [12:38, 52.08it/s][A
36065it [12:38, 52.05it/s][A
36097it [12:39, 52.06it/s][A
36129it [12:40, 52.09it/s][A
36161it [12:40, 52.11it/s][A
36193it [12:41, 52.12it/s][A
36225it [12:41, 52.16it/s][A
36257it [12:42, 52.12it/s][A
36289it [12:43, 52.02it/s][A
36321it [12:43, 52.05it/s][A
36353it [1

44033it [15:11, 52.01it/s][A
44065it [15:12, 52.08it/s][A
44097it [15:13, 51.97it/s][A
44129it [15:13, 52.10it/s][A
44161it [15:14, 52.14it/s][A
44193it [15:14, 52.16it/s][A
44225it [15:15, 52.12it/s][A
44257it [15:16, 52.10it/s][A
44289it [15:16, 52.16it/s][A
44321it [15:17, 52.15it/s][A
44353it [15:17, 52.11it/s][A
44385it [15:18, 52.13it/s][A
44417it [15:19, 51.98it/s][A
44449it [15:19, 52.17it/s][A
44481it [15:20, 52.12it/s][A
44513it [15:21, 52.14it/s][A
44545it [15:21, 52.26it/s][A
44577it [15:22, 52.11it/s][A
44609it [15:22, 52.11it/s][A
44641it [15:23, 52.14it/s][A
44673it [15:24, 52.12it/s][A
44705it [15:24, 52.11it/s][A
44737it [15:25, 52.10it/s][A
44769it [15:25, 52.15it/s][A
44801it [15:26, 52.07it/s][A
44833it [15:27, 52.09it/s][A
44865it [15:27, 52.05it/s][A
44897it [15:28, 52.11it/s][A
44929it [15:28, 52.13it/s][A
44961it [15:29, 52.12it/s][A
44993it [15:30, 52.09it/s][A
45025it [15:30, 52.08it/s][A
45057it [15:31, 52.19it/s][A
45089it [1

52769it [17:59, 52.03it/s][A
52801it [18:00, 52.07it/s][A
52833it [18:00, 52.09it/s][A
52865it [18:01, 52.10it/s][A
52897it [18:01, 52.11it/s][A
52929it [18:02, 52.09it/s][A
52961it [18:03, 52.15it/s][A
52993it [18:03, 52.14it/s][A
53025it [18:04, 52.10it/s][A
53057it [18:04, 52.08it/s][A
53089it [18:05, 52.11it/s][A
53121it [18:06, 52.13it/s][A
53153it [18:06, 52.09it/s][A
53185it [18:07, 52.10it/s][A
53217it [18:08, 52.12it/s][A
53249it [18:08, 52.12it/s][A
53281it [18:09, 52.13it/s][A
53313it [18:09, 52.11it/s][A
53345it [18:10, 52.12it/s][A
53377it [18:11, 52.14it/s][A
53409it [18:11, 52.15it/s][A
53441it [18:12, 52.11it/s][A
53473it [18:12, 52.07it/s][A
53505it [18:13, 52.14it/s][A
53537it [18:14, 52.10it/s][A
53569it [18:14, 52.11it/s][A
53601it [18:15, 52.06it/s][A
53633it [18:16, 52.13it/s][A
53665it [18:16, 52.12it/s][A
53697it [18:17, 52.12it/s][A
53729it [18:17, 52.12it/s][A
53761it [18:18, 52.05it/s][A
53793it [18:19, 52.13it/s][A
53825it [1

61505it [20:47, 52.10it/s][A
61537it [20:47, 52.12it/s][A
61569it [20:48, 52.11it/s][A
61601it [20:48, 52.12it/s][A
61633it [20:49, 52.04it/s][A
61665it [20:50, 52.14it/s][A
61697it [20:50, 52.20it/s][A
61729it [20:51, 52.17it/s][A
61761it [20:52, 52.15it/s][A
61793it [20:52, 52.09it/s][A
61825it [20:53, 52.13it/s][A
61857it [20:53, 52.15it/s][A
61889it [20:54, 52.15it/s][A
61921it [20:55, 52.10it/s][A
61953it [20:55, 52.10it/s][A
61985it [20:56, 52.11it/s][A
62017it [20:56, 52.11it/s][A
62049it [20:57, 52.09it/s][A
62081it [20:58, 52.07it/s][A
62113it [20:58, 52.12it/s][A
62145it [20:59, 52.02it/s][A
62177it [21:00, 51.67it/s][A
62209it [21:00, 51.77it/s][A
62241it [21:01, 51.91it/s][A
62273it [21:01, 52.00it/s][A
62305it [21:02, 52.01it/s][A
62337it [21:03, 52.05it/s][A
62369it [21:03, 51.99it/s][A
62401it [21:04, 52.10it/s][A
62433it [21:04, 52.09it/s][A
62465it [21:05, 52.18it/s][A
62497it [21:06, 51.89it/s][A
62529it [21:06, 51.82it/s][A
62561it [2

70241it [23:34, 51.89it/s][A
70273it [23:35, 51.84it/s][A
70305it [23:36, 51.91it/s][A
70337it [23:36, 51.78it/s][A
70369it [23:37, 51.87it/s][A
70401it [23:37, 51.89it/s][A
70433it [23:38, 52.01it/s][A
70465it [23:39, 52.04it/s][A
70497it [23:39, 51.96it/s][A
70529it [23:40, 52.03it/s][A
70561it [23:40, 52.04it/s][A
70593it [23:41, 52.09it/s][A
70625it [23:42, 52.11it/s][A
70657it [23:42, 52.10it/s][A
70689it [23:43, 52.09it/s][A
70721it [23:44, 52.13it/s][A
70753it [23:44, 52.16it/s][A
70785it [23:45, 52.06it/s][A
70817it [23:45, 52.05it/s][A
70849it [23:46, 52.12it/s][A
70881it [23:47, 52.14it/s][A
70913it [23:47, 52.13it/s][A
70945it [23:48, 52.10it/s][A
70977it [23:48, 52.13it/s][A
71009it [23:49, 52.09it/s][A
71041it [23:50, 52.09it/s][A
71073it [23:50, 52.07it/s][A
71105it [23:51, 52.12it/s][A
71137it [23:52, 52.08it/s][A
71169it [23:52, 52.11it/s][A
71201it [23:53, 52.09it/s][A
71233it [23:53, 52.26it/s][A
71265it [23:54, 52.07it/s][A
71297it [2

78977it [26:22, 52.11it/s][A
79009it [26:23, 52.14it/s][A
79041it [26:23, 52.13it/s][A
79073it [26:24, 52.12it/s][A
79105it [26:25, 52.16it/s][A
79137it [26:25, 52.10it/s][A
79169it [26:26, 52.09it/s][A
79201it [26:26, 52.05it/s][A
79233it [26:27, 52.06it/s][A
79265it [26:28, 52.15it/s][A
79297it [26:28, 52.04it/s][A
79329it [26:29, 52.13it/s][A
79361it [26:29, 52.17it/s][A
79393it [26:30, 52.04it/s][A
79425it [26:31, 52.04it/s][A
79457it [26:31, 52.05it/s][A
79489it [26:32, 52.13it/s][A
79521it [26:32, 52.13it/s][A
79553it [26:33, 52.16it/s][A
79585it [26:34, 52.14it/s][A
79617it [26:34, 52.12it/s][A
79649it [26:35, 52.16it/s][A
79681it [26:36, 52.13it/s][A
79713it [26:36, 52.09it/s][A
79745it [26:37, 52.13it/s][A
79777it [26:37, 52.12it/s][A
79809it [26:38, 52.05it/s][A
79841it [26:39, 52.06it/s][A
79873it [26:39, 52.12it/s][A
79905it [26:40, 52.11it/s][A
79937it [26:40, 52.12it/s][A
79969it [26:41, 52.15it/s][A
80001it [26:42, 52.05it/s][A
80033it [2

87713it [29:10, 52.10it/s][A
87745it [29:10, 52.11it/s][A
87777it [29:11, 52.12it/s][A
87809it [29:12, 52.17it/s][A
87841it [29:12, 52.06it/s][A
87873it [29:13, 52.14it/s][A
87905it [29:13, 52.08it/s][A
87937it [29:14, 51.99it/s][A
87969it [29:15, 52.18it/s][A
88001it [29:15, 52.12it/s][A
88033it [29:16, 52.00it/s][A
88065it [29:17, 52.12it/s][A
88097it [29:17, 52.16it/s][A
88129it [29:18, 52.16it/s][A
88161it [29:18, 52.12it/s][A
88193it [29:19, 52.16it/s][A
88225it [29:20, 52.10it/s][A
88257it [29:20, 52.14it/s][A
88289it [29:21, 51.99it/s][A
88321it [29:21, 52.14it/s][A
88353it [29:22, 52.10it/s][A
88385it [29:23, 52.12it/s][A
88417it [29:23, 52.09it/s][A
88449it [29:24, 52.19it/s][A
88481it [29:25, 52.16it/s][A
88513it [29:25, 52.07it/s][A
88545it [29:26, 52.16it/s][A
88577it [29:26, 52.12it/s][A
88609it [29:27, 52.16it/s][A
88641it [29:28, 52.14it/s][A
88673it [29:28, 52.18it/s][A
88705it [29:29, 52.13it/s][A
88737it [29:29, 52.13it/s][A
88769it [2

96449it [31:58, 52.13it/s][A
96481it [31:58, 52.16it/s][A
96513it [31:59, 52.12it/s][A
96545it [31:59, 52.07it/s][A
96577it [32:00, 52.12it/s][A
96609it [32:01, 52.09it/s][A
96641it [32:01, 52.12it/s][A
96673it [32:02, 52.18it/s][A
96705it [32:02, 52.08it/s][A
96737it [32:03, 52.13it/s][A
96769it [32:04, 52.12it/s][A
96801it [32:04, 52.13it/s][A
96833it [32:05, 52.15it/s][A
96865it [32:05, 52.14it/s][A
96897it [32:06, 52.11it/s][A
96929it [32:07, 52.10it/s][A
96961it [32:07, 52.04it/s][A
96993it [32:08, 52.09it/s][A
97025it [32:09, 52.14it/s][A
97057it [32:09, 52.14it/s][A
97089it [32:10, 52.09it/s][A
97121it [32:10, 52.06it/s][A
97153it [32:11, 52.12it/s][A
97185it [32:12, 52.10it/s][A
97217it [32:12, 52.10it/s][A
97249it [32:13, 52.11it/s][A
97281it [32:13, 52.09it/s][A
97313it [32:14, 52.09it/s][A
97345it [32:15, 52.03it/s][A
97377it [32:15, 52.13it/s][A
97409it [32:16, 52.07it/s][A
97441it [32:17, 52.12it/s][A
97473it [32:17, 52.11it/s][A
97505it [3

104993it [34:42, 52.07it/s][A
105025it [34:42, 52.13it/s][A
105057it [34:43, 51.99it/s][A
105089it [34:43, 52.01it/s][A
105121it [34:44, 52.01it/s][A
105153it [34:45, 52.11it/s][A
105185it [34:45, 52.11it/s][A
105217it [34:46, 52.19it/s][A
105249it [34:46, 52.18it/s][A
105281it [34:47, 52.11it/s][A
105313it [34:48, 52.19it/s][A
105345it [34:48, 52.09it/s][A
105377it [34:49, 52.19it/s][A
105409it [34:50, 52.21it/s][A
105441it [34:50, 52.12it/s][A
105473it [34:51, 52.12it/s][A
105505it [34:51, 52.12it/s][A
105537it [34:52, 51.86it/s][A
105569it [34:53, 52.21it/s][A
105601it [34:53, 52.17it/s][A
105633it [34:54, 52.09it/s][A
105665it [34:54, 52.13it/s][A
105697it [34:55, 52.14it/s][A
105729it [34:56, 52.14it/s][A
105761it [34:56, 52.11it/s][A
105793it [34:57, 52.20it/s][A
105825it [34:57, 52.07it/s][A
105857it [34:58, 52.10it/s][A
105889it [34:59, 52.09it/s][A
105921it [34:59, 52.13it/s][A
105953it [35:00, 52.10it/s][A
105985it [35:01, 52.13it/s][A
106017it

113441it [37:24, 52.11it/s][A
113473it [37:24, 51.99it/s][A
113505it [37:25, 52.19it/s][A
113537it [37:25, 52.14it/s][A
113569it [37:26, 52.10it/s][A
113601it [37:27, 52.13it/s][A
113633it [37:27, 52.19it/s][A
113665it [37:28, 52.14it/s][A
113697it [37:29, 52.16it/s][A
113729it [37:29, 52.15it/s][A
113761it [37:30, 52.12it/s][A
113793it [37:30, 52.13it/s][A
113825it [37:31, 52.11it/s][A
113857it [37:32, 52.08it/s][A
113889it [37:32, 52.08it/s][A
113921it [37:33, 52.11it/s][A
113953it [37:33, 52.17it/s][A
113985it [37:34, 52.05it/s][A
114017it [37:35, 52.19it/s][A
114049it [37:35, 52.09it/s][A
114081it [37:36, 52.12it/s][A
114113it [37:37, 52.09it/s][A
114145it [37:37, 52.06it/s][A
114177it [37:38, 52.09it/s][A
114209it [37:38, 52.15it/s][A
114241it [37:39, 52.06it/s][A
114273it [37:40, 52.09it/s][A
114305it [37:40, 52.09it/s][A
114337it [37:41, 52.10it/s][A
114369it [37:41, 52.10it/s][A
114401it [37:42, 52.10it/s][A
114433it [37:43, 52.13it/s][A
114465it

121889it [40:06, 51.94it/s][A
121921it [40:06, 52.09it/s][A
121953it [40:07, 51.98it/s][A
121985it [40:08, 52.28it/s][A
122017it [40:08, 52.17it/s][A
122049it [40:09, 52.16it/s][A
122081it [40:09, 52.13it/s][A
122113it [40:10, 52.21it/s][A
122145it [40:11, 52.05it/s][A
122177it [40:11, 52.08it/s][A
122209it [40:12, 52.11it/s][A
122241it [40:13, 52.09it/s][A
122273it [40:13, 52.09it/s][A
122305it [40:14, 52.21it/s][A
122337it [40:14, 52.17it/s][A
122369it [40:15, 52.08it/s][A
122401it [40:16, 52.06it/s][A
122433it [40:16, 52.08it/s][A
122465it [40:17, 52.05it/s][A
122497it [40:17, 52.00it/s][A
122529it [40:18, 51.97it/s][A
122561it [40:19, 51.95it/s][A
122593it [40:19, 52.08it/s][A
122625it [40:20, 52.10it/s][A
122657it [40:21, 52.06it/s][A
122689it [40:21, 52.10it/s][A
122721it [40:22, 52.07it/s][A
122753it [40:22, 52.11it/s][A
122785it [40:23, 51.98it/s][A
122817it [40:24, 52.03it/s][A
122849it [40:24, 52.10it/s][A
122881it [40:25, 52.09it/s][A
122913it

130337it [42:48, 52.10it/s][A
130369it [42:49, 52.12it/s][A
130401it [42:49, 52.06it/s][A
130433it [42:50, 52.12it/s][A
130465it [42:50, 52.12it/s][A
130497it [42:51, 52.12it/s][A
130529it [42:52, 52.14it/s][A
130561it [42:52, 52.09it/s][A
130593it [42:53, 57.72it/s][A

INFO:tensorflow:prediction_loop marked as finished
INFO:tensorflow:prediction_loop marked as finished



130613it [42:53, 50.76it/s][A
  0%|          | 0/130613 [00:00<?, ?it/s][A
100%|██████████| 130613/130613 [00:00<00:00, 1964691.43it/s][A

In [15]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score

print('%.2f' % accuracy_score(np.array(results), test_labels))
print('%.5f' % f1_score(np.array(results), test_labels))

0.96
0.71777


There are several downsides for BERT at this moment:

- Training is expensive. All results on the paper were fine-tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re-produce most of the BERT-Large results on the paper using a GPU with 12GB - 16GB of RAM, because the maximum batch size that can fit in memory is too small. 

- At the moment BERT supports only English, though addition of other languages is expected.



# Competition test


We've run a test with all of Quora data on Standard NC6 (6 vcpus, 56 GB memory) and achieved f1 score of 0.71777.(1th place at the end of competition)

**You can't use BERT in the competition, the notebook will fail when it comes to real testing.**

Training took about 40 hours.
Results are really amazing, espetially because it's a raw model with no optimization or ensamble, using the simlest of 3 released models.

We didn't even have to preprocess anything, model does it for you.
