# BERT model classification

In [7]:
!git clone -b docker https://github.com/yoheikikuta/bert.git

Cloning into 'bert'...
remote: Enumerating objects: 4, done.[K
remote: Counting objects: 100% (4/4), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 234 (delta 1), reused 3 (delta 1), pack-reused 230[K
Receiving objects: 100% (234/234), 152.83 KiB | 0 bytes/s, done.
Resolving deltas: 100% (133/133), done.
Checking connectivity... done.


In [8]:
!ls bert/

CONTRIBUTING.md		    modeling.py		  run_pretraining.py
Dockerfile		    modeling_test.py	  run_squad.py
LICENSE			    multilingual.md	  sample_text.txt
README.md		    optimization.py	  tokenization.py
__init__.py		    optimization_test.py  tokenization_test.py
create_pretraining_data.py  requirements.txt	  utils
extract_features.py	    run_classifier.py


In [13]:
!pip3 install -r ./bert/requirements.txt

Collecting tensorflow>=1.11.0 (from -r ./bert/requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/b1/ad/48395de38c1e07bab85fc3bbec045e11ae49c02a4db0100463dd96031947/tensorflow-1.12.0-cp35-cp35m-manylinux1_x86_64.whl (83.1MB)
[K    100% |################################| 83.1MB 14kB/s  eta 0:00:01
Collecting keras-preprocessing>=1.0.5 (from tensorflow>=1.11.0->-r ./bert/requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/fc/94/74e0fa783d3fc07e41715973435dd051ca89c550881b3454233c39c73e69/Keras_Preprocessing-1.0.5-py2.py3-none-any.whl
Collecting tensorboard<1.13.0,>=1.12.0 (from tensorflow>=1.11.0->-r ./bert/requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/e0/d0/65fe48383146199f16dbd5999ef226b87bce63ad5cd73c840cf722637969/tensorboard-1.12.0-py3-none-any.whl (3.0MB)
[K    100% |################################| 3.1MB 447kB/s eta 0:00:01
[?25hCollecting keras-applications>=1.0.6 (from tensorflow>=1.

### Model and data download

We solve RTE task in GLUE datasets; see https://www.nyu.edu/projects/bowman/glue.pdf in detail.

In [15]:
import os

In [16]:
os.makedirs("./bert/model", exist_ok=True)
os.makedirs("./bert/data", exist_ok=True)

In [20]:
!wget -O ./bert/model/uncased_L-12_H-768_A-12.zip https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip

--2018-11-18 03:53:02--  https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.24.144, 2404:6800:4004:81b::2010
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.24.144|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 407727028 (389M) [application/zip]
Saving to: './bert/model/uncased_L-12_H-768_A-12.zip'


2018-11-18 03:53:12 (37.8 MB/s) - './bert/model/uncased_L-12_H-768_A-12.zip' saved [407727028/407727028]



In [24]:
!unzip ./bert/model/uncased_L-12_H-768_A-12.zip -d ./bert/model/ && \
  rm ./bert/model/uncased_L-12_H-768_A-12.zip

Archive:  ./bert/model/uncased_L-12_H-768_A-12.zip
   creating: ./bert/model/uncased_L-12_H-768_A-12/
  inflating: ./bert/model/uncased_L-12_H-768_A-12/bert_model.ckpt.meta  
  inflating: ./bert/model/uncased_L-12_H-768_A-12/bert_model.ckpt.data-00000-of-00001  
  inflating: ./bert/model/uncased_L-12_H-768_A-12/vocab.txt  
  inflating: ./bert/model/uncased_L-12_H-768_A-12/bert_model.ckpt.index  
  inflating: ./bert/model/uncased_L-12_H-768_A-12/bert_config.json  


In [27]:
!python3 ./bert/utils/download_glue_data.py --data_dir ./bert/data --tasks RTE

Downloading and extracting MNLI...
	Completed!


### Model fine-tuning

It takes about 3 hours in a `n1-standard-4` instance on GCP Compute Engine.

In [2]:
%%time

!python3 ./bert/run_classifier.py \
  --task_name=RTE \
  --do_train=true \
  --do_eval=true \
  --data_dir=./bert/data/RTE \
  --vocab_file=./bert/model/uncased_L-12_H-768_A-12/vocab.txt \
  --bert_config_file=./bert/model/uncased_L-12_H-768_A-12/bert_config.json \
  --init_checkpoint=./bert/model/uncased_L-12_H-768_A-12/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=2e-5 \
  --num_train_epochs=3.0 \
  --output_dir=./bert/tmp/rte_output/

INFO:tensorflow:Using config: {'_num_ps_replicas': 0, '_train_distribute': None, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_keep_checkpoint_max': 5, '_is_chief': True, '_model_dir': './bert/tmp/rte_output/', '_save_summary_steps': 100, '_global_id_in_cluster': 0, '_task_id': 0, '_log_step_count_steps': None, '_protocol': None, '_cluster': None, '_num_worker_replicas': 1, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_device_fn': None, '_save_checkpoints_steps': 1000, '_task_type': 'worker', '_master': '', '_tf_random_seed': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb2b93e50f0>, '_save_checkpoints_secs': None, '_eval_distribute': None, '_experimental_distribute': None, '_keep_chec

INFO:tensorflow:***** Running training *****
INFO:tensorflow:  Num examples = 2490
INFO:tensorflow:  Batch size = 32
INFO:tensorflow:  Num steps = 233
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running train on CPU
INFO:tensorflow:*** Features ***
INFO:tensorflow:  name = input_ids, shape = (32, 128)
INFO:tensorflow:  name = input_mask, shape = (32, 128)
INFO:tensorflow:  name = label_ids, shape = (32,)
INFO:tensorflow:  name = segment_ids, shape = (32, 128)
INFO:tensorflow:**** Trainable Variables ****
INFO:tensorflow:  name = bert/embeddings/word_embeddings:0, shape = (30522, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/position_embeddings:0, shape = (512, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: 

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2018-11-18 08:07:59.343665: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into ./bert/tmp/rte_output/model.ckpt.
INFO:tensorflow:global_step/sec: 0.0230017
INFO:tensorflow:examples/sec: 0.736054
INFO:tensorflow:global_step/sec: 0.0230105
INFO:tensorflow:examples/sec: 0.736337
INFO:tensorflow:Saving checkpoints for 233 into ./bert/tmp/rte_output/model.ckpt.
INFO:tensorflow:Loss for final step: 0.31156892.
INFO:tensorflow:training_loop marked as finished
INFO:tensorflow:Writing example 0 of 277
INFO:tensorflow:*** Example ***
INFO:tensorflow:guid: dev-0
INFO:tensorflow:tokens: [CLS] dana reeve , the widow of the actor christopher reeve , has died of 

INFO:tensorflow:***** Running evaluation *****
INFO:tensorflow:  Num examples = 277
INFO:tensorflow:  Batch size = 8
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running eval on CPU
INFO:tensorflow:*** Features ***
INFO:tensorflow:  name = input_ids, shape = (?, 128)
INFO:tensorflow:  name = input_mask, shape = (?, 128)
INFO:tensorflow:  name = label_ids, shape = (?,)
INFO:tensorflow:  name = segment_ids, shape = (?, 128)
INFO:tensorflow:**** Trainable Variables ****
INFO:tensorflow:  name = bert/embeddings/word_embeddings:0, shape = (30522, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/position_embeddings:0, shape = (512, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-11-18-10:57:23
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./bert/tmp/rte_output/model.ckpt-233
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-11-18-10:59:45
INFO:tensorflow:Saving dict for global step 233: eval_accuracy = 0.6931408, eval_loss = 0.71709377, global_step = 233, loss = 0.71939987
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 233: ./bert/tmp/rte_output/model.ckpt-233
INFO:tensorflow:evaluation_loop marked as finished
INFO:tensorflow:***** Eval results *****
INFO:tensorflow:  eval_accuracy = 0.6931408
INFO:tensorflow:  eval_loss = 0.71709377
INFO:tensorflow:  global_step = 233
INFO:tensorflow:  loss = 0.71939987
CPU times: user 4min 30s, sys: 33.3 s, total: 5min 4s
Wall time: 2h 52min 15s
