<a href="https://colab.research.google.com/github/ztjfreedom/colab/blob/master/bert(tf_hub)_google_job_skills.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Classification on Google Job Skills dataset with BERT on TF Hub

In [0]:
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
from datetime import datetime

In addition to the standard libraries we imported above, we'll need to install BERT's python package.

In [2]:
!pip install bert-tensorflow

Collecting bert-tensorflow
[?25l  Downloading https://files.pythonhosted.org/packages/a6/66/7eb4e8b6ea35b7cc54c322c816f976167a43019750279a8473d355800a93/bert_tensorflow-1.0.1-py2.py3-none-any.whl (67kB)
[K     |████████████████████████████████| 71kB 3.4MB/s 
Installing collected packages: bert-tensorflow
Successfully installed bert-tensorflow-1.0.1


In [3]:
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization

W0712 04:32:37.548106 140341499602816 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.



Below, we'll set an output directory location to store our model output and checkpoints. This can be a local directory, in which case you'd set OUTPUT_DIR to the name of the directory you'd like to create. If you're running this code in Google's hosted Colab, the directory won't persist after the Colab session ends.

Alternatively, if you're a GCP user, you can store output in a GCP bucket. To do that, set a directory name in OUTPUT_DIR and the name of the GCP bucket in the BUCKET field.

Set DO_DELETE to rewrite the OUTPUT_DIR if it exists. Otherwise, Tensorflow will load existing model checkpoints from that directory (if they exist).

In [4]:
# Set the output directory for saving model file
# Optionally, set a GCP bucket location

OUTPUT_DIR = 'bert_output'#@param {type:"string"}
#@markdown Whether or not to clear/delete the directory and create a new one
DO_DELETE = True #@param {type:"boolean"}
#@markdown Set USE_BUCKET and BUCKET if you want to (optionally) store model output on GCP bucket.
USE_BUCKET = False #@param {type:"boolean"}
BUCKET = '' #@param {type:"string"}

if USE_BUCKET:
    OUTPUT_DIR = 'gs://{}/{}'.format(BUCKET, OUTPUT_DIR)
    from google.colab import auth
    auth.authenticate_user()

if DO_DELETE:
    try:
        tf.gfile.DeleteRecursively(OUTPUT_DIR)
    except:
        # Doesn't matter if the directory didn't exist
        pass
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))

***** Model output directory: bert_output *****


#Data

In [5]:
# mount google drive
from google.colab import drive
drive.mount('/content/gdrive')
%cd gdrive/My Drive/Colab Notebooks/dataset/

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive
/content/gdrive/My Drive/Colab Notebooks/dataset


In [6]:
import numpy as np
import pandas as pd
import collections
from IPython.display import display

dataset_file = 'google_job_skills/job_skills.csv'
df = pd.read_csv(dataset_file)
print(len(df))
display(df.head())

1250


Unnamed: 0,Company,Title,Category,Location,Responsibilities,Minimum Qualifications,Preferred Qualifications
0,Google,Google Cloud Program Manager,Program Management,Singapore,"Shape, shepherd, ship, and show technical prog...",BA/BS degree or equivalent practical experienc...,Experience in the business technology market a...
1,Google,"Supplier Development Engineer (SDE), Cable/Con...",Manufacturing & Supply Chain,"Shanghai, China",Drive cross-functional activities in the suppl...,BS degree in an Engineering discipline or equi...,"BSEE, BSME or BSIE degree.\nExperience of usin..."
2,Google,"Data Analyst, Product and Tools Operations, Go...",Technical Solutions,"New York, NY, United States",Collect and analyze data to draw insight and i...,"Bachelor’s degree in Business, Economics, Stat...",Experience partnering or consulting cross-func...
3,Google,"Developer Advocate, Partner Engineering",Developer Relations,"Mountain View, CA, United States","Work one-on-one with the top Android, iOS, and...",BA/BS degree in Computer Science or equivalent...,"Experience as a software developer, architect,..."
4,Google,"Program Manager, Audio Visual (AV) Deployments",Program Management,"Sunnyvale, CA, United States",Plan requirements with internal customers.\nPr...,BA/BS degree or equivalent practical experienc...,CTS Certification.\nExperience in the construc...


In [7]:
df = df[['Title', 'Responsibilities', 'Category']]
mask = (df['Category'].notnull()) & (df['Title'].notnull()) & (df['Responsibilities'].notnull())
df = df.loc[mask]
print(len(df))
display(df.head())

1235


Unnamed: 0,Title,Responsibilities,Category
0,Google Cloud Program Manager,"Shape, shepherd, ship, and show technical prog...",Program Management
1,"Supplier Development Engineer (SDE), Cable/Con...",Drive cross-functional activities in the suppl...,Manufacturing & Supply Chain
2,"Data Analyst, Product and Tools Operations, Go...",Collect and analyze data to draw insight and i...,Technical Solutions
3,"Developer Advocate, Partner Engineering","Work one-on-one with the top Android, iOS, and...",Developer Relations
4,"Program Manager, Audio Visual (AV) Deployments",Plan requirements with internal customers.\nPr...,Program Management


In [8]:
print(df.groupby(['Category']).size().count())
print(df.groupby(['Category']).size())

23
Category
Administrative                       40
Business Strategy                    98
Data Center & Network                 2
Developer Relations                   5
Finance                             115
Hardware Engineering                 22
IT & Data Management                  5
Legal & Government Relations         46
Manufacturing & Supply Chain         16
Marketing & Communications          165
Network Engineering                   6
Partnerships                         59
People Operations                    86
Product & Customer Support           50
Program Management                   72
Real Estate & Workplace Services     25
Sales & Account Management          168
Sales Operations                     31
Software Engineering                 24
Technical Infrastructure             11
Technical Solutions                 100
Technical Writing                     5
User Experience & Design             84
dtype: int64


In [9]:
print(df['Title'].apply(len).mean())
print(df['Responsibilities'].apply(len).mean())

41.83400809716599
639.9352226720648


In [10]:
label_list = list(set(df.get('Category').tolist()))
print(len(label_list))
print(label_list)

23
['Program Management', 'Software Engineering', 'User Experience & Design', 'Real Estate & Workplace Services', 'Sales & Account Management', 'Technical Writing', 'Developer Relations', 'IT & Data Management', 'Technical Solutions', 'People Operations', 'Marketing & Communications', 'Manufacturing & Supply Chain', 'Data Center & Network', 'Technical Infrastructure', 'Business Strategy', 'Hardware Engineering', 'Network Engineering', 'Finance', 'Legal & Government Relations', 'Partnerships', 'Administrative', 'Sales Operations', 'Product & Customer Support']


In [11]:
# Train test set split, satisfying that test set contains each category
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle

train, test = train_test_split(df, test_size=0.2, random_state=0, stratify=df.get('Category'))
print(len(train), type(train), len(test), type(test))
print(test.groupby(['Category']).size().count())

988 <class 'pandas.core.frame.DataFrame'> 247 <class 'pandas.core.frame.DataFrame'>
23


#Data Preprocessing
We'll need to transform our data into a format BERT understands. This involves two steps. First, we create  `InputExample`'s using the constructor provided in the BERT library.

- `text_a` is the text we want to classify, which in this case, is the `Request` field in our Dataframe. 
- `text_b` is used if we're training a model to understand the relationship between sentences (i.e. is `text_b` a translation of `text_a`? Is `text_b` an answer to the question asked by `text_a`?). This doesn't apply to our task, so we can leave `text_b` blank.
- `label` is the label for our example, i.e. True, False

In [12]:
# Use the InputExample class from BERT's run_classifier code to create examples from the data
train_InputExamples_t = train.apply(lambda x: bert.run_classifier.InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this example
                                                                     text_a = x['Title'], 
                                                                     text_b = None, 
                                                                     label = x['Category']), axis = 1)

test_InputExamples_t = test.apply(lambda x: bert.run_classifier.InputExample(guid=None, 
                                                                     text_a = x['Title'], 
                                                                     text_b = None, 
                                                                     label = x['Category']), axis = 1)

train_InputExamples_r = train.apply(lambda x: bert.run_classifier.InputExample(guid=None,
                                                                     text_a = x['Responsibilities'], 
                                                                     text_b = None, 
                                                                     label = x['Category']), axis = 1)

test_InputExamples_r = test.apply(lambda x: bert.run_classifier.InputExample(guid=None, 
                                                                     text_a = x['Responsibilities'], 
                                                                     text_b = None, 
                                                                     label = x['Category']), axis = 1)

print(type(train_InputExamples_t))

<class 'pandas.core.series.Series'>


Next, we need to preprocess our data so that it matches the data BERT was trained on. For this, we'll need to do a couple of things (but don't worry--this is also included in the Python library):


1. Lowercase our text (if we're using a BERT lowercase model)
2. Tokenize it (i.e. "sally says hi" -> ["sally", "says", "hi"])
3. Break words into WordPieces (i.e. "calling" -> ["call", "##ing"])
4. Map our words to indexes using a vocab file that BERT provides
5. Add special "CLS" and "SEP" tokens (see the [readme](https://github.com/google-research/bert))
6. Append "index" and "segment" tokens to each input (see the [BERT paper](https://arxiv.org/pdf/1810.04805.pdf))

Happily, we don't have to worry about most of these details.




To start, we'll need to load a vocabulary file and lowercasing information directly from the BERT tf hub module:

In [13]:
# This is a path to an uncased (all lowercase) version of BERT
BERT_MODEL_HUB = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"

def create_tokenizer_from_hub_module():
    """Get the vocab file and casing info from the Hub module."""
    with tf.Graph().as_default():
        bert_module = hub.Module(BERT_MODEL_HUB)
        tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
        with tf.Session() as sess:
            vocab_file, do_lower_case = sess.run([tokenization_info["vocab_file"],
                                                  tokenization_info["do_lower_case"]])
      
    return bert.tokenization.FullTokenizer(vocab_file=vocab_file, do_lower_case=do_lower_case)

tokenizer = create_tokenizer_from_hub_module()

W0712 04:33:32.260858 140341499602816 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.



Great--we just learned that the BERT model we're using expects lowercase data (that's what stored in tokenization_info["do_lower_case"]) and we also loaded BERT's vocab file. We also created a tokenizer, which breaks words into word pieces:

In [14]:
tokenizer.tokenize("This here's an example of using the BERT tokenizer")

['this',
 'here',
 "'",
 's',
 'an',
 'example',
 'of',
 'using',
 'the',
 'bert',
 'token',
 '##izer']

Using our tokenizer, we'll call `run_classifier.convert_examples_to_features` on our InputExamples to convert them into features BERT understands.

In [15]:
import os

# Change log level so that we can check the results after tokenization
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0' 
tf.logging.set_verbosity(tf.logging.INFO)

# We'll set sequences to be at most 128 tokens long.ß
MAX_SEQ_LENGTH_t = 32
MAX_SEQ_LENGTH_r = 128

# Convert our train and test features to InputFeatures that BERT understands.
train_features_t = bert.run_classifier.convert_examples_to_features(train_InputExamples_t, label_list, MAX_SEQ_LENGTH_t, tokenizer)
test_features_t = bert.run_classifier.convert_examples_to_features(test_InputExamples_t, label_list, MAX_SEQ_LENGTH_t, tokenizer)

train_features_r = bert.run_classifier.convert_examples_to_features(train_InputExamples_r, label_list, MAX_SEQ_LENGTH_r, tokenizer)
test_features_r = bert.run_classifier.convert_examples_to_features(test_InputExamples_r, label_list, MAX_SEQ_LENGTH_r, tokenizer)

W0712 04:33:35.704999 140341499602816 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/bert/run_classifier.py:774: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

I0712 04:33:35.710829 140341499602816 run_classifier.py:774] Writing example 0 of 988
I0712 04:33:35.712619 140341499602816 run_classifier.py:461] *** Example ***
I0712 04:33:35.713667 140341499602816 run_classifier.py:462] guid: None
I0712 04:33:35.716379 140341499602816 run_classifier.py:464] tokens: [CLS] user experience engineer intern , summer 2018 [SEP]
I0712 04:33:35.718050 140341499602816 run_classifier.py:465] input_ids: 101 5310 3325 3992 25204 1010 2621 2760 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I0712 04:33:35.719858 140341499602816 run_classifier.py:466] input_mask: 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I0712 04:33:35.722665 140341499602816 run_classifier.py:467] segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

#Creating a model

Now that we've prepared our data, let's focus on building a model. `create_model` does just this below. First, it loads the BERT tf hub module again (this time to extract the computation graph). Next, it creates a single new layer that will be trained to adapt BERT to our sentiment task (i.e. classifying whether a movie review is positive or negative). This strategy of using a mostly trained model is called [fine-tuning](http://wiki.fast.ai/index.php/Fine_tuning).

In [0]:
def create_model(is_predicting, input_ids_t, input_mask_t, segment_ids_t,
                 input_ids_r, input_mask_r, segment_ids_r,
                 labels, num_labels):
    """Creates a classification model."""
  
    bert_module = hub.Module(BERT_MODEL_HUB, trainable=True)
      
    bert_inputs_t = dict(input_ids=input_ids_t, input_mask=input_mask_t, segment_ids=segment_ids_t)
    bert_inputs_r = dict(input_ids=input_ids_r, input_mask=input_mask_r, segment_ids=segment_ids_r)
  
    bert_outputs_t = bert_module(inputs=bert_inputs_t, signature="tokens", as_dict=True)
    bert_outputs_r = bert_module(inputs=bert_inputs_r, signature="tokens", as_dict=True)
  
    # Use "pooled_output" for classification tasks on an entire sentence.
    # Use "sequence_outputs" for token-level output.
    output_layer_t = bert_outputs_t["pooled_output"]
    output_layer_r = bert_outputs_r["pooled_output"]
    print(output_layer_t.shape)
    print(output_layer_r.shape)
    output_layer = tf.concat([output_layer_t, output_layer_r], axis=1)
  
    hidden_size = output_layer.shape[-1].value
  
    # Create our own layer to tune for politeness data.
    output_weights = tf.get_variable(
        "output_weights", [num_labels, hidden_size],
        initializer=tf.truncated_normal_initializer(stddev=0.02))
  
    output_bias = tf.get_variable(
        "output_bias", [num_labels], initializer=tf.zeros_initializer())
  
    with tf.variable_scope("loss"):
  
        # Dropout helps prevent overfitting
        output_layer = tf.nn.dropout(output_layer, rate=0.1)
    
        logits = tf.matmul(output_layer, output_weights, transpose_b=True)
        logits = tf.nn.bias_add(logits, output_bias)
        log_probs = tf.nn.log_softmax(logits, axis=-1)
    
        # Convert labels into one-hot encoding
        one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)
    
        predicted_labels = tf.squeeze(tf.argmax(log_probs, axis=-1, output_type=tf.int32))
        # If we're predicting, we want predicted labels and the probabiltiies.
        if is_predicting:
            return (predicted_labels, log_probs)
    
        # If we're train/eval, compute loss between predicted and actual label
        per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
        loss = tf.reduce_mean(per_example_loss)
        return (loss, predicted_labels, log_probs)


Next we'll wrap our model function in a `model_fn_builder` function that adapts our model to work for training, evaluation, and prediction.

In [0]:
# model_fn_builder actually creates our model function
# using the passed parameters for num_labels, learning_rate, etc.
def model_fn_builder(num_labels, learning_rate, num_train_steps, num_warmup_steps):
    """Returns `model_fn` closure for TPUEstimator."""
    def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument
        """The `model_fn` for TPUEstimator."""
    
        input_ids_t = features["input_ids_t"]
        input_mask_t = features["input_mask_t"]
        segment_ids_t = features["segment_ids_t"]
        
        input_ids_r = features["input_ids_r"]
        input_mask_r = features["input_mask_r"]
        segment_ids_r = features["segment_ids_r"]
        
        label_ids = features["label_ids"]
    
        is_predicting = (mode == tf.estimator.ModeKeys.PREDICT)
        
        # TRAIN and EVAL
        if not is_predicting:
            (loss, predicted_labels, log_probs) = create_model(
              is_predicting, input_ids_t, input_mask_t, segment_ids_t,
              input_ids_r, input_mask_r, segment_ids_r, label_ids, num_labels)
      
            train_op = bert.optimization.create_optimizer(
                loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu=False)
      
            # Calculate evaluation metrics. 
            def metric_fn(label_ids, predicted_labels):
                accuracy = tf.metrics.accuracy(label_ids, predicted_labels)
                return {
                    "eval_accuracy": accuracy
                }
      
            eval_metrics = metric_fn(label_ids, predicted_labels)
      
            if mode == tf.estimator.ModeKeys.TRAIN:
              return tf.estimator.EstimatorSpec(mode=mode,
                loss=loss,
                train_op=train_op)
            else:
                return tf.estimator.EstimatorSpec(mode=mode,
                  loss=loss,
                  eval_metric_ops=eval_metrics)
        else:
            (predicted_labels, log_probs) = create_model(
              is_predicting, input_ids_t, input_mask_t, segment_ids_t,
              input_ids_r, input_mask_r, segment_ids_r, label_ids, num_labels)
      
            predictions = {
                'probabilities': log_probs,
                'labels': predicted_labels
            }
            return tf.estimator.EstimatorSpec(mode, predictions=predictions)
  
    # Return the actual model function in the closure
    return model_fn

In [0]:
# Compute train and warmup steps from batch size
# These hyperparameters are copied from this colab notebook (https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb)
BATCH_SIZE = 32
LEARNING_RATE = 2e-5
NUM_TRAIN_EPOCHS = 10.0
# Warmup is a period of time where hte learning rate 
# is small and gradually increases--usually helps training.
WARMUP_PROPORTION = 0.1
# Model configs
SAVE_CHECKPOINTS_STEPS = 500
SAVE_SUMMARY_STEPS = 100

In [0]:
# Compute # train and warmup steps from batch size
num_train_steps = int(len(train_features_t) / BATCH_SIZE * NUM_TRAIN_EPOCHS)
num_warmup_steps = int(num_train_steps * WARMUP_PROPORTION)

In [0]:
# Specify output directory and number of checkpoint steps to save
run_config = tf.estimator.RunConfig(
    model_dir=OUTPUT_DIR,
    save_summary_steps=SAVE_SUMMARY_STEPS,
    save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS)

In [21]:
model_fn = model_fn_builder(
  num_labels=len(label_list),
  learning_rate=LEARNING_RATE,
  num_train_steps=num_train_steps,
  num_warmup_steps=num_warmup_steps)

estimator = tf.estimator.Estimator(
  model_fn=model_fn,
  config=run_config,
  params={"batch_size": BATCH_SIZE})


I0712 04:34:27.435110 140341499602816 estimator.py:209] Using config: {'_model_dir': 'bert_output', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 500, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fa37b27da20>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


Next we create an input builder function that takes our training feature set (`train_features`) and produces a generator. This is a pretty standard design pattern for working with Tensorflow [Estimators](https://www.tensorflow.org/guide/estimators).

In [0]:
def input_fn_builder(features_t, features_r, seq_length_t, seq_length_r, is_training, drop_remainder):
    """Creates an `input_fn` closure to be passed to TPUEstimator."""
  
    all_input_ids_t = []
    all_input_mask_t = []
    all_segment_ids_t = []
    
    all_input_ids_r = []
    all_input_mask_r = []
    all_segment_ids_r = []
    
    all_label_ids = []
  
    for feature_t in features_t:
        all_input_ids_t.append(feature_t.input_ids)
        all_input_mask_t.append(feature_t.input_mask)
        all_segment_ids_t.append(feature_t.segment_ids)
        all_label_ids.append(feature_t.label_id)
        
    for feature_r in features_r:
        all_input_ids_r.append(feature_r.input_ids)
        all_input_mask_r.append(feature_r.input_mask)
        all_segment_ids_r.append(feature_r.segment_ids)
  
    def input_fn(params):
        """The actual input function."""
        batch_size = params["batch_size"]
    
        num_examples = len(features_t)
    
        # This is for demo purposes and does NOT scale to large data sets. We do
        # not use Dataset.from_generator() because that uses tf.py_func which is
        # not TPU compatible. The right way to load data is with TFRecordReader.
        d = tf.data.Dataset.from_tensor_slices({
            "input_ids_t":
                tf.constant(
                    all_input_ids_t, shape=[num_examples, seq_length_t],
                    dtype=tf.int32),
            "input_mask_t":
                tf.constant(
                    all_input_mask_t,
                    shape=[num_examples, seq_length_t],
                    dtype=tf.int32),
            "segment_ids_t":
                tf.constant(
                    all_segment_ids_t,
                    shape=[num_examples, seq_length_t],
                    dtype=tf.int32),
            "input_ids_r":
                tf.constant(
                    all_input_ids_r, shape=[num_examples, seq_length_r],
                    dtype=tf.int32),
            "input_mask_r":
                tf.constant(
                    all_input_mask_r,
                    shape=[num_examples, seq_length_r],
                    dtype=tf.int32),
            "segment_ids_r":
                tf.constant(
                    all_segment_ids_r,
                    shape=[num_examples, seq_length_r],
                    dtype=tf.int32),
            "label_ids":
                tf.constant(all_label_ids, shape=[num_examples], dtype=tf.int32),
        })
    
        if is_training:
            d = d.repeat()
            d = d.shuffle(buffer_size=100)
    
        d = d.batch(batch_size=batch_size, drop_remainder=drop_remainder)
        return d
  
    return input_fn

In [0]:
# Create an input function for training. drop_remainder = True for using TPUs.
train_input_fn = input_fn_builder(
    features_t=train_features_t,
    features_r=train_features_r,
    seq_length_t=MAX_SEQ_LENGTH_t,
    seq_length_r=MAX_SEQ_LENGTH_r,
    is_training=True,
    drop_remainder=False)

Now we train our model! For me, using a Colab notebook running on Google's GPUs, my training time was about 14 minutes.

In [24]:
print(f'Beginning Training!')
current_time = datetime.now()
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
print("Training took time ", datetime.now() - current_time)

W0712 04:34:48.422138 140341499602816 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


Beginning Training!


I0712 04:34:49.119671 140341499602816 estimator.py:1145] Calling model_fn.
I0712 04:34:51.884239 140341499602816 saver.py:1499] Saver not created because there are no variables in the graph to restore
I0712 04:34:52.656802 140341499602816 saver.py:1499] Saver not created because there are no variables in the graph to restore
W0712 04:34:52.841491 140341499602816 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/bert/optimization.py:27: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

W0712 04:34:52.843405 140341499602816 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/bert/optimization.py:32: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.

W0712 04:34:52.849877 140341499602816 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/learning_rate_schedule.py:409: d

(?, 768)
(?, 768)


W0712 04:34:53.257192 140341499602816 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:1205: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0712 04:35:00.017908 140341499602816 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/bert/optimization.py:117: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
I0712 04:35:04.428295 140341499602816 estimator.py:1147] Done calling model_fn.
I0712 04:35:04.431043 140341499602816 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
I0712 04:35:08.647073 140341499602816 monitored_session.py:240] Graph was finalized.
I0712 04:35:14.779228 140341499602816 session_manager.py:500] Running lo

Training took time  0:07:04.551520


Now let's use our test data to see how well our model did:

In [0]:
test_input_fn = input_fn_builder(
    features_t=test_features_t,
    features_r=test_features_r,
    seq_length_t=MAX_SEQ_LENGTH_t,
    seq_length_r=MAX_SEQ_LENGTH_r,
    is_training=False,
    drop_remainder=False)

In [26]:
estimator.evaluate(input_fn=test_input_fn, steps=None)

I0712 04:44:32.976675 140341499602816 estimator.py:1145] Calling model_fn.
I0712 04:44:36.276080 140341499602816 saver.py:1499] Saver not created because there are no variables in the graph to restore
I0712 04:44:37.108463 140341499602816 saver.py:1499] Saver not created because there are no variables in the graph to restore


(?, 768)
(?, 768)


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
I0712 04:44:49.451953 140341499602816 estimator.py:1147] Done calling model_fn.
I0712 04:44:49.477385 140341499602816 evaluation.py:255] Starting evaluation at 2019-07-12T04:44:49Z
I0712 04:44:50.854363 140341499602816 monitored_session.py:240] Graph was finalized.
W0712 04:44:50.860321 140341499602816 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
I0712 04:44:50.868097 140341499602816 saver.py:1280] Restoring parameters from bert_output/model.ckpt-308
I0712 04:44:54.765058 140341499602816 session_manager.py:500] Running local_init_op.
I0712 04:44:55.041480 140341499602816 session_manager.py:502] Done running local_init_op.
I0712 04:45:00.07152

{'eval_accuracy': 0.87449396, 'global_step': 308, 'loss': 0.5120736}