### Name: Youdan Zhang
### The code was adapted from https://github.com/ishaangrover/Bert-For-sentence-pair/blob/master/keras-bert-sentence-pair.ipynb

# 1. Introduction for the project 

**TensorFlow Hub** is a machine learning model library that can be readily fine-tuned and deployed anywhere trained models such as BERT and Faster R-CNN 

**BERT** is known as: a bi-directional encoder representation from Transformer. BERT has had great success on various tasks in NLP (Natural Language Processing). They compute vector space representations of natural language suitable for use in deep learning models. the BERT family of models uses the Transformer encoder architecture to process each token of the input text in the full context of all tokens before and after. This tutorial describes how to use BERT and Tensorflow hub for natural language processing.

First,install bert-tensorflow 1.0.1 and tensorflow 1.15.0 to fit the code environment


```
!pip install bert-tensorflow == 1.0.1
!pip install tensorflow == 1.15.0
```
This tutorial briefly demonstrates how to integrate BERT from a tensorflow hub into a custom Keras layer that can be integrated directly into a Keras or tensorflow model.  We use arbitrarily defined trains ,tests and devs to train the model just for a demo.
For the parts of the code, I have explained the code feature line by line, and for some of the code, I have explained the code by block because it has the same feature.

For details on how to use the bert module see **Step 5** for tagging text, creating input_ids, input_masks and segment_ids, and using the tf-hub module, which simplifies preprocessing. and **Section 4**






### For more specialized tutorial code templates see:https://github.com/strongio/keras-bert




In [None]:

!pip install tensorflow==1.15.0

Collecting tensorflow==1.15.0
  Downloading tensorflow-1.15.0-cp37-cp37m-manylinux2010_x86_64.whl (412.3 MB)
[K     |████████████████████████████████| 412.3 MB 25 kB/s 
[?25hCollecting tensorflow-estimator==1.15.1
  Downloading tensorflow_estimator-1.15.1-py2.py3-none-any.whl (503 kB)
[K     |████████████████████████████████| 503 kB 58.1 MB/s 
Collecting keras-applications>=1.0.8
  Downloading Keras_Applications-1.0.8-py3-none-any.whl (50 kB)
[K     |████████████████████████████████| 50 kB 6.7 MB/s 
Collecting gast==0.2.2
  Downloading gast-0.2.2.tar.gz (10 kB)
Collecting tensorboard<1.16.0,>=1.15.0
  Downloading tensorboard-1.15.0-py3-none-any.whl (3.8 MB)
[K     |████████████████████████████████| 3.8 MB 28.1 MB/s 
Building wheels for collected packages: gast
  Building wheel for gast (setup.py) ... [?25l[?25hdone
  Created wheel for gast: filename=gast-0.2.2-py3-none-any.whl size=7554 sha256=6d1dcd7ce4e91ecf71ca9e98d6d3849ba0b85a9bb34570bd6a65898a0315d458
  Stored in directory

In [None]:
!pip install bert-tensorflow==1.0.1

Collecting bert-tensorflow==1.0.1
  Downloading bert_tensorflow-1.0.1-py2.py3-none-any.whl (67 kB)
[?25l[K     |████▉                           | 10 kB 22.9 MB/s eta 0:00:01[K     |█████████▊                      | 20 kB 28.0 MB/s eta 0:00:01[K     |██████████████▋                 | 30 kB 19.1 MB/s eta 0:00:01[K     |███████████████████▍            | 40 kB 7.1 MB/s eta 0:00:01[K     |████████████████████████▎       | 51 kB 5.8 MB/s eta 0:00:01[K     |█████████████████████████████▏  | 61 kB 6.8 MB/s eta 0:00:01[K     |████████████████████████████████| 67 kB 3.2 MB/s 
Installing collected packages: bert-tensorflow
Successfully installed bert-tensorflow-1.0.1


# 2.Imports package and Initialization
**Step1 :** Import the packages to be used for the code and initialize tensorflow session to run TensorFlow operations.

In [None]:
import tensorflow as tf
import pandas as pd
import tensorflow_hub as hub # Here's how we get access to several newer features
import os
import re
import numpy as np
from bert.tokenization import FullTokenizer#import the bert tokenizer 
from tqdm import tqdm_notebook
from tensorflow.keras import backend as K

# Initialize session
sess = tf.Session()

# Params for bert model and tokenization
bert_path = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"
max_seq_length = 256

**Step2:**  Initialize train ,validation and test sentences

train sentence use to train the model, dev sentence is the
 validation data to test the valid accurancy, test data is to test model

In [None]:
train_sentence_1 = ["hello world", "I am awesome"] #initialize train sentence
train_sentence_2 = ["phone kjh", "odjeu"]
train_labels = [[0], [1]]

dev_sentence_1 = ["jdbhd", "kjbdbc"]#initialize dev sentence
dev_sentence_2 = ["jkbjnf", "ouhdhd"]
dev_labels = [[0], [1]]

test_sentence_1 = ["lhdlihd", "uhkbd"]#initialize test sentence
test_sentence_2 = ["khfbbf", "ubdbdf"]
test_labels = [[0], [1]]

# 3.Tokenize and set up input data 
This class is a fake Example instead of None

In [None]:
class PaddingInputExample(object):
    """This is a Fake example so the num input examples is a multiple of the batch size.
  When running eval/predict on the TPU, we need to pad the number of examples
  to be a multiple of the batch size, because the TPU requires a fixed batch
  size. The alternative is to drop the last batch, which is bad because it means
  the entire output data won't be generated.
  We use this class instead of `None` because treating `None` as padding
  battches could cause silent errors.
  """

**Step4:**  simple sequence classification

In [None]:
class InputExample(object):
    """A single training/test example for simple sequence classification."""

    def __init__(self, guid, text_a, text_b=None, label=None):
        """Constructs a InputExample.
    Args:
      guid: Unique id for the example.
      text_a: string. The untokenized text of the first sequence. For single
        sequence tasks, only this sequence must be specified.
      text_b: (Optional) string. The untokenized text of the second sequence.
        Only must be specified for sequence pair tasks.
      label: (Optional) string. The label of the example. This should be
        specified for train and dev examples, but not for test examples.
    """
    #initialize variables 
        self.guid = guid #Unique id for the example
        self.text_a = text_a # text_a for the input Example
        self.text_b = text_b# text_b for the input Example
        self.label = label$#label for the input example


**Step5:** This step is to Tokenize the text to create input_ids, input_masks, and segment_ids.  using the tf-hub module, which simplifies preprocessing

```
bert_module =  hub.Module(bert_path)
```
A module represents a part of a TensorFlow graph that can be exported to disk (in SavedModel format) and later reloaded.

**An example of hub.module**

```
hub.Module(
    spec, trainable=False, name='module', tags=None
)
```


FullTokenizer is a function in bert runs end-to-end tokenziation.


In [None]:
def create_tokenizer_from_hub_module():
    #Get the vocab file and casing info from the Hub module.
    bert_module =  hub.Module(bert_path)
    #tokenlize the  vocab_file and do_lower_case
    tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
  #run tokenization session
    vocab_file, do_lower_case = sess.run(
        [
            tokenization_info["vocab_file"],
            tokenization_info["do_lower_case"],
        ]
    )
 #return the FullTokenizer from bert to create tokenizer
    return FullTokenizer(vocab_file=vocab_file, do_lower_case=do_lower_case)

**Step6:**  Function to convert the InputExample instance to Input Features, to create input_ids, input_mask, segment_ids and example.label
The return value are all matrix and filled the unused positions with 0, to make the shape the same as max_seq_length.

In [None]:
def convert_single_example(tokenizer, example, max_seq_length=256):
    """Converts a single `InputExample` into a single `InputFeatures`."""
    #determine the example is instance and convert it to input features with input_ids, input_masks,segment_ids and label.
    if isinstance(example, PaddingInputExample):
        input_ids = [0] * max_seq_length
        input_mask = [0] * max_seq_length
        segment_ids = [0] * max_seq_length
        label = 0
        return input_ids, input_mask, segment_ids, label
    
    tokens_a = tokenizer.tokenize(example.text_a)
    if len(tokens_a) > max_seq_length - 2:
        tokens_a = tokens_a[0 : (max_seq_length - 2)]
    #define token list
    tokens = []
   #define segment_id list
    segment_ids = []
    tokens.append("[CLS]")
    segment_ids.append(0)
    # insert tokens to token list and segment_id to segment_id list
    for token in tokens_a:
        tokens.append(token)
        segment_ids.append(0)
    tokens.append("[SEP]")
    segment_ids.append(0)
  #convert tokens to input_ids
    input_ids = tokenizer.convert_tokens_to_ids(tokens)

    # The mask has 1 for real tokens and 0 for padding tokens. Only real
    # tokens are attended to.
    # this is to change the padding token to real token
    input_mask = [1] * len(input_ids)

    # Zero-pad up to the sequence length.
    while len(input_ids) < max_seq_length:
        input_ids.append(0)
        input_mask.append(0)
        segment_ids.append(0)
    #make the inputs fits the max sequence
    assert len(input_ids) == max_seq_length
    assert len(input_mask) == max_seq_length
    assert len(segment_ids) == max_seq_length

    return input_ids, input_mask, segment_ids, example.label

**Step7:** This is a function convert InputExample to list of Input Features . the type of inputFeatures is ndarray. **tqdm** is used in this function to print a **smart progress meter** during the process of the list generate.

In [None]:
def convert_examples_to_features(tokenizer, examples, max_seq_length=256):
   #Convert a set of `InputExample`s to a list of `InputFeatures`.
   #same as the previous block, a feature contans  input_id, input_mask, segment_id and label
    input_ids, input_masks, segment_ids, labels = [], [], [], []
  #iterate throught the tqdm notebook and convert each single example to feature.
    for example in tqdm_notebook(examples, desc="Converting examples to features"):
        input_id, input_mask, segment_id, label = convert_single_example(
            tokenizer, example, max_seq_length
        )
    #append the feater member to the lists
        input_ids.append(input_id)
        input_masks.append(input_mask)
        segment_ids.append(segment_id)
        labels.append(label)
#return the feature members array
    return (
        np.array(input_ids),
        np.array(input_masks),
        np.array(segment_ids),
        np.array(labels).reshape(-1, 1),#reshape the label to fit the others shape
    )

**Step8:** use class InputExample to convert the sentences list to list of instance. It is a simple loop to append the instance to a list

In [None]:
def convert_text_to_examples(sentence_1, sentence_2, labels):
    """Create InputExamples"""
    #define an input examples list and appends InputExample to the list
    InputExamples = []
    #for each index value and element value in the labels , use index value as the text index and element value as the label
    for i, ele in enumerate(labels):
      #appends InputExample to the list
        InputExamples.append(
            InputExample(guid=None, text_a=sentence_1[i], text_b=sentence_2[i], label=ele)
        )
    return InputExamples

In [None]:

# Instantiate tokenizer
tokenizer = create_tokenizer_from_hub_module()

# Convert data to InputExample format
train_examples = convert_text_to_examples(train_sentence_1, train_sentence_2, train_labels)
dev_examples = convert_text_to_examples(dev_sentence_1, dev_sentence_2, dev_labels)
test_examples = convert_text_to_examples(test_sentence_1, test_sentence_2, test_labels)

# Convert to features
(train_input_ids, train_input_masks, train_segment_ids, train_labels 
) = convert_examples_to_features(tokenizer, train_examples, max_seq_length=max_seq_length)

(dev_input_ids, dev_input_masks, dev_segment_ids, dev_labels 
) = convert_examples_to_features(tokenizer, dev_examples, max_seq_length=max_seq_length)

(test_input_ids, test_input_masks, test_segment_ids, test_labels
) = convert_examples_to_features(tokenizer, test_examples, max_seq_length=max_seq_length)

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore
Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  """


Converting examples to features:   0%|          | 0/2 [00:00<?, ?it/s]

<class 'list'>
<class 'list'>


Converting examples to features:   0%|          | 0/2 [00:00<?, ?it/s]

<class 'list'>
<class 'list'>


Converting examples to features:   0%|          | 0/2 [00:00<?, ?it/s]

<class 'list'>
<class 'list'>


In [None]:
#test an element type of train examples
type(train_examples[1])

__main__.InputExample

In [None]:
#test the type of the output
type(train_examples),type(train_input_ids),type(train_input_masks), type(train_segment_ids),type(train_labels) 

(list, numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray)

In [None]:
#test the 'train' feature instance shapes
train_input_ids.shape,train_input_masks.shape,train_segment_ids.shape,train_labels.shape

((2, 256), (2, 256), (2, 256), (2, 1))

In [None]:
test_input_ids.shape,test_input_masks.shape,test_segment_ids.shape,test_labels.shape# test 'test' the feature instance shapes

((2, 256), (2, 256), (2, 256), (2, 1))

In [None]:
dev_input_ids.shape,dev_input_masks.shape,dev_segment_ids.shape,dev_labels.shape# test 'dev' the feature instance shapes

((2, 256), (2, 256), (2, 256), (2, 1))

# 4. Build Bert Layer 
**Step 10**: build the bert layer. Tutorial in the comment line by line


In [None]:
class BertLayer(tf.keras.layers.Layer):
    def __init__(
        self,
        n_fine_tune_layers=10,
        pooling="first",
        bert_path="https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1",
        **kwargs,
    ):#Constructor for the layer
    # define and set the variables for the layer parameters
        self.n_fine_tune_layers = n_fine_tune_layers
        self.trainable = True
        self.output_size = 768#set output size 
        self.pooling = pooling
        self.bert_path = bert_path
        if self.pooling not in ["first", "mean"]:#Exception Handler
            raise NameError(
                f"Undefined pooling type (must be either first or mean, but is {self.pooling}"
            )

        super(BertLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.bert = hub.Module(
            self.bert_path, trainable=self.trainable, name=f"{self.name}_module"
        )
        # use hub_module to load the bert Module. the purpose of hub.Module demonstrated in the second part of the tutorial.
      
        # Remove unused layers
        trainable_vars = self.
        bert.variables # the varibles in the bert module is the trainable module
        if self.pooling == "first":
            trainable_vars = [var for var in trainable_vars if not "/cls/" in var.name]
            trainable_layers = ["pooler/dense"]
        #cls special classification token
        #Pooler takes the output representation corresponding to the first token and uses it for downstream tasks.
        elif self.pooling == "mean":
            trainable_vars = [
                var
                for var in trainable_vars
                if not "/cls/" in var.name and not "/pooler/" in var.name
            ]
            trainable_layers = []
        else:
            raise NameError(
                f"Undefined pooling type (must be either first or mean, but is {self.pooling}"
            )

        # Select how many layers to fine tune
        for i in range(self.n_fine_tune_layers):
            trainable_layers.append(f"encoder/layer_{str(11 - i)}")#Because n_fine_tune_layers =10 
            #we use 11-i to select how many layers to fine tume

        # Update trainable vars to contain only the specified layers
        trainable_vars = [
            var
            for var in trainable_vars
            if any([l in var.name for l in trainable_layers])
        ]

        # Add to trainable weights
        for var in trainable_vars:
            self._trainable_weights.append(var)
        #Add non trainable weights
        for var in self.bert.variables:
            if var not in self._trainable_weights:
                self._non_trainable_weights.append(var)

        super(BertLayer, self).build(input_shape)

    def call(self, inputs):
        inputs = [K.cast(x, dtype="int32") for x in inputs]#Casts a tensor to int32 type and returns it
        input_ids, input_mask, segment_ids = inputs
        bert_inputs = dict(
            input_ids=input_ids, input_mask=input_mask, segment_ids=segment_ids
        )# define bert layer inputs

        #pooling layer using bert
        if self.pooling == "first":
            pooled = self.bert(inputs=bert_inputs, signature="tokens", as_dict=True)[
                "pooled_output"
            ]
        elif self.pooling == "mean":
            result = self.bert(inputs=bert_inputs, signature="tokens", as_dict=True)[
                "sequence_output"
            ]

            mul_mask = lambda x, m: x * tf.expand_dims(m, axis=-1)#adds x inner most dimension to m
            masked_reduce_mean = lambda x, m: tf.reduce_sum(mul_mask(x, m), axis=1) / (
                    tf.reduce_sum(m, axis=1, keepdims=True) + 1e-10)#Reduces mul_mask(x, m) along the dimensions given in 1 divide by
                   # Reduces m along the dimensions given in 1 to get the mean.
            input_mask = tf.cast(input_mask, tf.float32)
            pooled = masked_reduce_mean(result, input_mask)
        else:
            raise NameError(f"Undefined pooling type (must be either first or mean, but is {self.pooling}")

        return pooled
    #output shape
    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_size)

step 11: Build the model. The model has 3 inputs id,mask and segment, represent as a list of 3 for input layer. We use Bert layer for the output layer. We use relu and sigmoid as the dense layer activation function

In [None]:
# Build model
def build_model(max_seq_length): 
  #use the max_seq_length for the input shape parameter in order to let the maximum length input to fit the model
    in_id = tf.keras.layers.Input(shape=(max_seq_length,), name="input_ids")
    in_mask = tf.keras.layers.Input(shape=(max_seq_length,), name="input_masks")
    in_segment = tf.keras.layers.Input(shape=(max_seq_length,), name="segment_ids")
    # define the input for layer
    bert_inputs = [in_id, in_mask, in_segment]
     # define the output for layer
    bert_output = BertLayer(n_fine_tune_layers=3, pooling="first")(bert_inputs)
     # define the dense layer
    dense = tf.keras.layers.Dense(256, activation='relu')(bert_output) 
     # define the another dense layer for predict
    pred = tf.keras.layers.Dense(1, activation='sigmoid')(dense)
    #build and complie the model
    model = tf.keras.models.Model(inputs=bert_inputs, outputs=pred)
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])#compile model
    model.summary()#summarize model
    
    return model

def initialize_vars(sess):
  #after running tf.global_variables_initializer() in a session,
  #your variables will hold the values you told them to hold when you declared them.
    sess.run(tf.local_variables_initializer())
    sess.run(tf.global_variables_initializer())
    sess.run(tf.tables_initializer())#initializes all tables of the default graph.
    K.set_session(sess)#Sets the global TensorFlow session.


# 5. Run the model and make predicition for the test pairs

In [None]:
model = build_model(max_seq_length)

# Instantiate variables
initialize_vars(sess)
# fit the model
model.fit(
    [train_input_ids, train_input_masks, train_segment_ids], 
    train_labels,
    validation_data=([dev_input_ids, dev_input_masks, dev_segment_ids], dev_labels),
    epochs=1,
    batch_size=1
)#because the training set only has 2 sentences so we set the epoch and batch to 1

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Model: "model_7"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_ids (InputLayer)          [(None, 256)]        0                                            
__________________________________________________________________________________________________
input_masks (InputLayer)        [(None, 256)]        0                                            
__________________________________________________________________________________________________
segment_ids (InputLayer)        [(None, 256)]        0                                            
__________________________________________________________________________________________________
bert_layer_7 (BertLayer)        (None, 768)          110104890   input_ids[0][0]                  
                                                                 input_masks[0][0]          

<tensorflow.python.keras.callbacks.History at 0x7f38aaa13690>

In [None]:
pre_save_preds = model.predict([test_input_ids[0:100], 
                                test_input_masks[0:100], 
                                test_segment_ids[0:100]]
                              ) # predictions 

In [None]:
pre_save_preds# save the predictions


array([[0.8626769],
       [0.859957 ]], dtype=float32)