<h4>Note : BERT base has 12 encoders and BERT Large has 24 encoder.</h4> <h5>We will be using BERT base with 12 encoders </h5>

In [2]:
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text as text

In [3]:
encoder_url = "https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4"
preprocess_url = "https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3"

<h4> We will 1st create a preprocessing layer, for preprocessing our text </h4>

In [4]:
#For preprocessing. We can supply bunch of statements, and it will do preprocessing.
bert_preprocess_model = hub.KerasLayer(preprocess_url)

In [5]:
text_test = ['nice movie indeed','I love python programming']
text_preprocessed = bert_preprocess_model(text_test)

<p>This will return dictionary with <b>input_mask </b>, <b>input_type_ids</b>, and <b>input_word_ids</b></p>

In [6]:
text_preprocessed.keys()

dict_keys(['input_type_ids', 'input_mask', 'input_word_ids'])

<h4>input_mask</h4>
<p>It is of shape (2,128). 2 is no of sentence, and 128 length of text.</p>
Note : For 1st sentence there are 5 1's. It treats 1st sentence as <b>CLS nice movie indeed SEP </b>

In [7]:
text_preprocessed['input_mask']

<tf.Tensor: shape=(2, 128), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])>

<h4>input_type_ids</h4>
<p>It is also of shape (2,128). 2 is no of sentence, and 128 length of text.</p>
They are useful when you have multiple sentence in one statement. For our case, all value will be 0.

In [8]:
text_preprocessed['input_type_ids']

<tf.Tensor: shape=(2, 128), dtype=int32, numpy=
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])>

<h4>input_word_ids</h4>
<p>It is also of shape (2,128). 2 is no of sentence, and 128 length of text.</p>
It indicates word id of each word in sentence. For first sentence, <b>word_id of CLS = 101, nice = 3835, movie = 3185, indeed = 5262, SEP = 102 </b>

In [9]:
text_preprocessed['input_word_ids']

<tf.Tensor: shape=(2, 128), dtype=int32, numpy=
array([[  101,  3835,  3185,  5262,   102,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0, 

<h4> Now the preprocessing is done. We can create encoding layer.</h4>

In [10]:
bert_model = hub.KerasLayer(encoder_url)

<h5>We can supply pre processed text to this encoding layer. This encoding layer will generate sentence or word embeddings. </h5>

In [11]:
bert_results = bert_model(text_preprocessed)

This will return dictionary with 3 keys. <b>encoder_outputs, pooled_output, sequence_output</b> 

In [12]:
bert_results.keys()

dict_keys(['default', 'pooled_output', 'encoder_outputs', 'sequence_output'])

<h4>pooled_output</h4>
<b>pooled_output</b> is an embedding for entire sentence. The resulting tensor will be of shape <b>(2,768) </b>.Here 2 is no of sentence and <b>768</b> is dimesnion of embedding. This each 768 vector accurately represent each statements.

In [13]:
bert_results['pooled_output']

<tf.Tensor: shape=(2, 768), dtype=float32, numpy=
array([[-0.79177445, -0.21411942,  0.49769488, ...,  0.24465126,
        -0.47334498,  0.8175873 ],
       [-0.9171231 , -0.4793517 , -0.78656983, ..., -0.6175176 ,
        -0.7102685 ,  0.92184293]], dtype=float32)>

<h4>sequence_output</h4>
<b>sequence_output</b> is individual word embedding vectors. <b>Note: For each word inside sentence, it was 768 size word embedding vector</b>.
Hence, the shape of tensor is <b>(2,128,768)</b> . Here, <b>2 = no of sentence </b>, <b>128 = no of words in each sentence </b>,and  <b>768 = word embedding for each word in sentence </b>.

In [14]:
bert_results['sequence_output'] 

<tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy=
array([[[ 0.07292067,  0.08567819,  0.14476839, ..., -0.09677105,
          0.08722159,  0.07711076],
        [ 0.17839423, -0.19006088,  0.5034951 , ..., -0.05869836,
          0.32717168, -0.15578607],
        [ 0.18701434, -0.43388814, -0.48875174, ..., -0.15502723,
          0.00145242, -0.24470958],
        ...,
        [ 0.12083033,  0.12884216,  0.4645349 , ...,  0.07375568,
          0.17441967,  0.16522148],
        [ 0.07967912, -0.01190673,  0.50225425, ...,  0.13777754,
          0.21002257,  0.00624568],
        [-0.07212678, -0.28303456,  0.5903342 , ...,  0.4755191 ,
          0.16668472, -0.08920309]],

       [[-0.07900576,  0.36335146, -0.21101616, ..., -0.17183737,
          0.16299757,  0.6724266 ],
        [ 0.2788348 ,  0.4371632 , -0.35764787, ..., -0.04463551,
          0.3831522 ,  0.5887987 ],
        [ 1.2037671 ,  1.0727023 ,  0.4840871 , ...,  0.24921003,
          0.40730935,  0.40481764],
        ...,

<h4>encoder_outputs</h4>.
If we look at length of encoder output, it is <b>12</b>. Because, we are using BERT with 12 encoder. And each layer has 768 size embedding vector.

In [15]:
len(bert_results['encoder_outputs'])

12

<p> Each of these <b>encoder_ouputs </b> will be <b>(2,128,768)</b> size vector.  Here, <b>2 = no of sentence </b>, <b>128 = no of words in each sentence </b>,and  <b>768 = word embedding for each word in sentence </b>.

In [16]:
bert_results['encoder_outputs'][0]

<tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy=
array([[[ 0.12901412,  0.00644755, -0.03614965, ...,  0.04999633,
          0.06149195, -0.02657555],
        [ 1.1753379 ,  1.2140787 ,  1.1569977 , ...,  0.11634361,
         -0.35855392, -0.40490174],
        [ 0.03859011,  0.53869987, -0.21089745, ...,  0.21858183,
          0.72601724, -1.1158607 ],
        ...,
        [-0.07587045, -0.2542191 ,  0.7075512 , ...,  0.50542   ,
         -0.18878683,  0.15028355],
        [-0.160666  , -0.28089684,  0.57597065, ...,  0.52758557,
         -0.1114136 ,  0.02887519],
        [-0.04428155, -0.20279573,  0.59093577, ...,  0.8133834 ,
         -0.390758  , -0.02601733]],

       [[ 0.18903567,  0.02752543, -0.06513736, ..., -0.0062021 ,
          0.15053876,  0.03165446],
        [ 0.5916145 ,  0.75891393, -0.07240694, ...,  0.61903995,
          0.829289  ,  0.1616199 ],
        [ 1.4460828 ,  0.4460268 ,  0.40990224, ...,  0.4825589 ,
          0.6269117 ,  0.13463363],
        ...,

The last encoding layer vector is same as the <b>sequence _output</b>

In [17]:
bert_results['encoder_outputs'][-1]

<tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy=
array([[[ 0.07292067,  0.08567819,  0.14476839, ..., -0.09677105,
          0.08722159,  0.07711076],
        [ 0.17839423, -0.19006088,  0.5034951 , ..., -0.05869836,
          0.32717168, -0.15578607],
        [ 0.18701434, -0.43388814, -0.48875174, ..., -0.15502723,
          0.00145242, -0.24470958],
        ...,
        [ 0.12083033,  0.12884216,  0.4645349 , ...,  0.07375568,
          0.17441967,  0.16522148],
        [ 0.07967912, -0.01190673,  0.50225425, ...,  0.13777754,
          0.21002257,  0.00624568],
        [-0.07212678, -0.28303456,  0.5903342 , ...,  0.4755191 ,
          0.16668472, -0.08920309]],

       [[-0.07900576,  0.36335146, -0.21101616, ..., -0.17183737,
          0.16299757,  0.6724266 ],
        [ 0.2788348 ,  0.4371632 , -0.35764787, ..., -0.04463551,
          0.3831522 ,  0.5887987 ],
        [ 1.2037671 ,  1.0727023 ,  0.4840871 , ...,  0.24921003,
          0.40730935,  0.40481764],
        ...,

In [18]:
bert_results['encoder_outputs'][-1] == bert_results['sequence_output']

<tf.Tensor: shape=(2, 128, 768), dtype=bool, numpy=
array([[[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]],

       [[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]]])>

<h2> Build a model using BERT </h2>

In [19]:
encoder_url = "https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4"
preprocess_url = "https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3"

In [20]:
bert_preprocess = hub.KerasLayer(preprocess_url)
bert_encoder = hub.KerasLayer(encoder_url)

<h3> Only updating Dense Layer. Freezing BERT Layer </h3>

In [21]:
# Bert layers
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
preprocessed_text = bert_preprocess(text_input)
outputs = bert_encoder(preprocessed_text)

# Neural network layers
l = tf.keras.layers.Dropout(0.1, name="dropout")(outputs['pooled_output'])
l = tf.keras.layers.Dense(1, activation='sigmoid', name="output")(l)

# Use inputs and outputs to construct a final model
model = tf.keras.Model(inputs=[text_input], outputs = [l])

In [22]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 text (InputLayer)              [(None,)]            0           []                               
                                                                                                  
 keras_layer_2 (KerasLayer)     {'input_type_ids':   0           ['text[0][0]']                   
                                (None, 128),                                                      
                                 'input_word_ids':                                                
                                (None, 128),                                                      
                                 'input_mask': (Non                                               
                                e, 128)}                                                      

<h3> Updating both Bert layer and Dense Layer </h3>

In [23]:
# Bert layers
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
preprocessed_text = bert_preprocess(text_input)
bert_encoder.trainable = True  # Set the trainable attribute of the BERT layer to True. It is false by default
outputs = bert_encoder(preprocessed_text)

# Neural network layers
l = tf.keras.layers.Dropout(0.1, name="dropout")(outputs['pooled_output'])
l = tf.keras.layers.Dense(1, activation='sigmoid', name="output")(l)

# Use inputs and outputs to construct a final model
model = tf.keras.Model(inputs=[text_input], outputs = [l])

In [24]:
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 text (InputLayer)              [(None,)]            0           []                               
                                                                                                  
 keras_layer_2 (KerasLayer)     {'input_type_ids':   0           ['text[0][0]']                   
                                (None, 128),                                                      
                                 'input_word_ids':                                                
                                (None, 128),                                                      
                                 'input_mask': (Non                                               
                                e, 128)}                                                    

In [1]:
import tensorflow
import torch

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
tensorflow.__version__

'2.10.1'

In [3]:
torch.__version__

'1.12.1'

In [4]:
torch.cuda.is_available()

True

In [5]:
import tensorflow
# Check if GPU is available
if tensorflow.test.is_gpu_available():
    print('GPU is available')
else:
    print('GPU is NOT available')

Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
GPU is available


In [6]:
import tensorflow as tf

# Get a list of available GPUs
gpus = tf.config.list_physical_devices('GPU')

# Check if any GPUs are available
if gpus:
    for gpu in gpus:
        print(f'GPU: {gpu.name}')
else:
    print('No GPUs detected')

GPU: /physical_device:GPU:0


In [7]:
import tensorflow as tf
tf.test.is_gpu_available(cuda_only = False, min_cuda_compute_capability = None
)

True

In [8]:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
print(gpus)
for gpu in gpus:
    print("Name:", gpu.name, "  Type:", gpu.device_type)

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Name: /physical_device:GPU:0   Type: GPU
