<a href="https://colab.research.google.com/github/vinay10949/AnalyticsAndML/blob/master/Kaggle/NLP/BERT/BERT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
# install huggingface Transformers [https://huggingface.co/transformers/installation.html]

# Many transformer based models in a single library: https://github.com/huggingface/transformers#model-architectures
! pip install transformers

# This week: we will use HuggingFace BERT implementations.
# Next sessions: Build an encoder-decoder seq-seq Transfomer from scratch using TF/Keras.



In [0]:
# Reference: https://medium.com/tensorflow/using-tensorflow-2-for-state-of-the-art-natural-language-processing-102445cda54a
# Ref: https://huggingface.co/transformers/notebooks.html

In [0]:
%tensorflow_version 2.x
import tensorflow as tf
print(tf.__version__)

2.2.0


## Tokenization

In [0]:
# Tokenization: map words to ids
# Refer: https://colab.research.google.com/github/huggingface/transformers/blob/master/notebooks/01-training-tokenizers.ipynb#scrollTo=LgktNYt7ADPS

# simple example
s = "very long corpus..."
words = s.split(" ")  # Split over space
vocabulary = dict(enumerate(set(words)))  # Map storing the word to it's corresponding id

print(vocabulary)

# Problems: cat(1123) vs cats(1346)

{0: 'corpus...', 1: 'very', 2: 'long'}


### Sub-tokenization

- Why? : fast vs faster, cat vs cats
- example: cats --**bold text**> [cat, ##s]
- Image: https://nlp.fast.ai/images/multifit_vocabularies.png

<img src="https://nlp.fast.ai/images/multifit_vocabularies.png" alt="Smiley face" height="75%" width="75%">


### Tokenization in huggingface
**bold text**

In [0]:
from transformers import BertTokenizer
bert_tokenizer = BertTokenizer.from_pretrained("bert-base-cased") 

In [0]:
# Refer BERT architecture from the previous videos in the course.

#https://huggingface.co/transformers/main_classes/tokenizer.html
print(bert_tokenizer.cls_token)

[CLS]


In [0]:
enc = bert_tokenizer.encode("Hi, I am James bond !")
print(enc)

print(bert_tokenizer.decode(enc))

[101, 8790, 117, 146, 1821, 1600, 7069, 106, 102]
[CLS] Hi, I am James bond! [SEP]


In [0]:
print(bert_tokenizer.decode([117]))
print(bert_tokenizer.decode([106]))

,
!


In [0]:
enc = bert_tokenizer.encode("I see many cats and dogs")
print(enc)

print(bert_tokenizer.decode(enc))

[101, 146, 1267, 1242, 11771, 1105, 6363, 102]
[CLS] I see many cats and dogs [SEP]


## BERT Models
- DistillBERT
- RoBERTa
- https://miro.medium.com/max/2000/1*IFVX74cEe8U5D1GveL1uZA.png 
<img src="https://miro.medium.com/max/2000/1*IFVX74cEe8U5D1GveL1uZA.png " alt="Smiley face" height="75%" width="75%">

- https://miro.medium.com/max/1400/1*bSUO_Qib4te1xQmBlQjWaw.png
<img src="https://miro.medium.com/max/1400/1*bSUO_Qib4te1xQmBlQjWaw.png " alt="Smiley face" height="75%" width="75%">

- General Language Understanding Evaluation (GLUE)  : https://gluebenchmark.com/


In [0]:
import tensorflow as tf

# Refer: https://huggingface.co/transformers/model_doc/distilbert.html#

from transformers import DistilBertTokenizer, TFDistilBertModel

distil_bert = 'distilbert-base-uncased' # Name of the pretrained models

#DistilBERT 
tokenizer = DistilBertTokenizer.from_pretrained(distil_bert)
model = TFDistilBertModel.from_pretrained(distil_bert)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=363423424.0, style=ProgressStyle(descri…




### Extract features using BERT

In [0]:
# obtain the 768-dim vector correpsoding to [CLS] which is a sentence vector

e = tokenizer.encode("Hello, my dog is cute")
print(e)

input = tf.constant(e)[None, :]  # Batch size 1 
print(input)
print(type(input)) # shape: [1,8]

output = model(input)

print(type(output))
print(len(output))
print(output) #shape[1,8,768]


[101, 7592, 1010, 2026, 3899, 2003, 10140, 102]
tf.Tensor([[  101  7592  1010  2026  3899  2003 10140   102]], shape=(1, 8), dtype=int32)
<class 'tensorflow.python.framework.ops.EagerTensor'>
<class 'tuple'>
1
(<tf.Tensor: shape=(1, 8, 768), dtype=float32, numpy=
array([[[-1.82963982e-01, -7.40541294e-02,  5.02676778e-02, ...,
         -1.12606876e-01,  4.44931060e-01,  4.09413189e-01],
        [ 7.05925748e-04,  1.48253545e-01,  3.43282759e-01, ...,
         -8.60396028e-02,  6.94747746e-01,  4.33529206e-02],
        [-5.07205963e-01,  5.30855298e-01,  3.71626914e-01, ...,
         -5.62874496e-01,  1.37557149e-01,  2.84752548e-01],
        ...,
        [-4.22513545e-01,  5.73149137e-02,  2.43383065e-01, ...,
         -1.52226850e-01,  2.44624346e-01,  6.41548634e-01],
        [-4.93844777e-01, -1.88954517e-01,  1.26408130e-01, ...,
          6.32405952e-02,  3.69128466e-01, -5.82522973e-02],
        [ 8.32686663e-01,  2.49482125e-01, -4.54395115e-01, ...,
          1.19975321e-01, -3

In [0]:
#[CLS] corresponding vector
print((output[0])[0,0,:])  # shape: 768 dim vector

tf.Tensor(
[-1.82963982e-01 -7.40541294e-02  5.02676778e-02 -3.49530637e-01
 -7.28532523e-02 -2.63872653e-01  2.39293337e-01  4.79842186e-01
 -2.14802593e-01 -1.89516500e-01  8.99828300e-02 -1.29189074e-01
 -1.11275867e-01  3.16634685e-01 -8.25903937e-02  9.26223695e-02
 -2.09081769e-02  4.74876165e-01  1.28833622e-01  3.18728387e-03
 -1.53505668e-01 -3.57001841e-01  9.89340246e-04 -3.92747205e-03
  1.38443485e-02 -5.49410284e-02  8.45261216e-02  1.36564597e-01
  2.18252391e-01 -1.96798936e-01  2.47997623e-02  1.75569624e-01
 -3.97218093e-02 -1.10777304e-01  5.48526347e-02  6.07530251e-02
  1.71999484e-02 -1.07415274e-01 -8.76947269e-02  2.12042019e-01
 -4.05892804e-02 -3.17958817e-02  1.37657121e-01 -1.39004737e-01
 -4.68869507e-03 -3.97633314e-01 -2.60034633e+00 -1.08741753e-01
  4.86706719e-02 -3.61387670e-01  3.71814489e-01 -7.61096478e-02
  3.23910415e-02  2.31666267e-01  2.63016045e-01  3.18299890e-01
 -3.87970746e-01  2.98111081e-01 -4.93029132e-02 -3.59300710e-02
  1.58540592e-

In [0]:
# How about hidden layer outputs

#https://huggingface.co/transformers/model_doc/distilbert.html#distilbertconfig
from transformers import  DistilBertConfig

config = DistilBertConfig.from_pretrained(distil_bert, output_hidden_states=True)


e = tokenizer.encode("Hello, my dog is cute")
input = tf.constant(e)[None, :]  # Batch size 1 
model = TFDistilBertModel.from_pretrained(distil_bert, config=config)
print(model.config) # Every model has a config file 



DistilBertConfig {
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "output_hidden_states": true,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "vocab_size": 30522
}



In [0]:
output = model(input)
print(len(output))

2


In [0]:
print(output[0])

tf.Tensor(
[[[-1.82963982e-01 -7.40541294e-02  5.02676778e-02 ... -1.12606876e-01
    4.44931060e-01  4.09413189e-01]
  [ 7.05925748e-04  1.48253545e-01  3.43282759e-01 ... -8.60396028e-02
    6.94747746e-01  4.33529206e-02]
  [-5.07205963e-01  5.30855298e-01  3.71626914e-01 ... -5.62874496e-01
    1.37557149e-01  2.84752548e-01]
  ...
  [-4.22513545e-01  5.73149137e-02  2.43383065e-01 ... -1.52226850e-01
    2.44624346e-01  6.41548634e-01]
  [-4.93844777e-01 -1.88954517e-01  1.26408130e-01 ...  6.32405952e-02
    3.69128466e-01 -5.82522973e-02]
  [ 8.32686663e-01  2.49482125e-01 -4.54395115e-01 ...  1.19975321e-01
   -3.92573118e-01 -2.77853817e-01]]], shape=(1, 8, 768), dtype=float32)


In [0]:
print(type(output[1]))
print(len(output[1])) # 7 Why?
print(output[1][0]) # Shape:(1,8,768)

<class 'tuple'>
7
tf.Tensor(
[[[ 0.3469352  -0.16263762 -0.23334563 ...  0.14869013  0.08653456
    0.14215374]
  [ 0.07189059 -0.07270843 -0.29645342 ... -0.30408904  0.75935036
   -0.5568752 ]
  [-0.2266272  -0.06833443 -0.02030379 ...  0.3494217   0.59173024
    0.19666305]
  ...
  [-0.71145713 -0.5655291  -0.59169155 ...  0.3092698   0.46315265
    0.6692429 ]
  [-1.6721641  -0.04439694  0.72651184 ... -0.3427156   0.8009167
    0.01589048]
  [-0.2762845  -0.34943795 -0.18642375 ... -0.34279847  0.36574972
   -0.1593638 ]]], shape=(1, 8, 768), dtype=float32)


 **Same steps as above, for any Transformer /BERT like model**

### Fine-tuning for various tasks

- Refer: https://arxiv.org/pdf/1810.04805.pdf

-Next video