## BERT Embeddings
- BERT is language model by google.
- We already have Word2Vec why we need BERT Embeddings.

**Problem with Word2Vec**
- Issue with Word2Vec is Fixed embeddings
- Example:
    - Sentence 1 - He didn't receive **fair** treatment
    - Sentence 2 - Fun **fair** in New York this summer
    - Here in both sentence "fair" has different contextual meaning but the vector generated in word2vec is same.
- Thus we need a model which can generate a contextualized meaning of a word
- Meaning, you look at the whole sentence and based on that you generate the number representation of word. BERT allow you do do the same.

**BERT can generate contexualized embeddings**
- Case 1:
    - He didn't receive fair treatment -> fair -> [1, 0.9, 0.2, 1, 0.7]
    - Tom deserves unbiased judgement -> unbaised -> [1, 0.8, 0.2, 1, 0.6]
    - Here the worf "fair" has contextual meaning to "unbaised" thus vector is similar to "unbaised"
- Case 2:
    - Fun fair in New York city summer -> fair -> [0. 0.1, 0. 0.5, 0.4]
    - Carnival was packed with fun activities -> carnival -> [0, 0.1, 0.1, 0.5, 0.3]
    - Here the worf "fair" has contextual meaning to "carnival" thus vector is similar to "carnival"

**BERT can generate embeddings for entire sentence**
- can generate a single vector for complete sentence
- vector dimension is 768

**BERT Good Article**
https://jalammar.github.io/illustrated-bert/

**How BERT was trained**
- Trained on 2500 Million words from wikipedia and 800 Million words from books.
- Trained using two Artificial Tasks and embeddings are generated as side effects of these tasks.
    1) Mased Language Model - mased(removed) 15% words and train bert to fill the mased words.
    2) Next Sentence Predection - predicting next sentence using some sentence.

**BERT - Biderectional Encoder Representations from Transfotmers**

In [None]:
# !pip install tensorflow_hub

In [1]:
import tensorflow_hub as hub
import tensorflow_text as text

In [2]:
preprocess_url = "https://kaggle.com/models/tensorflow/bert/TensorFlow2/en-uncased-preprocess/3"
encoder_url = "https://kaggle.com/models/tensorflow/bert/TensorFlow2/en-uncased-l-12-h-768-a-12/3"

In [4]:
bert_preprocess_model = hub.KerasLayer(preprocess_url)

In [5]:
text_test = ["nice movie indeed", "i love python programming"]
text_preprocessed = bert_preprocess_model(text_test)
text_preprocessed.keys()

dict_keys(['input_mask', 'input_word_ids', 'input_type_ids'])

In [6]:
bert_model = hub.KerasLayer(encoder_url)

bert_results = bert_model(text_preprocessed)
bert_results.keys()

dict_keys(['encoder_outputs', 'sequence_output', 'default', 'pooled_output'])

In [7]:
bert_results["pooled_output"] # each sentence is converted to 768 dimension vector

# you can use these vectors in you NLP tasks

<tf.Tensor: shape=(2, 768), dtype=float32, numpy=
array([[-0.7917742 , -0.21411929,  0.497695  , ...,  0.2446516 ,
        -0.47334483,  0.8175871 ],
       [-0.91712314, -0.4793517 , -0.7865697 , ..., -0.61751723,
        -0.7102687 ,  0.92184293]], dtype=float32)>

In [8]:
bert_results["sequence_output"] # each word of sentence is converted to vector of 768 dimension

<tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy=
array([[[ 0.0729205 ,  0.0856782 ,  0.14476842, ..., -0.09677088,
          0.08722158,  0.07711128],
        [ 0.17839359, -0.190061  ,  0.50349385, ..., -0.0586981 ,
          0.3271711 , -0.15578535],
        [ 0.18701442, -0.43388787, -0.48875144, ..., -0.15502739,
          0.00145156, -0.24470964],
        ...,
        [ 0.12083052,  0.12884241,  0.4645356 , ...,  0.07375539,
          0.17441928,  0.16522132],
        [ 0.0796783 , -0.01190706,  0.5022544 , ...,  0.13777801,
          0.21002187,  0.00624593],
        [-0.07212718, -0.28303438,  0.5903338 , ...,  0.4755192 ,
          0.16668445, -0.08920303]],

       [[-0.07900569,  0.36335132, -0.21101536, ..., -0.17183745,
          0.16299719,  0.67242664],
        [ 0.27883506,  0.4371634 , -0.35764694, ..., -0.04463694,
          0.383151  ,  0.5887983 ],
        [ 1.2037675 ,  1.0727018 ,  0.4840878 , ...,  0.24921024,
          0.407309  ,  0.4048182 ],
        ...,