# A Simple BERT Example

### Import PyTorch and Hugging Face libraries

In [22]:
import torch
from transformers import AutoTokenizer, AutoModel
from transformers import logging

logging.set_verbosity_error()

### Load Tokenizer and Mode

Here, we're using __DistilBERT__, a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark.

In [23]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

model = AutoModel.from_pretrained("distilbert-base-uncased")

### Tokenize a sample input string

In [24]:
inputs = tokenizer("Hello, I love to cook!", return_tensors="pt")
inputs.keys()

dict_keys(['input_ids', 'attention_mask'])

In [25]:
inputs['input_ids']

tensor([[ 101, 7592, 1010, 1045, 2293, 2000, 5660,  999,  102]])

Note how the special CLS token id (101) is __prepended__ to the input string and the SEP token id (102) is __appended__ to the input string.

### We then send the tokenized inputs through the model

In [None]:
outputs = model(**inputs)

### Now, we can obtain the last hidden states

In [10]:
last_hidden_states = outputs.last_hidden_state
print(last_hidden_states.size())

torch.Size([1, 8, 768])


The dimentions for the last hiddent states are:
 - 1 = the number of inputs we sent through the model (e.g. the batch size)
 - 8 = the number of token ids in the tokenized input sequence
 - 768 = the hidden dimension from the model

In [29]:
print(last_hidden_states)

tensor([[[-1.8296e-01, -7.4054e-02,  5.0268e-02,  ..., -1.1261e-01,
           4.4493e-01,  4.0941e-01],
         [ 7.0624e-04,  1.4825e-01,  3.4328e-01,  ..., -8.6040e-02,
           6.9475e-01,  4.3353e-02],
         [-5.0721e-01,  5.3086e-01,  3.7163e-01,  ..., -5.6287e-01,
           1.3756e-01,  2.8475e-01],
         ...,
         [-4.2251e-01,  5.7315e-02,  2.4338e-01,  ..., -1.5223e-01,
           2.4462e-01,  6.4155e-01],
         [-4.9384e-01, -1.8895e-01,  1.2641e-01,  ...,  6.3240e-02,
           3.6913e-01, -5.8253e-02],
         [ 8.3269e-01,  2.4948e-01, -4.5440e-01,  ...,  1.1998e-01,
          -3.9257e-01, -2.7785e-01]]], grad_fn=<NativeLayerNormBackward0>)
