# Demo of RobBERT for humour detection
We use a [RobBERT (Delobelle et al., 2020)](https://arxiv.org/abs/2001.06286) model that is [finetuned for humour detection (Winters et al., 2020)]().

**Dependencies**
- tokenizers
- torch
- transformers

First we load our RobBERT model that was pretrained on this task. We also load in RobBERT's tokenizer.

Because we only want to get results, we have to disable dropout etc. So we add `model.eval()`.

In [1]:
import torch
from transformers import RobertaTokenizer, AutoModelForSequenceClassification, AutoConfig, \
    RobertaForSequenceClassification
model_location = "../models/jokes-proverbs/artifacts/"
tokenizer = RobertaTokenizer.from_pretrained(model_location)
model = RobertaForSequenceClassification.from_pretrained(model_location, return_dict=True)

# Put RobBERT model on GPU if it is available
if torch.cuda.is_available():
    model.to('cuda:0')

model.eval()
print("RobBERT model loaded")


RobBERT model loaded


Let's create some inputs for this model

In [2]:
input_sentences = ["Men moet een gegeven paard niet in de bek kijken.",
     "Het is groen en overweegt iets? Kermit de wikker!",
     "Het is groen en het rolt van de trap? Kermit de knikker!"]

# Tokenize the input sentences using RobBERT v2 vocabulary
inputs = tokenizer.batch_encode_plus(
    input_sentences,
    max_length=512,
    padding='max_length',
    truncation=True,
    return_tensors="pt")

# Put input on GPU if it is available
if torch.cuda.is_available():
    inputs = inputs.to('cuda:0')

# Convert to tokens
for key, value in inputs.items():
    print("{}:\n\t{}".format(key, value))
print("Tokens:\n\t{}".format(tokenizer.convert_ids_to_tokens(inputs['input_ids'][0]) ))
print("\t{}".format(tokenizer.convert_ids_to_tokens(inputs['input_ids'][1]) ))


input_ids:
	tensor([[   0, 9396,   89,  ...,    1,    1,    1],
        [   0,  104,   12,  ...,    1,    1,    1],
        [   0,  104,   12,  ...,    1,    1,    1]], device='cuda:0')
attention_mask:
	tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0]], device='cuda:0')
Tokens:
	['<s>', 'Men', 'Ġmoet', 'Ġeen', 'Ġgegeven', 'Ġpaard', 'Ġniet', 'Ġin', 'Ġde', 'Ġbek', 'Ġkijken', '.', '</s>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '

In our model config, we stored what labels we use (`0 = joke` and `1 = proverb`). We can load these in and automatically convert our predictions to a human-readable format.

In [3]:
print(model.config.id2label)

{0: 'Joke', 1: 'Proverb'}


Ok, let's do some predictions! Since we have a batch of two jokes and one proverb, we can do this in one batch—as long as it fits on your GPU.
If not, you should use a pytorch dataloader.

In [4]:
with torch.no_grad():
    # Make the model predict the id of the label for every sentence
    results = model(**inputs)

    # Turn the prediction into a human readable label
    predicted_labels = [model.config.id2label[item.item()] for item in results.logits.argmax(axis=1)]

    # Print the results along with the corresponding sentence
    for i in range(len(input_sentences)):
        print(predicted_labels[i] + str(":"), input_sentences[i])

Proverb: Men moet een gegeven paard niet in de bek kijken.
Joke: Het is groen en overweegt iets? Kermit de wikker!
Joke: Het is groen en het rolt van de trap? Kermit de knikker!
