# Using BERT for next sentence prediction

The hardest part of this is making sure you've got the python packages you need installed. You'll need to install ```torch``` and ```transformers,``` and as usual with python, you may run into compatibility issues.

All I can say to help there is "google the error message"?

But once you've got the packages installed it's easy.

First we load everything and get it ready to run.

In [1]:
import torch
from transformers import BertTokenizer, BertForNextSentencePrediction

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
print('built tokenizer')
model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased')
model.eval()
print('built model')

built tokenizer
built model


Then here's a function to do next sentence prediction:

In [2]:
def get_logits(firstsentence, secondsentence):
    global tokenizer, model

    encoding = tokenizer.encode_plus(firstsentence, secondsentence, return_tensors = 'pt', max_seq_length = 255)
    loss, logits = model(**encoding, next_sentence_label=torch.LongTensor([1]))

    return loss, logits

Now we just need two sentences.

In [15]:
firstsentence = "I was walking to the store one day to buy groceries."
secondsentence = "At the store I bought bananas and milk."

get_logits(firstsentence, secondsentence)

(tensor(12.3876, grad_fn=<NllLossBackward>),
 tensor([[ 6.2713, -6.1164]], grad_fn=<AddmmBackward>))

Okay. What the hell does that mean? The first line is the "loss," the second the "logits." The relation between logits and probability makes my head hurt to explain, so I'm just going to [point at Wikipedia.](https://en.wikipedia.org/wiki/Logit)

But for a quick and dirty approach I wrote this function which *loosely* translates BERT's logits output into a probability for the sequence.

In [13]:
import math

def get_probability(firstsent, secondsent):
    '''
    
    :param logits: a tensor produced by BERT
    :return: probability of the first category after softmax
    '''
    loss, logits = get_logits(firstsent, secondsent)
    
    poslogit = logits[0, 0]
    neglogit = logits[0, 1]

    pospart = math.pow(2.72, poslogit)
    negpart = math.pow(2.72, neglogit)

    posprob = pospart / (pospart + negpart)

    return posprob

In [17]:
firstsentence = "I was walking to the store one day to buy groceries."
secondsentence = "At the store I bought bananas and milk."

get_probability(firstsentence, secondsentence)

0.9999958626257772

Ah, now we can see that BERT considers that a pretty probable sequence. Let's try a less probable sequence.

We'll use the same first sentence about walking to the store, and for our second sentence

    Psychedelics are a hallucinogenic class of psychoactive drug whose primary effect is to trigger non-ordinary states of consciousness and psychedelic experiences via serotonin 2A receptor agonism.
    
Which is from Wikipedia on "psychedelic drug."


In [18]:
firstsentence = "I was walking to the store one day to buy groceries."
secondsentence = "Psychedelics are a hallucinogenic class of psychoactive drug whose primary effect is to trigger non-ordinary states of consciousness and psychedelic experiences via serotonin 2A receptor agonism."
get_probability(firstsentence, secondsentence)

5.017845137195324e-05

That's a much less probable sequence! Let's try a slightly weaker non-sequitur.

In [31]:
firstsentence = "I was walking to the store one day to buy groceries."
secondsentence = "Everything is closed due to the pandemic."
get_probability(firstsentence, secondsentence)

0.055498046096809694

Okay, that probability is slightly higher. Still unlikely. But not *totally* improbable.