# ShakesBERT

BERT's BertForNextSentencePrediction class gives a score for the likelihood that a sentence (or line) follows a preceding one. We can use this for example to construct a new sonnet from lines of existing Shakespeare sonnets. The new sonnet will have a higher likelihood of making sense than if we merely drew the lines at random. The next sentence prediction therefore acts as a kind of sense discriminator.

Sonnet lines are taken from [Poetry DB](http://poetrydb.org).

In [1]:
import torch
from pytorch_pretrained_bert import BertTokenizer, BertForNextSentencePrediction

Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.


In [2]:
tokeniser = BertTokenizer.from_pretrained('bert-base-uncased')

In [None]:
model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased')

In [10]:
import urllib
import json
from random import *

url = 'http://poetrydb.org/author,linecount/Shakespeare;14/lines'
with urllib.request.urlopen(url) as response:
    data = json.load(response)   

    
poem_number = randint(0, len(data)-1)
previous_line = data[poem_number]['lines'][0]
print(previous_line.strip())

next_line_prediction = 0
threshold = 3
poems_picked = [poem_number]

for line_number in range(1, 14):
    next_line_prediction = 0
    while(line_number == len(poems_picked)):
        poem_number = randint(0, len(data)-1)
        line_to_check = data[poem_number]['lines'][line_number]
        
        len_line_1 = len(tokeniser.tokenize(previous_line))
        len_line_2 = len(tokeniser.tokenize(line_to_check))

        text = previous_line + ' ' + line_to_check
        tokenized_text = tokeniser.tokenize(text)

        indexed_tokens = tokeniser.convert_tokens_to_ids(tokenized_text)
        segments_ids = ([0] * len_line_1) + ([1] * len_line_2)
        tokens_tensor = torch.tensor([indexed_tokens])
        segments_tensors = torch.tensor([segments_ids])
        
        predictions = model(tokens_tensor, segments_tensors)
        
        next_line_prediction = predictions[0,0].item()
        # No poem should be taken a line from more than once
        if poem_number not in poems_picked and next_line_prediction > threshold:
            poems_picked = poems_picked + [poem_number]

    print(line_to_check.strip())
    previous_line = line_to_check

From fairest creatures we desire increase,
My verse alone had all thy gentle grace;
So shall those blots that do with me remain,
Who, in despite of view, is pleased to dote.
Whilst her neglected child holds her in chase,
Whilst I, my sovereign, watch the clock for you,
Nay, if thou lour'st on me, do I not spend
To weigh how once I suffer'd in your crime.
So then I am not lame, poor, nor despis'd,
To this composed wonder of your frame;
At such who, not born fair, no beauty lack,
And having thee, of all men's pride I boast:
So thou, thyself outgoing in thy noon:
And all they foul that thy complexion lack.
