In [1]:
from wrappedCode.createModel import *
from wrappedCode.encryptionWrapped import *
from wrappedCode.decryptionWrapped import *

## Model selection

As different models were used for the example, these models need to be chosen as part of the model selection. For GPT2 models, it can be chosen between:


*   "gpt2-small"
*   "gpt2-medium"
*   "gpt2-large"
*   "gpt2-xl"

Additionally, BERT and RoBERTa can also be selected.  




In [2]:
mod, tok= buildModel("bert-base-uncased") # make nice wrapper for this!

If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertLMHeadModel: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertLMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertLMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


## Encryption of the secret text

Depending on the choice, the encryption can be conducted with complete sentences or incomplete sentences. For this example, the start of Adele's "Hello" is encrypted. 
As part of the encryption it can be decided, whether the last sentence should be completed. 



In [3]:
startOfText="This year's Shakespeare Festival"
precondSec="Secret: "
secret="""Hello, it's me
I was wondering if after all these years you'd like to meet
To go over everything
They say that time's supposed to heal ya
But I ain't done much healing
Hello, can you hear me?
I'm in California dreaming about who we used to be
When we were younger and free
I've forgotten how it felt before the world fell at our feet
There's such a difference between us
And a million miles
Hello from the other side
I must've called a thousand times
To tell you I'm sorry for everything that I've done
But when I call, you never seem to be home"""
sentenceComplete=True

In [4]:
outText, outInd=encryptMessage(mod, tok, secret, precondSec, startOfText)

In [5]:
print("Cover text : {}".format(outText))

Cover text : this year's shakespeare festivalurities true magazine tr small - all pre constant jen {nnantaalllayrahlahorate backstage for dear far great otherwise o is element all or bound love torch mentor an afar learnt wisdom and truth faith worries ridges te pension'not co an non non non pal buffer safely ni aa when took i i never did aye &. nay " ; nline as / ran > en off demiseours if parentheses corner to to r. l or ex = was middle me jennings = # " waist lean tip to hem fruit tale substantial ~. min of % * * n waist waist the, was remained which un thoughts remained de often it at free many intoit to [CLS]. [SEP]


## Decryption of the cover text

For the decryption, the receiver needs to know the preconditioning of the secret and the start of the text. Given this and knowing, whether sentence completion was activated, the text can be recovered correctly. 



In [7]:
text = decryptMessage(mod, tok, outText, precondSec, startOfText)

In [8]:
print("Decrypted text : {}".format(text))

Decrypted text : secret : hello, it's me i was wondering if after before arriving valle? do depends de then ago final then take hit watch note if rest sal like is myself once raced tasted back cloth revolution of as? some plus total subjecthwa and note long on remained or or those stir wentston would t store in in too it time another was sure turned for removing art back way mom changed mother'danced they various followed matches play play the play all over by of for killed released devotion remorse,d wonder just midnight nightd luce an serpent disappeared as above still played the the was loop bit is'which with of together atoms. begin hear listen wonder think form your coming


## Evaluation 

### Smoothness of the generated text

In [None]:
def plot_ranks_bert(mod, tok, precondSec, secret):
  ranks = []
  x = range(len(tok.encode(secret, add_special_tokens=False)))
  ranks = get_ranks(mod, tok, precondSec, secret)
  plt.plot(x, ranks, color='orange')
  plt.ylim(-1000, 50000)
  plt.show()


def plot_ranks_gpt2(model_gpt, tok_gpt, precondSec, secret):
  x = range(len(tok_gpt.encode(secret)))
  ranks=getSecretRanks(model_gpt, tok_gpt, secret, precondSec)
  plt.plot(x, ranks, color='orange')
  plt.ylim(-1000, 50000)
  plt.show()

if GPT2_:
  plot_ranks_gpt2(model_gpt, tok_gpt, precondSec, secret_text)
elif BERT_:  
  plot_ranks_bert(mod, tok, precondSec, secret_text)

### Perplexity score

In [None]:
def get_perplex_score(cover_text, model, tokenizer, startingSecret=". "):
    probas = []
    token_secret = tokenizer.encode(cover_text)
    token_start = tokenizer.encode(startingSecret)
    # Convert indexed tokens in a PyTorch tensor
    tokens_tensor = torch.tensor([token_start])
    m = nn.Softmax(dim=0)
  # If you have a GPU, put everything on cuda
    tokens_tensor = tokens_tensor.to('cuda')
    model.to('cuda')
    pred = []
    with torch.no_grad():
        outputs = model(tokens_tensor)
        predictions = outputs[0]
        tab = m(predictions[:, -1, :][0])
        pred.append(tab[token_secret[0]].item())
        for i in range(1, len(token_secret)):
            tokens_tensor = torch.cat((tokens_tensor.to('cpu').view(-1), torch.Tensor([token_secret[i]])), dim=-1).view(1, -1)
            outputs = model(tokens_tensor.type(torch.long).to("cuda"))
            predictions = outputs[0]
            tab = m(predictions[:, -1, :][0])
            pred.append(tab[token_secret[i]].item())
            
    s = 0
    for p in pred:
        s += np.log2(p)
    score = 2**((-1/len(pred))*s)
    return score