# **Enigma Deciphering with GRU**
In this code, I am deciphering the enigma encoding for a fixed configuration. I am using GRU to achieve this. This problem comes under seq-seq model.

### **Brief Background of Enigma**
Enigma is an encryption device that played some role during world war II. The way enigma works is that it encrypts a character into some other charcater based on the machine's mechanical configuration which would change after every press on the keyboard.

In this problem, I am using a static configuration of the machine rather than the dynamic configuration to keep things simple. But it should be noted that this machine has some time dependency that is the character that it encodes to is based on the previous character. 

### **Data Generation**
I am generating the data using faker and enigmaMachine.


In [0]:
!pip install py-enigma ## Installing the library that simulates the enigma machine

Collecting py-enigma
  Downloading https://files.pythonhosted.org/packages/91/4e/44327ad4a5960de12d86d39e1797f3ab67396a17d82182e8fc1b5ef347e5/py-enigma-0.1.tar.gz
Building wheels for collected packages: py-enigma
  Building wheel for py-enigma (setup.py) ... [?25l[?25hdone
  Created wheel for py-enigma: filename=py_enigma-0.1-cp36-none-any.whl size=46860 sha256=2a5c3d92f5e6895af3b5e0d09455fc3b8281290ce5e350c8e09f56cb12284cf3
  Stored in directory: /root/.cache/pip/wheels/35/5b/fb/f29b74ef2508b1cd3fa78ba14f57888e0a8488daed8672c4cf
Successfully built py-enigma
Installing collected packages: py-enigma
Successfully installed py-enigma-0.1


In [0]:
!pip install Faker ## Library that generates fake data

Collecting Faker
[?25l  Downloading https://files.pythonhosted.org/packages/2a/6f/2a868e12996ea630a4591aa967cb934e411d94e42da12e1dd141b66d0070/Faker-4.0.2-py3-none-any.whl (1.0MB)
[K     |▎                               | 10kB 31.8MB/s eta 0:00:01[K     |▋                               | 20kB 3.0MB/s eta 0:00:01[K     |█                               | 30kB 4.0MB/s eta 0:00:01[K     |█▎                              | 40kB 2.9MB/s eta 0:00:01[K     |█▋                              | 51kB 3.3MB/s eta 0:00:01[K     |██                              | 61kB 3.9MB/s eta 0:00:01[K     |██▎                             | 71kB 4.2MB/s eta 0:00:01[K     |██▋                             | 81kB 4.5MB/s eta 0:00:01[K     |███                             | 92kB 5.1MB/s eta 0:00:01[K     |███▎                            | 102kB 4.8MB/s eta 0:00:01[K     |███▋                            | 112kB 4.8MB/s eta 0:00:01[K     |████                            | 122kB 4.8MB/s eta 0:00:0

In [0]:
from typing import List, Tuple
from enigma.machine import EnigmaMachine
from faker import Faker
import re

In [0]:
from google.colab import files

In [0]:
### This is the virtual enigma machine with a certain configuration we would be using for this code
class ConfiguredMachine:
    def __init__(self):
        self.machine = EnigmaMachine.from_key_sheet(
            rotors='II IV V',
            reflector='B',
            ring_settings=[1, 20, 11],
            plugboard_settings='AV BS CG DL FU HZ IN KM OW RX')

    def reset(self):
        self.machine.set_display('WXC')

    def encode(self, plain_str: str) -> str:
        self.reset()
        return self.machine.process_text(plain_str)

    def batch_encode(self, plain_list: List[str]) -> List[str]:
        encoded = list()
        for s in plain_list:
            encoded.append(self.encode(s))
        return encoded

In [0]:
### preprocessing the text generated by faker to eliminate punctuation and converting to uppercase just to keep things clean and simple
def pre_process(input_str):
    return re.sub('[^a-zA-Z]', '', input_str).upper()


def generate_data(batch_size: int, seq_len: int = 42) -> Tuple[List[str], List[str]]:
    fake = Faker()
    machine = ConfiguredMachine()

    plain_list = fake.texts(nb_texts=batch_size, max_nb_chars=seq_len)
    plain_list = [pre_process(p) for p in plain_list]
    cipher_list = machine.batch_encode(plain_list)
    return plain_list, cipher_list

In [0]:
import collections
import helper
import numpy as np
from keras.preprocessing.text import Tokenizer
from keras.models import Model
from keras.layers import LSTM, GRU, Input, Dense, TimeDistributed, Activation, RepeatVector, Bidirectional
from keras.layers.embeddings import Embedding
from keras.optimizers import Adam
from keras.losses import sparse_categorical_crossentropy
from keras.preprocessing.sequence import pad_sequences

#### Initializing the basic stuff
input_characters=['','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
input_characters=target_characters=sorted(list(input_characters))
num_encoder_tokens = len(input_characters)
num_decoder_tokens = len(target_characters)


input_token_index = dict(
  [(char, i) for i, char in enumerate(input_characters)])
target_token_index = dict(
  [(i,char) for i, char in enumerate(target_characters)])


### Helper Functions
def tokenize(x):
  tokenized_text=[]
  for c in x:
    tokenized_text.append(input_token_index[c])
  return (tokenized_text)

def preprocess(cipher,plain):
  cipher_sentences, plain_sentences=[],[]
  for c,p in zip(cipher,plain):
    cipher_sentences.append(tokenize(c))
    plain_sentences.append(tokenize(p))
  cipher_sentences=pad_sequences(cipher_sentences, maxlen=42, padding='post')
  plain_sentences=pad_sequences(plain_sentences, maxlen=42, padding='post')
  plain_sentences= plain_sentences.reshape(*plain_sentences.shape, 1)
  return cipher_sentences,plain_sentences

def logits_to_text(logits, target_token_index):
    return ''.join([target_token_index[prediction] for prediction in np.argmax(logits, 1)])




# Cells below are to build and train a model. 

In [0]:
### Building Data to train
plain,cipher=generate_data(400000)
preproc_cipher, preproc_plain=preprocess(cipher,plain)

In [0]:
def build_model(input_shape, output_sequence_length, num_encoder_tokens, num_decoder_tokens):
    learning_rate=1e-3

    input_seq = Input(input_shape[1:])
    emb = Embedding(num_encoder_tokens, 64, input_length=output_sequence_length)(input_seq)
    bdrnn = Bidirectional(GRU(64, return_sequences=True))(emb)
    logits = TimeDistributed(Dense(num_decoder_tokens, activation='softmax'))(bdrnn)

    model = Model(inputs=input_seq, outputs=logits)
    model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(learning_rate),
                  metrics=['accuracy'])
    return model



In [0]:
emb_bdrnn = build_model(
    preproc_cipher.shape,
    42,
    num_encoder_tokens,
    num_decoder_tokens)








In [0]:
print('Final Model Loaded')
# Train
emb_bdrnn.fit(preproc_cipher, preproc_plain, batch_size=1024, epochs=23, validation_split=0.2)

Final Model Loaded
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where



Train on 320000 samples, validate on 80000 samples
Epoch 1/23





Epoch 2/23
Epoch 3/23
Epoch 4/23
Epoch 5/23
Epoch 6/23
Epoch 7/23
Epoch 8/23
Epoch 9/23
Epoch 10/23
Epoch 11/23
Epoch 12/23
Epoch 13/23
Epoch 14/23
Epoch 15/23
Epoch 16/23
Epoch 17/23
Epoch 18/23
Epoch 19/23
Epoch 20/23
Epoch 21/23
Epoch 22/23
Epoch 23/23


<keras.callbacks.History at 0x7f278e62c160>

In [0]:
def decipher(cipher):
  cipher_sentences=[]
  plain_predictions=[]
  for c in cipher:
    cipher_sentences.append(tokenize(c))
  cipher_sentences=pad_sequences(cipher_sentences, maxlen=42, padding='post')
  for i in range(len(cipher_sentences)-1):
    p=logits_to_text(emb_bdrnn.predict(cipher_sentences[i:i+1])[0], target_token_index)
    plain_predictions.append(p.split()[0])
  return(plain_predictions)


def predict(cipher_list: List[str]) -> List[str]:
    plain_list=decipher(cipher_list) 
    return plain_list


def str_score(str_a: str, str_b: str) -> float:
    if len(str_a) != len(str_b):
        return 0

    n_correct = 0

    for a, b in zip(str_a, str_b):
        n_correct += int(a == b)

    return n_correct / len(str_a)


def score(predicted_plain: List[str], correct_plain: List[str]) -> float:
    correct = 0
    for p, c in zip(predicted_plain, correct_plain):
        # print(p.split()[0],len(p),c,len(c))
        # print(p,c)
        if str_score(p, c) > 0.8:
            correct += 1

    return correct / len(correct_plain)

# Runs the deciphering on 
if __name__ == "__main__":
    plain, cipher = generate_data(1<<14)
    print(score(predict(cipher), plain))


In [0]:
emb_bdrnn.save("enigmaDeciphering.h5")