# Decoding the aristocrat cipher

Decoding aristocrat ciphers is tricky because there are so many possible answers. The first step I took was to write a function called `make_start_key` that matches the most common letter in the ciphertext with the most common letter in the provided english letter frquency table. It goes on down the list and matches each letter. This does not usually correctly decode the cipher, but it does provide a good starting place. The text I am encoding and decoding in these examples is the Emancipation Proclamation.

In [1]:
from utils import *

In [2]:
eman_letters = make_letters('../data/Emancipation_Proclamation.txt')
ciphertext = (encode_aris(eman_letters, make_rand_aris_key()))
bible_letters = make_letters('../data/bible.txt')
bible_letter_count = count_letters(bible_letters)
bible_letter_percent = normalize_counts_no_spaces(bible_letter_count)
bible_pair_counts = count_letter_pairs(bible_letters)
bible_matrix = compute_transition_matrix(bible_pair_counts, 0.5)

In [4]:
start_key = make_start_key(ciphertext, bible_letter_percent)
print(start_key)
print(''.join(decode_aris(ciphertext, start_key))[0:194])

[23, 0, 6, 14, 24, 15, 18, 17, 16, 21, 13, 25, 2, 20, 1, 12, 5, 4, 10, 19, 3, 8, 7, 11, 22, 9]
khamhsw  h tshanusiytioabw tre ysenideat of tre maited nthten of hgesiuhh ysoulhghtioapresehn oa tre tpeatwneuoad dhw of neytegbes ia tre wehs of oms losd oae tromnhad eicrt rmadsed had nijtwtpo h yso


And here is what the actual text is. You can see that many of the more common letters have been decoded correctly: e, t, i, b, o, f, d, l.

In [8]:
print(''.join(eman_letters[0:194]))

january  a transcriptionby the president of the united states of americaa proclamationwhereas on the twentysecond day of september in the year of our lord one thousand eight hundred and sixtytwo


Then I take the 'plaintext' as decoded by the `start_key` and make changes to it. This program, `decode_aris_no_key`, is designed to search across keys to determine the most likely key for the text. It takes in the text, and two references: the frequency of letters, and the frequency of letter pairs. The function creates two random indices and checks if switching the key values at those indices increases or decreases the likelihood of the decoded text. If it increases the likelihood, the change is put in place. The program automatically does 3000 guess and checks, but this can easily be altered.

In [10]:
decoded_text, count, log_likelihood = \
decode_aris_no_key(ciphertext, start_key, bible_letter_percent, bible_matrix)
if (log_likelihood) == \
(find_pair_log_likelihood(eman_letters, bible_letter_percent, bible_matrix)):
    print('Right:')
else:
    print('Wrong:')
print(decoded_text[0:194])

Right:
january  a transcriptionby the president of the united states of americaa proclamationwhereas on the twentysecond day of september in the year of our lord one thousand eight hundred and sixtytwo
