# Decoding an aristocrat faster

The way we have decoded aristocrats so far is effective, but slow. 

In [1]:
from utils import *

First we read in the text we are going to encode (Emancipation proclamation), and make reference counts of word pair frequencies (from the bible).

In [2]:
eman_letters = make_letters('../data/Emancipation_Proclamation.txt')
ciphertext = (encode_aris(eman_letters, make_rand_aris_key()))
bible_letters = make_letters('../data/bible.txt')
bible_letter_count = count_letters(bible_letters)
bible_letter_percent = normalize_counts_no_spaces(bible_letter_count)
bible_pair_counts = count_letter_pairs(bible_letters)
bible_matrix = compute_transition_matrix(bible_pair_counts, 0.5)

The faster method is based on pre-computing counts of letter pairs in the encoded ciphertext. 


In [3]:
ciphertext_pair_counts = count_letter_pairs(ciphertext)

Now create our start key, and run the decoder:

In [4]:
eman_pair_counts = count_letter_pairs(eman_letters)

start_key = make_start_key(ciphertext, bible_letter_percent)
start_key.append(26)
print(start_key)
print(''.join(decode_aris(ciphertext[0:194], start_key)))
num_key = new_decode_aris_faster(start_key, ciphertext_pair_counts, bible_matrix)

print(start_key)
print(num_key)


[25, 5, 18, 12, 10, 4, 24, 9, 13, 8, 1, 14, 7, 16, 3, 19, 15, 21, 20, 0, 23, 17, 2, 22, 6, 11, 26]
khamhsw  h tshanusiytioabw tre ysenideat of tre maited nthten of hgesiuhh ysoulhghtioapresehn oa tre tpeatwneuoad dhw of neytegbes ia tre wehs of oms losd oae tromnhad eicrt rmadsed had nijtwtpo
[25, 5, 18, 12, 10, 4, 24, 9, 13, 8, 1, 14, 7, 16, 3, 19, 15, 21, 20, 0, 23, 17, 2, 22, 6, 11, 26]
[9, 5, 23, 12, 10, 4, 18, 21, 13, 1, 11, 14, 24, 25, 3, 6, 15, 20, 16, 0, 7, 17, 19, 8, 2, 22, 26]


In [5]:
print(''.join(decode_aris(ciphertext[0:194], num_key)))

january  a transcriptionby the president of the united states of americaa proclamationwhereas on the twentysecond day of september in the year of our lord one thousand eight hundred and sixtytwo


The way this works is by computing the change in likelihood for each guess. Instead of going through every letter in the text and calculating the likelihood for both possibilities, it looks at the pair frequency matrix and calculates the change based only on the rows and columns that contain the letter to be switched. This means that this method works much faster than the other one.

In [6]:
old = compute_key_log_likelihood_pairs(ciphertext_pair_counts, bible_matrix, start_key)
new = compute_key_log_likelihood_pairs(ciphertext_pair_counts, bible_matrix, num_key)
print(new - old)
old = find_pair_log_likelihood(decode_aris(ciphertext, start_key), bible_matrix)
new = find_pair_log_likelihood(decode_aris(ciphertext, num_key), bible_matrix)
print(new - old)
print(start_key)
print(num_key)

3568.393394900366
3568.3933949004713
[25, 5, 18, 12, 10, 4, 24, 9, 13, 8, 1, 14, 7, 16, 3, 19, 15, 21, 20, 0, 23, 17, 2, 22, 6, 11, 26]
[9, 5, 23, 12, 10, 4, 18, 21, 13, 1, 11, 14, 24, 25, 3, 6, 15, 20, 16, 0, 7, 17, 19, 8, 2, 22, 26]


In [7]:
test_text = eman_letters
changed_text = switch_letters(eman_letters, 'b', 'c')
text_letter_count = count_letters(test_text)
text_pair_count = count_letter_pairs(test_text)
one = find_pair_log_likelihood(test_text, bible_matrix)
two = find_pair_log_likelihood(changed_text, bible_matrix)
diff = (two - one)
start_key = char_to_num(list(alpha_list))
index1 = alpha_list.index("b")
index2 = alpha_list.index("c")
faster_calc = (calculate_log_likelihood_change(test_text, text_pair_count, bible_letter_percent, bible_matrix, index1, index2, start_key))


In [8]:
print(diff)
print(faster_calc)
print(index1)
print(index2)

-220.29815349189448
-220.2981534919021
1
2


In [9]:
random.seed(1)
print(random.randint(0, 25))

4
