# Brute Forcing Kryptos — An Attempt

Attempting to brute force the unsolved Kryptos cipher (K4 segment).

Using Python because I'm rusty as fuck with C at the moment. Plus I've recently used a lot of Python with GPU acceleration in my [CS thesis](https://github.com/vinivosh/ufu-tcc2), so it'll be very useful here and won't require me to learn new stuff — always a great bonus for a lazy lad.

I'll try using [Numba](https://numba.pydata.org/)'s [jit compiler](https://numba.readthedocs.io/en/stable/user/jit.html) to accelerate the process this as much as possible. Preferably even Numba's [CUDA support](https://numba.readthedocs.io/en/stable/cuda/overview.html) if it works.

# The Segments

Of note: **K2** here is the corrected version. For the version in the actual sculpture, with the omission error, just substitute the `ESWJL` near the end by `EWJL`

In [161]:
K1 = '''EMUFPHZLRFAXYUSDJKZLDKRNSHGNFIVJ
YQTQUXQBQVYUVLLTREVJYQTMKYRDMFD'''

K2 = '''VFPJUDEEHZWETZYVGWHKKQETGFQJNCE
GGWHKK?DQMCPFQZDQMMIAGPFXHQRLG
TIMVMZJANQLVKQEDAGDVFRPJUNGEUNA
QZGZLECGYUXUEENJTBJLBQCRTBJDFHRR
YIZETKZEMVDUFKSJHKFWHKUWQLSZFTI
HHDDDUVH?DWKBFUFPWNTDFIYCUQZERE
EVLDKFEZMOQQJLTTUGSYQPFEUNLAVIDX
FLGGTEZ?FKZBSFDQVGOGIPUFXHHDRKF
FHQNTGPUAECNUVPDJMQCLQUMUNEDFQ
ELZZVRRGKFFVOEEXBDMVPNFQXEZLGRE
DNQFMPNZGLFLPMRJQYALMGNUVPDXVKP
DQUMEBEDMHDAFMJGZNUPLGESWJLLAETG'''

K3 = '''ENDYAHROHNLSRHEOCPTEOIBIDYSHNAIA
CHTNREYULDSLLSLLNOHSNOSMRWXMNE
TPRNGATIHNRARPESLNNELEBLPIIACAE
WMTWNDITEENRAHCTENEUDRETNHAEOE
TFOLSEDTIWENHAEIOYTEYQHEENCTAYCR
EIFTBRSPAMHHEWENATAMATEGYEERLB
TEEFOASFIOTUETUAEOTOARMAEERTNRTI
BSEDDNIAAHTTMSTEWPIEROAGRIEWFEB
AECTDDHILCEIHSITEGOEAOSDDRYDLORIT
RKLMLEHAGTDHARDPNEOHMGFMFEUHE
ECDMRIPFEIMEHNLSSTTRTVDOHW?'''

K4 = '''OBKR
UOXOGHULBSOLIFBBWFLRVQQPRNGKSSO
TWTQSJQSSEKZZWATJKLUDIAWINFBNYP
VTTMZFPKWGDKZXTJCDIGKUHUAUEKCAR'''

# print(f'K1:\n{K1}\n\nK2:\n{K2}\n\nK3:\n{K3}\n\nK4:\n{K4}\n\n')

# Dictionary and Evaluation

How to know if the decrypted message candidates make sense and are not just random gibberish? Using some known patterns in the english language!

> I got this idea from [this great experiment](https://github.com/desgeeko/kryptos/blob/main/Kryptos.ipynb), so all credit goes to GitHub user **[desgeeko](https://github.com/desgeeko)**.

## Reading Files (Google Web Trillion Word Corpus)

In [162]:
import os
from urllib.request import urlopen

import numpy as np
import pandas as pd
# from numba import jit

def download_if_needed(file_path, url):
    '''Downloads file from `url`, if it doesn't already exists in `filePath`'''

    if os.path.exists(file_path): return

    with urlopen(url) as f:
        html = f.read().decode('utf-8')
    with open(file_path, 'w') as f:
        f.write(html)

file_count_1w = 'count_1w.txt'
file_count_2l = 'count_2l.txt'
url_base = 'https://norvig.com/ngrams/'

download_if_needed(file_count_1w, url_base + file_count_1w)
download_if_needed(file_count_2l, url_base + file_count_2l)

# Reading ngrams files
with open(file_count_1w, 'r') as f:
    count_1w = pd.read_csv(f, names=['word', 'count'], sep='\t')
with open(file_count_2l, 'r') as f:
    count_2l = pd.read_csv(f, names=['bigram', 'count'], sep='\t')

# Normalizing bigram count, to help calculations
minBigramCount = count_2l['count'].min()
count_2l['count'] = (count_2l['count'] / minBigramCount).astype(np.uint32)

print(count_1w)
print(count_2l)

# Converting count_2l to a dictionary
count_2l = {bigram.upper(): count for (bigram, count) in count_2l.to_numpy()}
# print(count_2l)

           word        count
0           the  23135851162
1            of  13151942776
2           and  12997637966
3            to  12136980858
4             a   9081174698
...         ...          ...
333328    gooek        12711
333329   gooddg        12711
333330  gooblle        12711
333331   gollgo        12711
333332    golgw        12711

[333333 rows x 2 columns]
    bigram  count
0       in  47154
1       th  46594
2       er  41698
3       re  38010
4       he  37250
..     ...    ...
671     qy      2
672     zq      2
673     jx      1
674     qz      1
675     jq      1

[676 rows x 2 columns]


## Evaluation Function

In [163]:
def is_it_english(text:str, use_geo_mean=False):
    bigrams = [text[i:i+2].upper() for i in range(len(text) - 1)]
    # print(bigrams)
    total = 0

    for bigram in bigrams:
        score = count_2l.get(bigram, None)
        # print(score)
        if score is None: continue

        if use_geo_mean: total += np.log(score)
        else: total += score

    if use_geo_mean: return round(np.exp(total / len(text)))
    return round(total / len(text))

for text in [
    'A CAT NAP A DAY LATE AND A DOLLAR SHORT LOVE BIRDS A LOT ON ONES PLATE A BITE AT THE CHERRY',
    'TEST PHRASE THAT IS COMPLETELY WRITTEN IN ENGLISH LETS SEE IF THIS GOES AS PLANNED',
    'FRASE DE PRUEBA QUE ESTA COMPLETAMENTE ESCRITA EN ESPANOL A VER SI VA SEGUN LO PLANIFICADO',
    'PHRASE DE TEST ENTIEREMENT ECRITE EN FRANCAIS, VOYONS SI CELA SE PASSE COMME PREVU',
    'TESTZIN DIE VOLLEDIG IN HET NEDERLANDS IS GESCHREVEN LATEN WE KIJKEN OF DIT GAAT ZOALS PLANNEN',
    'FRASE EM PORTUGUES PARA TESTAR COMO O SCORE FICA SE TESTARMOS OUTRA LINGUA BANANAS',
    'WPRYCSWDUWQXAYKMOKUZLVOSIPPTGTIIDZXQYCOCSLIAGPEPFJEXFVVMMOPLRJYATIISHMPNFJFIEWLVQ',
    'CJODRMQNNKCEGPOXUUQOXISVUIRIDLXMLXFKWREBKWEZMRCRPXNNFKOIOEKOXBCOUZGGIORDTNXYNRMTR',
    'HQLVEOWOTLEZNDGYFFPLTCJCIOGGFWAMYZOXHIGJMFSLEPHLVGRDIGQURPAIIFLWICWYSHZWCIMRJWVUB',
    'XUCUSMSSROGJKIGUPGSBGSIKINELYBHOALXCQYTGMRKDWJUXDUOANVEWHIWDIMGKMKKUSIYFQAGBLWTDF',
]:
    score = is_it_english(text.replace(' ', ''))
    print(f'"{text}"\n{score}\n')

"A CAT NAP A DAY LATE AND A DOLLAR SHORT LOVE BIRDS A LOT ON ONES PLATE A BITE AT THE CHERRY"
15374

"TEST PHRASE THAT IS COMPLETELY WRITTEN IN ENGLISH LETS SEE IF THIS GOES AS PLANNED"
15267

"FRASE DE PRUEBA QUE ESTA COMPLETAMENTE ESCRITA EN ESPANOL A VER SI VA SEGUN LO PLANIFICADO"
13004

"PHRASE DE TEST ENTIEREMENT ECRITE EN FRANCAIS, VOYONS SI CELA SE PASSE COMME PREVU"
15846

"TESTZIN DIE VOLLEDIG IN HET NEDERLANDS IS GESCHREVEN LATEN WE KIJKEN OF DIT GAAT ZOALS PLANNEN"
13975

"FRASE EM PORTUGUES PARA TESTAR COMO O SCORE FICA SE TESTARMOS OUTRA LINGUA BANANAS"
15536

"WPRYCSWDUWQXAYKMOKUZLVOSIPPTGTIIDZXQYCOCSLIAGPEPFJEXFVVMMOPLRJYATIISHMPNFJFIEWLVQ"
4134

"CJODRMQNNKCEGPOXUUQOXISVUIRIDLXMLXFKWREBKWEZMRCRPXNNFKOIOEKOXBCOUZGGIORDTNXYNRMTR"
3778

"HQLVEOWOTLEZNDGYFFPLTCJCIOGGFWAMYZOXHIGJMFSLEPHLVGRDIGQURPAIIFLWICWYSHZWCIMRJWVUB"
3879

"XUCUSMSSROGJKIGUPGSBGSIKINELYBHOALXCQYTGMRKDWJUXDUOANVEWHIWDIMGKMKKUSIYFQAGBLWTDF"
4549



# Brute Forcing K1