# Set 1

## Task 1: *convert hex to base64*

In [393]:
input_hex = "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d"

In [394]:
import base64
print(input_bytes := base64.b16decode(input_hex, casefold=True))

b"I'm killing your brain like a poisonous mushroom"


In [395]:
output_b64 = base64.b64encode(input_bytes).decode()
correct_output = "SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t"
if correct_output == output_b64:
    print("Task 1 passed successfully!")

Task 1 passed successfully!


## Task 2: *Fixed XOR*

In [396]:
a = "1c0111001f010100061a024b53535009181c"
b = "686974207468652062756c6c277320657965"

Time to get familiar with a great quirk of Python: bit-wise operations take place on *integers*, not on *bytes* objects! So we have to take the input bytes, read them as integers and then get back to hex to check the result.

In [397]:
int_a = int(a, 16)
int_b = int(b, 16)
result = hex(int_a ^ int_b)[2:]

Here, the slicing on the string is to remove the characters that Python naturally adds to the front to let us know this is a hex-encoded string.

In [398]:
correct_output = "746865206b696420646f6e277420706c6179"
if correct_output == result:
    print("Task 2 passed successfully!")


Task 2 passed successfully!


## Task 3: *Single-byte XOR cipher*

The first step is to work out what a good English-language plaintext should look like. We can do this by sampling from something like "The Lord of the Rings" to get a Dictionary of letter frequencies:

In [399]:
english_chars = "abcdefghijklmnopqrstuvwxyz "
frequency_dict = {c:0 for c in english_chars}
with open("lotr.txt", "r") as lotr_file:
    total = 0
    for line in lotr_file:
        for c in line:
            if c.lower() in english_chars:
                frequency_dict[c.lower()] += 1
                total += 1

frequency_dict = {c:frequency_dict[c]/total for c in english_chars}

Now, we can loop over the characters in *english_chars* and do the following:
1. generate the XOR of that character with the given ciphertext;
2. compare, using *SciPy*'s built-in *chisquare* test, the letter frequencies recorded for the 'decoded' text with the letter frequencies from "The Lord of the Rings";

and consequently pick the one with the lowest Chi-Square score to be the correct plaintext!

First, we need to get *SciPy*'s *chisquare* function, and can set out some variables for later use:

In [400]:
from scipy.stats import chisquare

inf_ch_sq = 1000
best_decipher = None
key = None

Now we want to decipher the ciphertext, given below as a hex-encoded string:

In [401]:
ciphertext = "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"
int_ciphertext = int(ciphertext, 16)
buffer_length = len(ciphertext) // 2

In [402]:
keys = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
for c in keys:
    total = 0
    local_dict = {a:0 for a in english_chars}

    buffer = base64.b16encode(c.encode()) * buffer_length
    int_buffer = int(buffer.decode(), 16)

    hex_XOR = hex(int_ciphertext ^ int_buffer)[2:]
    XOR_text = base64.b16decode(hex_XOR, casefold=True).decode()
    
    for d in XOR_text:
        if d.lower() in english_chars:
            local_dict[d.lower()] += 1
            total += 1
    local_dict = {c:local_dict[c]/total for c in english_chars}
    ch_sq = chisquare(list(local_dict.values()), list(frequency_dict.values()))[0]
    if ch_sq < inf_ch_sq:
        inf_ch_sq = ch_sq
        best_decipher = XOR_text
        key = c

print(f"The key was '{key}', with a Chi-Square value of {ch_sq:.2f}")
print("The corresponding plaintext was:")
print(best_decipher)    

The key was 'X', with a Chi-Square value of 6.94
The corresponding plaintext was:
Cooking MC's like a pound of bacon


That concludes task 3! (N.B. 'ETAOIN SHRDLU' is a phrase that occasionally popped up in old newspaper misprints, and is a reference to things that look like total gibberish.)

## Task 4: *Detect single-character XOR*

We have a file of hex-encoded strings, one of which is encrypted with a single-byte XOR mask. We can use the same code as for Task 3 to decrypt the strings, and keep any that have a Chi Square coefficient of less than, say, 10, and at least 25 viable letters.

We also need to give some thought to possible keys: whereas above, letters of the alphabet sufficed, here we need to be broader, and consider all possible single-byte keys, i.e. the integers 0 to 255.

In [403]:
keys = ["0" * (2-len(hex(a)[2:])) + hex(a)[2:] for a in range(256)]

In [404]:
viable_strings = {}
with open("s1t4_data.txt", "r") as list_of_strings:
    for s in list_of_strings:
        int_s = int(s[:-3], 16)
        buffer_length = len(s[:-3]) // 2
        for c in keys:
            total = 0
            local_dict = {a:0 for a in english_chars}

            buffer = c * buffer_length
            int_buffer = int(buffer, 16)

            hex_XOR = hex(int_s ^ int_buffer)[2:]

            if len(hex_XOR) % 2 == 1:
                hex_XOR = "0" + hex_XOR
            try:
                XOR_text = base64.b16decode(hex_XOR, casefold=True).decode()
            except:
                continue

            for d in XOR_text:
                if d.lower() in english_chars:
                    local_dict[d.lower()] += 1
                    total += 1
                
            try:
                local_dict = {c:local_dict[c]/total for c in english_chars}
            except:
                continue
            ch_sq = chisquare(list(local_dict.values()), list(frequency_dict.values()))[0]
            if ch_sq < 10 and total > 25:
                viable_strings[s[:-3]] = (c, XOR_text, ch_sq)

viable_strings

{'7b5a4215415d544115415d5015455447414c155c46155f4058455c5b52': ('35',
  'Now that the party is jumping',
  2.4370679438820475)}

That concludes task 4!

## Task 5: *Implement repeating-key XOR*

In repeating key XOR, successive bytes of the plaintext are encrypted with successive bytes of the key. For example, with the example plaintext and key:

In [405]:
plaintext = """Burning 'em, if you ain't quick and nimble
I go crazy when I hear a cymbal"""
key = "ICE"

As we are using a multi-line plaintext, in some cases we might want to preserve line endings. A "\n" return corresponds to "\x0A", so we can recognise that in the hex-encoded plaintext and just add a return if we find that byte. See the comment in the loop for how this could be used; here it doesn't apply, Cryptopals' linebreaks in the hex-encoded string are purely aesthetic, but the ones in the plaintext need encoding.

In [406]:
base64.b16encode("\n".encode()).decode()

'0A'

We will encrypt the "B" with the "I", the "u" with the "C" and so on. So, we need both objects as hex-encoded bytes, ready to XOR against each other as appropriate.

In [407]:
p_as_hex = base64.b16encode(plaintext.encode()).decode()
key_as_hex = base64.b16encode(key.encode()).decode()

We can start with an empty variable for the ciphertext, but more generally this could be set to an empty list the same length as the plaintext. Then, we need to iterate throught the plaintext, and encode successive bytes with the corresponding bytes of the key.

In [408]:
ciphertext = ""
j = 0
#  print("P  --- K  ---  C")
for i in range(len(p_as_hex) // 2):
    ptext_byte = p_as_hex[2*i:2*i+2]
    # if ptext_byte == "0A":
    #     ciphertext += "\n"
    #     continue
    key_byte = key_as_hex[j % (len(key_as_hex)):j % len(key_as_hex) + 2]
    j += 2

    ptext_int = int(ptext_byte, 16)
    key_int = int(key_byte, 16)
    cipher_byte = hex(ptext_int ^key_int)[2:]
    if len(cipher_byte) % 2 == 1:
        cipher_byte = "0" + cipher_byte
    
    ciphertext += cipher_byte
    # print(f"{ptext_byte} --- {key_byte} --- {cipher_byte}")


In [409]:
correct_ciphertext = """0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f"""
if ciphertext == correct_ciphertext:
    print("Task 5 passed correctly!")

Task 5 passed correctly!


## Task 6: *Break repeating-key XOR*

Apparently this task is going to be a bit harder. Luckily, it's laid out in steps for us, and the first thing to do is be able to correctly compute the Hamming distance between hex-encoded strings. The *hexhamming* library on PyPi can do that for us.

In [410]:
from hexhamming import hamming_distance_string

input1 = "this is a test"
input2 = "wokka wokka!!!"

hex1 = base64.b16encode(input1.encode()).decode()
hex2 = base64.b16encode(input2.encode()).decode()

hamming_distance = hamming_distance_string(hex1, hex2)

In [411]:
correct_hamming_distance = 37
if hamming_distance == correct_hamming_distance:
    print("That worked as expected!")

That worked as expected!


Now we want to try and guess the length of the key. Over blocks of the ciphertext the length of some key, the correct key length should minimise the Hamming distance between the blocks, normalised by the length of the key.

In [412]:
ciphertext = ""
with open("s1t6_data.txt", 'r') as file:
    for s in file:
        for c in s:
            if c != "\n":
                ciphertext += c

ciphertext = base64.b16encode(base64.b64decode(ciphertext.encode())).decode()

In [413]:
from numpy import mean
KEYSIZE = [i for i in range(2, 41)]
key_hamming_dict = {}
for l in KEYSIZE:
    distances = []
    for i in range(40):
        distances.append(hamming_distance_string(ciphertext[i*l:(i+1)*l], ciphertext[(i+1)*l:(i+2)*l]))
    key_hamming_dict[l] = mean(distances) / l

key_hamming_dict = {k: v for k, v in sorted(key_hamming_dict.items(), key=lambda item: item[1])}
print(f"The smallest values for the Hamming distance corespond to key lengths of {list(key_hamming_dict.keys())[:5]} ")

The smallest values for the Hamming distance corespond to key lengths of [30, 18, 10, 40, 4] 


We now have the likely key lengths, and can start to attack the ciphertext using the same statistical method as in Task 3, because every, say, 30th byte should be XORed against the same key.

In [434]:
KEYLENGTH = [30]
for key_length in KEYLENGTH:
    for ki in range(key_length):
        relevant_ciphertext = "".join([ciphertext[key_length*i + ki: key_length*i + ki + 2] for i in range(len(ciphertext) // key_length)])
        cipher_int = int(relevant_ciphertext, 16)
        for k in keys:
            buffer = k * (len(relevant_ciphertext) // 2)
            buffer_int = int(buffer, 18)

            hex_XOR = hex(cipher_int ^ buffer_int)[2:]
            if len(hex_XOR) % 2 == 1:
                hex_XOR = "0" + hex_XOR
                
            try:
                XOR_string = base64.b16decode(hex_XOR.encode(), casefold=True).decode()
            except:
                continue
            
            local_dict = {c:0 for c in english_chars}
            total = 0
            for c in XOR_string:
                if c.lower() in english_chars:
                    local_dict[c.lower()] += 1
                    total += 1
            
            if total > 140:
                local_dict = {c:local_dict[c]/total for c in english_chars}
                ch_sq = chisquare(list(local_dict.values()), list(frequency_dict.values()))[0]
                if ch_sq < 10:
                    print(f"The {ki}th character of the key is {k}!")
                    print(XOR_string, ch_sq)