# Set 1

## Task 1: *convert hex to base64*

In [160]:
input_hex = "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d"

In [161]:
import base64
print(input_bytes := base64.b16decode(input_hex, casefold=True))

b"I'm killing your brain like a poisonous mushroom"


In [162]:
output_b64 = base64.b64encode(input_bytes).decode()
correct_output = "SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t"
if correct_output == output_b64:
    print("Task 1 passed successfully!")

Task 1 passed successfully!


## Task 2: *Fixed XOR*

In [163]:
a = "1c0111001f010100061a024b53535009181c"
b = "686974207468652062756c6c277320657965"

Time to get familiar with a great quirk of Python: bit-wise operations take place on *integers*, not on *bytes* objects! So we have to take the input bytes, read them as integers and then get back to hex to check the result.

In [164]:
int_a = int(a, 16)
int_b = int(b, 16)
result = hex(int_a ^ int_b)[2:]

Here, the slicing on the string is to remove the characters that Python naturally adds to the front to let us know this is a hex-encoded string.

In [165]:
correct_output = "746865206b696420646f6e277420706c6179"
if correct_output == result:
    print("Task 2 passed successfully!")


Task 2 passed successfully!


## Task 3: *Single-byte XOR cipher*

The first step is to work out what a good English-language plaintext should look like. We can do this by sampling from something like "The Lord of the Rings" to get a Dictionary of letter frequencies:

In [166]:
english_chars = "abcdefghijklmnopqrstuvwxyz "
frequency_dict = {c:0 for c in english_chars}
with open("lotr.txt", "r") as lotr_file:
    total = 0
    for line in lotr_file:
        for c in line:
            if c.lower() in english_chars:
                frequency_dict[c.lower()] += 1
                total += 1

frequency_dict = {c:frequency_dict[c]/total for c in english_chars}

Now, we can loop over the characters in *english_chars* and do the following:
1. generate the XOR of that character with the given ciphertext;
2. compare, using *SciPy*'s built-in *chisquare* test, the letter frequencies recorded for the 'decoded' text with the letter frequencies from "The Lord of the Rings";

and consequently pick the one with the lowest Chi-Square score to be the correct plaintext!

First, we need to get *SciPy*'s *chisquare* function, and can set out some variables for later use:

In [167]:
from scipy.stats import chisquare

inf_ch_sq = 1000
best_decipher = None
key = None

Now we want to decipher the ciphertext, given below as a hex-encoded string:

In [168]:
ciphertext = "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"
int_ciphertext = int(ciphertext, 16)
buffer_length = len(ciphertext) // 2

In [169]:
keys = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
for c in keys:
    total = 0
    local_dict = {a:0 for a in english_chars}

    buffer = base64.b16encode(c.encode()) * buffer_length
    int_buffer = int(buffer.decode(), 16)

    hex_XOR = hex(int_ciphertext ^ int_buffer)[2:]
    XOR_text = base64.b16decode(hex_XOR, casefold=True).decode()
    
    for d in XOR_text:
        if d.lower() in english_chars:
            local_dict[d.lower()] += 1
            total += 1
    local_dict = {c:local_dict[c]/total for c in english_chars}
    ch_sq = chisquare(list(local_dict.values()), list(frequency_dict.values()))[0]
    if ch_sq < inf_ch_sq:
        inf_ch_sq = ch_sq
        best_decipher = XOR_text
        key = c

print(f"The key was '{key}', with a Chi-Square value of {ch_sq:.2f}")
print("The corresponding plaintext was:")
print(best_decipher)    

The key was 'X', with a Chi-Square value of 6.94
The corresponding plaintext was:
Cooking MC's like a pound of bacon


That concludes task 3! (N.B. 'ETAOIN SHRDLU' is a phrase that occasionally popped up in old newspaper misprints, and is a reference to things that look like total gibberish.)

## Task 4: *Detect single-character XOR*

We have a file of hex-encoded strings, one of which is encrypted with a single-byte XOR mask. We can use the same code as for Task 3 to decrypt the strings, and keep any that have a Chi Square coefficient of less than, say, 10, and at least 25 viable letters.

We also need to give some thought to possible keys: whereas above, letters of the alphabet sufficed, here we need to be broader, and consider all possible single-byte keys, i.e. the integers 0 to 255.

In [170]:
keys = [hex(a)[2:] for a in range(256)]

In [171]:
viable_strings = {}
with open("s1t4_data.txt", "r") as list_of_strings:
    for s in list_of_strings:
        int_s = int(s[:-3], 16)
        buffer_length = len(s[:-3]) // 2
        for c in keys:
            total = 0
            local_dict = {a:0 for a in english_chars}

            buffer = c * buffer_length
            int_buffer = int(buffer, 16)

            hex_XOR = hex(int_s ^ int_buffer)[2:]

            if len(hex_XOR) % 2 == 1:
                hex_XOR = "0" + hex_XOR
            try:
                XOR_text = base64.b16decode(hex_XOR, casefold=True).decode()
            except:
                continue

            for d in XOR_text:
                if d.lower() in english_chars:
                    local_dict[d.lower()] += 1
                    total += 1
                
            try:
                local_dict = {c:local_dict[c]/total for c in english_chars}
            except:
                continue
            ch_sq = chisquare(list(local_dict.values()), list(frequency_dict.values()))[0]
            if ch_sq < 10 and total > 25:
                viable_strings[s[:-3]] = (c, XOR_text, ch_sq)

viable_strings

{'7b5a4215415d544115415d5015455447414c155c46155f4058455c5b52': ('35',
  'Now that the party is jumping',
  2.4370679438820475)}

That concludes task 4!

## Task 5: *Implement repeating-key XOR*