# Table of Contents<a name="toc"></a>
* [Convert hex to base64](#prob1)
* [Fixed XOR](#prob2)
* [Single-byte XOR cipher](#prob3)
* [Detect single-character XOR](#prob4)
* [Implementing repeating-key XOR](#prob5)
* [Breaking repeating-key XOR](#prob6)

# Convert hex to base64<a name="prob1"></a>

In [1]:
import binascii, base64

In [13]:
string_to_convert = "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d"

In [14]:
output = base64.b64encode(binascii.unhexlify(string_to_convert)).decode()
print(f"base64 encoded: {output}")

base64 encoded: SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t


# Fixed XOR<a name="prob2"></a>

In [2]:
%%capture
!pip install numpy
import numpy as np

In [16]:
cipher = "1c0111001f010100061a024b53535009181c"

In [17]:
xor_block = "686974207468652062756c6c277320657965"

In [8]:
def xor_encrypt(cipher: bytes, block: bytes) -> bytes:
    cipher_npa = np.frombuffer(cipher, dtype=np.uint8)
    block_npa = np.frombuffer(block, dtype=np.uint8)
    return np.bitwise_xor(cipher_npa, block_npa).tobytes()

In [19]:
ciphertext = xor_encrypt(binascii.unhexlify(cipher), binascii.unhexlify(xor_block))
print(f"ciphertext, hex: {binascii.hexlify(ciphertext).decode()}")
print(f"ciphertext, bytes: {ciphertext}")

ciphertext, hex: 746865206b696420646f6e277420706c6179
ciphertext, bytes: b"the kid don't play"


# Single-byte XOR cipher<a name="prob3"></a>

In [14]:
%%capture
!pip install nltk
import nltk
nltk.download("words")
from nltk.corpus import words

In [21]:
ciphertext = "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"

Iterate over all possible XOR values from 0x00 to 0xFF, and then compare all characters in the candidate cipher output against valid printable ASCII values, which range from 0x20 - 0x7E.

**Assumption** is that the output is printable ASCII.

In [22]:
ciphertext_b = binascii.unhexlify(ciphertext)
possible_results = {}
for i in range(0x100):
    block = np.ones(len(ciphertext_b), dtype=np.uint8) * i
    candidate = xor_encrypt(ciphertext_b, block.tobytes())
    if True not in map(lambda x: x < 0x20 or x > 0x7f, candidate):
        if True in map(lambda word: word in words.words(), candidate.decode().split()):
            possible_results[i] = candidate.decode()
for k, v in possible_results.items():
    print(f"XOR: 0x{k:02x}; cipher: {v}")

XOR: 0x58; cipher: Cooking MC's like a pound of bacon


# Detect single-character XOR<a name="prob4"></a>

In [23]:
class CipherCandidate:
    row: int
    xor: int
    cipher: str
    def __init__(self, row, xor, cipher):
        self.row = row
        self.xor = xor
        self.cipher = cipher
    def __str__(self):
        return f"Row: {self.row}; XOR: 0x{self.xor:02x}; Cipher: {self.cipher}"

In [24]:
results = []
fd = open("4.txt")
line = fd.readline()
idx = 0
while line:
    # remove newline
    if line.endswith("\n"):
        line = line[:-1]
    line_b = binascii.unhexlify(line)
    for xor in range(0x100):
        xor_block = np.ones(len(line_b), dtype=np.uint8) * xor
        cipher = xor_encrypt(line_b, xor_block.tobytes())
        if True not in map(lambda x: x < 0x20 or x > 0x7e, cipher):
            if True in map(lambda word: word in words.words(), cipher.decode().split()):
                results.append(CipherCandidate(idx, xor, cipher.decode()))
    line = fd.readline()
    idx += 1
for r in results:
    print(r)
fd.close()

Row: 225; XOR: 0x71; Cipher: F J\{?O#`F[KpB=, r}77OF'X}|cS
Row: 295; XOR: 0x70; Cipher: F9QoQt&unYk<(=w9R|X{Z #oVYq N


I clearly couldn't figure out what the message was supposed to be.

# Implementing repeating-key XOR<a name="prob5"></a>

In [25]:
phrase = "Burning 'em, if you ain't quick and nimble\nI go crazy when I hear a cymbal"

In [26]:
xor_block = b"ICE"

In [73]:
def xor_block_encrypt(cipher: bytes, xor_block: bytes) -> bytes:
    while len(cipher) > 0:
        if len(xor_block) > len(cipher):
            xor_block = xor_block[:len(cipher)]
            yield xor_encrypt(cipher, xor_block)
        segment = cipher[:len(xor_block)]
        cipher = cipher[len(xor_block):]
        yield xor_encrypt(segment, xor_block)

In [28]:
phrase_b = bytes(phrase.encode("utf8"))
result = b""
for ciphertext in xor_block_encrypt(phrase_b, xor_block):
    result += ciphertext
print(binascii.hexlify(result).decode())

0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f282f


# Breaking repeating-key XOR<a name="prob6"></a>

## Implement Hamming Distance function
It's worth noting that the XOR operation itself will naturally generate a binary stream of where the two strings are different. Counting up the number of 1's in the output binary stream will result in the Hamming Distance between the two strings.

In [3]:
def hamming(str1: bytes, str2: bytes) -> int:
    diff = xor_encrypt(str1, str2)
    dist = 0
    for n in diff:
        a = np.array([i for i in map(lambda x: int(x), list(f"{n:08b}"))])
        dist += np.sum(a)
    return dist

Make sure the hamming distance between strings "`this is a test`" and "`wokka wokka!!!`" is `37`.

In [30]:
str1 = b"this is a test"
str2 = b"wokka wokka!!!"
hamming(str1, str2)

37

## Determine XOR keysize

In [4]:
%%capture
!pip install matplotlib
import pylab as plt

In [5]:
from collections import namedtuple
XORKeysize = namedtuple("XORKeysize", ["keysize", "hamming_dist", "hamming_dist_normal"])

In [6]:
import base64
with open("6.txt") as fd:
    data = fd.read()
data = data.replace("\n", "")
data = base64.b64decode(data)

In [9]:
import pprint
keysize_tests = []
for i in range(2, 41):
    h = hamming(data[:i], data[i:2*i])
    keysize_tests.append(XORKeysize(i, h, h/i))
pprint.pprint(keysize_tests)

[XORKeysize(keysize=2, hamming_dist=5, hamming_dist_normal=2.5),
 XORKeysize(keysize=3, hamming_dist=6, hamming_dist_normal=2.0),
 XORKeysize(keysize=4, hamming_dist=14, hamming_dist_normal=3.5),
 XORKeysize(keysize=5, hamming_dist=6, hamming_dist_normal=1.2),
 XORKeysize(keysize=6, hamming_dist=24, hamming_dist_normal=4.0),
 XORKeysize(keysize=7, hamming_dist=21, hamming_dist_normal=3.0),
 XORKeysize(keysize=8, hamming_dist=24, hamming_dist_normal=3.0),
 XORKeysize(keysize=9, hamming_dist=32, hamming_dist_normal=3.5555555555555554),
 XORKeysize(keysize=10, hamming_dist=33, hamming_dist_normal=3.3),
 XORKeysize(keysize=11, hamming_dist=29, hamming_dist_normal=2.6363636363636362),
 XORKeysize(keysize=12, hamming_dist=39, hamming_dist_normal=3.25),
 XORKeysize(keysize=13, hamming_dist=33, hamming_dist_normal=2.5384615384615383),
 XORKeysize(keysize=14, hamming_dist=45, hamming_dist_normal=3.2142857142857144),
 XORKeysize(keysize=15, hamming_dist=44, hamming_dist_normal=2.933333333333333)

It seems that the smallest normalized hamming distance calculated is when the keysize is 5 bytes (1.2). Tests could be done to check to see if this is statistically significant, but as the next lowest normalize value is (2.0) and the vast majority of values falling between 2.5 and 3.5, 1.2 is an outlier.

In [10]:
xor_keysize = 5

## Determine XOR key

We're going to trim the data so that it is evenly divisble by the keysize, and then convert it into a matrix.

In [11]:
trimsize = len(data) % xor_keysize
if trimsize > 0:
    data = data[:(-1 * trimsize)]
data_block = []
for i in range(0, len(data), xor_keysize):
    data_block.append(list(data[i:i+xor_keysize]))
data_block = np.array(data_block, dtype=np.uint8)

Iterate over every column, logging XOR byte values that when applied result in over 90% of the values to fall within the printable ASCII range.

In [64]:
XORCandidate = namedtuple("XORCandidate", ["byte_position", "byte_value", "ascii_score"])

ascii_threshold = 0.9

def isprintableascii(x):
    if x >= 0x20 and x <= 0x7e:
        return 1
    else:
        return 0

candidates = {}
for pos in range(xor_keysize):
    xor_candidates = []
    ciphertext = data_block[:, pos]
    for xor in range(0x100):
        block = np.ones(len(ciphertext), dtype=np.uint8) * xor
        cipher = np.bitwise_xor(ciphertext, block).tobytes()
        isascii = np.array([x for x in map(isprintableascii, cipher)])
        ascii_score = np.sum(isascii) / len(ciphertext)
        if ascii_score >= ascii_threshold:
            xor_candidates.append(XORCandidate(pos, xor, ascii_score))
    candidates[pos] = xor_candidates
            
# pprint.pprint(candidates)

## Sort by top ASCII scores
The process has returned multiple possible XOR values. The likely reason for this is that multiple XOR masked values resulted in still-valid ASCII values. Clean up `candidates` by keeping top ASCII score.

In [63]:
for byte_pos, xor_candidates in candidates.items():
    print(f"Byte Position: {byte_pos}")
    max_ascii_score = 0
    for candidate in xor_candidates:
        if candidate.ascii_score > max_ascii_score:
            max_ascii_score = candidate.ascii_score
    for candidate in xor_candidates:
        if candidate.ascii_score >= max_ascii_score:
            print(candidate)

Byte Position: 0
XORCandidate(byte_position=0, byte_value=97, ascii_score=0.96)
XORCandidate(byte_position=0, byte_value=107, ascii_score=0.96)
XORCandidate(byte_position=0, byte_value=124, ascii_score=0.96)
Byte Position: 1
XORCandidate(byte_position=1, byte_value=109, ascii_score=0.9617391304347827)
Byte Position: 2
XORCandidate(byte_position=2, byte_value=97, ascii_score=0.9356521739130435)
Byte Position: 3
XORCandidate(byte_position=3, byte_value=109, ascii_score=0.9495652173913044)
Byte Position: 4
XORCandidate(byte_position=4, byte_value=102, ascii_score=0.9478260869565217)
XORCandidate(byte_position=4, byte_value=107, ascii_score=0.9478260869565217)


In [81]:
xor_block = bytes([107, 109, 97, 109, 107])
result = b""
for ciphertext in xor_block_encrypt(data, xor_block):
    result += ciphertext
result.decode()

'v/~ `do~"x%Q{\x04\x0et$}c%xljj%plw(]ks`$\x0fK9|p(X>/\x089mm&9wf(,jog>sQgue(wbz"u!Jq*Jmnv*8|obm\x05Kl8mV{mgwv,vj?5]4kMrgn,"s%c.#\x04S}bU$kncw\'j$t4\x19\x13\x01\t[mpd*`)k>qvap)\x1fi\x7fl"qdpo9\x11\x12(m#Qm{~"q"$%dv`2iQj?xl`*~gm\'Z22\x0f~mjhj?`z y}"\x14RX`pldb-l"|#\x13%%F?ola$5#c,a"K?e\x15fvr$cme}1a?[\x02\x08~$ii9al`l#c}8jKqx&ijcr$p#\x19]\x02]?klb?kfd8%oq>cP}gh"dbq"PkW>*@w$\x05^$?oq>q$hw|\x1fgk,bien"?\'V#aB`"`i#|fx5p$eq$3C` xlylg4>\x13%"\x03kj`*2pgkmvgnta[/5&C`,Ieq(Y=*\x0fTk$,G\x1fVc$lzl8)Zejug#tqemjJw?Az(tm29@(<lnd>lZ(\x19Ala,|d92Z.mMvj(~kxlr(%e$viR`3,pmow.\x15\x1c[.aVvw#u9~qa/$$cj$Tk9\n[l*xggm\\7+\x038ad\x7f2|#GmlmllzZd9rlj,lp~&PqA{{awijf%`$#jmk}Pj8&colvs|)\x19]\x02\x0er(jbktp($rl(nf^{v \x08Qdp"~"G7$Lj$|kkf%p%`}$~gIk?aa%kwj??[65\x0fpq#i&?\tI/g$K>gX`9dimiz"q(G%(Q?vmk/9b`4/ik|([/9vhnu?\x0e\x15\x12A0,J36%!`5\\k*#zl}.Vjz&{b\'9sx#W6kEv{wi%9}gk\x0fK|9}\x1fguf"hu5j|*Q{>F9hj~kkmamgaef(Ob~u$qbkaj,[wK|v"J&.~m(\'vji>mM.lp(bd{"~,X4mJk"ve4wg.*`mf8\x02\x04%++7/Up$2l\x15\x1a%@po%c#5va&f.swa]$\x15@mq 

Somewhat unclear what the actual output should be, but clearly I'm not getting it.