UW user id: `g66xu`

# Problem 1
A large number of helper functions have been implemented to facilitate the solution (and it was just fun anyways). They are included at the end of the write-up for problem 1

## a)
The bias of the linear relationship over my inputs is approximately 0.00205.

In [1]:
import os
import sys

sys.path.insert(0, "/Users/ganyuxu/opensource/linear-cryptanalysis")

from cryptanalysis.heys import HeysCipher
from cryptanalysis.attack import read_inputs, get_bias, check_sec34_linear_approx

plaintexts = read_inputs("inputs/a2q1plaintexts.txt")
ciphertexts = read_inputs("inputs/a2q1ciphertexts.txt")
n_pairs = len(plaintexts)

round_key_guess = 0b0000011100000110
guess = HeysCipher([0, 0, 0, 0, round_key_guess])

print(
    get_bias(
        plaintexts, ciphertexts, guess, check_sec34_linear_approx
    )
)

0.0020499999999999963


## b)

We will iterate through all possible values of these 8 bits and compute the "section 3.4" bias. The values that yield the largest amount of vias is the most likely value.

The most possible values are as follows:

|key bits (5th, 6th, ..., 15th, 16th bit of $K_5$)|bias|
|:---|:---|
|0,0,0,0,0,1,0,0|0.026800|
|0,0,0,0,1,0,0,1|0.026800|
|0,0,1,1,0,1,0,0|0.026350|
|1,1,0,1,0,1,0,0|0.026000|
|0,0,0,1,0,1,0,0|0.024650|

In [2]:
rankings = []

for bits_5_to_8 in range(0b0000, 0b1111 + 1):
    for bits_13_to_16 in range(0b0000, 0b1111 + 1):
        round_key = (bits_5_to_8 << 8) + bits_13_to_16
        guess = HeysCipher([0, 0, 0, 0, round_key])
        bias = get_bias(
            plaintexts, ciphertexts, guess, check_sec34_linear_approx
        )
        rankings.append((bias, round_key))

rankings.sort(key=lambda x: x[0], reverse=True)
for bias, round_key in rankings[:5]:
    print(f"round key: {round_key:016b}, bias: {bias:0.6f}")

round key: 0000000000000100, bias: 0.026800
round key: 0000000000001001, bias: 0.026800
round key: 0000001100000100, bias: 0.026350
round key: 0000110100000100, bias: 0.026000
round key: 0000000100000100, bias: 0.024650


## c)
Finding the bias of the individual linear relationship involves computing the input sum and the output sum. Then we use table 4 to look up the expected bias. The results are as follows:

|linear relation|input sum|output sum|expected bias|
|:---|:---|:---|:---|
|$S_{11}: X_1 \oplus X_4 \approx Y_1$|9|8|-4/16|
|$S_{13}: X_1 \oplus X_4 \approx Y_1$|9|8|-4/16|
|$S_{21}: X_1 \oplus X_3 \approx Y_2$|10|4|-4/16|
|$S_{32}: X_1 \approx Y_1 \oplus Y_2 \oplus Y_3 \oplus Y_4 $|8|15|-6/16|

We can trace through the Heys Cipher write down the linear relationships of the intermediary state before arriving at a final linear approximation:

$$
\begin{aligned}
U_{1,1} &= P_1 \oplus K_{1,1} \\
U_{1,4} &= P_4 \oplus K_{1,4} \\
U_{1,9} &= P_9 \oplus K_{1,9} \\
U_{1,12} &= P_{12} \oplus K_{1,12} \\
V_{1,1} &\approx U_{1,1} \oplus U_{1,4}, \epsilon = -\frac{1}{4} \\
V_{1,9} &\approx U_{1,9} \oplus U_{1,12}, \epsilon = -\frac{1}{4} \\
U_{2,1} &= V_{1,1} \oplus K_{2,1} \\
U_{2,3} &= V_{1,9} \oplus K_{2,3} \\
V_{2,2} &\approx U_{2,1} \oplus U_{2,3}, \epsilon = -\frac{1}{4} \\
U_{3,5} &= V_{2,2} \oplus K_{3,5} \\
V_{3,5} \oplus V_{3,6} \oplus V_{3,7} \oplus V_{3,8} &\approx U_{3,5}, \epsilon = -\frac{3}{8}
\end{aligned}
$$

Therefore:

$$
\begin{aligned}
V_{3,5} \oplus V_{3,6} \oplus V_{3,7} \oplus V_{3,8} &\approx U_{3,5} \\
&\approx V_{2,2} \oplus K_{3,5} \\
&\approx U_{2,1} \oplus U_{2,3} \oplus K_{3,5} \\
&\approx (V_{1,1} \oplus K_{2,1}) \oplus (V_{1,9} \oplus K_{2,3}) \oplus K_{3,5} \\
&\approx U_{1,1} \oplus U_{1,4} \oplus K_{2,1} \oplus U_{1,9} \oplus U_{1,12} \oplus K_{3,5} \\
&\approx P_1 \oplus K_{1,1} \oplus P_4 \oplus K_{1,4} \oplus K_{2,1} \oplus P_9 \oplus K_{1,9} \oplus P_{12} \oplus K_{1,12} \oplus K_{3,5} \\
\end{aligned}
$$

Add to the approximation above the following relationship:

$$
\begin{aligned}
U_{4,2} &= V_{3,5} \oplus K_{4,2} \\
U_{4,6} &= V_{3,6} \oplus K_{4,6} \\
U_{4,10} &= V_{3,7} \oplus K_{4,10} \\
U_{4,14} &= V_{3,8} \oplus K_{4,14} \\
\end{aligned}
$$

Finally, move everything to onde side, We have the following approximation:

$$
P_1 \oplus P_4 \oplus P_9 \oplus P_{12} 
\oplus U_{4,2} \oplus U_{4,6} \oplus U_{4,10} \oplus U_{4,14}
\oplus \sum\text{bunch of key bits}
\approx 0
$$

Where the bunch of key bits include: $K_{1,1}, K_{1,4}, K_{1,9}, K_{1,12}, K_{2,1}, K_{3,5}, K_{4,2}, K_{4,6}, K_{4,10}, K_{4,14}$

The final bias can be approximated using the piling-up lemma:

$$
\epsilon = 2^3 \cdot \Pi_{i=1}^{4}\epsilon_i = 8 \cdot (-\frac{1}{4}) \cdot (-\frac{1}{4}) \cdot (-\frac{1}{4}) \cdot (-\frac{3}{8}) = \frac{3}{64}
$$

Also, a helpful diagram:

<img src="./static/a2q1c.png" width=500></img>

## d.ii)
Fixing the sum of key bits to be $0$, we arrive at the following linear approximation:

$$
P_1 \oplus P_4 \oplus P_9 \oplus P_{12} 
\oplus U_{4,2} \oplus U_{4,6} \oplus U_{4,10} \oplus U_{4,14}
\approx 0
$$

Because $U_{4,2}, U_{4,6}, U_{4,10}, U_{4,14}$ are involved in all four S-boxes, without further insights into the constructions of the S-boxes, the best approach to finding the fifth round key is to brute-force all $2^{16}$ possible values, which is impractical to do on a personal computer running Python. So I choose (ii) and will manually analyze the first 10 pairs using the round key `0b1100011111100110`.

|plaintext|ciphertext|U4|$P_1 \oplus P_4 \oplus P_9 \oplus P_{12} \oplus U_{4,2} \oplus U_{4,6} \oplus U_{4,10} \oplus U_{4,14}$|
|:---|:---|:---|:---|
|`0000000000000001`|`1111010100100110`|`1000010010111110`|0|
|`0000000000000010`|`0100110011011100`|`0111011010001001`|0|
|`0000000000000011`|`1110101100010001`|`0100101101011111`|1|
|`0000000000000100`|`1011001010101110`|`1111110000010111`|1|
|`0000000000000110`|`1111101111000001`|`1000101101001111`|0|
|`0000000000001000`|`1110010101000000`|`0100010010011010`|0|
|`0000000000001010`|`1011000010100110`|`1111111100011110`|1|
|`0000000000010001`|`1111100100010111`|`1000000001010011`|0|
|`0000000000010101`|`1000010011010011`|`0001100010001100`|0|
|`0000000000011010`|`1101110000001101`|`0011011000000110`|1|

The relationship holds 6 out of 10 times, so the bias is $\frac{6}{10} - \frac{1}{2} = \frac{1}{10}$


## d.i)
Knowing that the expected bias is $3/64 \approx 0.046875$, using the code at the end I got the following output:

```
round key: 1101000000011001, bias: 0.052649999999999975
round key: 1101011000011001, bias: 0.029799999999999993
round key: 1100000000011001, bias: 0.02915000000000001
round key: 1101010000011001, bias: 0.02915000000000001
round key: 1101000000011010, bias: 0.02905000000000002
```

We notice that round key `1101000000011001` has particularly high bias, so it is the most likely round key

```rust
use std::error::Error;
use std::fs;

type Result<T> = core::result::Result<T, Box<dyn Error>>;

#[allow(dead_code)]
const SBOX: [(u16, u16); 16] = [
    (0x0, 0xE),
    (0x1, 0x4),
    (0x2, 0xD),
    (0x3, 0x1),
    (0x4, 0x2),
    (0x5, 0xF),
    (0x6, 0xB),
    (0x7, 0x8),
    (0x8, 0x3),
    (0x9, 0xA),
    (0xA, 0x6),
    (0xB, 0xC),
    (0xC, 0x5),
    (0xD, 0x9),
    (0xE, 0x0),
    (0xF, 0x7),
];

const SBOX_INVERT: [(u16, u16); 16] = [
    (0xE, 0x0),
    (0x4, 0x1),
    (0xD, 0x2),
    (0x1, 0x3),
    (0x2, 0x4),
    (0xF, 0x5),
    (0xB, 0x6),
    (0x8, 0x7),
    (0x3, 0x8),
    (0xA, 0x9),
    (0x6, 0xA),
    (0xC, 0xB),
    (0x5, 0xC),
    (0x9, 0xD),
    (0x0, 0xE),
    (0x7, 0xF),
];

fn sbox_lookup(sbox: &[(u16, u16)], val: u16) -> Result<u16> {
    for (from, to) in sbox {
        if *from == val {
            return Ok(*to);
        }
    }
    return Err("Lookup failed".into());
}

/// A 16-bit block
struct Block {
    val: u16,
}

impl Block {
    fn new(val: u16) -> Self {
        return Self { val };
    }

    fn from_binstr(binstr: &str) -> Result<Self> {
        let val = u16::from_str_radix(binstr, 2)?;
        return Ok(Self::new(val));
    }

    fn mix_key(&self, key: u16) -> Self {
        return Self::new(self.val ^ key);
    }

    fn get_bit_1base(&self, loc: usize) -> Result<u16> {
        if loc < 1 || loc > 16 {
            return Err("loc must be between 1 and 16".into());
        }
        let mask = 1u16 << (16 - loc);
        if self.val & mask == 0 {
            return Ok(0);
        }
        return Ok(1);
    }

    fn substitute(&self, sbox: &[(u16, u16)]) -> Result<Self> {
        let b0 = self.val % 16;
        let b1 = (self.val >> 4) % 16;
        let b2 = (self.val >> 8) % 16;
        let b3 = (self.val >> 12) % 16;

        let b0 = sbox_lookup(&sbox, b0)?;
        let b1 = sbox_lookup(&sbox, b1)?;
        let b2 = sbox_lookup(&sbox, b2)?;
        let b3 = sbox_lookup(&sbox, b3)?;

        let b3 = b3 << 12;
        let b2 = b2 << 8;
        let b1 = b1 << 4;

        return Ok(Self::new(b3 + b2 + b1 + b0));
    }
}

fn check_linear_approx(pt: &Block, ct: &Block, key: u16) -> u16 {
    let u4 = ct.mix_key(key);
    let u4 = u4.substitute(&SBOX_INVERT).unwrap();

    let binsum = u4.get_bit_1base(2).unwrap()
        + u4.get_bit_1base(6).unwrap()
        + u4.get_bit_1base(10).unwrap()
        + u4.get_bit_1base(14).unwrap()
        + pt.get_bit_1base(1).unwrap()
        + pt.get_bit_1base(4).unwrap()
        + pt.get_bit_1base(9).unwrap()
        + pt.get_bit_1base(12).unwrap();
    return 1 - (binsum % 2);
}

fn compute_bias(plaintexts: &[Block], ciphertexts: &[Block], key: u16) -> f64 {
    let count = plaintexts
        .iter()
        .zip(ciphertexts.iter())
        .map(|(pt, ct)| check_linear_approx(pt, ct, key))
        .sum::<u16>();
    let prob = (count as f64) / (plaintexts.len() as f64);
    if prob > 0.5 {
        return prob - 0.5;
    }
    return 0.5 - prob;
}

fn main() {
    let plaintexts = fs::read_to_string(
        "/Users/ganyuxu/opensource/waterloo-cryptography/co687/inputs/a2q1plaintexts.txt",
    )
    .unwrap()
    .lines()
    .map(|line| Block::from_binstr(line).unwrap())
    .collect::<Vec<Block>>();
    let ciphertexts = fs::read_to_string(
        "/Users/ganyuxu/opensource/waterloo-cryptography/co687/inputs/a2q1ciphertexts.txt",
    )
    .unwrap()
    .lines()
    .map(|line| Block::from_binstr(line).unwrap())
    .collect::<Vec<Block>>();

    let mut guesses: Vec<(f64, u16)> = vec![];
    for round_key in 0u16..=0xffffu16 {
        let bias = compute_bias(&plaintexts, &ciphertexts, round_key);
        // println!("{round_key}: {bias}");
        guesses.push((bias, round_key));
    }

    guesses.sort_by(|a, b| {
        let (bias1, _) = a;
        let (bias2, _) = b;
        return bias2.partial_cmp(bias1).unwrap();
    });

    guesses.iter().take(5).for_each(|(bias, round_key)| {
        println!("round key: {round_key:16b}, bias: {bias:08}");
    });
}
```

## Appendix: Python source code
```python
# cryptanalysis/heys.py
"""A toy implementation of the substitution-permutation network
"""
from __future__ import annotations


def basedecomp(n: int, base: int, showzero: bool = False, width: int = 0):
    """Decompose n into a sequence of mantissa and exponents, where exponents are powers of base
    and mantissa is in the range(0, base)
    """
    power = 1
    width_used = 0

    while n != 0:
        remainder = n % base
        if remainder > 0 or showzero:
            yield (remainder, power)
            width_used += 1
        n = n // base
        power = power * base

    if width > 0 and showzero:
        while width_used < width:
            yield (0, power)
            power *= base
            width_used += 1


def find_key(lookup: dict, val, default):
    """Given a dictionary, find the first key whose value matches the input
    value, unless no such key exists, in which case the input default is
    returned
    """
    for k, v in lookup.items():
        if v == val:
            return k
    return default


class Block:
    """A 16-bit block of data, could be plaintext, ciphertext, or intermediary state"""

    def __init__(self, val: int):
        if val < 0 or val >= 2**16:
            raise OverflowError("Exceeded 16-bit limit")
        self.val = val

    def __eq__(self, other: object) -> bool:
        if not isinstance(other, Block):
            return False
        return self.val == other.val

    def __repr__(self):
        return f"<16-bit Block 0x{self.val:04x}>"

    def substitute(self, lookup: dict[int, int]) -> Block:
        """Perform an S-box substitution using basedecomp and base-16 lookup table"""
        newval = 0
        for r, e in basedecomp(
            self.val, 16, True, 4
        ):  # TODO: remove magic number
            newval += lookup.get(r, r) * e
        return Block(newval)

    def invert_substitute(self, lookup: dict[int, int]) -> Block:
        """Perform the inverse of the S-box substitution"""
        newval = 0
        for r, e in basedecomp(self.val, 16, True, 4):
            newval += find_key(lookup, r, r) * e
        return Block(newval)

    def permute(self, lookup: dict[int, int]) -> Block:
        """Perform a permutation using basedecomp and base-2 lookup table"""
        newval = 0
        for _, e in basedecomp(self.val, 2):  # TODO: remove magic number
            newval += lookup.get(e, e)
        return Block(newval)

    def invert_permute(self, lookup: dict[int, int]) -> Block:
        """Perform the inverse of the input permutation"""
        newval = 0
        for _, e in basedecomp(self.val, 2):
            newval += find_key(lookup, e, e)
        return Block(newval)

    def __xor__(self, other: object) -> Block:
        """XOR with a 16-bit (round) key or another Block"""
        if isinstance(other, Block):
            return Block(self.val ^ other.val)
        elif isinstance(other, int):
            if other < 0 or other >= 2**16:
                raise OverflowError("Can only XOR with u16 integer")
            return Block(self.val ^ other)
        raise TypeError("XOR with Block not defined")

    def get_bit_1base(self, loc: int) -> int:
        """Get the bit at the specified location as either 1 or 0.

        loc follows big-endianness and starts at 1
        """
        if not (1 <= loc <= 16):
            raise IndexError("1-based bit loc must be between 1 and 16")
        # Left shift 1 by the appropriate number of bits and do a bit-wise AND;
        # if the result is 0, return 0, else return 1
        mask = 1 << (16 - loc)
        bit = self.val & mask

        return 0 if bit == 0 else 1


class HeysCipher:
    PERMUTATION = {
        0x8000: 0x8000,  # 1,
        0x4000: 0x0800,  # 2,
        0x2000: 0x0080,  # 3,
        0x1000: 0x0008,  # 4,
        0x0800: 0x4000,  # 5,
        0x0400: 0x0400,  # 6,
        0x0200: 0x0040,  # 7,
        0x0100: 0x0004,  # 8,
        0x0080: 0x2000,  # 9,
        0x0040: 0x0200,  # 10,
        0x0020: 0x0020,  # 11,
        0x0010: 0x0002,  # 12,
        0x0008: 0x1000,  # 13,
        0x0004: 0x0100,  # 14,
        0x0002: 0x0010,  # 15,
        0x0001: 0x0001,  # 16,
    }
    SBOX = {
        0x0: 0xE,
        0x1: 0x4,
        0x2: 0xD,
        0x3: 0x1,
        0x4: 0x2,
        0x5: 0xF,
        0x6: 0xB,
        0x7: 0x8,
        0x8: 0x3,
        0x9: 0xA,
        0xA: 0x6,
        0xB: 0xC,
        0xC: 0x5,
        0xD: 0x9,
        0xE: 0x0,
        0xF: 0x7,
    }

    def __init__(self, round_keys: list[int]):
        """Initialize a cipher by providing it with the round keys.
        There should be exactly five round keys and each should be a 16-bit unsigned int
        """
        if len(round_keys) != 5:
            raise ValueError("Incorrect number of round keys")
        for rkey in round_keys:
            if rkey < 0 or rkey >= 2**16:
                raise OverflowError("Round key must be 16-bit unsigned int")
        self.round_keys = round_keys

    def encrypt(self, plaintext: Block) -> Block:
        u1 = plaintext ^ self.round_keys[0]
        v1 = u1.substitute(self.SBOX)
        u2 = v1.permute(self.PERMUTATION) ^ self.round_keys[1]
        v2 = u2.substitute(self.SBOX)
        u3 = v2.permute(self.PERMUTATION) ^ self.round_keys[2]
        v3 = u3.substitute(self.SBOX)
        u4 = v3.permute(self.PERMUTATION) ^ self.round_keys[3]
        v4 = u4.substitute(self.SBOX)

        return v4 ^ self.round_keys[4]

    def decrypt(self, ciphertext: Block) -> Block:
        pt = ciphertext ^ self.round_keys[4]
        pt = pt.invert_substitute(self.SBOX)
        pt = pt ^ self.round_keys[3]
        pt = pt.invert_permute(self.PERMUTATION)
        pt = pt.invert_substitute(self.SBOX)
        pt = pt ^ self.round_keys[2]
        pt = pt.invert_permute(self.PERMUTATION)
        pt = pt.invert_substitute(self.SBOX)
        pt = pt ^ self.round_keys[1]
        pt = pt.invert_permute(self.PERMUTATION)
        pt = pt.invert_substitute(self.SBOX)
        pt = pt ^ self.round_keys[0]

        return pt
```

```python
# cryptanalysis/attack.py
from typing import Callable
from .heys import Block, HeysCipher


def read_inputs(input: str) -> list[Block]:
    """Read lines of input text file and output list of blocks

    Each line of input text file is a big-endian binary encoding:
    1010101010101010 in text file encodes a block Block(0b1010101010101010)
    """
    blocks = []
    with open(input) as f:
        for line in f.read().splitlines():
            block = Block(eval(f"0b{line}"))
            blocks.append(block)
    return blocks


def check_sec34_linear_approx(pt: Block, ct: Block, cipher: HeysCipher) -> bool:
    """Given a pair of known PT-CT and a cipher (for the round keys), return
    True iff the following linear relationship holds:

    U[4,6] + U[4,8] + U[4,14] + U[4,16] + P[5] + P[7] + P[8] == 0 (mod 2)

    This relationship is stated in Heys' paper (section 3.4)
    https://www.engr.mun.ca/~howard/PAPERS/ldc_tutorial.pdf, and the expected
    bias is 1/32
    """
    U4 = ct ^ cipher.round_keys[4]
    U4 = U4.invert_substitute(cipher.SBOX)

    binsum = (
        U4.get_bit_1base(6)
        + U4.get_bit_1base(8)
        + U4.get_bit_1base(14)
        + U4.get_bit_1base(16)
        + pt.get_bit_1base(5)
        + pt.get_bit_1base(7)
        + pt.get_bit_1base(8)
    )
    return binsum % 2 == 0


def get_bias(
    plaintexts: list[Block],
    ciphertexts: list[Block],
    guess: HeysCipher,
    check: Callable[..., bool],
) -> float:
    """Given list of pairs, a guess at the key, and some linear relationship, 
    compute the bias.

    get_bias(plaintexts, ciphertexts, guess, check_sec34_linear_approx)
    """
    n_pairs = len(plaintexts)
    bias = abs(
        sum(
            [
                check(plaintexts[i], ciphertexts[i], guess)
                for i in range(n_pairs)
            ]
        )
        / n_pairs
        - 0.5
    )
    return bias
```