# Problem 1

## a)
Note that the definition on the assignment is missing a constant factor of 26 (size of English alphabet). Using the assignment definition for index of coincidence ,$\text{IC}_\text{Moby Dick} \approx 0.0659344$; using the definition with the constant factor, $\text{IC}_\text{Moby Dick} \approx 1.71429$, which matches closely with the "reference IC" listed on [Wikipedia](https://en.wikipedia.org/wiki/Index_of_coincidence) of approximately 1.73.

```rust
/// Bascially factorial
pub fn permutation(n: usize) -> usize {
    let mut mul = 1;
    for i in 1..=n {
        mul = mul * i;
    }
    return mul;
}

/// "n choose c"
pub fn combination(n: usize, c: usize) -> usize {
    if n < c {
        return 0;
    }
    let mut mul = 1;
    for i in (n - c + 1)..=n {
        mul *= i;
    }
    for i in 1..=c {
        mul /= i;
    }

    return mul;
}

pub fn index_of_coincidence(text: &str) -> f64 {
    let textlen = text.chars().filter(|c| c.is_alphabetic()).count();

    let mut charcounts: HashMap<char, usize> = HashMap::new();
    text.chars().filter(|c| c.is_alphabetic()).for_each(|c| {
        let count = charcounts.get(&c).unwrap_or(&0);
        charcounts.insert(c, count + 1);
    });

    let numerator = charcounts
        .iter()
        .map(|(_c, count)| combination(*count, 2))
        .sum::<usize>();
    let denominator = combination(textlen, 2);

    return (numerator as f64) / (denominator as f64);
}

fn main() {
    let text = fs::read_to_string("inputs/q1-mobydick.txt").unwrap();
    let reference_ioc = vigenere::index_of_coincidence(&text);
    println!("{}", reference_ioc * 26.);
}
```

## b)
**UW user id `g66xu`**

Using the code below, we can rank all possible blocksizes by some kind of metrics that measure the average distance between sub-ciphertext IoC and the reference IoC. In this implementation, I choose to use mean-square-error (mean of square of difference between IoC's). For my input, the top candidates are:

|block size|MSE IoC|
|:---|:---|
|21|0.000007311664312279843|
|42|0.000012634178853011177|
|63|0.000022382459134919598|
|84|0.00002575462524269855|
|28|0.0003801167882366923|

Notice that the first four candidates are multiples of the top candidate, which makes sense since we can simply concatenate keys without affecting the encrpytion scheme. I think this is strong evidence that **21** is the correct key size.

```rust
/// Return a copy of the input plaintext but sliced acccording to block size
/// and offset
fn copy_alignment(bytes: &[u8], blocksize: usize, offset: usize) -> Vec<u8> {
    let mut output = vec![];

    let mut i = offset;
    while i < bytes.len() {
        output.push(*bytes.get(i).unwrap());
        i += blocksize;
    }

    return output;
}

/// For each of the sub-ciphertext, compute the IoC, the score is the MSE
/// between subciphertext's IoC's against the reference IoC
fn score_blocksize(
    ciphertext: &[u8],
    reference_ioc: f64,
    blocksize: usize,
) -> f64 {
    let mut mse = 0.;
    for offset in 0..blocksize {
        let subciphertext = copy_alignment(ciphertext, blocksize, offset);
        let subciphertext_str = String::from_utf8(subciphertext).unwrap();
        let sub_ioc = index_of_coincidence(&subciphertext_str);
        mse += (reference_ioc - sub_ioc) * (reference_ioc - sub_ioc);
    }

    return mse / (blocksize as f64);
}

/// Search all possible block sizes. For each block size, compute the index of
/// coincidence and and average distance to the reference index of coincidence.
/// Rank the block sizes based on the distance to the refernece IoC
pub fn search_blocksize(
    ciphertext: &[u8],
    reference_ioc: f64,
    max_blocksize: usize,
) -> Vec<(usize, f64)> {
    let mut blocksizes = (1..=max_blocksize)
        .map(|blocksize| {
            let dist = score_blocksize(ciphertext, reference_ioc, blocksize);
            return (blocksize, dist);
        })
        .collect::<Vec<(usize, f64)>>();

    blocksizes.sort_by(|elem1, elem2| {
        let (_, mse1) = elem1;
        let (_, mse2) = elem2;
        return mse1.partial_cmp(mse2).unwrap();
    });

    return blocksizes;
}

fn main() {
    let text = fs::read_to_string("inputs/q1-mobydick.txt").unwrap();
    let reference_ioc = vigenere::index_of_coincidence(&text);
    let ciphertext = fs::read_to_string("inputs/q1-ciphertext.txt").unwrap();
    let blocksizes =
        vigenere::search_blocksize(ciphertext.as_bytes(), reference_ioc, 100);
    blocksizes.iter().for_each(|(blocksize, score)| {
        println!("{blocksize}: {score}");
    })
}
```

## c)
The approach in part (b) relies on the assumption that the same character at the same offset in each plaintext block is always encrypted to the same character in the ciphertext (e.g. if an $A$ at offset $2$ of one plaintext block is encrypted to $E$, then any $A$ at offset $2$ of all plaintext blocks is encrypted to $E$). However, this assumption is not true for the transposition cipher, where the corresponding ciphertext character of a block depends on the content of the rest of the block. Therefore, **computing index of coincidence using aligned ciphertext blocks is not an effective approach to identify the blocksize of a transposition cipher**.

# Problem 2
