<p style="page-break-after:always;"></p>

UW user id: `g66xu`

# Problem 1

## a)
For this question, I plan to implement authenticated encryption using the "encrypt-then-MAC" approach. This means the encryption will largely follow this procedure:

```
func encryption(plaintext):
    password = prompt_password()
    encryption_key, signing_key = hkdf(password)
    ciphertext = AES.encrypt(encryption_key, plaintext)
    tag = HMAC.sign(signing_key, ciphertext)
    return ciphertext, tag
```

The decryption will be a reversal of the encryption algorithm:

```
func decryption(ciphertext, tag):
    password = prompt_password()
    encryption_key, signing_key = hkdf(password)
    HMAC.verify(tag, ciphertext)
    plaintext = AES.decrypt(encryption_key, ciphertext)
    return plaintext
```

Note that in the context of this question we know the tag to be obtained from an HMAC with SHA3-256, so we can simply hardcode the first (or last) 256 bits of the input "authenticated ciphertext" to be the tag. In production environment (such as with TLS), the choice of MAC needs to be encoded elsewhere so that the number of bits used for the tag can depend on the choice of the MAC algorithm.

## b)
```python
import base64
import getpass
import os
import sys
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.hmac import HMAC
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes    
from cryptography.hazmat.primitives import padding
# from cryptography.hazmat.primitives import constant_time

HKMAC_NITERS = 200000
# Params of AES128
AES_KEYSIZE = BLOCKSIZE = HMAC_KEYSIZE = 16 # bytes
BLOCKSIZE_BITS = 128
# Params of SHA3-256
TAGSIZE = 32  # bytes
# Number of bytes of salt
SALTSIZE = 10  # 80 bits

def bytes2string(b):
    return base64.urlsafe_b64encode(b).decode('utf-8')

def string2bytes(s):
    return base64.urlsafe_b64decode(s.encode('utf-8'))

def derive_cipher_suite(
    password: bytes, 
    nonce: bytes | None = None, 
    salt: bytes | None = None
):
    """Return the padder, the cipher, and the MAC from a single password

    If an input nonce is given, it will be used for the Block cipher CTR mode;
    otherwise a random nonce will be generated
    """
    salt = os.urandom(SALTSIZE) if salt is None else salt
    nonce = os.urandom(BLOCKSIZE) if nonce is None else nonce
    hkdf = PBKDF2HMAC(
        algorithm=hashes.SHA3_256(),
        length=AES_KEYSIZE + HMAC_KEYSIZE,  # we need two keys
        iterations=HKMAC_NITERS,
        salt=salt,
    )
    key = hkdf.derive(password)
    key_sign = key[:HMAC_KEYSIZE]
    key_enc = key[HMAC_KEYSIZE:]
    pad = padding.PKCS7(BLOCKSIZE_BITS)
    cipher = Cipher(algorithms.AES128(key_enc), modes.CTR(nonce))
    mac = HMAC(key=key_sign, algorithm=hashes.SHA3_256())

    return pad, cipher, nonce, mac, salt

def encrypt(message):
    
    # encode the string as a byte string
    plaintext = message.encode('utf-8')

    # Use getpass to prompt the user for a password
    password = getpass.getpass("Enter password:")
    password2 = getpass.getpass("Enter password again:")

    # Do a quick check to make sure that the password is the same!
    if password != password2:
        sys.stderr.write("Passwords did not match")
        sys.exit()

    ### START: This is what you have to change
    
    pad, cipher, nonce, mac, salt = derive_cipher_suite(password.encode())
    encryptor = cipher.encryptor()
    padder = pad.padder()

    plaintext = padder.update(plaintext) + padder.finalize()
    ciphertext = encryptor.update(plaintext) + encryptor.finalize()
    mac.update(ciphertext)
    tag = mac.finalize()

    return bytes2string(salt + nonce + ciphertext + tag)
    
    ### END: This is what you have to change

def decrypt(ciphertext):
    # prompt the user for the password
    password = getpass.getpass("Enter the password:")

    ### START: This is what you have to change
    ciphertext = string2bytes(ciphertext)
    salt, ciphertext = ciphertext[:SALTSIZE], ciphertext[SALTSIZE:]
    nonce = ciphertext[:BLOCKSIZE]
    tag = ciphertext[-TAGSIZE:]
    ciphertext = ciphertext[BLOCKSIZE:-TAGSIZE]
    pad, cipher, nonce, mac, _ = derive_cipher_suite(
        password.encode(), nonce, salt
    )
    decryptor = cipher.decryptor()
    unpadder = pad.unpadder()

    mac.update(ciphertext)
    mac.verify(tag)

    plaintext = decryptor.update(ciphertext) + decryptor.finalize()
    plaintext = unpadder.update(plaintext) + unpadder.finalize()
    
    ### END: This is what you have to change

    # decode the byte string back to a string
    return plaintext.decode('utf-8')
```

## c)
There are a few sources of overhead:

1. Padding will add up to an additional block's worth of overhead to the plaintext, which is up to 128 bits
2. Block cipher in CTR mode adds an additional block from the nonce, which is 128 bits
3. The messsage authentication code uses SHA3-256, which adds an additional 256 bits

Overall, **this authenticated encryption scheme adds up to 512 bits (64 bytes) of overhead** from padding, nonce, and the tag.

## d)
One common pitfall in implementing cryptographic protocols is using variable-time operations that leave the implementation vulnerable to side-channel attacks such as timing attack.

In the context of this question, one possible mistake is to use variable time comparison when verifying the MAC one byte at a time. If the verification time is linear with regard to the number of bytes that match, then an adversary can forge a signature one byte at a time, thus breaking authentication efficiently.

<p style="page-break-after:always;"></p>

UW user id: `g66xu`

# Problem 2

## a)
Alice's password is `excitement` (source: https://crackstation.net/).

## b)
```python
import hashlib

SALT = "27615912"
HASH = bytes.fromhex(
    "aacc7cb90ee724457d09b06024761ac51f791d3394438442f950b17a31e16baf"
)

if __name__ == "__main__":
    for x in range(0, 999999+1):
        password = f"{x:06d}"
        hash = hashlib.sha256((SALT + password).encode())
        if HASH == hash.digest():
            print(password)
```

The password is `918465`

## c)
Here is Alice's password strategy:

1. Password is always 12 characters long
2. The first part is an English word with the first letter capitalized
3. The last part is a single special character (one of `!, ?, *, $, #`)
4. Between the word and the special character, insert as many random integers as needed to reach the desired length

Here is the code for cracking the password:

```python
import hashlib

SPECIAL_CHARS = "!?*$#"
PWD_LEN = 12
SALT = "89535971"
HASH = "9edabf325bc63599725e1aa6a01a9b1a0bc09ff264bdbe7debb35f73a352a2cc"

def password_gen(
    special_chars: str,
    password_len: int,
    words: list[str],
):
    for word in words:
        word = word.capitalize()
        ndigits = password_len - 1 - len(word)
        for num in range(10 ** ndigits):
            for special_char in special_chars:
                num = str(num).zfill(ndigits)
                password = word + num + special_char
                yield password

if __name__ == "__main__":
    with open("inputs/a3q2word_list.txt") as f:
        words = f.read().splitlines()

    nhashes = 0
    for password in password_gen(SPECIAL_CHARS, PWD_LEN, words):
        password = SALT + password
        hash = hashlib.sha256(password.encode()).hexdigest()
        nhashes += 1
        if hash == HASH:
            print(password)
    
    print(nhashes)
```

The recovered password is `"Steering799!"` and we used 68,064,796 hashes before finding the password. With the given dictionary, there are a total of 79,683,500 distinct possible passwords.

## d)
The total number of distinct password is as follows:

$$
\text{num of distinct passwords} = 20000 \cdot 20000 \cdot 10^3 \cdot 6 = 2,400,000,000,000
$$

So we require this many hashes to obtain the hash of all passwords. My program checked 68,064,796 hashes in part (c), so the new password stragey does provide additional cryptographic strength

## e)
If we take "tera" to be `2^40`, then it takes 0.01984 seconds to try all hashes.

If we take "tera" to be `10^12`, then it takes 0.2182 seconds to try all hashes.

## f)
They could require the password to be longer or allow a bigger range of special characters, both of which will quickly increase the number of distinct possible passwords.

<p style="page-break-after:always;"></p>

UW user id: `g66xu`

# Problem 3

## a)
For each of the scheme, the verification algorithm is as follows:

1. Break the message into the appropriate blocks
2. Compute the tag using the secret key
3. Compare the computed tag with the provided tag. If they are identical, then the provided tag is valid; otherwise, the provided tag is not valid

## b)
**`BIGMAC` is existentially forgeable** under chosen message attack because the XOR of two valid tag is a valid tag of the concatenated message. Here is a existential forgery adversary:

1. The adversary generates three distinct messages $m_0, m_1, m_2 \in \{0, 1\}^l$ of equal length.
2. The adversary queries the tag of $m_0 \Vert m_1$, which is $t_1 = \text{MAC}(k, m_0) \oplus \text{MAC}(k, m_1)$
3. The adversary queries the tag of $m_0 \Vert m_2$, which we know to be $t_2 = \text{MAC}(k, m_0) \oplus \text{MAC}(k, m_2)$
4. $t_1 \oplus t_2$ is a valid tag of $m_1 \Vert m_2$ because:

$$
\begin{aligned}
t_1 \oplus t_2 &= (\text{MAC}(k, m_0) \oplus \text{MAC}(k, m_1))
\oplus (\text{MAC}(k, m_0) \oplus \text{MAC}(k, m_2)) \\
&= \text{MAC}(k, m_0) \oplus \text{MAC}(k, m_0)
\oplus \text{MAC}(k, m_1) \oplus \text{MAC}(k, m_2) \\
&= 0 \oplus \text{MAC}(k, m_1) \oplus \text{MAC}(k, m_2) \\
&= \text{MAC}(k, m_1) \oplus \text{MAC}(k, m_2) \\
&= \text{BIGMAC}(k, m_1 \Vert m_2)
\end{aligned}
$$

Thus we have forged a tag for a distinct message from the queried messages $\blacksquare$.

## c)
`WHOPPER` is not secure under chosen message attack because we can concatenate two tags and get a valid tag of the concatenated message.

1. The adversary generates two distinct blocks $m_1^\prime, m_2^\prime \in \{0, 1\}^l$, then generates the chosen messages $m_1 = m_1^\prime \Vert m_1^\prime, m_2 = m_2^\prime \Vert m_2^\prime$
2. The adversary queries the tag of each of the chosen messages. We know that $t_1 = t_1^\prime \Vert t_1^\prime, t_2 = t_2^\prime \Vert t_2^\prime$ where $t_1^\prime = \text{MAC}(k, m_1^\prime), t_2^\prime=\text{MAC}(k, m_2^\prime)$
3. $t = t_1^\prime \Vert t_2^\prime = \text{MAC}(k, m_1^\prime) \Vert \text{MAC}(k, m_2^\prime)$ is a valid tag for $m = m_1^\prime \Vert m_2^\prime$

Thus we have forged a tag for a distinct message from the queried messages $\blacksquare$.

<p style="page-break-after:always;"></p>

UW user id: `g66xu`

# Problem 4
The augmented AE scheme is not IND-CPA because **the chosen MAC is deterministic and used on the plaintext. Therefore, if the two halves of the plaintext are identical, then the two halves of the tag are guaranteed to be identical**.

An adversary can construct $m_1 = m_1[0] \Vert m_1[1]$ and $m_2 = m_2[0] \Vert m_2[1]$ such that the two halves of the first message are identical $m_1[0] = m_1[1]$ while the two halves of the second message are not identical $m_2[0] \neq m_2[1]$. When the adversary receives the challenge ciphertexts $c = (c_1, c_2, t_1, t_2)$, it claims the ciphertext to be the authenticated encryption of $m_1$ if $t_1 = t_2$, otherwise it claims authenticated encryption of $m_2$. The probability that the adversary loses is the probability that the tags of two distinct messages are identical, which is $2^{-l}$. Therefore, this adversary will have overwhelming advantage at winning the IND-CPA game.

<p style="page-break-after:always;"></p>

UW user id: `g66xu`

# Problem 5

## a)
$A(k, x) = F(k, x) \Vert F(k, x)$ is not indistinguishable from truly random functions because for $A$ the **first half and the second half of the output is always identical** while in a truly random function, the probability of having identical halves is negligible.

We can construct an adversary who outputs "not random" if the first half of $A(k, x)$ is identical to the second half of $A(k, x)$. By the argument above, this adversary has overwhelming advantage.

## b)
$B(k, x_1 \Vert x_2) = F(k, x_1) \oplus F(k, x_2)$ is not indistinguishable from truly random function under CPA because **if the two halves of the input are identical, then the output of the function is guaranteed to be all 0's**.

We can construct an adversary who queries the output of some $x_0 = x_0[0] \Vert x_0[1]$ such that $x_0[0] = x_0[1]$. If the returned result is all 0's then the adversary claims the challenge output to be non-random. The probability that this adversary is wrong is the probability that a truly-random string has all 0's, which is negligible, so this adversary will have overwhelming advantage.

## c)
$C(k, x) = F(k, 0 \Vert x) \Vert F(k, 1 \Vert x)$ is a secure PRF.

First, because $F$ is indistinguishable from a truly random function, we can replace $F$ with a truly random function $G$ in the construction of $C$. In other words $C^\prime(k, x) = G(k, 0 \Vert x) \Vert G(k, 1 \Vert x)$ is computationally indistinguishable from $C(k, x) = F(k, 0 \Vert x) \Vert F(k, 1 \Vert x)$. It remains to show that $C^\prime$ is indistinguishable from truly random function.

Notice that $0 \Vert x$ and $1 \Vert x$ are never equal, so the outputs of $G(k, 0 \Vert x)$ and $G(k, 1 \Vert x)$ are guaranteed to be independently sampled random outputs by the definition of a truly random function. We claim without proof that the concatenation of two independent random sample is itself statistically indistinguishable from a truly random sample from the concatenated space. Therefore, $C^\prime$ itself is indistinguishable from a truly random function.

<p style="page-break-after:always;"></p>

UW id is `g66xu`

# Problem 6

## a)
**Using non-distinct primes to generate the modulus is a bad idea** because there exists logarithmic "integer square roots" algorithm. This means that knowing the modulus to be an integer square, an adversary can efficiently factor the modulus and thus break the encryption.

Note: here is a $O(\log(N))$ implementation of integer square root:

```python
def fast_intsqrt(n: int) -> int | None:
    """return the integer square root if it exists; else return 0
    """
    if n < 0:
        return None
    # Binary search the smallest x such that x^2 >= n
    x = jump = n // 2 + 1
    print(x, jump)
    while jump > 0:
        while x - jump >= 0 and (x - jump) * (x - jump) >= n:
            x -= jump
            print(x, jump)
        jump = jump // 2
        print(x, jump)
    
    if x * x == n:
        return x
    return None
```

## b)
Denote the two distinct public exponents by $e_1, e_2 \in \mathbb{Z}$ and the common message to be $m \in \mathbb{Z}_n^*$. Assuming that $e_1, e_2$ are relatively prime, we can use the extended Euclid algorithm to find integers $r_1, r_2$ such that:

$$
r_1e_1 + r_2e_2 = 1
$$

Recall that the ciphertexts of the common message $m$ under the two exponents are $c_1 \equiv m_1^{e_1} \mod n$ and $c_2 \equiv m_2^{e_2} \mod n$. The adversary can raise the two ciphertexts to $r_1$ and $r_2$ exponents respectively, then multiply them together:

$$
\begin{aligned}
c_1^{r_1} \cdot c_2^{r_2} &\equiv (m^{e_1})^{r_1} \cdot (m^{e_2})^{r_2} \mod n \\
&\equiv m^{r_1e_1 + r_2e_2} \mod n \\
&\equiv m \mod n
\end{aligned}
$$

Thus the adversary has obtained the message

## c)
If the adversary knows that Alice and Alicia are using related modulus $N_1 = pq$ and $N_2 = pr$, then the adversary can use the Euclid algorithm to find the GCD between $N_1, N_2$, which will be the common prime factor $p$ with extremely high probability. The adversary can then use $p$ to find the other two factors $q, r$, thus breaking RSA.