UW user ID `g66xu`

# Problem 1
## a)
I found the following cryptographic weaknesses in the encryption procedure from the attacker log

1. **`seed = datetime.now().strftime("%Y-%m-%d %H:%M")` is weak source of entropy**. The value of the seed only changes every minute, and the change is predictable. The encryption scheme should use a stronger and more unpredictable source of entropy such as system noise (e.g. `os.urandom` in Python) or radioactive decay
1. **The RSA modulus is generated incorrectly: the two prime factors are too close**. When the two prime factors are very close, the RSA modulus $n$ is very close to a perfect square, which means that trial divisions that start at the square root of $n$ can efficiently find a factorization. Instead, the two prime factors should be generated independently (but with similar lengths).
1. **IV is hard coded**, which makes the CBC mode of operation deterministic. Instead, a different (a possibly random) IV should be generated for each run of the encryption.
1. **The public exponent is not necessarily invertible under $\phi(n)$**. In the attacker log, the public exponent is simply randomly generated without checking whether it is invertible under modulus $\phi(n)$. While for a ransomwhere attack, correct decryption might not be important, in normal usage of RSA the public exponent must be invertible under $\phi$ for the secret key to be defined.
1. **The public modulus $n = pq$ is too short**. In `gen_rsa_pk`, both both primes are 256 bits in length, so the overall public modulus is 512 bits, which is too short. Instead, a minimum of 1024 bits key size is needed, and 2048 bits key size is recommended.
1. **There is no integrity of ciphertext**. Instead, a MAC should be used to ensure the integrity of the ciphertext.

## b)
I took advantage of the fact that the RSA modulus is close to being a perfect square to efficiently factor the modulus. From there I used the prime factorization to obtain the secret key by computing the multiplicative inverse of the public exponent under modulus $\phi$.

```python
def sqrt_ceil(modulus: int):
    """Find the smallest integer x such that x * x >= modulus using a binary
    search
    
    Note that math.ceil(math.sqrt(modulus)) doesn't work because you lose too
    many digits of precision
    """
    x = modulus
    jump = modulus // 2

    while jump > 0:
        while x - jump > 0 and (x - jump) * (x - jump) >= modulus:
            x -= jump
        jump = jump // 2
    
    return x

def trial_division(modulus: int):
    """Trial divisions, except we start we the ceiling of square root and
    counting down
    """
    halfway = sqrt_ceil(modulus)
    assert halfway * halfway >= modulus
    for (i, p) in enumerate(range(halfway, 0, -1)):
        if modulus % p == 0:
            return p

with open("./static/a4/encrypted_assignment.json.txt") as f:
    ciphertext = json.loads(f.read())
n = ciphertext["n"]
p = trial_division(n)
q = n // p
phi = (p-1) * (q-1)
d = pow(ciphertext["e"], -1, phi)  # this is the RSA decryption key
aes_key = pow(ciphertext["c_1"], d, n).to_bytes(length=16, byteorder="big")
iv = b"1337c0487c068711"
cipher = Cipher(algorithms.AES(aes_key), modes.CBC(iv)).decryptor()
pdf_ciphertext = base64.urlsafe_b64decode(ciphertext["c_2"].encode())
pdf_plaintext = cipher.update(pdf_ciphertext) + cipher.finalize()
unpadder = padding.PKCS7(128).unpadder()
pdf_plaintext = unpadder.update(pdf_plaintext) + unpadder.finalize()
with open("./static/a4/a4.pdf", "wb") as f:
    f.write(pdf_plaintext)
```

Note that `sympy.ntheory.factorint` can also efficiently factor the modulus in my inputs, although when I generate two independent primes of 256 bits each (instead of being consecutive primes), `factorint` also failed to factor the integers in time. I suspect that it also uses trial division.

## c)
My random number is `90182`

<p style="page-break-after:always;"></p>

# Problem 2

## a)
**Using non-distinct primes to generate the modulus is a bad idea** because integer square root (not quadratic residue!) can be efficiently computed.

Here is an integer square root implemented using binary search:

```python
def fast_intsqrt(n: int) -> int | None:
    """return the integer square root if it exists; else return 0
    """
    if n < 0:
        return None
    # Binary search the smallest x such that x^2 >= n
    x = jump = n // 2 + 1
    while jump > 0:
        while x - jump >= 0 and (x - jump) * (x - jump) >= n:
            x -= jump
            print(x, jump)
        jump = jump // 2
    
    if x * x == n:
        return x
    return None
```

## b)
Denote the two distinct public exponents by $e_1, e_2 \in \mathbb{Z}_\phi^*$ and the common message to be $m \in \mathbb{Z}_n^*$. Assuming that $e_1, e_2$ are relatively prime, we can use the extended Euclid algorithm to find integers $r_1, r_2$ such that:

$$
r_1e_1 + r_2e_2 = 1
$$

Recall that the ciphertexts of the common message $m$ under the two exponents are $c_1 \equiv m_1^{e_1} \mod n$ and $c_2 \equiv m_2^{e_2} \mod n$. The adversary can raise the two ciphertexts to $r_1$ and $r_2$ exponents respectively, then multiply them together:

$$
\begin{aligned}
c_1^{r_1} \cdot c_2^{r_2} &\equiv (m^{e_1})^{r_1} \cdot (m^{e_2})^{r_2} \mod n \\
&\equiv m^{r_1e_1 + r_2e_2} \mod n \\
&\equiv m \mod n
\end{aligned}
$$

Thus the adversary has obtained the message.

P.S: $e_1, e_2$ being relatively prime is probably a necessary condition for this problem to work.

I am not sure if an attacker can recover $m$ if the two public exponents are not relatively prime. Here is a tentative proof that if there exists such an adversary who can recover $m$ from $m^{e_1}$ and $m^{e_2}$ for some arbitrary pair of distinct $e_1, e_2$, then we can build a second adversary who can break arbitrary RSA encryption.

Suppose that the RSA adversary is given $m^e \mod n$ and $e$, it can generate two distinct random numbers $u_1 \neq u_2$ and compute $(m^e)^{u_1} = m^{eu_1}$ and $(m^e)^{u_2} = m^{eu_2}$. The RSA adversary can then give $m^{eu_1}, m^{eu_2}$ to the first adversary, and the first adversary will have a non-negligible advantage at recovering $m$. From here the RSA adversary gains non-negligible advantage at recovering $m$.

This means that if $e_1, e_2$ are indeed arbitrary distinct numbers and are not necessarily relatively prime, then recovering $m$ from $m^{e_1}$ and $m^{e_2}$ is probably at least hard as solving the RSA problem.

## c)
If the adversary knows that Alice and Alicia are using related modulus $N_1 = pq$ and $N_2 = pr$, then the adversary can use the Euclid algorithm to find the GCD between $N_1, N_2$, which will be the common prime factor $p$. The adversary can then use $p$ to find the other two factors $q, r$, thus breaking RSA.

<p style="page-break-after:always;"></p>

# Problem 3

## a)
Let $\mathcal{A}_\text{DDH}$ denote the DDH adversary and $\mathcal{A}_\text{CDH}$ denote the CDH adversary. To prove that DDH is at least as hard as CDH, we assume the CDH adversary to have a non-negligible advantage $\epsilon$, and show that there exists a DDH adversary who also has non-negligible advantage. The DDH adversary will work as follows:

1. $\mathcal{A}_\text{DDH}$ receives $g^a, g^b, z$, where $z$ has a 50% chance being $g^{ab}$ and 50% chance being a random element from $G$.
2. $\mathcal{A}_\text{DDH}$ gives $g^a, g^b$ to $\mathcal{A}_\text{CDH}$. Because $g, G, g^a, g^b$ are all generated using identical distribution between the DDH game and the CDH game, $\mathcal{A}_\text{CDH}$ retains its advantage at computing $g^{ab}$. Denote the output of $\mathcal{A}_\text{CDH}$ by $h$, then there is a non-negligible chance that $h = g^{ab}$
3. $\mathcal{A}_\text{DDH}$ checks whether $h$ is equal to $z$. If they are equal, $\mathcal{A}_\text{DDH}$ claims $z$ to be $g^{ab}$; otherwise $\mathcal{A}_\text{DDH}$ claims $z$ to be a truly random element from $G$.

Where $z = g^{ab}$, $\mathcal{A}_\text{DDH}$ wins if and only if $\mathcal{A}_\text{CDH}$'s output is exactly $g^{ab}$, so $\mathcal{A}_\text{DDH}$'s advantage is exactly $\mathcal{A}_\text{CDH}$'s advantage.

Where $z = g^c$, $\mathcal{A}_\text{DDH}$ wins if and only if $\mathcal{A}_\text{DDH}$'s output is not equal to $g^c$. Since $g^c$ is a truly random element, the chance that it equals the output of $\mathcal{A}_\text{CDH}$ is negligible, so the probability that $\mathcal{A}_\text{DDH}$ wins is actually overwhelming.

Averaging over the two equally likely situation, we can conclude that if $\mathcal{A}_\text{CDH}$ has non-negligible advantage, then $\mathcal{A}_\text{DDH}$ also has non-negligible advantage.

## b)
Let $\mathcal{A}_\text{CDH}$ denote the CDH adversary and $\mathcal{A}_\text{DLog}$ denote the discrete logarithm adversary. To prove that CDH is at least as hard as DLog, we show that we can build a CDH adversary using a DLog adversary and retain a non-negligible amount of the DLog's adversary.

1. $\mathcal{A}_\text{CDH}$ receives $g^a, g^b$
2. $\mathcal{A}_\text{CDH}$ gives $g^a$ to $\mathcal{A}_\text{DLog}$. Denote the output of $\mathcal{A}_\text{DLog}$ by $a^*$
3. $\mathcal{A}_\text{CDH}$ outputs $(g^b)^{a^*}$

Assuming $G$ to be a generic group, the probability that $(g^b)^{a^*} = g^{ab}$ is exactly the probability that $a^* = a$, which means that $\mathcal{A}_\text{CDH}$ has exactly the same advantage as $\mathcal{A}_\text{DLog}$, so if there exists a $\mathcal{A}_\text{DLog}$ adversary with non-negligible advantage, then there exists a $\mathcal{A}_\text{CDH}$ with non-negligible advantage.

## c)
We can show that for $G = \mathbb{Z}_p^*$ where $p$ is an odd prime and $g \in G$ where $g$ is a primitive root of $\mathbb{Z}_p$ (based on Gauss's work we know htat if $p$ is prime then primitive root always exists), DDH can be broken with non-neligible advantage.

First, observe that an arbitrary element $x \in G$ is a square if and only if $x = g^e$ for some even exponent $e$. In the forward direction, if $x$ is a square, then there exists some $y = g^d$ such that $x = y^2 = (x^d)^2 = g^{2d}$, which means that $e = 2d$ is an even number. In the backward direction, if $e = 2d$ is an even number then, $y = g^d$ is a square root of $x$.

Given that $a, b, c$ are all uniformly distributed, each has a 50% chance of being even. Because $ab$ is odd if and only both $a, b$ are odd, $ab$ has a 75% chance of being even. This means that $g^{ab}$ has a 75% chance of being a square, while $g^c$ has a 50% chance of being a square.

We can thus build a $\text{DDH}(g^a, g^b, z)$ adversary who checks whether $z$ is a square, then claims that $z = g^{ab}$ if $z$ is a square and $z = g^c$ otherwise. 

Where $z = g^{ab}$, the adversary wins iff $g^{ab}$ is a square, so the probability that the adversary wins is 75%. Where $z = g^c$, the adversary wins iff $g^c$ is not a square, so the probability that the adversary wins is 50%. Averaging over the two equally likely situation, we know the adversary wins with a probability of 62.5%, so the advantage is 12.5%, which is definitely not negligible.

Note that this attacks relies on the fact that $g$ is not a square, which is true when $g$ is a primitive root in $\mathbb{Z}_p^*$ for some prime $p$. It is possible to choose the group such that $g$ itself is a square, and this attack can be thwarted.

<p style="page-break-after:always;"></p>

# Problem 4

## a)
When discussing IND-CPA and IND-CCA games in a public-key cryptosystem, an encryption oracle is not necessary because the adversary is assumed to have access to the public key and can thus perform the encryption procedure by itself.

## b)
The **IND-CPA game** of ElGamal is as follows:

1. **PGen**: $(p, G, g) \leftarrow \text{PGen}(1^\lambda)$
1. **KeyGen**: $(x, g^x) \leftarrow \text{KeyGen}(1^\lambda, (p, G, g))$
1. **Adversary generates chosen message**: $(m_0, m_1) \leftarrow \mathcal{A}_\text{IND-CPA}(1^\lambda ,(p, G, g), g^x)$
1. **Challenge ciphertext**:
    1. Sample a bit $b \leftarrow \{0, 1\}$
    2. Sample a random exponent $r \leftarrow \mathbb{Z}_p$
    3. Encrypt the chosen plaintext: $c^* \leftarrow (g^r, m_b \cdot (g^x)^r)$
1. $b^* \leftarrow \mathcal{A}_\text{IND-CPA}(1^\lambda, (p, G, g), g^x, (m_0, m_1), c^*)$

The IND-CPA adversary wins the game if and only if $b^* = b$


Notice that there is no encryption oracle in the IND-CPA game because the adversary is always given the public key, which it can use to encrypt arbitrary message that it chooses.

The **IND-CCA2 game** of ElGamal is as follows:

1. **PGen**: $(p, G, g) \leftarrow \text{PGen}(1^\lambda)$
1. **KeyGen**: $(x, g^x) \leftarrow \text{KeyGen}(1^\lambda, (p, G, g))$
1. **The adversary makes decryption queries**: for $1 \leq i \leq n$, $c_i \leftarrow \mathcal{A}(1^\lambda, (p, G, g), g^x, \{(c_j, m_j)\}_{j<i})$, where the adversary generates a query ciphertext $c_i$ and obtains from the decryption oracle the correct decryption $m_i$
1. **Adversary generates chosen messages**: $(m_0^*, m_1^*) \leftarrow \mathcal{A}(1^\lambda, (p, G, g), g^x, \{(c_i, m_i)\}_{1\leq i \leq n})$
1. **Challenger computes the challenge ciphertext**:
    1. Sample a bit $b \leftarrow \{0, 1\}$
    2. Sample a random exponent $r \leftarrow \mathbb{Z}_p$
    3. Encrypt the chosen plaintext: $c^* \leftarrow (g^r, m^*_b \cdot (g^x)^r)$
1. The adversary can continue to make decryption query, but the decryption oracle will deny decryption if the query is the challenge ciphertext
1. The adversary outputs the guess $b^*$ and wins the game if and only if $b^* = b$.

The only oracle in the IND-CCA2 game is the decryption oracle. Given a ciphertext, the decryption oracle will output the corresponding decrypted plaintext. However, after the challenge ciphertext has been published, the decryption oracle will refuse to decrypt any query that is the challenge ciphertext so the adversary cannot just trivially win.

## c)
Let $\mathcal{A}_\text{DDH}$ denote the DDH adversary and $\mathcal{A}_\text{IND-CPA}$ denote the IND-CPA adversary for ElGamal.

1. When $\mathcal{A}_\text{DDH}$ receives $p, G, g, g^a, g^b, h \leftarrow \{g^{ab}, g^c\}$ from the DDH challenger, it gives $p, G, g$ and $g^a$ to $\mathcal{A}_\text{IND-CPA}$ and receives the chosen messages $m_0, m_1 \leftarrow \mathcal{A}_\text{IND-CPA}(1^\lambda, (p, G, g), g^a)$.
2. $\mathcal{A}_\text{DDH}$ samples a random bit $i \leftarrow \{0,1\}$ and gives $c^* = (g^b, m_i \cdot h)$ to $\mathcal{A}_\text{IND-CPA}$
3. The IND-CPA adversary outputs a guess $i^* \leftarrow \mathcal{A}_\text{IND-CPA}(1^\lambda, (p, G, g), g^a, (m_0, m_1, c^*))$
4. $\mathcal{A}_\text{DDH}$ claims $h = g^{ab}$ if $i^* = i$, else it claims $h = g^c$

Where $h = g^{ab}$, $c^* = (g^b, m \cdot g^{ab})$ is a valid encryption of $m$ under public key $g^a$, so the IND-CPA adversary retains its advantage. Where $h = g^c$, the second half of $c^*$ is $m \cdot g^c$, which is a uniformly random element in $G$, so the probability distribution of $c^*$ is independent of choice of $m$, which means that the IND-CPA has no advantage.

Since there is a 50% chance that $h = g^c$, there is a 50% chance the IND-CPA adversary will retain its advantage. Therefore, the advantage of the DDH adversary is half of the advantage of IND-CPA adversary. Therefore, if IND-CPA adversary has non-negligible advantage, then the DDH adversary has non-negligible advantage.


## d)
The ElGamal encryption scheme is not CCA-secure because it exhibits some homomorphic properties. Suppose we have $c = (c_1, c_2) = (g^r, m \cdot (g^x)^r)$ as an encryption of some message $m \in \mathbb{Z}_p$, then $c^\prime = (c_1, 2 \cdot c_2) = (g^r, (2 \cdot m) \cdot (g^x)^r)$ is a valid encryption of the message $2 \cdot m$.

This means that we can build an IND-CCA2 adversary: after receiving the challenge ciphertext $c^* = (c^*_1, c^*_2)$, the adversary can query the "double ciphertext" $(c^*_1, 2 \cdot c^*_2)$ and obtain $2 \cdot m_b$ as the output because the "double ciphertext" is distinct from the challenge ciphertext. From here it is trivial to recover the challenge message $m_b$, and the IND-CCA2 adversary will be able to distinguish the ciphertext.

<p style="page-break-after:always;"></p>

# Problem 5
Here is my implementation of an elliptic curve. The group operation of points on an elliptic curve is described on Wikipedia: https://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication.

```python
from __future__ import annotations

class EllipticCurve:
    """(finite field) elliptic curve y^2 = x^3 + ax + b
    """
    def __init__(self, p, a, b):
        self.p = p
        self.a = a
        self.b = b
    
    def validate(self, x: int, y: int):
        """Return True iff and point is on the curve"""
        lhs = (y * y) % self.p
        rhs = ((x ** 3) + self.a * x + self.b) % self.p
        return lhs == rhs

    def solve_y(self, x: int):
        """Given a value of x, solve for y with brute-force. Always returns
        a pair of solution if it exists, otherwise return None
        """
        x = x % self.p
        for y in range(self.p):
            remainder = (y ** 2 - x ** 3 - self.a * x - self.b) % self.p
            if remainder == 0:
                return (y, self.p - y)
        return None
    
class Point:
    """A single point on the elliptic curve identified by the coordinate. The
    identity is identified by having no identity
    """
    def _linear_slope(self, other: Point):
        """assume that self and other are two points such that non of them is
        identity, they are not identical, and they are not negations of each
        other, then the slop of the line can be linearly computed
        """
        slope = pow(other.x - self.x, -1, self.curve.p)
        slope = (other.y - self.y) * slope
        slope = slope % self.curve.p
        return slope

    def _tagent_slope(self):
        """The slope of the tagent line cutting through this point"""
        slope = pow(2 * self.y, -1, self.curve.p)
        slope = (3 * self.x * self.x + self.curve.a) * slope
        return slope % self.curve.p

    @staticmethod
    def identity(curve: EllipticCurve):
        return Point(None, None, curve)
    
    def is_identity(self):
        return self.x is None
    
    def negation(self):
        if self.is_identity():
            return Point.identity(self.curve)
        return Point(self.x, -self.y + self.curve.p, self.curve)

    def __init__(self, x, y, curve: EllipticCurve):
        if x is not None and not curve.validate(x, y):
            raise ValueError("Point is not on the curve")
        self.x = x
        self.y = y
        self.curve = curve
    
    def __eq__(self, other):
        """Two points are equal if they are both identity or if they are
        coordinate-wise equal
        """
        if not isinstance(other, Point):
            raise TypeError("Equality with non-point not defined")
        return (
            (self.is_identity() and other.is_identity()) 
            or (self.x == other.x and self.y == other.y)
        )
    
    def __add__(self, other):
        if not isinstance(other, Point):
            raise TypeError("Addition with non-point not defined")
        
        # If one of the operand is identity then return the other
        if self.is_identity():
            return other
        if other.is_identity():
            return self
        
        # P + (-P) = 0
        if self == other.negation():
            return Point.identity(self.curve)
        
        # With P + Q where P, Q are not identity and not x-axis mirrors, there
        # is a common formula, although there is a slope term that is evaluated
        # differently depending on P == Q or P != Q
        slope = (
            self._tagent_slope() 
            if self == other else self._linear_slope(other)
        )
        x = ((slope * slope) - other.x - self.x) % self.curve.p
        y = (-slope * x - self.y + slope * self.x) % self.curve.p
        return Point(x, y, self.curve)
    
    def __mul__(self, other):
        if not isinstance(other, int):
            raise TypeError("Non-scaler multiplication not defined")
        prod = Point(self.x, self.y, self.curve)
        for _ in range(other-1):
            prod += Point(self.x, self.y, self.curve)
        return prod
        

    def __repr__(self):
        if self.is_identity():
            return "<Point 0>"
        return f"<Point ({self.x}, {self.y})>"
```

## a)
Here is the PRG:

```python
# a)
class ECPRG:
    """The toy pseudorandom generator"""
    def __init__(self, curve: EllipticCurve, p1: Point, p2: Point, start: int):
        self.curve = curve
        self.p1 = p1
        self.p2 = p2
        self.start = start
    
    def generate(self, limit: int | None = None):
        s_prev = self.start
        count = 1
        while limit is None or (count <= limit):
            r = (self.p1 * s_prev).x
            s_prev = (self.p1 * r).x
            t = (self.p2 * r).x
            yield t
            count += 1

curve = EllipticCurve(19, 2, 3)
p1 = Point(1, 14, curve)
p2 = Point(3, 13, curve)

prg = ECPRG(curve, p1, p2, 2)
for t in prg.generate(3):
    print(t)
```

The outputs are 15, 18, 3.

## b)
First we give the algorithm, then we will explain why it works.

```
1. Solve the elliptic curve equation with x = t_1. Among the two solutions, pick one of them.
2. Compute c * X, where X is the chosen solution from part 1
3. s_1 is the x-coordinate of c * X
```

Given $t_1 = x(r_1Q)$, we know the x-coodinate of the point $r_1Q$. From here we can try to solve for $y$ on the elliptic curve. By the fundamental theorem of algebra we know there can be up to two distinct solutions, and knowing that the only $y$ term is a quadratic term, we know that if there are two solutions, then they are mirrors of each other across x-axis. Denote the two solutions by $X_1, X_2$, then we know one of them is exactly $r_1Q$ and $X_1 = -X_2$.

We can substitute $P = cQ$ into the equation for $s_1$:

$$
\begin{aligned}
s_1 &= x(r_1P) \\
&= x(r_1 \cdot (cQ))
\end{aligned}
$$

Because scalar multiplication is defined as repeated group operation within $E(\mathbb{F}_p)$, we know $r_1 \cdot (cQ) = c \cdot (r_1Q)$, so we have

$$
s_1 = x(c \cdot (r_1Q))
$$

We don't know the exact value of $r_1Q$, but we know two candidate values that are inverses of each other. Furthermore, we know that inverses are preserved across scaler multiplication: $-c \cdot X = c \cdot (-X)$. Therefore, it is easy to deduce that regardless of which of $X_1, X_2$ we plug into the equation above, the output value will be the same and be the correct value: $s_1 = x(c \cdot X_1) = x(c \cdot X_2)$. Thus we have predicted the value of $s_1$.

## c)
Here is the attack:

```python
curve = EllipticCurve(103, 3, 4)
q = Point(2, 11, curve)
p = Point(84, 68, curve)
c = 3
assert p == q * c

t1 = 42
sols = curve.solve_y(t1)
r1q = Point(t1, sols[1], curve)
r1p = r1q * c
s1 = r1p.x; print(s1)  # Recovered s1
r2 = (p * s1).x
t2 = (q * r2).x; print(t2)
```

I recovered $s_1 = 102$, and I predict $t_2 = 37$.

## d)
I propose to pick the points $P, Q$ such that linear relationship $P = cQ$ is impossible.

It is possible to choose a curve such that the group of points on the curve is not a cyclic group, which means that it's possible to have proper non-trivial subgroup (for example, the group of points on the [curve25519](https://en.wikipedia.org/wiki/Curve25519) has non-trivial prime-order subgroup). Denote $G \subsetneq E(\mathbb{F}_p)$ to be the non-trivial proper subgroup, then we can pick a point $Q$ from within $G$ and a point $P$ from outside $G$, and we will have $P, Q$ that are guaranteed to not have linear relationship $P = cQ$ because $cQ$ is guaranteed to be in the subgroup $G$ while $P$ is not.