### Sieve of Eratosthenes

- We previously saw how we can check the primality of a number
- Now imagine we are given a large number of numbers to check (say 100 million numbers). Obviously, we don't want to call `is_prime()` 100 million times. This is where the sieve comes in handy

- The idea behind the sieve is simple. Imagine a set of numbers $\{1, 2, ... 50 \}$
    - Starting from 2 (which is prime), we multiply 2 by all integers up and terminate when $2 * x > 50$. For each value of the multiplication, we mark the number as non-prime.
    - Next, we find that 3 is prime (since it has not yet been marked as non-prime), and do the same thing.
        - Notice that we only need to start marking numbers as non-prime from $3^2$. This is because all multiples of 3 less than $3^2$ (i.e. 3*2) are already marked as non-prime!
    - 4 is non-prime (already marked by 2*2), so we skip.
    - Once we reach a value $x$ where $x^2$ exceeds 50, we can terminate the loop

- Time complexity
    - The time complexity of the Eratosthenes sieve is somewhat tricky to calculate
    - We first highlight that, at number 2, we need to loop over $\frac{N}{2}$ other numbers
    - Then for number 3, we need to loop over $\frac{N}{3}$
    - Then for prime number $i$, we need to loop over $\frac{N}{i}$ numbers
    - As $N$ gets very large, we will loop over $\frac{N}{2} + \frac{N}{3} + \frac{N}{5} + ... = N * [\frac{1}{2} + \frac{1}{3} + ...]$

    - It has been proven (https://en.wikipedia.org/wiki/Divergence_of_the_sum_of_the_reciprocals_of_the_primes) that the sum of prime reciprocals up to some prime $i$ is bounded by $\log \log(n)$

    - Therefore, the limit of this goes to $O(N \cdot \log \log(n))$

In [1]:
import numpy as np

def sieve_of_eratosthenes(largest_number: int) -> np.array:
    '''
    Time complexity: O(N log log(N)), by sum of prime reciprocals
    '''
    is_prime = [True] * (largest_number+1)
    is_prime[0] = False
    is_prime[1] = False

    for num in range(2, largest_number):
        if is_prime[num]:
            ## Note that we start from n**2 instead of num. This is because, for any given prime number, the intermediate multiples involving smaller values must already have been acounted for in the earlier part of the loop. For example, if we want to sieve out multiples of 5, there is no reason to look at 5*2 and 5*3, because it will already have been resolved in earlier loops involving the 2 and 3 primes. 5*4 is also done, because 5*4 = 5*2*2 = 2*10, which was resolved in the 2 loop
            for num_multiple in np.arange(num**2, largest_number+1, num):
                is_prime[num_multiple] = False
                
    return np.where(is_prime)[0]
    
sieve_of_eratosthenes(50)

array([ 2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47])

### Find the N-th prime

- Assume N takes on a maximum of 1 million
- The first 1 million primes are within the first 20million numbers

In [22]:
def find_nth_prime(n: int) -> int:
    lim = 20_000_000
    is_prime = np.ones(lim+1)
    is_prime[0], is_prime[1] = 0, 0
    
    for num in range(2, n):
        if is_prime[num]:
            is_prime[num**2 : lim+1 : num] = 0
    
    # primes = np.where(is_prime)[0]
    primes = np.flatnonzero(is_prime)
    return primes[n-1]

find_nth_prime(300)

1987