Primecount

This program implements the algorithms of "Computing π(x): the combinatorial method" by Tomás Oliveira e Silva (2006). It is also a learning project for numerical computing in C++ and a sandbox to test out optimizations.

Tuning and Benchmarks

Alpha is a tuning parameter that trades fewer segmented sieve blocks to iterate over for more computation per block. From the paper, it must be <= x^(1/6) and should grow like O(log^3 x). I found log10(x)^3 / 150 generated reasonable alpha values, and there is a decent amount of leeway.

The following benchmarks used a block size of 2^20 and default alpha value. Compiled with g++ -O3 and run on my laptop (Ubuntu 22.04, GCC 11.4.0, i7-7700HQ). The CPU has 6 MiB of L3 cache, and experimentally a block size of 2^20, taking up 2^20 x (1/2 sieve size storing odd values) x 4 byte ints = 2 MiB, works well.

x	Time	π(x)
10^12	0.3s	37607912018
10^13	1.0s	346065536839
10^14	4.0s	3204941750802
10^15	18s	29844570422669
10^16	1m16s	279238341033925
10^17	5m50s	2623557157654233
10^18	30m38s	24739954287740860
10^19	186m	234057667276344607

Still a better use of electricity than crypto

Correctness

The powers of 10 in the table match Table IV of the paper. Previously I had only tested smaller values, and starting from x = 10^15 had an overflow bug due to cubing some values. Running with GCC's -ftrapv can detect signed overflow. There was also an issue with crashing where Algorithm 2 sometimes needs $p_{a + 1}$ ; this was mentioned in the paper but I missed it. I tried to use integer math mostly, but the most likely errors will be from imprecise floating point calculations like cbrt(x), where the input double can't even represent every integer past 2^53. I had some exact integer checks for these with __int128 but didn't look through it carefully.

Basics of the Meissel-Lehmer method

Legendre (1808) was the first to notice $π (x)$ does not require explicitly determining all primes up to $x$ , making use of inclusion-exclusion. The computation uses the Meissel (1870) method, improved by Lehmer (1959), Lagarias-Miller-Odlyzko (1985), and Deléglise-Rivat (1996). It is combinatorial in nature and requires only elementary number theory, with some basics about prime factorization and the Möbius function's role in inclusion-exclusion.

Lagarias and Odlyzko (1987) described an analytic algorithm with better time complexity O(x^(1/2+ε)), but it goes way over my head and in practice has not been faster than the combinatorial method yet. The implementation also uses interval arithmetic. Don't expect that project from me any time soon.

Let $ϕ (x, a)$ count the positive integers up to $x$ (inclusive) that are not divisible by any of the first $a$ primes, i.e. no small prime factors. Also let $ϕ_{k} (x, a)$ count those integers with exactly $k$ prime factors (including repetitions).

By the fundamental theorem of arithmetic,

$ϕ (x, a) = ϕ_{0} (x, a) + ϕ_{1} (x, a) + ϕ_{2} (x, a) + \dots$

Since the prime factors have to be greater than $p_{a}$ , we have $ϕ_{k} (x, a) = 0$ when $a \geq π (\sqrt[k]{x})$ . We also have $ϕ_{0} (x, a) = 1$ (the only number with $0$ prime factors is $1$ ), and for $a \leq π (x)$ we have $ϕ_{1} (x, a) = π (x) - a$ (the primes $> p_{a}$ ).

The Lagarias et al. methods use $a = π (α \sqrt[3]{x})$ , with $α \leq x^{1 / 6}$ , thus $a \leq π (\sqrt{x})$ . Therefore

$ϕ (x, a) = ϕ_{0} (x, a) + ϕ_{1} (x, a) + ϕ_{2} (x, a)$

rearranging

$π (x) = ϕ (x, a) + a - 1 - ϕ_{2} (x, a)$

The computation of $ϕ_{2} (x, a)$ is relatively easy to do combinatorially.

The main computation $ϕ (x, a)$ is based on the recurrence which forms a binary tree:

$\begin{aligned} ϕ (x, a) & = \sum_{n = 1}^{⌊ x ⌋} ([p_{min} (n) \geq p_{a}] - [p_{min} (n) = p_{a}]) \\ = ϕ (x, a - 1) - \sum_{n = 1}^{⌊ x / p_{a} ⌋} [p_{min} (n) \geq p_{a}] \\ = ϕ (x, a - 1) - ϕ (x / p_{a}, a - 1) \end{aligned}$

Lehmer's original algorithm stops recursing at $ϕ (x, c)$ for a pre-computed table with fixed $c$ . Lagarias et al. show that examines too many leaves, so the rest of the improvements are deciding when to split the node (recurse) and when not to (leaf), along with lots of fancy counting of the contributions of the leaves using a sieve.

The algorithm as a whole takes $O (x^{2 / 3})$ time, and the sieve is segmented to only use $O (α \sqrt[3]{x})$ memory. Without segmenting, it is more straightforward to sieve the entire interval $[1, z)$ where $z = x^{2 / 3} / α$ . Over each interval $B_{k} = [z_{k - 1}, z_{k})$ , the contribution of $ϕ (y, b)$ for $y \in B_{k}$ and $c \leq b < a$ is computed. A Fenwick tree is used to be able to compute this as a prefix sum (plus saved $ϕ (z_{k - 1} - 1, b)$ from previous blocks) efficiently while also allowing fast updates to an indicator array from sieving out $p_{b}$ .

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
alg1.mermaid		alg1.mermaid
alg2.mermaid		alg2.mermaid
alg3.mermaid		alg3.mermaid
bench12.csv		bench12.csv
bench13.csv		bench13.csv
bench14.csv		bench14.csv
notes.txt		notes.txt
phi.Rmd		phi.Rmd
phi.csv		phi.csv
phi.nb.html		phi.nb.html
primecount.cpp		primecount.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Primecount

Tuning and Benchmarks

Correctness

Basics of the Meissel-Lehmer method

About

Releases

Packages

Languages

jxu/primecount

Folders and files

Latest commit

History

Repository files navigation

Primecount

Tuning and Benchmarks

Correctness

Basics of the Meissel-Lehmer method

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages