Skip to content

Parallel payload caching#4860

Merged
adamnovak merged 8 commits intohublabel-debugfrom
parallel-payload-caching
Mar 26, 2026
Merged

Parallel payload caching#4860
adamnovak merged 8 commits intohublabel-debugfrom
parallel-payload-caching

Conversation

@adamnovak
Copy link
Copy Markdown
Member

This adds @electricEpilith's parallel payload caching to #4857 (which shouldn't merge until after this).

electricEpilith and others added 8 commits March 21, 2026 15:22
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
cache_payloads was single-threaded despite the -t flag; with 164M nodes
on an HPRC graph it hung for hours. Two fixes:

1. Pass `true` to for_each_handle to enable OpenMP parallelism; guard
   the non-thread-safe writes (oversized_zipcodes vector and
   node_id_to_payload map) with named omp critical sections.

2. Call distance_index->preload(true) immediately before cache_payloads
   in build_minimizer_index. find_frequent_kmers runs for ~3300 s before
   this point and evicts the mmap'd index pages, causing a page fault on
   every snarl-tree lookup in fill_in_zipcode_from_pos. Reloading here
   ensures the index is warm when the parallel loop starts.

Also add a depth guard (abort at >10000) in fill_in_zipcode_from_pos to
catch any future infinite loops in the snarl tree traversal.

Also use distance_index.get_snarl_child_count() (O(1) record read)
instead of for_each_child iteration in get_regular/irregular_snarl_code.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@adamnovak
Copy link
Copy Markdown
Member Author

This isn't really going to pass tests unless we can use c++2a for the standard on old compilers (and have it work). One of the toil-vg tests failed with a timeout and I restarted it in hopes that it's somehow not a real problem.

@adamnovak
Copy link
Copy Markdown
Member Author

OK the only problem now is the lack of c++20 support in the oldest compiler; when we really merge this development thread we'll have to bump it or sort that out.

@adamnovak adamnovak merged commit 9b48cfe into hublabel-debug Mar 26, 2026
1 check failed
@electricEpilith electricEpilith deleted the parallel-payload-caching branch March 31, 2026 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants