Fix SWI-Prolog 10 stack exhaustion and improve memory management by hoijnet · Pull Request #2377 · terminusdb/terminusdb

hoijnet · 2026-02-15T08:49:58Z

This branch addresses a class of stack overflow crashes that occur under SWI-Prolog 10 when large transaction_object dicts are carried through recursive Prolog call chains. It also enables correct LRU cache eviction in terminus-store and replaces the system allocator with jemalloc to reduce memory fragmentation in long-running server processes.

Also ensures the integration tests are optimized between each test for well-controlled optimization, instead of auto-optimizer that operates between transactions with 10% probability.

With these changes, there is an approx 15% performance improvement when running a WOQL-heavy set of integration tests and has sustained performance for long runs thanks to auto-optimizer, stack and memory usage improvements, tabling optimization, jemalloc and many other improvements made across the codebase to support swipl 10 and stability fixes.

Background

SWI-Prolog 10 introduced stricter stack segment management. The trie_gen_compiled/2 primitive, used internally by tabled predicates, asserts gTop+1 <= gMax && tTop+2 <= tMax at entry. When recursive predicates carried the full transaction_object — a deeply nested dict containing schema, instance, and inference graphs plus all metadata — each stack frame consumed significantly more space than necessary. This leaves insufficient headroom for trie operations, triggering hard crashes on databases with moderately complex schemas.

The core fix is straightforward: extract the lightweight Schema (a simple list of read-write objects) once at each entry point, then thread it through the recursive chain instead of the full transaction object. Wrapper predicates that extract schema internally — like is_subdocument/2, class_predicate_type/4, oneof_descriptor/3 — are replaced with their direct schema_* equivalents where the schema is already available.

Changes

Prolog stack pressure reduction

inference.pl — The inference chain (infer_type, infer_range, infer_object_type, and related predicates) previously threaded Database through every recursive frame. Refactored to extract Schema at entry points and pass it through the entire chain. Prefix merging moved to entry points to avoid repeated computation.

json.pl — Two recursive chains fixed:

json_assign_ids extracts Schema once, delegates to json_assign_ids_ which carries the lightweight schema through ID generation for nested subdocuments. New json_idgen_schema and get_field_values_ variants take Schema directly.
get_document extracts Schema and Instance once, delegates to get_document_ to avoid carrying the full transaction object through document retrieval.

migration.pl — strip_nonconforming_ids extracts Schema once at the entry point. The recursive strip_nonconforming_value_ uses schema_is_subdocument and schema_key_descriptor directly, eliminating the large transaction object from convlist lambda closures that previously captured it on every frame.

instance.pl — The refute_instance validation chain threads Schema from the entry points (refute_instance/2, refute_instance_schema/2) through refute_subject, refute_subject_1, refute_typed_subject, refute_cardinality, refute_cardinality_new, refute_object_type, and refute_object_type_. Each predicate now uses direct schema_* calls (schema_class_predicate_type, schema_oneof_descriptor, schema_is_abstract, is_schema_foreign, etc.) instead of wrappers that re-extract the schema. Old-arity predicates are kept as thin wrappers where external callers depend on them.

schema.pl — Exports is_schema_foreign/2, schema_class_predicate_type/4, schema_class_subsumed/3, and is_schema_simple_class/2 to support direct schema-based lookups from other modules.

Memory allocator

terminusdb-dylib — Replaces the system malloc with jemalloc via tikv-jemallocator. The default glibc allocator on Linux tends to fragment memory in long-running processes with many small allocations (common in layer cache operations). Jemalloc uses thread-local caches and size-class-based arenas that significantly reduce fragmentation. Configured with background_threads for asynchronous purging and disable_initial_exec_tls for compatibility as a dynamically loaded library. Only enabled on non-MSVC targets.

Test infrastructure

test_utils.pl — Improved spawn_server_1 to collect stderr lines during server startup and retry on a different port when the spawned server fails to start. Previously, server_has_no_output threw past the between/3 retry loop, causing flaky push/pull tests when ports were temporarily unavailable.

…ugh frames. Now lightweight.

…xhaustion

…nusdb into fix-swipl10-crash

hoijnet added 14 commits February 14, 2026 03:00

Clarify docs

72c864d

Migrate to the new schema_class_frame approach

36c2a42

Recursive inference chain carried the full 3-level nested object thro…

d4b4c7e

…ugh frames. Now lightweight.

Also fix the potential issues related to get_document stack segment e…

b96e1a6

…xhaustion

Fix flaky push/pull tests

dd1cf0b

Improve stack handling

d03de43

Enable correct LRU eviction

202bef8

Reduce use of huge stack objects

c8a28d3

Replace malloc with jemalloc to reduce memory fragmentation

6a33fde

Reduce table key size and remove problematic tabling

d8c5389

Enable clean restart of test server

5a2eb0e

Background alloc threads only on Linux

e17830a

Setup jemallocator as default on linux, no bg_threads on macOS

ae25ae9

Fixes #2369 add test optimizations

9108c55

hoijnet linked an issue Feb 15, 2026 that may be closed by this pull request

Redesign the integration tests to allow the auto-optimize.pl plugin to run #2369

Closed

hoijnet added 4 commits February 15, 2026 20:44

Merge branch 'main' into fix-swipl10-crash

53e817b

Update to tus 0.0.17 that is aligned with swipl-10

9754141

Merge branch 'fix-swipl10-crash' of ssh://github.com/terminusdb/termi…

cf1f013

…nusdb into fix-swipl10-crash

Update the snapcraft version to tus 0.0.17

0d60384

hoijnet linked an issue Feb 15, 2026 that may be closed by this pull request

Low prio: parse_upload_metadata test leaves choicepoint on SWI-Prolog 10 #2370

Closed

Fix lint

1e0c3cf

hoijnet requested a review from dfrnt-HansKochstein February 15, 2026 21:15

dfrnt-HansKochstein approved these changes Feb 16, 2026

View reviewed changes

hoijnet merged commit 9d98be3 into main Feb 16, 2026
14 checks passed

hoijnet deleted the fix-swipl10-crash branch February 16, 2026 12:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SWI-Prolog 10 stack exhaustion and improve memory management#2377

Fix SWI-Prolog 10 stack exhaustion and improve memory management#2377
hoijnet merged 19 commits into
mainfrom
fix-swipl10-crash

hoijnet commented Feb 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hoijnet commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Changes

Prolog stack pressure reduction

Memory allocator

Test infrastructure

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hoijnet commented Feb 15, 2026 •

edited

Loading