Model.tracker — snapshot-managed run state (stacked on #195)#196
Merged
Conversation
Model.tracker is the authoritative *record* of where a run is — time, step, dt, plus any quantity the user parks on it — and it is automatically captured by Model.snapshot() and reverted by Model.restore(). A loose Python variable (model_time = 0.0 in a script) is not reverted; the same value on the tracker is. That contrast is the whole point. Design (per Louis's intent, 2026-05-19): - Authoritative as a *record*, NOT a dependency. Solvers and DDt are untouched; using the tracker is optional. It sits alongside DDt's own _dt_history (captured independently), it does not subsume it. - User-extensible by plain attribute assignment: model.tracker.foo = ... registers foo as managed state — no dataclass authoring, no special status in solvers. - time/step/dt are ordinary pre-seeded managed entries (0.0/0/None), not privileged fields — consistent with "user-added quantities are first-class". - git-stash semantics: restore replaces the managed map wholesale, so a quantity created after the snapshot is dropped on restore. Implementation: - src/underworld3/checkpoint/tracker.py: ModelTracker (uw_object subclass for instance_number) + TrackerState(SnapshottableState) carrying an open `managed` dict. Attribute routing: underscore names are real attributes, public names are managed entries. __setattr__ respects class-level data descriptors so the `state` property setter is honoured (without this guard restore would silently no-op — caught by the test suite; `state` is therefore a reserved name). .state getter deep-copies for isolation. - Model: PrivateAttr _tracker, instantiated and auto-registered as a state-bearer in __init__; exposed via the `tracker` property. Zero new snapshot plumbing — the existing _state_bearers path picks it up. Tests: tests/test_0009_model_tracker.py (9, tier_a level_1) — defaults, builtins revert, user-quantity reverts, numpy-by-value deep-copy, post-snapshot quantity dropped on restore, the loose-var-vs-tracker contrast, bit-identical state roundtrip, and a realistic stepping-loop continuation. Drive-by: test_symbolic_ddt_snapshot_is_deep_copy assumed state_bearers[0] was the DDt; with a tracker now always registered (WeakSet, unordered) it now finds the DDt state by type. Pre-existing fragility, exposed not introduced. 60 tests pass (24 snapshot + 3 real-solver + 9 tracker + 24 regression); parallel ptest still PASS at np 4 with the tracker auto-registered and snapshot/restored alongside everything else. Stacked on feature/in-memory-checkpoint (depends on its Snapshottable/_state_bearers); PRs to development after #195 lands. Underworld development team with AI support from Claude Code (https://claude.com/claude-code)
User-facing advanced guide covering Model.snapshot() / Model.restore() and Model.tracker, in the Sphinx/MyST form that builds into the readthedocs site. Distinct from the developer state-as-dataclass guide (that one is for people *extending* the mechanism; this is for people *using* it). Contents: the "stash for timesteps" mental model and when to use it (backtrack, adaptive Δt, predictor-corrector, RK staging); the API; what is captured automatically; the loose-variable-vs-tracker trap and how Model.tracker solves it (with the reserved-name and git-stash-semantics caveats); a worked adaptive-Δt CFL backtracking loop; and an explicit guarantees/limitations section (bit-exact discard incl. parallel and through real solvers; in-memory only; fixed rank count; mesh-adapt refused; within-tolerance vs a never-snapshotted solver run). Wired into docs/advanced/index.md prose listing and the hidden toctree. `pixi run -e amr-dev docs-build` succeeds; the page renders to docs/_build/html/advanced/snapshot-restore.html with no page-specific warnings and the toctree link resolves. On feature/model-tracker because the guide documents both the snapshot toolkit (#195) and the tracker, so it can be complete and build against working code; lands with the tracker PR after #195. Underworld development team with AI support from Claude Code (https://claude.com/claude-code)
Two standalone runnable demos (tests/run_*.py convention), companions to tests/test_0007's back-stepping test and the new advanced user guide: - run_snapshot_backstepping_demo.py: CFL-ratio time series. Two overlapping segments in the snap-back zone — the abandoned big step (dashed, CFL spike) and the kept substep trajectory — making "time is multi-valued where you stashed" visible at a glance. - run_snapshot_backstepping_spatial.py: 2x2 spatial panels (initial / after bad step / after restore / after substep recovery). Top-left and bottom-left are visually identical — the snap-back proof. Each writes a PNG to the cwd; the PNGs themselves are regenerable output and are intentionally not committed. Underworld development team with AI support from Claude Code (https://claude.com/claude-code)
3 tasks
lmoresi
added a commit
that referenced
this pull request
May 20, 2026
First slice of the on-disk snapshot format (v1.1). Establishes the file structure and the inspectability bar; no PETSc bulk yet (that is phase 2). Stacked on the in-memory snapshot toolkit (#195) and the model tracker (#196) so it can serialise both later. What lands: - src/underworld3/checkpoint/disk_snapshot.py - DISK_SNAPSHOT_SCHEMA_VERSION = 1 - write_snapshot_skeleton(model, path): writes /metadata attrs + empty stub groups /mesh /variables /swarms /python_state (the structure phases 2+ will fill in). - read_snapshot_metadata(path): reads /metadata back as a plain dict, decodes JSON-encoded list fields for convenience, validates schema version. - inspect_snapshot(path): human-readable summary suitable for print(...) at a notebook prompt. - src/underworld3/checkpoint/__init__.py: exports. - tests/test_0010_snapshot_disk_format.py (7, tier_a level_1): - top-level group structure matches the spec - h5py-readable /metadata attrs cover identity, schema, tracker conventions, geometry, MPI rank count, and inventories of meshes / swarms / state-bearer classes / variables — the proxy for "an external user running h5ls/h5dump sees useful info" - read/write roundtrip - rejection of non-snapshot files and wrong-schema files with clear errors (not obscure h5py noise) - inspect_snapshot includes the key facts - skeleton groups carry `filled_by` attrs so phases 2/3 readers and external inspectors can tell whether content is populated yet. Design notes encoded: - UW3-controlled rich-metadata wrapper around PETSc bulk; pure PETSc HDF5 dumps fail the inspectability bar so are rejected as the format. - List-typed metadata stored as JSON strings in scalar attrs so h5py / h5ls handle them cleanly; read API exposes them as plain Python lists alongside the *_json originals. - Swarm storage left as a phase-3 decision: the metadata wrapper is designed to support `@external_file` on /swarms/swarm_X/ when individual swarms grow too bulky for a single file. No commitment to inline vs split until phase 3 has real swarm sizes in hand. Stacked on feature/model-tracker; PRs to development after #195 and #196 land. Underworld development team with AI support from Claude Code (https://claude.com/claude-code)
lmoresi
added a commit
that referenced
this pull request
May 20, 2026
…t roundtrip Builds on phase 1's metadata wrapper to actually carry mesh + mesh- variable state to disk and read it back. Delegates the heavy lifting to #146's `Mesh.write_checkpoint` / `MeshVariable.read_checkpoint` PETSc-DMPlex primitives — phase 2's job is layout, dispatch, and tying the wrapper to the bulk data via a simple convention. Layout (final v1.1 shape): /path/to/run.snap.h5 wrapper (h5py-inspectable) /path/to/run.snap.bulk/ companion directory (one per snap) {mesh_safe}.mesh.00000.h5 {mesh_safe}.{var_clean}.00000.h5 Wrapper carries /meshes/{mesh_safe}/ with @name, @mesh_file, and /meshes/{mesh_safe}/variables/{var_safe}/ with @name, @components, @degree, @continuous, @external_file. The bulk-dir path is derived from the wrapper path by convention (`.h5` → `.bulk`), so no external_file attr is needed for the standard placement. Move them together; a clear FileNotFoundError fires if bulk is missing on read. Phase 1 layout refactor folded in: - /mesh (singular) → /meshes (plural) — supports multi-mesh natively. - /variables removed from the top level — now nests under each mesh as /meshes/{name}/variables/{var}, matching the in-memory snapshot's mesh→vars structure. New API: - `write_snapshot(model, path)` — writes wrapper + bulk; covers every registered mesh and every allocated meshvar on each mesh. Lazy-allocated vars (_gvec is None) are skipped — same rule as the in-memory path. - `read_snapshot(model, path)` — loads var DOFs back into already- registered meshes by name. Mesh / variable mismatch raises a clear ValueError (mesh-rebuild on read is v1.2 scope). - `write_snapshot_skeleton` / `read_snapshot_metadata` / `inspect_snapshot` stay as phase-1 metadata-only entry points. Branch hygiene: merged origin/development (which now has #146) into this branch so the new code can actually call read_checkpoint. The merge was clean — #146 and the snapshot toolkit only overlap at different methods in `discretisation_mesh.py`, as the earlier analysis predicted. PR target will be development once #195/#196 land; the diff stays clean because the merged dev commits are already there. Tests (12 total, 5 new in phase 2, tier_a level_1): - write produces wrapper + bulk-dir with the expected file pattern - wrapper populated with the per-mesh + per-var metadata that makes inspectability self-sufficient - bit-exact write→scribble→read roundtrip on a 2D mesh with one scalar + one vector variable (np.array_equal, zero tolerance) - missing bulk-dir → clear FileNotFoundError - mismatched mesh on read → clear ValueError (not an obscure h5py trace) Regression: 64 tests pass (24 snapshot + 9 tracker + 12 disk-format + 19 core/regression). Phase 3 next: swarms (with the @external_file freedom kept open for bulky swarms) + /python_state for DDt + ModelTracker via dataclass- to-HDF5-attrs serialisation. Underworld development team with AI support from Claude Code (https://claude.com/claude-code)
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
Model.tracker— a model-dwelling, snapshot-managed record of where a run is (time, step, dt) plus any quantity the user parks on it. Everything on the tracker is automatically captured byModel.snapshot()and reverted byModel.restore(); a loose Python variable is not. Solvers do not depend on it; using it is optional.Also adds the user-facing documentation and the back-stepping demo scripts for the whole snapshot/restore feature.
This PR is based on
feature/in-memory-checkpoint(#195), notdevelopment, so the diff shows only the tracker work (9 files, +976). It cannot merge until #195 merges. After #195 merges, retarget this PR's base todevelopment— the diff will stay clean.What's here
src/underworld3/checkpoint/tracker.py—ModelTracker(attribute-style:model.tracker.foo = ...becomes managed state automatically;time/step/dtpre-seeded as ordinary conventions) +TrackerState(SnapshottableState). Zero new snapshot plumbing — auto-registers via the existing_state_bearerspath from In-memory snapshot toolkit (git stash for timesteps) #195.src/underworld3/model.py—_trackerPrivateAttr, instantiated + registered in__init__, exposed via thetrackerproperty.tests/test_0009_model_tracker.py— 9 tier-A tests: defaults, builtins revert, user-quantity reverts, numpy-by-value deep-copy, git-stash semantics (post-snapshot quantity dropped on restore), the loose-var-vs-tracker contrast, bit-identical state roundtrip, realistic stepping-loop continuation.docs/advanced/snapshot-restore.md(+ index/toctree) — user guide for the whole feature;docs-buildverified.tests/run_snapshot_backstepping_demo.py/_spatial.py— runnable visualisations companion to the tests.tests/test_0007_…one-line robustness fix: find DDt state by type, notstate_bearers[0](a tracker is now always registered; pre-existing fragility exposed, not introduced).Design notes for the reviewer
__setattr__was found (by the test suite) to silently shadow thestateproperty setter, which would makerestore()a no-op. Fixed by having__setattr__respect class-level data descriptors;stateis therefore a documented reserved name on the tracker.Test plan
pixi run -e amr-dev pytest tests/test_0009_model_tracker.py(9)pytest tests/test_0007_snapshot_inmemory.py tests/test_0008_snapshot_realsolver.py(27)cd tests/parallel && mpirun -np 4 python ./ptest_0007_snapshot_inmemory.pypixi run -e amr-dev docs-build—advanced/snapshot-restore.htmlrenders, toctree resolvesUnderworld development team with AI support from Claude Code