Conversation
Parametrized tests over all s22_* fixtures verify: - znh5md write -> asebytes read (catches H5MD path mapping bug with per-atom calc results like energies, stresses, magmoms) - asebytes write -> znh5md read - Full bidirectional round-trip - *.h5 extension with H5MD content (catches registry routing bug) 9 tests fail RED, exposing two bugs: 1. H5MDStore path mapping loses actual H5MD paths on round-trip 2. Registry sends *.h5 H5MD files to RaggedColumnarBackend Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add _is_h5md_file() helper that checks for "h5md" group in HDF5 files - Insert sniffing logic in resolve_backend to redirect *.h5 files containing H5MD metadata to the H5MD backend Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…roperties Properties like energies, stresses, magmoms placed by znh5md in /particles/ were lost on read because _column_to_h5() statically mapped them to /observables/. Now _walk_elements() caches the actual H5 path during discovery, and _get_ds() checks the cache first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In _discover(), columns from list_arrays() are known to exist, so the has_array() check was redundant and broke per-atom detection for columns whose _column_to_h5() mapping disagreed with the actual H5 path (e.g. per-atom energies placed in /particles/ by znh5md). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughDetect H5MD metadata inside .h5 files to select H5MD-specific backends, add a path cache mapping column names to HDF5 paths, change per-atom detection in the H5MD backend discovery to use HDF5 path/ndim, and add bidirectional znh5md ↔ asebytes round-trip tests. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Registry as resolve_backend()
participant Detector as _is_h5md_file()
participant h5py
participant Backend as BackendCandidates
User->>Registry: resolve(path, layer, patterns)
activate Registry
note over Registry: If no URI scheme and layer == "object"\nand pattern candidates include *.h5
Registry->>Detector: _is_h5md_file(path)
activate Detector
Detector->>h5py: open(path)
alt HDF5 with h5md key
h5py-->>Detector: returns file with 'h5md'
Detector-->>Registry: True
else Not HDF5 or no h5md / error
h5py-->>Detector: exception or no key
Detector-->>Registry: False
end
deactivate Detector
alt H5MD detected
Registry->>Backend: select *.h5md candidates
Backend-->>Registry: H5MD-specific candidates
else keep original
Backend-->>Registry: original candidates
end
Registry-->>User: resolved backend
deactivate Registry
sequenceDiagram
participant Discovery as _walk_elements()
participant PathCache as path_cache
participant Store as _get_ds()
participant Resolver as _column_to_h5()
Discovery->>Discovery: traverse HDF5 groups
loop for each discovered dataset group
Discovery->>Store: compute h5_path
Store->>PathCache: cache mapping (column ← h5_path)
end
Store->>PathCache: lookup(column)
alt cache hit
PathCache-->>Store: return cached h5_path
else cache miss
PathCache-->>Store: miss
Store->>Resolver: _column_to_h5(column)
Resolver-->>Store: h5_path
end
Store-->>Caller: dataset at h5_path
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/asebytes/h5md/_store.py`:
- Line 111: The path cache (_path_cache) is only used by _get_ds(), causing
has_array() and other probes using _column_to_h5() to miss previously discovered
foreign paths; centralize path resolution into a single helper (e.g.,
_resolve_column_path or update _column_to_h5) that looks up and populates
self._path_cache for a given column name and always returns the actual HDF5
path, then change _get_ds(), has_array(), and any other callers (references
around _get_ds, has_array, and the blocks at lines ~218-225 and ~452-456) to
call this resolved-path helper instead of directly using the old _column_to_h5()
logic so existence checks and reads use the same cached path. Ensure the helper
is thread-safe relative to _path_cache mutation if applicable.
In `@tests/contract/test_znh5md_roundtrip.py`:
- Around line 75-83: The test currently slices result to n before comparing,
which masks leaks of padded atoms; change the checks to first assert len(result)
== n (e.g., using assert len(result) == n or
np.testing.assert_equal(len(result), n)) and then compare the full arrays (call
np.testing.assert_allclose(result.positions, expected.positions, ...) and
np.testing.assert_array_equal(result.numbers, expected.numbers, ...)) instead of
result[:n]; make the same update for the analogous block later in the file where
positions/numbers are compared.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 28b39aca-79f9-476b-8536-f9948807373d
📒 Files selected for processing (4)
src/asebytes/_registry.pysrc/asebytes/h5md/_backend.pysrc/asebytes/h5md/_store.pytests/contract/test_znh5md_roundtrip.py
- Use H5 path (/particles/ vs /observables/) for per-atom detection instead of shape heuristic that misclassified frame-level vectors - Centralize path resolution with _resolve_h5_path helper used by both _get_ds and has_array to fix foreign column lookups - Remove test slicing ([:n]) that masked potential padding leaks and add explicit atom count assertions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace weak positions/numbers-only assertions in TestAsebytesToZnH5MD and TestBidirectionalRoundtrip with assert_atoms_equal to catch regressions in cell, pbc, info, arrays, and calc properties. Merge TestH5ExtensionWithH5MDContent into TestZnH5MDToAsebytes as a parametrized extension (.h5md/.h5) to remove duplication. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary by CodeRabbit
New Features
Performance
Tests