Decouple GPU alias test from real CUDA runtime (#2515)#2518
Merged
Conversation
test_gpu_alias_accepts_old_values_without_validation_error_1560 was failing on hosts where CuPy imports but the CUDA runtime is unusable. The test sends gpu='strict' through read_geotiff_gpu to confirm the deprecated alias passes value validation. On a host with no working CUDA driver, the call hits _preflight_cuda_runtime and raises the documented RuntimeError, which is not in the test's accepted-exception tuple, so the test fails for an environmental reason unrelated to the alias logic. Reuse _install_cupy_stub_1903 to stub cupy with a preflight that reports one device. The alias call then proceeds past preflight on every host and fails with FileNotFoundError from the missing path, which is what the alias check is really there to verify. The test no longer depends on CUDA state, so alias-validation coverage is uniform across CPU-only, broken-CUDA, and GPU hosts.
brendancol
commented
May 27, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Decouple GPU alias test from real CUDA runtime (#2515)
Blockers
None.
Suggestions
None.
Nits
-
xrspatial/geotiff/tests/gpu/test_kernels_and_kwargs.py:76-81— the_install_cupy_stub_1903docstring says "Used on machines without cupy installed; lets us exercise the preflight failure path on CPU-only CI." This PR also uses it on broken-CUDA hosts for the preflight success path, which is a wider role than the docstring describes. A one-line tweak to the helper's docstring would keep its contract aligned with its callers.
What looks good
- The fix targets the test, not the preflight contract. The PR body says so directly.
- Reuses
_install_cupy_stub_1903instead of inventing a new pattern. monkeypatch.setitem(sys.modules, ...)is function-scoped and auto-restored, so sibling tests aren't affected.- Drops
ImportErrorfrom the accepted-exception tuple now that the stub guarantees cupy resolves. An unexpectedImportErrorwill now fail loudly rather than be silently tolerated. - Test-only. No public API, no GPU code paths, no backend dispatch touched.
- Verified locally: 5913 passed, 0 failed on the CuPy-installed-but-CUDA-unusable host that reproduced the original bug.
Checklist
- Algorithm matches reference/paper — N/A (test-only)
- All implemented backends produce consistent results — N/A (alias coverage now uniform across hosts)
- NaN handling is correct — N/A
- Edge cases covered — alias validation passes on CPU-only, broken-CUDA, and GPU hosts
- Dask chunk boundaries handled correctly — N/A
- No premature materialization or unnecessary copies — N/A
- Benchmark exists or is not needed — not needed (test-only)
- README feature matrix updated — N/A
- Docstrings present and accurate — yes
Review nit follow-up. The helper was originally described as a failure-path stub for CPU-only CI. The alias-validation test added in this PR also uses it for the success path on broken-CUDA hosts, so the docstring now covers both modes.
brendancol
commented
May 27, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Decouple GPU alias test from real CUDA runtime (#2515) — follow-up pass
Blockers
None.
Suggestions
None.
Nits
None. The earlier nit on helper docstring scope was handled in 3164b23; the docstring now covers both the failure-path and success-path callers.
What looks good
- The docstring update names the alias-validation use case directly (mentions #2515), so anyone reading later can see why the success-path mode exists.
- Diff is still scoped to one test file. No drift into unrelated code.
- All 163 tests in the file pass after the docstring tweak.
Checklist
- Algorithm matches reference/paper — N/A (test-only)
- All implemented backends produce consistent results — N/A
- NaN handling is correct — N/A
- Edge cases covered — alias validation runs uniformly across hosts
- Dask chunk boundaries — N/A
- No premature materialization — N/A
- Benchmark — not needed
- README feature matrix — N/A
- Docstrings present and accurate — yes (helper docstring is broader after the follow-up commit)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2515.
Summary
test_gpu_alias_accepts_old_values_without_validation_error_1560was failing on hosts where CuPy imports but the CUDA runtime can't run. The test sendsgpu='strict'throughread_geotiff_gputo confirm the deprecated alias passes value validation. On a broken-CUDA host the call hits_preflight_cuda_runtime(the documented behaviour from issue geotiff: preflight CUDA in read_geotiff_gpu so broken-driver errors are clean #1903) and the resultingRuntimeErroris not in the test's accepted-exception tuple._install_cupy_stub_1903helper to stub cupy with a preflight that reports one device. The alias call then reaches the file-read stage on every host and fails withFileNotFoundError, which is what the alias check needs to assert.RuntimeErroritself is unchanged; it's the correct contract for geotiff: preflight CUDA in read_geotiff_gpu so broken-driver errors are clean #1903.Backend coverage
Test-only change. No public API surface, no backend dispatch, no GPU code paths touched. Alias-validation coverage now runs uniformly on CPU-only, broken-CUDA, and GPU hosts.
Test plan
pytest xrspatial/geotiff/tests/gpu/test_kernels_and_kwargs.py::test_gpu_alias_accepts_old_values_without_validation_error_1560passes on a CuPy-installed-but-CUDA-unusable host.pytest -q xrspatial/geotiff/tests -m 'not slow'is fully green on that same host (5913 passed, 0 failed).