Skip to content

Decouple GPU alias test from real CUDA runtime (#2515)#2518

Merged
brendancol merged 3 commits into
mainfrom
issue-2515
May 27, 2026
Merged

Decouple GPU alias test from real CUDA runtime (#2515)#2518
brendancol merged 3 commits into
mainfrom
issue-2515

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2515.

Summary

  • test_gpu_alias_accepts_old_values_without_validation_error_1560 was failing on hosts where CuPy imports but the CUDA runtime can't run. The test sends gpu='strict' through read_geotiff_gpu to confirm the deprecated alias passes value validation. On a broken-CUDA host the call hits _preflight_cuda_runtime (the documented behaviour from issue geotiff: preflight CUDA in read_geotiff_gpu so broken-driver errors are clean #1903) and the resulting RuntimeError is not in the test's accepted-exception tuple.
  • Fix: reuse the existing _install_cupy_stub_1903 helper to stub cupy with a preflight that reports one device. The alias call then reaches the file-read stage on every host and fails with FileNotFoundError, which is what the alias check needs to assert.
  • The preflight RuntimeError itself is unchanged; it's the correct contract for geotiff: preflight CUDA in read_geotiff_gpu so broken-driver errors are clean #1903.

Backend coverage

Test-only change. No public API surface, no backend dispatch, no GPU code paths touched. Alias-validation coverage now runs uniformly on CPU-only, broken-CUDA, and GPU hosts.

Test plan

  • pytest xrspatial/geotiff/tests/gpu/test_kernels_and_kwargs.py::test_gpu_alias_accepts_old_values_without_validation_error_1560 passes on a CuPy-installed-but-CUDA-unusable host.
  • pytest -q xrspatial/geotiff/tests -m 'not slow' is fully green on that same host (5913 passed, 0 failed).
  • CI validates the same fix on CPU-only and (where available) GPU runners.

test_gpu_alias_accepts_old_values_without_validation_error_1560 was
failing on hosts where CuPy imports but the CUDA runtime is unusable.
The test sends gpu='strict' through read_geotiff_gpu to confirm the
deprecated alias passes value validation. On a host with no working
CUDA driver, the call hits _preflight_cuda_runtime and raises the
documented RuntimeError, which is not in the test's accepted-exception
tuple, so the test fails for an environmental reason unrelated to the
alias logic.

Reuse _install_cupy_stub_1903 to stub cupy with a preflight that
reports one device. The alias call then proceeds past preflight on
every host and fails with FileNotFoundError from the missing path,
which is what the alias check is really there to verify.

The test no longer depends on CUDA state, so alias-validation
coverage is uniform across CPU-only, broken-CUDA, and GPU hosts.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 27, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Decouple GPU alias test from real CUDA runtime (#2515)

Blockers

None.

Suggestions

None.

Nits

  • xrspatial/geotiff/tests/gpu/test_kernels_and_kwargs.py:76-81 — the _install_cupy_stub_1903 docstring says "Used on machines without cupy installed; lets us exercise the preflight failure path on CPU-only CI." This PR also uses it on broken-CUDA hosts for the preflight success path, which is a wider role than the docstring describes. A one-line tweak to the helper's docstring would keep its contract aligned with its callers.

What looks good

  • The fix targets the test, not the preflight contract. The PR body says so directly.
  • Reuses _install_cupy_stub_1903 instead of inventing a new pattern.
  • monkeypatch.setitem(sys.modules, ...) is function-scoped and auto-restored, so sibling tests aren't affected.
  • Drops ImportError from the accepted-exception tuple now that the stub guarantees cupy resolves. An unexpected ImportError will now fail loudly rather than be silently tolerated.
  • Test-only. No public API, no GPU code paths, no backend dispatch touched.
  • Verified locally: 5913 passed, 0 failed on the CuPy-installed-but-CUDA-unusable host that reproduced the original bug.

Checklist

  • Algorithm matches reference/paper — N/A (test-only)
  • All implemented backends produce consistent results — N/A (alias coverage now uniform across hosts)
  • NaN handling is correct — N/A
  • Edge cases covered — alias validation passes on CPU-only, broken-CUDA, and GPU hosts
  • Dask chunk boundaries handled correctly — N/A
  • No premature materialization or unnecessary copies — N/A
  • Benchmark exists or is not needed — not needed (test-only)
  • README feature matrix updated — N/A
  • Docstrings present and accurate — yes

Review nit follow-up. The helper was originally described as a
failure-path stub for CPU-only CI. The alias-validation test added
in this PR also uses it for the success path on broken-CUDA hosts,
so the docstring now covers both modes.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Decouple GPU alias test from real CUDA runtime (#2515) — follow-up pass

Blockers

None.

Suggestions

None.

Nits

None. The earlier nit on helper docstring scope was handled in 3164b23; the docstring now covers both the failure-path and success-path callers.

What looks good

  • The docstring update names the alias-validation use case directly (mentions #2515), so anyone reading later can see why the success-path mode exists.
  • Diff is still scoped to one test file. No drift into unrelated code.
  • All 163 tests in the file pass after the docstring tweak.

Checklist

  • Algorithm matches reference/paper — N/A (test-only)
  • All implemented backends produce consistent results — N/A
  • NaN handling is correct — N/A
  • Edge cases covered — alias validation runs uniformly across hosts
  • Dask chunk boundaries — N/A
  • No premature materialization — N/A
  • Benchmark — not needed
  • README feature matrix — N/A
  • Docstrings present and accurate — yes (helper docstring is broader after the follow-up commit)

@brendancol brendancol merged commit 3db625a into main May 27, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GeoTIFF GPU alias test fails on CuPy-installed but unusable-CUDA environments

1 participant