feat: native OS window screenshot for Tauri and Electron#262
feat: native OS window screenshot for Tauri and Electron#262goosewobbler wants to merge 32 commits into
Conversation
Add POST /wdio/native-screenshot to the embedded WebDriver server. Uses xcap 0.9 to capture the full OS window (title bar + decorations) on macOS and Windows; returns unsupported_operation on Linux. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add nativeScreenshot() to DirectEvalClient, wire it through the provider-gated command, inject onto the browser object, and declare it on TauriServiceAPI. Only works with the embedded driver provider. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add nativeScreenshot() using screencapture (macOS) and PowerShell PrintWindow (Windows). Wired into getElectronAPI() requiring CDP. Type declared on ElectronServiceAPI in native-types. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add screenshotChecks.ts (PNG structural + tesseract.js OCR) and visionAssert.ts (Ollama-compatible vision LLM, merge-to-main only). Specs for both Tauri and Electron assert chrome was captured and that screenshot content matches known fixture text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Release Preview — no release
Updated automatically by ReleaseKit |
… in nativeScreenshot The Electron main process supports ESM, so use await import() rather than require(). Relies on awaitPromise: true in the CDP callFunctionOn call so the async callback result is awaited before returning. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Greptile SummaryAdds
Confidence Score: 5/5Safe to merge; all previously identified blocking issues in the Electron Windows capture path have been fixed and the Tauri xcap path is well-scoped to macOS/Windows. The prior round resolved every critical bug: HWND byte order, GetHdc pairing, path escaping, spawnSync error propagation, and the incomplete visionEnabled guard. What remains is a minor cleanup edge case in the finally block. packages/electron-service/src/commands/nativeScreenshot.ts — the finally/unlinkSync edge case noted above; otherwise all changed files look correct.
|
| Filename | Overview |
|---|---|
| packages/electron-service/src/commands/nativeScreenshot.ts | New command implementing native Electron window screenshot; previous issues with HWND byte order, GetHdc release, path escaping, and spawnSync error handling are addressed; minor cleanup concern in finally block |
| packages/tauri-plugin-webdriver/src/platform/macos.rs | Implements xcap-based native screenshot for macOS, correctly filtering by PID before title, with a fallback to first process-owned window |
| packages/tauri-plugin-webdriver/src/platform/windows.rs | Implements xcap-based native screenshot for Windows with PID-first matching, title-substring fallback for CI, and detailed diagnostic error output when no window is found |
| packages/tauri-service/src/commands/nativeScreenshot.ts | Routes nativeScreenshot to the embedded Tauri WebDriver server; enforces the embedded-provider requirement and caches the DirectEvalClient per browser instance |
| packages/tauri-plugin-webdriver/src/server/handlers/native_screenshot.rs | New Axum handler for POST /wdio/native-screenshot; window-not-found returns 404 with available labels, unsupported platform returns error via WebDriverErrorResponse::into_response |
| e2e/lib/visionAssert.ts | visionEnabled() now requires both the API key and OLLAMA_BASE_URL; regex tightened to /^(YES |
| e2e/lib/screenshotChecks.ts | Full 8-byte PNG magic validation, OCR helpers via tesseract.js, and assertCapturesChrome; OCR worker lazily initialised and correctly terminated |
Sequence Diagram
sequenceDiagram
participant Test as E2E Test
participant Service as electron/tauri Service
participant App as Electron/Tauri App
participant OS as OS (screencapture / xcap)
Test->>Service: browser.electron.nativeScreenshot()
Service->>App: CDP execute: getBounds() + getNativeWindowHandle()
App-->>Service: "{ bounds, nativeHandle, gpuCompositing }"
alt macOS
Service->>OS: spawnSync screencapture -R x,y,w,h out.png
OS-->>Service: PNG file written
else Windows (GPU)
Service->>OS: spawnSync powershell PrintWindow(PW_RENDERFULLCONTENT)
OS-->>Service: PNG file written
else Windows (no GPU)
Service->>OS: spawnSync powershell BitBlt(GetWindowDC)
OS-->>Service: PNG file written (may be blank under WARP)
end
Service->>Service: readFileSync(out) + unlinkSync(out)
Service-->>Test: Buffer (PNG bytes)
Test->>Service: browser.tauri.nativeScreenshot()
Service->>App: "POST /wdio/native-screenshot {window_label}"
App->>OS: xcap::Window::capture_image()
OS-->>App: RgbaImage
App->>App: encode to PNG bytes
App-->>Service: image/png response
Service-->>Test: Buffer (PNG bytes)
Reviews (15): Last reviewed commit: "test(e2e): skip tauri native-screenshot ..." | Re-trigger Greptile
- Check full 8-byte PNG signature instead of first 2 bytes - Use strict YES/NO regex and exact equality in visionAssert - Validate spawnSync exit code and error before reading output - Match windows by PID first, then title, to avoid cross-process capture Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Revert dynamic import() to require() in Electron nativeScreenshot: CDP callFunctionOn evaluates code in Electron's CJS main-process context where ESM dynamic import() is unavailable. Added biome-ignore comments. - Add embedded-only guard to Tauri native screenshot spec: nativeScreenshot() only works with the embedded provider; skip cleanly when running with official or CrabNebula providers. - Remove assertCapturesChrome from Tauri spec: Tauri on macOS uses fullSizeContentView so the native screenshot and webview screenshot share the same pixel dimensions. OCR is the reliable verification layer on Tauri. - Fix Windows PowerShell path escaping: out.replace(/\\/g, '\\\\') inside a PS single-quoted string produced double backslashes. Use forward slashes instead (valid on Windows). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Assert nativePng !== webviewPng in Tauri spec: xcap and WebKit's DevTools screenshot protocol produce different PNG bytes even when capturing the same content, so byte equality would only hold if nativeScreenshot incorrectly re-emits the webview screenshot. - Add 30s timeout to visionAssert API call: without it a slow or unreachable Ollama endpoint would hang the merge-to-main job indefinitely. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
$g.GetHdc() suspends GDI+'s internal state for the Graphics object. Calling $b.Save() while the HDC is still checked out causes GDI+ to throw "A generic error occurred in GDI+". Release the HDC and dispose the Graphics object before saving the PNG. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- electron-service: nativeScreenshot CDP callback now returns only WindowInfo (bounds + hex HWND); all screencapture/PowerShell I/O runs in the WDIO process where ESM imports are available. Fixes "ReferenceError: require is not defined" in all Electron CI jobs. - tauri-plugin-webdriver: when xcap::Window::all() PID filter returns empty (common in Windows CI virtual display sessions), fall back to matching by window title. Fixes "no window found for this process". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…path
toString('hex') emits raw LE bytes and BigInt('0x...') then
re-interprets them as BE, producing a wrong handle value.
Use readBigUInt64LE(0).toString() to decode the LE Buffer to
the actual HWND integer, pass the decimal string directly to
PowerShell's [IntPtr] cast — no further conversion needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- e2e/lib/visionAssert.ts: visionEnabled() now requires OLLAMA_BASE_URL in addition to an API key. Previously a job with only OLLAMA_API_KEY set would pass the guard, hit https://ollama.com/v1 (not an API server), and throw — failing the spec instead of silently skipping Layer 3. - electron-service/nativeScreenshot.ts: add spawnSync timeouts. screencapture gets 10 s; PowerShell gets 30 s to account for Add-Type JIT compilation on first call. - e2e/test/tauri/native-screenshot.spec.ts: Layer 1 now calls assertCapturesChrome(nativeDims, webviewDims) so a blank or webview- only capture that happens to be a valid PNG is caught by the height comparison, not just the byte-equality check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- electron/native-screenshot.spec.ts: OCR was asserting 'e2e test app' but the fixture heading is '🚀 Electron Builder E2E' — Tesseract reliably finds 'electron' and 'builder' from that text. Change assertion to those two tokens. - tauri/native-screenshot.spec.ts: assertCapturesChrome (native height > webview height) fails on macOS because Tauri v2 uses fullSizeContentView by default, making the title bar an overlay rather than adding height. Gate the check to Windows only, where the title bar genuinely adds height above the content area. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
readFileSync after a successful spawnSync could throw (e.g. disk full, race condition), leaving the temp PNG on disk. Wrap in try/finally so unlinkSync runs regardless. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Updated the openai package from version 4.104.0 to 6.34.0 in e2e/package.json. - Refactored OCR worker management in screenshotChecks.ts to use a promise for worker initialization, ensuring proper handling of the worker's lifecycle.
- Updated the OCR assertion in native-screenshot.spec.ts to dynamically match the app name from the APP environment variable, ensuring accurate recognition for different Electron applications. - Cleaned up PowerShell DllImport syntax in nativeScreenshot.ts for consistency and clarity.
…omScreen - Replaced PrintWindow with CopyFromScreen for capturing screenshots on Windows to avoid deadlocks on CI runners. - Improved PowerShell script clarity by updating DllImport syntax and ensuring proper bitmap dimensions are calculated before saving the image.
… compositing disabled - Introduced a conditional argument `--disable-gpu-compositing` for Electron appArgs when running in CI environments to ensure reliable screenshot capturing. - Updated the native screenshot method to return GPU compositing status, allowing for better handling of screenshot capture based on the environment. - Enhanced documentation to address common issues related to black window captures on Windows CI/virtual machines.
…pture PrintWindow(WM_PRINT=0) produces a blank capture for Chromium windows even with --disable-gpu-compositing because Chromium's HWND procedure paints via BeginPaint/EndPaint and ignores the WM_PRINT HDC. Switch to CopyFromScreen (GDI framebuffer read) when the flag is detected; the software compositor BitBlt's rendered frames directly to the GDI screen buffer, making it readable via CopyFromScreen. BringWindowToTop + SetForegroundWindow + 200ms sleep ensure the last frame has flushed before the read. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ow before CopyFromScreen BringWindowToTop + SetForegroundWindow triggered a DWM recomposition cycle on the Hyper-V virtual adapter. CopyFromScreen called during that cycle blocks waiting for DWM (running on WARP/software D3D) to flush the frame — which can take 30+ seconds on CI. Removing those calls leaves the display in a stable state so CopyFromScreen completes immediately. Also switch from GetWindowRect P/Invoke to Electron's win.getBounds() (already in windowInfo.bounds) to eliminate the Add-Type -TypeDefinition compilation step, which avoids antivirus scanning overhead on CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s CI CopyFromScreen returned a valid PNG but blank content on GitHub Actions Windows runners. The desktop redirection surface read by GetDC(NULL) is not fully backed on Hyper-V virtual display adapters without a hardware GPU, so the desktop framebuffer doesn't reflect window content. Switch to GetWindowDC(hwnd) + BitBlt: with --disable-gpu-compositing, Chromium's SoftwareOutputDeviceWin presents each frame via BitBlt to the window's own HDC, which lives in the per-window DWM redirection bitmap. Reading from GetWindowDC bypasses the desktop framebuffer and captures the actual rendered content. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PowerShell parser rejects 0x00CC0020u with a ParserError at char 592 — the trailing u is C# syntax for unsigned int and is not valid PowerShell. PowerShell coerces the plain literal 0x00CC0020 to uint when passing to the [DllImport]-declared BitBlt(uint dwRop) parameter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Modern Chromium presents frames via DirectComposition even with --disable-gpu-compositing, so the content never reaches any GDI surface — every GDI-based capture (CopyFromScreen / GetDC(NULL), PrintWindow(WM_PRINT), BitBlt from GetWindowDC) returns a blank PNG on Hyper-V/WARP environments like GitHub Actions. Switch the no-GPU branch to ffmpeg's ddagrab muxer, which uses the DXGI Desktop Duplication API to read what DWM is actually presenting. This works on WARP (software D3D), captures DComp output, and is preinstalled on the GitHub Actions windows-2022 runner image. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ffmpeg is not preinstalled on the GitHub Actions windows-2022 image, so nativeScreenshot's ddagrab capture path (used when Chromium runs with --disable-gpu-compositing under WARP) failed with ENOENT. Install via chocolatey before running the e2e suite. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ffmpeg rejected -frames:v as an input option. -framerate is an input option (configures the ddagrab device); -frames:v and -vf are output options and must come after -i. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The plain chocolatey 'ffmpeg' package ships BtbN's essentials build, which omits the ddagrab DXGI duplication input device. ffmpeg-full wraps the GPL build which includes ddagrab. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both chocolatey 'ffmpeg' (essentials) and 'ffmpeg-full' (years-stale, pre-FFmpeg-6.1) ship without the ddagrab DXGI duplication input device. Download BtbN's master GPL build directly which is verified to include ddagrab. Also dump '-version' and '-f ddagrab -h' so the job logs prove the indev is present. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BtbN's GPL build cross-compiled with mingw doesn't include --enable-d3d11va, which ddagrab depends on, so the indev was missing. GyanD's 'full' build has d3d11va enabled and ships ddagrab. Distributed as 7z; 7-Zip is preinstalled on the windows-2022 runner image. Also fail fast at install time if ddagrab is missing from the device list, so we don't have to drill through e2e logs to diagnose this again. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
No widely-available Windows ffmpeg build (BtbN GPL, Gyan full, chocolatey ffmpeg/ffmpeg-full) ships with the ddagrab indev — none of them configure with --enable-d3d11va. Rather than building ffmpeg ourselves on every CI run, force Chromium to use a GDI presentation path via --disable-direct-composition, which puts the rendered frames in the per-window DWM redirection bitmap where BitBlt from GetWindowDC can read them. - e2e: add --disable-direct-composition alongside --disable-gpu-compositing in ciAppArgs so all Windows CI runs share the same flag set. - electron-service: revert the no-GPU branch from ffmpeg ddagrab back to BitBlt(GetWindowDC). Document why both flags are required. - ci: remove the ffmpeg install step (no longer needed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After exhausting GDI capture paths and ffmpeg-with-ddagrab alternatives without finding one that works on WARP, accept the limitation: there is no reliable nativeScreenshot capture method for Hyper-V VMs without a real GPU. The feature still works on macOS and on Windows machines with hardware GPU (developer machines, self-hosted runners with graphics). - e2e: skip the spec on Windows CI alongside the existing Linux skip. - e2e config: revert --disable-direct-composition (caused PowerShell ETIMEDOUT — Chromium gets stuck in a paint-pending state and BitBlt blocks waiting for the paint). - electron-service: keep the GetWindowDC+BitBlt fallback so the API doesn't crash if anyone hits this path, but document that it returns blank under WARP. - docs: explain every method we tried and why each fails, so future contributors don't redo this investigation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…kup failure The Windows xcap-based handler was returning a bare 'no window found for this process' with no context, which makes CI failures impossible to diagnose without local Windows access. Two changes: 1. Title-fallback now matches case-insensitively after trimming, and uses substring containment instead of exact equality. CI environments sometimes append marker text to the visible title (e.g. an automation suffix) that Tauri's own .title() doesn't include. 2. The 'no window found' error now includes our PID, the title we were looking for, and the (pid, title) of every window xcap returned — so the next CI failure tells us exactly why the match failed instead of forcing another round of guesswork. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diagnostic output from the previous run confirms xcap (EnumWindows) only sees 2 windows on the GitHub Actions Windows runner — the agent itself and one nameless window owned by a different PID. The Tauri WebView2 window is not enumerable in this VM environment, so the embedded provider's nativeScreenshot endpoint can't find anything to capture. Same root cause as the Electron skip: no real graphical session, just Hyper-V/WARP. The feature still works on macOS and on Windows machines with a real desktop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Closing in favour of using The spike validated that The remaining "native chrome only" use cases this PR was originally chasing (multi-window-in-one-capture, OS-level menu/tray pixel diffs) turned out to be either niche or already covered. Combined with the inability to make the capture work on Windows CI (Hyper-V/WARP can't satisfy any GDI / desktop-duplication path), the feature isn't worth the carrying cost. |
Summary
browser.tauri.nativeScreenshot()andbrowser.electron.nativeScreenshot()— captures the full OS window including title bar, close/minimize buttons, and window decorations (not just the webview content)POST /wdio/native-screenshotendpoint in the embedded WebDriver server using thexcapcrate (macOS + Windows;unsupported_operationon Linux)screencapture -R(macOS) and PowerShellPrintWindow(Windows) — no native Node module, nodesktopCapturer(avoids macOS Screen Recording permission in CI)Bufferof PNG bytes; Tauri requires theembeddeddriver providerE2E verification — three-layer approach
tesseract.jsOLLAMA_API_KEY)mainonlyLayer 3 requires
OLLAMA_API_KEYset as a CI secret (gate viagithub.event_name == 'push' && github.ref == 'refs/heads/main'). ThevisionEnabled()guard ensures specs pass on PR runs without the key.Test plan
cargo checkpasses on macOS (aarch64-apple-darwin) — verified locallypnpm --filter @wdio/tauri-service typecheck— cleanpnpm --filter @wdio/electron-service typecheck— cleanpnpm --filter e2e typecheck— cleane2e/test/tauri/native-screenshot.spec.tslocally on macOS — layers 1+2 pass without API keye2e/test/electron/native-screenshot.spec.tslocally on macOS — layers 1+2 pass without API keyOLLAMA_API_KEY+OLLAMA_BASE_URLto merge-to-main CI job for layer 3🤖 Generated with Claude Code