Add CUDA support with new -cuda tag#749
Open
ToshY wants to merge 8 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #480
This PR was basically brought by Claude (Opus 4.6/4.7). Tested images for both the standard and CUDA build on WSL2, and was also able to embed it in existing Python alpine based image (but needs additional libraries / ENV vars to get it working, which is denoted in the README).
I am however unable to fully verify the changes in the
Dockerfile(except the part of addingnv-codec-headers) as I simple just lack the knowledge for it.A file is available in the PR,
docs/ffmpeg-with-cuda.mdthat just serves as summary for the changes that were made. It is purely kept temporarily in case the reviewer wants to check the decisions that were made (by Claude). If the PR might be merged, it can be removed before merging (to avoid clutter).Generated summary by Claude 🤖
Summary
Adds a second image variant
mwader/static-ffmpeg:<tag>-cuda(amd64 only) that supports NVIDIA GPU acceleration (h264_nvenc,hevc_nvenc,av1_nvenc, NVDEC, CUVID,scale_cuda, …) via the host driver and the NVIDIA Container Toolkit.The default
:<tag>image is unchanged — still fullystatic-piemusl, zeroNEEDEDentries, drops intoFROM scratch. CUDA users explicitly opt in via the-cudatag.Why a separate variant?
FROM scratch". CUDA requiresdlopen()of host driver libraries → fundamentally incompatible withstatic-pieon musl (no dynamic loader). Making the default dynamic would silently break existing users.:<tag>:<tag>-cudareadelf -dNEEDEDlibc.musl-x86_64.so.1Architecture (six-layer stack)
The CUDA variant works on Alpine + musl by combining six independently-essential layers. Each was added to fix one specific failure mode discovered during development. Full problem → cause → fix write-ups in
docs/ffmpeg-with-cuda.md./lib/ld-musl-x86_64.so.1libc.adlopenstub silently returning NULL-fPIE -pie, not-static-pie)dlopenimpossible/etc/ld-musl-x86_64.pathlisting toolkit injection dirs/usr/lib64,/usr/lib/wsl/lib, …gcompatpackage +libdl.so.2 → libgcompat.so.0symlinklibc.so.6/libdl.so.2(glibc names)libnvshim.soLD_PRELOAD (ABI-shim symbols only)gnu_get_libc_version,__register_atfork,dlmopen,dlvsym, …)__cxa_finalizeon muslFiles changed
Dockerfile—ARG ENABLE_CUDA=; gatednv-codec-headersinstall; ffmpeg configure gains--enable-ffnvcodec --enable-cuvid --enable-nvenc --enable-nvdecand the dynamic-PIE/absolute-path-libc link flags; newfinal-cudastage withgcompat,libnvshim.so, ld-musl path, env, and entrypoint wrapper.checkelf— new--cudamode that allows the musl libc/loader as the onlyNEEDEDentry; all other hardening checks (RELRO, BIND_NOW, PIE, NX stack) preserved.README.md— new "CUDA / NVENC / NVDEC" section; tag listing updated.docs/ffmpeg-with-cuda.md— full problem → root cause → fix write-up of every issue encountered, plus diagnostic playbook and regression-guard recipes..github/workflows/multiarch.yml— split the matrix into three jobs:build-default-arm64,build-default-amd64(parallel), thenbuild-cuda-amd64(needs: build-default-amd64, reuses the same buildx cache scope so only the final stage materializes).Tag layout
<tag><tag>-amd64+<tag>-arm64<tag>-amd64build-default-amd64<tag>-arm64build-default-arm64<tag>-cudabuild-cuda-amd64(single-arch)<tag>-cuda-amd64build-cuda-amd64(explicit arch alias)latestandlatest-cudafollow the latest stable release.Explicitly NOT supported
--enable-cuda-nvcc--enable-libnpp/scale_nppscale_cudainsteadFROM scratch/ distroless target imagesCI impact
Walltime estimate (the longest job dictates total run time; jobs run in parallel where possible):
build-default-arm64build-default-amd64build-cuda-amd64Total wall time: ~70–75 min (vs ~60 min before). The arm64 job still runs fully in parallel with the amd64 chain; the CUDA job blocks on the amd64 default build to reuse its buildx cache scope (avoids ~60 min of duplicate codec compilation).
Verification
End-to-end recipe in
docs/ffmpeg-with-cuda.md§4:All five verification steps pass on the test build (RTX 3060 Ti, driver 596.21, CUDA 13.2, WSL2).
Runtime requirements
--gpus all(or--runtime=nvidia+NVIDIA_VISIBLE_DEVICES).NVIDIA_DRIVER_CAPABILITIES=compute,utility,videois baked into the image —computemountslibcuda.so.1,videomountslibnvcuvid.so/libnvidia-encode.so. Default toolkit caps (utilityonly) would break NVENC.Known design notes (in docs)
libnvshim.soMUST NOT exportexit/_exit/_Exit. The earlier in-process attempt to suppress the teardown SIGSEGV via an_exitinterposer silently swallowed every ffmpeg error exit code (always returned 0). The shim is now strictly the minimum glibc→musl ABI symbol set; lifecycle policy lives in the bash entrypoint wrapper where it can read the real exit status via${PIPESTATUS[0]}and pattern-match on actual stderr keywords. See docs/ffmpeg-with-cuda.md §2 P6.__cxa_finalizecrash insidemain()— there is no in-process hook (atexit, signal handler, etc.) that can suppress it without risk of papering over real bugs. The out-of-process wrapper downgrades exit139 → 0only when stderr contains no recognised error keyword.ENV LD_PRELOAD=libnvshim.sois only safe in ffmpeg-only images. The published:*-cudaimage runs only/ffmpeg(ENTRYPOINT ["/ffmpeg"]), which was built and tested with the shim preloaded. Downstream users whoCOPY --fromthe binaries into a multi-process image (Python/Node app + ffmpeg, etc.) and blindly replicateENV LD_PRELOADwill see other musl interpreters (pip,python, …) crash withSIGSEGV(exit 139) at startup —libnvshimexports glibc-only symbols and transitively pulls ingcompat(viaDT_NEEDED libdl.so.2), which is not safe to inject into arbitrary musl processes. The README "Use in another image withCOPY --from" → "Multi-process images" subsection documents the scoped-wrapper alternative (/usr/local/bin/ffmpegshell stub that setsLD_PRELOADonly for the ffmpeg invocation).