post2-bench

Raw data, harness scripts, and methodology for the benchmark in Vulkan/RADV vs ROCm 6.4 on Strix Halo: What 128 Benchmark Runs Actually Showed.

If you want the narrative and the verdict, read the post. This repo is for the people who want to verify our numbers or re-run the bench on their own hardware.

Key findings

On Qwen 3.6-35B-A3B MoE: Vulkan/RADV is 25–32 % faster on generation than ROCm 6.4.4 across all prompt sizes. Vulkan also wins prefill by 6–8 % once context grows past trivial.
On Gemma 4 31B Dense: Vulkan runs cleanly at ~6 t/s. ROCm fails 48 of 48 runs in three distinct failure modes (hipGraphInstantiate OOM at production context size, degenerate output at smaller context, no tested workaround restores correct output).
ROCm failures reproduce identically on llama.cpp master (dbe7901ca, May 14 2026) — not a stale-build issue.

What's here

File / Directory	What it is
`methodology.md`	Full setup, build flags, run procedure, decision log
`results.csv`	128 raw runs, one row per inference call
`summary.csv`	Aggregated stats per (backend, model, cache_k, prompt) cell
`garbage-samples.txt`	Forensic captures of ROCm × Gemma degenerate output with token-ID decomposition
`anomalies.log`	Verbatim crash logs and failure traces
`prompts/`	The four fixed prompts used across the bench
`scripts/`	Harness (server lifecycle, VRAM polling, streaming TTFT capture)

Hardware

Bosgame M5
AMD Ryzen AI MAX+ 395 ("Strix Halo")
gfx1151 iGPU, 96 GB BIOS-allocated VRAM
128 GB unified LPDDR5X-8000 @ ~256 GB/s
Fedora Server 43, headless

Kernel command line:

amd_iommu=off amdgpu.gttsize=126976 ttm.pages_limit=32505856

Software

Component	Version
`llama.cpp` (primary)	b9016 (846262d78), May 4 2026
`llama.cpp` (verification)	master (dbe7901ca), May 14 2026
Vulkan/RADV	Mesa 25.3.6 (default Fedora 43 stack)
ROCm/HIP	6.4.4 (host-native, `dnf install rocm-hip-runtime rocm-llvm hip-runtime-amd`)
rocWMMA	6.4.0-3.fc43 (does not enumerate gfx1151 — bench built with `GGML_HIP_ROCWMMA_FATTN=OFF`)

Build flags, both backends:

-DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=ON

Backend-specific:

build-vulkan: -DGGML_VULKAN=ON -DGGML_HIP=OFF
build-rocm:   -DGGML_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx1151 -DGGML_HIP_ROCWMMA_FATTN=OFF

Models

Model	File	Size
Qwen 3.6-35B-A3B (MoE)	`Qwen3.6-35B-A3B-UD-Q5_K_XL.gguf` (Unsloth Dynamic Q5)	~25 GB
Gemma 4 31B (Dense)	`gemma-4-31B-it-Q8_0.gguf`	~33 GB

Reproducing

See methodology.md for the full procedure. Short version:

Two parallel llama.cpp builds at b9016, installed to /opt/llamacpp/vulkan/bin/ and /opt/llamacpp/rocm/bin/.
Stop any other inference processes — the bench needs idle GPU.
Server runs on port 9090 (separate from any production endpoints).
Run the harness in scripts/run_bench.sh.
Each cell: 5 runs, drop run 1 as warmup, statistics on runs 2–5.
Thermal cooldown to GPU edge temperature < 60 °C between runs.

Expect roughly 6–8 hours wall-clock for the full 128-run sweep with thermal cooldowns. The Gemma × ROCm cells fail fast (~10–30 s each) so they don't dominate runtime.

Verifying, extending, correcting

If you can reproduce these results on similar hardware — or, more interestingly, can't — open an issue. Include hardware details, ROCm and Mesa versions, llama.cpp commit, and what you saw.

If you find a methodology mistake, also open an issue. Data and the post both get corrected with attribution.

License

What	License
Data (CSVs, logs, samples)	CC0 1.0 — public domain, attribution appreciated but not required
Scripts	MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
prompts		prompts
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE-DATA		LICENSE-DATA
README.md		README.md
anomalies.log		anomalies.log
garbage-samples.txt		garbage-samples.txt
methodology.md		methodology.md
results.csv		results.csv
summary.csv		summary.csv
tldr.md		tldr.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

post2-bench

Key findings

What's here

Hardware

Software

Models

Reproducing

Verifying, extending, correcting

License

Related

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

post2-bench

Key findings

What's here

Hardware

Software

Models

Reproducing

Verifying, extending, correcting

License

Related

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages