Reproducible LLM inference benchmarks (prefill vs decode throughput) to inform requirements for an intermediate memory tier (HBF-class) between HBM and SSD.
benchmark performance storage inference ssd ocp qos hbm wsl2 memory-tiering hbf llm llama-cpp gguf open-compute-project high-bandwidth-flash llama-bench
-
Updated
Feb 26, 2026 - Python