fp8

Here are 2 public repositories matching this topic...

theogravity / dual-rtx-6000-blackwell-qwen3.6-27b-fp8

Optimized vLLM setup for Qwen3.6-27B-FP8 on dual RTX PRO 6000 Blackwell (192 GB GDDR7, no NVLink) ; config, benchmark sweep results, and custom chat template with thinking mode off by default.

benchmark blackwell fp8 vllm local-llm llm-inference speculative-decoding qwen3 multi-token-prediction rtx-pro-6000

Updated May 10, 2026
Shell

custom-build-robots / tensorrt-llm-edge-prep

Star

Build, run, and setup scripts for the complete TensorRT-LLM pipeline on RTX A6000 Ada (SM89). Reproducible path from HuggingFace checkpoint to deployable .engine file, with FP16 baseline and FP8 quantization. Companion material to the 4-part blog series on ai-box.eu — in preparation for the NVIDIA TensorRT Edge-LLM ecosystem.

inference nvidia quantization rtx fp16 ai-agents edge-ai llm ada-architecture fp8 tensorrt-llm

Updated May 16, 2026
Shell

Improve this page

Add a description, image, and links to the fp8 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the fp8 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp8

Here are 2 public repositories matching this topic...

theogravity / dual-rtx-6000-blackwell-qwen3.6-27b-fp8

custom-build-robots / tensorrt-llm-edge-prep

Improve this page

Add this topic to your repo