W8A8/W4A8 inference on Apple Silicon — unlocking unused INT8 TensorOps in M5 for 1.2–1.9× faster LLM prefill, built as MLX custom primitives.
-
Updated
May 3, 2026 - Python
W8A8/W4A8 inference on Apple Silicon — unlocking unused INT8 TensorOps in M5 for 1.2–1.9× faster LLM prefill, built as MLX custom primitives.
KV260 integration lane for PCCX™ v002 LLM IP-core bring-up, validation, and board/runtime evidence.
PCCX™ specification, documentation, and ecosystem coordination hub for open AI accelerator IP.
PCCX™ vision-v001 compatibility track for CNN inference planning and v002/Vision absorption review.
PCCX™ v002 IP-core package — board- and model-agnostic reusable RTL for LLM, Vision, Voice, and common subsystems.
Add a description, image, and links to the w4a8 topic page so that developers can more easily learn about it.
To associate your repository with the w4a8 topic, visit your repo's landing page and select "manage topics."