w4a8

Here are 5 public repositories matching this topic...

W8A8/W4A8 inference on Apple Silicon — unlocking unused INT8 TensorOps in M5 for 1.2–1.9× faster LLM prefill, built as MLX custom primitives.

metal quantization mlx apple-silicon w8a8 w4a8

KV260 integration lane for PCCX™ v002 LLM IP-core bring-up, validation, and board/runtime evidence.

PCCX™ specification, documentation, and ecosystem coordination hub for open AI accelerator IP.

docs fpga transformer rtl isa systemverilog computer-architecture quantization gemm inference-engine npu edge-ai hardware-accelerator kv-cache gemv llm ai-accelerator w4a8 pccx

PCCX™ vision-v001 compatibility track for CNN inference planning and v002/Vision absorption review.

PCCX™ v002 IP-core package — board- and model-agnostic reusable RTL for LLM, Vision, Voice, and common subsystems.

Add a description, image, and links to the w4a8 topic page so that developers can more easily learn about it.

To associate your repository with the w4a8 topic, visit your repo's landing page and select "manage topics."