Fast MLX port of ZeroEntropy zerank-2 cross-encoder reranker. 10x faster than PyTorch MPS on Apple Silicon. bf16, validated.
-
Updated
Apr 9, 2026 - Python
Fast MLX port of ZeroEntropy zerank-2 cross-encoder reranker. 10x faster than PyTorch MPS on Apple Silicon. bf16, validated.
Cursor-Auto / Claude-tier-style serving for local GGUF models on Mac (M4 Max, 64 GB). FastAPI router fronts llama-swap + llama.cpp, classifying each request into a coder, planner, or uncensored-planner tier. OpenAI-compatible API, opencode integration, per-project subshell, one `llmstack` console-script.
Add a description, image, and links to the m4-max topic page so that developers can more easily learn about it.
To associate your repository with the m4-max topic, visit your repo's landing page and select "manage topics."