m4-max

Here are 2 public repositories matching this topic...

199-biotechnologies / zerank-2-mlx

Fast MLX port of ZeroEntropy zerank-2 cross-encoder reranker. 10x faster than PyTorch MPS on Apple Silicon. bf16, validated.

macos metal transformers semantic-search mlx fast-inference reranker rag huggingface apple-silicon cross-encoder llm-inference retrieval-augmented-generation llm-optimization qwen3 zeroentropy zerank m4-max

Updated Apr 9, 2026
Python

rohitgarg19 / llmstack

Star

Cursor-Auto / Claude-tier-style serving for local GGUF models on Mac (M4 Max, 64 GB). FastAPI router fronts llama-swap + llama.cpp, classifying each request into a coder, planner, or uncensored-planner tier. OpenAI-compatible API, opencode integration, per-project subshell, one `llmstack` console-script.

python mac metal opencode coding-assistant apple-silicon openai-api llm llama-cpp local-llm local-ai gguf llm-router ai-coding-assistant model-router llama-swap intent-router m4-max

Updated May 13, 2026
Python

Improve this page

Add a description, image, and links to the m4-max topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the m4-max topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly