Skip to content
@neuralmagic

Neural Magic

Neural Magic empowers developers to optimize and deploy LLMs at scale. Our model compression and acceleration enable top performance with vLLM.

Pinned Loading

  1. nm-vllm-certs Public

    General Information, model certifications, and benchmarks for nm-vllm enterprise distributions

    11 2

  2. deepsparse Public

    Sparsity-aware deep learning inference runtime for CPUs

    Python 3.1k 183

  3. sparseml Public

    Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

    Python 2.1k 152

  4. docs Public

    Top-level directory for documentation and general content

    MDX 120 7

  5. sparsezoo Public

    Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes

    Python 382 26

  6. guidellm Public

    Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

    Python 273 32

Repositories

Showing 10 of 68 repositories
  • gateway-api-inference-extension Public Forked from kubernetes-sigs/gateway-api-inference-extension

    Gateway API Inference Extension

    Jupyter Notebook 1 Apache-2.0 67 5 5 Updated Apr 24, 2025
  • 0 0 0 8 Updated Apr 24, 2025
  • vllm Public Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 12 Apache-2.0 7,126 0 13 Updated Apr 23, 2025
  • research Public

    Repository to enable research flows

    Python 0 0 0 1 Updated Apr 23, 2025
  • benchmark-compare Public

    Fun with benchmarks

    Python 5 2 0 1 Updated Apr 23, 2025
  • speculators Public
    Python 0 Apache-2.0 0 0 1 Updated Apr 23, 2025
  • compressed-tensors Public

    A safetensors extension to efficiently store sparse quantized tensors on disk

    Python 101 Apache-2.0 10 5 17 Updated Apr 23, 2025
  • nm-actions Public

    Neural Magic GHA

    Python 0 Apache-2.0 0 0 4 Updated Apr 23, 2025
  • Python 0 9 0 1 Updated Apr 23, 2025
  • guidellm Public

    Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

    Python 273 Apache-2.0 32 28 4 Updated Apr 22, 2025