Skip to content
@neuralmagic

Neural Magic

Neural Magic empowers developers to optimize and deploy LLMs at scale. Our model compression and acceleration enable top performance with vLLM.

Pinned Loading

  1. nm-vllm-certs Public

    General Information, model certifications, and benchmarks for nm-vllm enterprise distributions

    11 1

  2. deepsparse Public

    Sparsity-aware deep learning inference runtime for CPUs

    Python 3.1k 181

  3. sparseml Public

    Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

    Python 2.1k 150

  4. docs Public

    Top-level directory for documentation and general content

    MDX 121 7

  5. sparsezoo Public

    Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes

    Python 382 26

  6. guidellm Public

    Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

    Python 208 21

Repositories

Showing 10 of 62 repositories
  • vllm Public Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 10 Apache-2.0 6,233 0 17 Updated Mar 9, 2025
  • guidellm Public

    Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

    Python 208 Apache-2.0 21 23 14 Updated Mar 8, 2025
  • research Public

    Repository to enable research flows

    Python 0 0 0 1 Updated Mar 8, 2025
  • compressed-tensors Public

    A safetensors extension to efficiently store sparse quantized tensors on disk

    Python 80 Apache-2.0 9 4 11 Updated Mar 7, 2025
  • nm-actions Public

    Neural Magic GHA

    Python 0 Apache-2.0 0 0 3 Updated Mar 7, 2025
  • axolotl Public Forked from axolotl-ai-cloud/axolotl

    Go ahead and axolotl questions

    Python 0 Apache-2.0 984 0 2 Updated Mar 4, 2025
  • yolov5 Public Forked from ultralytics/yolov5

    YOLOv5 in PyTorch > ONNX > CoreML > TFLite

    Python 20 GPL-3.0 16,944 0 3 Updated Mar 3, 2025
  • upstream-transformers Public Forked from huggingface/transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

    Python 1 Apache-2.0 28,676 0 0 Updated Feb 24, 2025
  • flash-attention Public Forked from vllm-project/flash-attention

    Fast and memory-efficient exact attention

    C++ 0 BSD-3-Clause 1,540 0 0 Updated Feb 20, 2025
  • pytest-nm-releng Public

    Pytest plugin used by the Release Engineering team

    Python 0 Apache-2.0 0 0 0 Updated Feb 17, 2025

Top languages

Loading…

Most used topics

Loading…