Neural Magic
Neural Magic (Acquired by Red Hat) empowers developers to optimize & deploy LLMs at scale. Our model compression & acceleration enable top performance with vLLM
Pinned Loading
Repositories
Showing 10 of 78 repositories
- speculators Public
- vllm Public Forked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
- arena-hard-auto Public Forked from lmarena/arena-hard-auto
Arena-Hard-Auto: An automatic LLM benchmark.
- compressed-tensors Public
A safetensors extension to efficiently store sparse quantized tensors on disk
- DeepGEMM Public Forked from deepseek-ai/DeepGEMM
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
- model-validation-configs Public
-
Most used topics
Loading…