Neural Magic
Neural Magic empowers developers to optimize and deploy LLMs at scale. Our model compression and acceleration enable top performance with vLLM.
Pinned Loading
Repositories
Showing 10 of 66 repositories
- depyf Public Forked from thuml/depyf
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
- model-validation-configs Public
- compressed-tensors Public
A safetensors extension to efficiently store sparse quantized tensors on disk
- lm-evaluation-harness Public Forked from EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
- lmms-eval Public Forked from EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.