Neural Magic
Neural Magic empowers developers to optimize and deploy LLMs at scale. Our model compression and acceleration enable top performance with vLLM.
Pinned Loading
Repositories
Showing 10 of 67 repositories
- gateway-api-inference-extension Public Forked from kubernetes-sigs/gateway-api-inference-extension
Gateway API Inference Extension
- compressed-tensors Public
A safetensors extension to efficiently store sparse quantized tensors on disk
- model-validation-configs Public
- upstream-transformers Public Forked from huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
- depyf Public Forked from thuml/depyf
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
Top languages
Loading…