A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Jun 18, 2025 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Burn is a next generation Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
Simple, scalable AI model deployment on GPU clusters
Stable Diffusion web UI
A deep learning package for many-body potential energy representation and molecular dynamics
Large-scale LLM inference engine
stdgpu: Efficient STL-like Data Structures on the GPU
Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
Agenium Scale vectorization library for CPUs and GPUs
Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster
AMD GPU (ROCm) programming in Julia
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.
Add a description, image, and links to the rocm topic page so that developers can more easily learn about it.
To associate your repository with the rocm topic, visit your repo's landing page and select "manage topics."