A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Jul 1, 2024 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Open deep learning compiler stack for cpu, gpu and specialized accelerators
PygmalionAI's large-scale inference engine
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
ROCm Install Utilities: rocminstall.py script to install a specific ROCm release version/revision.
TOML-annotated C header file format for packaging binary files, from Microsoft Research
This project has scripts to set up, build and test installation of AMD ROCm MIVisionX
Voice-to-voice personal assistant, Full-local, GPU company agnostic.
Instructions on how to use PyTorch on AMD GPU with Linux
MIVisionX Python Inference Analyzer uses pre-trained ONNX/NNEF/Caffe models to analyze inference results and summarize individual image results
MIVisionX Infrastructure for Neural Net Training and Inference with Optimized Data Augmentation through RALI
Add a description, image, and links to the rocm topic page so that developers can more easily learn about it.
To associate your repository with the rocm topic, visit your repo's landing page and select "manage topics."