-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does TRTIS support model parallelism? #131
Comments
Yes, you can serve multiple different models, multiple instances of the same model, or multiple instances of multiple models, on one or more CPUs and GPUs, simultaneously. The docs discuss it here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/model_configuration.html?highlight=batching#instance-groups As does this blog post (in the Performance section): https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/ |
@deadeyegoodwin, thanks for your prompt reply |
@deadeyegoodwin for a large model that doesn't fit on a single GPU, how does triton split the model onto mutlple GPUs then? |
Triton may not support this case based on my experience. |
Hi all, does TRTIS support model parallelism?, I mean, if a single model is copied in several GPUs to maximize inference throughput. Thanks,
The text was updated successfully, but these errors were encountered: