Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does TRTIS support model parallelism? #131

Closed
vilmara opened this issue Mar 5, 2019 · 4 comments
Closed

Does TRTIS support model parallelism? #131

vilmara opened this issue Mar 5, 2019 · 4 comments

Comments

@vilmara
Copy link

vilmara commented Mar 5, 2019

Hi all, does TRTIS support model parallelism?, I mean, if a single model is copied in several GPUs to maximize inference throughput. Thanks,

@deadeyegoodwin
Copy link
Contributor

Yes, you can serve multiple different models, multiple instances of the same model, or multiple instances of multiple models, on one or more CPUs and GPUs, simultaneously.

The docs discuss it here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/model_configuration.html?highlight=batching#instance-groups

As does this blog post (in the Performance section): https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/

@vilmara
Copy link
Author

vilmara commented Mar 5, 2019

@deadeyegoodwin, thanks for your prompt reply

@vilmara vilmara closed this as completed Mar 5, 2019
@BDHU
Copy link

BDHU commented Sep 23, 2022

@deadeyegoodwin for a large model that doesn't fit on a single GPU, how does triton split the model onto mutlple GPUs then?

@shanshanpt
Copy link

@deadeyegoodwin for a large model that doesn't fit on a single GPU, how does triton split the model onto mutlple GPUs then?

Triton may not support this case based on my experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants