Does TRTIS support model parallelism? #131

vilmara · 2019-03-05T20:44:33Z

Hi all, does TRTIS support model parallelism?, I mean, if a single model is copied in several GPUs to maximize inference throughput. Thanks,

deadeyegoodwin · 2019-03-05T20:54:00Z

Yes, you can serve multiple different models, multiple instances of the same model, or multiple instances of multiple models, on one or more CPUs and GPUs, simultaneously.

The docs discuss it here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/model_configuration.html?highlight=batching#instance-groups

As does this blog post (in the Performance section): https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/

vilmara · 2019-03-05T22:20:26Z

@deadeyegoodwin, thanks for your prompt reply

BDHU · 2022-09-23T18:27:08Z

@deadeyegoodwin for a large model that doesn't fit on a single GPU, how does triton split the model onto mutlple GPUs then?

shanshanpt · 2023-01-30T02:19:49Z

@deadeyegoodwin for a large model that doesn't fit on a single GPU, how does triton split the model onto mutlple GPUs then?

Triton may not support this case based on my experience.

vilmara closed this as completed Mar 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does TRTIS support model parallelism? #131

Does TRTIS support model parallelism? #131

vilmara commented Mar 5, 2019

deadeyegoodwin commented Mar 5, 2019

vilmara commented Mar 5, 2019

BDHU commented Sep 23, 2022

shanshanpt commented Jan 30, 2023

Does TRTIS support model parallelism? #131

Does TRTIS support model parallelism? #131

Comments

vilmara commented Mar 5, 2019

deadeyegoodwin commented Mar 5, 2019

vilmara commented Mar 5, 2019

BDHU commented Sep 23, 2022

shanshanpt commented Jan 30, 2023