python backend not support TRITONSERVER_MEMORY_GPU #2369

Slyne · 2020-12-29T16:03:48Z

Description
Using ensemble model, I see this issue: https://github.com/triton-inference-server/python_backend/blob/9e89c1018ef0a9cbd29c3c45ec0baffa7ccf0bc8/src/python.cc#L481

Triton Information
What version of Triton are you using?
20.10

Are you using the Triton container or did you build it yourself?
container

To Reproduce
Ensemble model: one runs on GPU(onnx_runtime) and one built on python backend.
The issue line seems to have a constraint on the input to have a memory type as TRITONSERVER_MEMORY_CPU.

Expected behavior
Should work with input tensor on GPU. Is this a bug or I misunderstand something?

Tabrizian · 2020-12-29T16:20:57Z

Thanks for the report. This issue is fixed in 20.11. Feel free to reopen this ticket if you still saw this error in 20.11.

ldengjie · 2020-12-29T19:19:58Z

@Tabrizian This error still happens in 20.11

{"error":"in ensemble 'ensemble_test', failed to get input buffer in CPU memory"}

I found this code is also still in master branch:
https://github.com/triton-inference-server/python_backend/blob/9e89c1018ef0a9cbd29c3c45ec0baffa7ccf0bc8/src/python.cc#L478

Slyne · 2020-12-30T03:31:31Z

@Tabrizian Still get the issue in 20.11. Besides I found there is a similar line in https://github.com/triton-inference-server/backend/blob/99b9ff27b9e1dd8d84bb2f5cdd7cd7ffe13d65ef/src/backend_common.cc#L175
Not sure how to get the data from GPU?

Tabrizian · 2020-12-30T04:47:07Z

Sorry for the confusion. It is not possible to get the data from GPU memory directly in Python backend. You need to manually copy the data from CPU to GPU in the Python code.

francescoclarifai · 2020-12-30T04:58:52Z

@Tabrizian could you provide an example of how that operation would look like in the python code? I'm unclear on where and how we would detach to CPU, given a tensor in the incoming request from an upstream GPU model. Thanks!

[edited to emphasize that this is the same issue as described by @Slyne below]

Slyne · 2020-12-30T05:34:16Z

@Tabrizian I understand that we can put the data from CPU to GPU in a single backend when the request is sent by the client directly. But currently I have an ensemble model which contains one model A runs on GPU and one backend B to process the result of model A. So the backend can only receive the GPU data produced by model A.
Hope to get some guidance or examples on solving this situation.

ldengjie · 2020-12-31T05:21:22Z

Hi, I made a request to solove it 29

uefall · 2021-01-06T03:02:18Z

@Tabrizian Hi,
same issue in 20.11 with ensemble model,
so where and how to copy the GPU output buffer to CPU manualy?

Tabrizian · 2021-01-11T19:41:16Z

This issue is fixed in triton-inference-server/python_backend#30. It will be available in 21.02 release.

simon5u · 2021-09-01T14:41:43Z

@Tabrizian We faced the same issue when using this docker, nvcr.io/nvidia/deepstream:5.1-21.02-triton. We found that the tritonserver version inside this docker is 2.5.0, the tritonserver version inside nvcr.io/nvidia/tritonserver:21.02-py3 is 2.7.0. Does it mean we cannot use python_backend with ensemble model in nvcr.io/nvidia/deepstream:5.1-21.02-triton?

Tabrizian · 2021-09-01T15:04:18Z

You can use ensemble if the input and output of the tensors given to the Python model are in CPU. If either of the input/output tensors are in GPU you need to update the Triton to 21.02+ versions.

simon5u · 2021-09-01T15:15:59Z

@Tabrizian I noticed that the latest Jetson version is using nvcr.io/nvidia/deepstream-l4t:5.1-21.02-base, from this link https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_docker_containers.html. Does it mean we cannot use the ensemble model in Jetson with a mixture of CPU and GPU tensors?

Tabrizian closed this as completed Dec 29, 2020

Tabrizian added the enhancement New feature or request label Jan 2, 2021

Tabrizian self-assigned this Jan 2, 2021

Tabrizian reopened this Jan 3, 2021

Tabrizian closed this as completed Jan 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python backend not support TRITONSERVER_MEMORY_GPU #2369

python backend not support TRITONSERVER_MEMORY_GPU #2369

Slyne commented Dec 29, 2020 •

edited

Loading

Tabrizian commented Dec 29, 2020 •

edited

Loading

ldengjie commented Dec 29, 2020 •

edited

Loading

Slyne commented Dec 30, 2020 •

edited

Loading

Tabrizian commented Dec 30, 2020

francescoclarifai commented Dec 30, 2020 •

edited

Loading

Slyne commented Dec 30, 2020

ldengjie commented Dec 31, 2020

uefall commented Jan 6, 2021

Tabrizian commented Jan 11, 2021

simon5u commented Sep 1, 2021

Tabrizian commented Sep 1, 2021 •

edited

Loading

simon5u commented Sep 1, 2021

python backend not support TRITONSERVER_MEMORY_GPU #2369

python backend not support TRITONSERVER_MEMORY_GPU #2369

Comments

Slyne commented Dec 29, 2020 • edited Loading

Tabrizian commented Dec 29, 2020 • edited Loading

ldengjie commented Dec 29, 2020 • edited Loading

Slyne commented Dec 30, 2020 • edited Loading

Tabrizian commented Dec 30, 2020

francescoclarifai commented Dec 30, 2020 • edited Loading

Slyne commented Dec 30, 2020

ldengjie commented Dec 31, 2020

uefall commented Jan 6, 2021

Tabrizian commented Jan 11, 2021

simon5u commented Sep 1, 2021

Tabrizian commented Sep 1, 2021 • edited Loading

simon5u commented Sep 1, 2021

Slyne commented Dec 29, 2020 •

edited

Loading

Tabrizian commented Dec 29, 2020 •

edited

Loading

ldengjie commented Dec 29, 2020 •

edited

Loading

Slyne commented Dec 30, 2020 •

edited

Loading

francescoclarifai commented Dec 30, 2020 •

edited

Loading

Tabrizian commented Sep 1, 2021 •

edited

Loading