Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python backend not support TRITONSERVER_MEMORY_GPU #2369

Closed
Slyne opened this issue Dec 29, 2020 · 12 comments
Closed

python backend not support TRITONSERVER_MEMORY_GPU #2369

Slyne opened this issue Dec 29, 2020 · 12 comments
Assignees
Labels
enhancement New feature or request

Comments

@Slyne
Copy link

Slyne commented Dec 29, 2020

Description
Using ensemble model, I see this issue: https://github.com/triton-inference-server/python_backend/blob/9e89c1018ef0a9cbd29c3c45ec0baffa7ccf0bc8/src/python.cc#L481

Triton Information
What version of Triton are you using?
20.10

Are you using the Triton container or did you build it yourself?
container

To Reproduce
Ensemble model: one runs on GPU(onnx_runtime) and one built on python backend.
The issue line seems to have a constraint on the input to have a memory type as TRITONSERVER_MEMORY_CPU.

Expected behavior
Should work with input tensor on GPU. Is this a bug or I misunderstand something?

@Tabrizian
Copy link
Member

Tabrizian commented Dec 29, 2020

Thanks for the report. This issue is fixed in 20.11. Feel free to reopen this ticket if you still saw this error in 20.11.

@ldengjie
Copy link

ldengjie commented Dec 29, 2020

@Tabrizian This error still happens in 20.11

{"error":"in ensemble 'ensemble_test', failed to get input buffer in CPU memory"}

I found this code is also still in master branch:
https://github.com/triton-inference-server/python_backend/blob/9e89c1018ef0a9cbd29c3c45ec0baffa7ccf0bc8/src/python.cc#L478

@Slyne
Copy link
Author

Slyne commented Dec 30, 2020

@Tabrizian Still get the issue in 20.11. Besides I found there is a similar line in https://github.com/triton-inference-server/backend/blob/99b9ff27b9e1dd8d84bb2f5cdd7cd7ffe13d65ef/src/backend_common.cc#L175
Not sure how to get the data from GPU?

@Tabrizian
Copy link
Member

Sorry for the confusion. It is not possible to get the data from GPU memory directly in Python backend. You need to manually copy the data from CPU to GPU in the Python code.

@francescoclarifai
Copy link

francescoclarifai commented Dec 30, 2020

@Tabrizian could you provide an example of how that operation would look like in the python code? I'm unclear on where and how we would detach to CPU, given a tensor in the incoming request from an upstream GPU model. Thanks!

[edited to emphasize that this is the same issue as described by @Slyne below]

@Slyne
Copy link
Author

Slyne commented Dec 30, 2020

@Tabrizian I understand that we can put the data from CPU to GPU in a single backend when the request is sent by the client directly. But currently I have an ensemble model which contains one model A runs on GPU and one backend B to process the result of model A. So the backend can only receive the GPU data produced by model A.
Hope to get some guidance or examples on solving this situation.

@ldengjie
Copy link

Hi, I made a request to solove it 29

@Tabrizian Tabrizian added the enhancement New feature or request label Jan 2, 2021
@Tabrizian Tabrizian self-assigned this Jan 2, 2021
@Tabrizian Tabrizian reopened this Jan 3, 2021
@uefall
Copy link

uefall commented Jan 6, 2021

@Tabrizian Hi,
same issue in 20.11 with ensemble model,
so where and how to copy the GPU output buffer to CPU manualy?

@Tabrizian
Copy link
Member

This issue is fixed in triton-inference-server/python_backend#30. It will be available in 21.02 release.

@simon5u
Copy link

simon5u commented Sep 1, 2021

@Tabrizian We faced the same issue when using this docker, nvcr.io/nvidia/deepstream:5.1-21.02-triton. We found that the tritonserver version inside this docker is 2.5.0, the tritonserver version inside nvcr.io/nvidia/tritonserver:21.02-py3 is 2.7.0. Does it mean we cannot use python_backend with ensemble model in nvcr.io/nvidia/deepstream:5.1-21.02-triton?

@Tabrizian
Copy link
Member

Tabrizian commented Sep 1, 2021

You can use ensemble if the input and output of the tensors given to the Python model are in CPU. If either of the input/output tensors are in GPU you need to update the Triton to 21.02+ versions.

@simon5u
Copy link

simon5u commented Sep 1, 2021

@Tabrizian I noticed that the latest Jetson version is using nvcr.io/nvidia/deepstream-l4t:5.1-21.02-base, from this link https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_docker_containers.html. Does it mean we cannot use the ensemble model in Jetson with a mixture of CPU and GPU tensors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

6 participants