-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python backend not support TRITONSERVER_MEMORY_GPU #2369
Comments
Thanks for the report. This issue is fixed in 20.11. Feel free to reopen this ticket if you still saw this error in 20.11. |
@Tabrizian This error still happens in 20.11 {"error":"in ensemble 'ensemble_test', failed to get input buffer in CPU memory"} I found this code is also still in master branch: |
@Tabrizian Still get the issue in 20.11. Besides I found there is a similar line in https://github.com/triton-inference-server/backend/blob/99b9ff27b9e1dd8d84bb2f5cdd7cd7ffe13d65ef/src/backend_common.cc#L175 |
Sorry for the confusion. It is not possible to get the data from GPU memory directly in Python backend. You need to manually copy the data from CPU to GPU in the Python code. |
@Tabrizian could you provide an example of how that operation would look like in the python code? I'm unclear on where and how we would detach to CPU, given a tensor in the incoming request from an upstream GPU model. Thanks! [edited to emphasize that this is the same issue as described by @Slyne below] |
@Tabrizian I understand that we can put the data from CPU to GPU in a single backend when the request is sent by the client directly. But currently I have an ensemble model which contains one model A runs on GPU and one backend B to process the result of model A. So the backend can only receive the GPU data produced by model A. |
Hi, I made a request to solove it 29 |
@Tabrizian Hi, |
This issue is fixed in triton-inference-server/python_backend#30. It will be available in 21.02 release. |
@Tabrizian We faced the same issue when using this docker, nvcr.io/nvidia/deepstream:5.1-21.02-triton. We found that the tritonserver version inside this docker is 2.5.0, the tritonserver version inside nvcr.io/nvidia/tritonserver:21.02-py3 is 2.7.0. Does it mean we cannot use python_backend with ensemble model in nvcr.io/nvidia/deepstream:5.1-21.02-triton? |
You can use ensemble if the input and output of the tensors given to the Python model are in CPU. If either of the input/output tensors are in GPU you need to update the Triton to 21.02+ versions. |
@Tabrizian I noticed that the latest Jetson version is using nvcr.io/nvidia/deepstream-l4t:5.1-21.02-base, from this link https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_docker_containers.html. Does it mean we cannot use the ensemble model in Jetson with a mixture of CPU and GPU tensors? |
Description
Using ensemble model, I see this issue: https://github.com/triton-inference-server/python_backend/blob/9e89c1018ef0a9cbd29c3c45ec0baffa7ccf0bc8/src/python.cc#L481
Triton Information
What version of Triton are you using?
20.10
Are you using the Triton container or did you build it yourself?
container
To Reproduce
Ensemble model: one runs on GPU(onnx_runtime) and one built on python backend.
The issue line seems to have a constraint on the input to have a memory type as TRITONSERVER_MEMORY_CPU.
Expected behavior
Should work with input tensor on GPU. Is this a bug or I misunderstand something?
The text was updated successfully, but these errors were encountered: