How to deploy maskRCNN pytorch model? #418

achbogga · 2019-06-27T06:44:00Z

@deadeyegoodwin , how should one think of deploying pytorch models some of which might not be supported for fully automatic tensorRT conversion yet? For example, can you point me towards any example with FPN-101 backbone maskRCNN pytorch model?

bezero · 2019-06-27T07:19:15Z

@achbogga you can use onnx conversion of your pytorch model and deploy. However, it is not straightforward. You can check this model, which I believe was created using maskrcnn-benchmark repository. From what I saw, TRTIS is working on integrating pytorch models as well, but you have to write your model using TorchScript in order to be able to freeze your model using torch.jit.trace. If there is other possible solutions, I would like to know as well.

ligonzheng · 2019-06-27T07:25:59Z

Using python flask is a easy solution.

achbogga · 2019-06-27T16:55:49Z

@ligonzheng can you please elaborate?

ligonzheng · 2019-07-01T05:49:22Z

https://medium.com/datadriveninvestor/deploy-your-pytorch-model-to-production-f69460192217
@achbogga , Something like this site.

deadeyegoodwin · 2019-07-01T22:24:04Z

@achbogga As @bezero mention your best bet is to use the native pytorch support if possible (in master and will be in 19.07). In 19.06 we also added ONNX support so it may be possible to export pytorch -> ONNX and use the resulting model.

achbogga · 2019-08-07T23:01:46Z

Regarding the r19.07-py3 version pytorch support:

There is no official PyTorch example supplied from NVIDIA to write config.pbtxt for the pytorch_libtorch models. Any official example would help. Also, when I tried with --strict-model-config=false, I got the following error while serving our maskrcnn model forked from facebook-maskrcnn-benchmark github repo. Any help is appreciated. If anyone got the model to work with onnx route, I would appreciate an example code snippet to export fb-maskrcnn model into onnx along with the corresponding config.pbtxt.

nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -it --rm -v/home/caffe/.bashrc:/.bashrc -v/home/caffe/.bash_history:/.bash_history -v/home/caffe/achu/model_repo:/models -p8000:8000 -p8001:8001 -p8002:8002 nvcr.io/nvidia/tensorrtserver:19.07-py3 trtserver --model-store=/models  --strict-model-config=false

===============================
== TensorRT Inference Server ==
===============================

NVIDIA Release 19.07 (build 7353475)

Copyright (c) 2018-2019, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

I0807 22:52:48.954956 1 main.cc:346] Starting endpoints, 'inference:0' listening on
I0807 22:52:48.955207 1 grpc_server.cc:272] Starting a GRPCService at 0.0.0.0:8001
I0807 22:52:48.955246 1 grpc_server.cc:278] Register TensorRT GRPCService
I0807 22:52:48.955284 1 grpc_server.cc:281] Register Infer RPC
I0807 22:52:48.955298 1 grpc_server.cc:285] Register StreamInfer RPC
I0807 22:52:48.955311 1 grpc_server.cc:290] Register Status RPC
I0807 22:52:48.955318 1 grpc_server.cc:294] Register Profile RPC
I0807 22:52:48.955326 1 grpc_server.cc:298] Register Health RPC
I0807 22:52:48.955370 1 grpc_server.cc:310] Register Executor
I0807 22:52:48.962232 1 http_server.cc:632] Starting HTTPService at 0.0.0.0:8000
I0807 22:52:49.004117 1 http_server.cc:646] Starting Metrics Service at 0.0.0.0:8002
I0807 22:52:49.006389 1 metrics.cc:160] found 1 GPUs supporting NVML metrics
I0807 22:52:49.011886 1 metrics.cc:169]   GPU 0: Tesla V100-SXM2-16GB
I0807 22:52:49.017681 1 server.cc:111] Initializing TensorRT Inference Server
I0807 22:52:49.052634 1 server_status.cc:83] New status tracking for model 'fridge_libtorch'
I0807 22:52:49.052687 1 model_repository_manager.cc:633] loading: fridge_libtorch:1
I0807 22:52:49.357929 1 libtorch_backend.cc:220] Creating instance fridge_libtorch_0_gpu0 on GPU 0 (7.0) using model.pt
E0807 22:52:49.649415 1 model_repository_manager.cc:779] failed to load 'fridge_libtorch' version 1: Internal: load failed for libtorch model -> 'fridge_libtorch': [enforce fail at inline_container.cc:137] . PytorchStreamReader failed reading zip archive: failed finding central directory
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void const*) + 0x78 (0x7f313c709ce8 in /opt/tensorrtserver/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::valid(char const*) + 0x8d (0x7f3140fe8e2d in /opt/tensorrtserver/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::init() + 0xa6 (0x7f3140fed136 in /opt/tensorrtserver/lib/libtorch.so)
frame #3: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::unique_ptr<caffe2::serialize::ReadAdapterInterface, std::default_delete<caffe2::serialize::ReadAdapterInterface> >) + 0x53 (0x7f3140ff0fe3 in /opt/tensorrtserver/lib/libtorch.so)
frame #4: <unknown function> + 0x5566a1e (0x7f3141e88a1e in /opt/tensorrtserver/lib/libtorch.so)
frame #5: torch::jit::load(std::unique_ptr<caffe2::serialize::ReadAdapterInterface, std::default_delete<caffe2::serialize::ReadAdapterInterface> >, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0xd7 (0x7f3141e92087 in /opt/tensorrtserver/lib/libtorch.so)
frame #6: torch::jit::load(std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x79 (0x7f3141e92229 in /opt/tensorrtserver/lib/libtorch.so)
frame #7: <unknown function> + 0x1f63c3 (0x7f31d00b23c3 in /opt/tensorrtserver/lib/libtrtserver.so)
frame #8: <unknown function> + 0x1f70d3 (0x7f31d00b30d3 in /opt/tensorrtserver/lib/libtrtserver.so)
frame #9: <unknown function> + 0x1f0224 (0x7f31d00ac224 in /opt/tensorrtserver/lib/libtrtserver.so)
frame #10: <unknown function> + 0xbe924 (0x7f31cff7a924 in /opt/tensorrtserver/lib/libtrtserver.so)
frame #11: <unknown function> + 0xbf315 (0x7f31cff7b315 in /opt/tensorrtserver/lib/libtrtserver.so)
frame #12: <unknown function> + 0xbd66f (0x7f31cfbf066f in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #13: <unknown function> + 0x76db (0x7f313a0cc6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #14: clone + 0x3f (0x7f31cf64b88f in /lib/x86_64-linux-gnu/libc.so.6)

CoderHam · 2019-08-07T23:19:40Z

The pytorch backend runs the C++ backend (LibTorch) which requires a torch script model. You can produce the same by tracing your existing Pytorch model as shown here: https://pytorch.org/tutorials/advanced/cpp_export.html

The autofill for pytorch is very limited due to lack of information being stored in the model file.

You can see here in the docs the naming convention and other related information on how to create the config.pbtxt file for PyTorch models: https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#model-configuration

achbogga · 2019-08-08T22:00:33Z

@CoderHam An official example to create a config.pbtxt for PyTorch model should be provided by NVIDIA for the sake of all the noobs out there who could not understand the documentation properly. Can you guys provide at least one example? Please reopen this issue.

achbogga · 2019-08-09T18:07:19Z

BTW, I have tried both 1.torch.jit.trace and 2.torch.jit.script routes. Nothing seems to work for the maskrcnn models. I encountered the following errors respectively 1.RuntimeError: Expected object of scalar type Float but got scalar type Long for argument #2 'other' 2.UnsupportedNodeError: GeneratorExp aren't supported.

CoderHam · 2019-08-09T19:09:46Z

You need to trace the pytorch model with a float tensor. Can you confirm you are doing that?
Here is a sample config file: config.pbtxt:

platform: "pytorch_libtorch"
max_batch_size: 128
input {
    name: "INPUT__0"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
  }
output {
    name: "OUTPUT__0"
    data_type: TYPE_FP32
    dims: [ 1000 ]
    label_filename: "resnet50_labels.txt"
  }

You can follow this template to create a similar config file to your model.

achbogga · 2019-08-09T20:08:47Z

Yes, I have tried tracing using float Tensor. The same result, it did not work.

c464851257 · 2020-12-01T08:01:44Z

You need to trace the pytorch model with a float tensor. Can you confirm you are doing that?
Here is a sample config file: config.pbtxt:
platform: "pytorch_libtorch"
max_batch_size: 128
input {
    name: "INPUT__0"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
  }
output {
    name: "OUTPUT__0"
    data_type: TYPE_FP32
    dims: [ 1000 ]
    label_filename: "resnet50_labels.txt"
  }
You can follow this template to create a similar config file to your model.

Hello,I have tried tracing using float Tensor, how to get the input name and output name?

okyspace · 2021-05-05T10:50:13Z

Did anyone manage to solve this? I got the server to run with Pytorch model but got http 400 when doing inference.
The config.pbtxt follows the above sample. I have also test my input in colab notebook with model inference, it works. but the same input with model deployed in triton gave http 400.

Any way to print out more logs to identify the error? I have tried to add --log-verbose=5 but not enough logs to help derive the error.

Appreciate the help

CoderHam · 2021-05-05T11:31:45Z

@okyspace Did you try tracing the model as advised here.
Also, try --log-verbose=1. You should see a failure logged in the server when the inference fails to execute.

@c464851257 pytorch models do not have a concept of input names. The INPUT names should follow the Triton convention as described here.

okyspace · 2021-05-05T17:00:12Z

@okyspace Did you try tracing the model as advised here.
Also, try --log-verbose=1. You should see a failure logged in the server when the inference fails to execute.

@c464851257 pytorch models do not have a concept of input names. The INPUT names should follow the Triton convention as described here.

Thanks. I reviewed again and found that the error was due to my output shape not correct.
By the way, I was thinking to contribute my pytorch config as example so it can benefit more people since I found that the examples provided are mostly tensorflow.

And also if you could point me to examples that deploy pre-trained model like YOLO will be great. Thanks!

deadeyegoodwin closed this as completed Jul 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deploy maskRCNN pytorch model? #418

How to deploy maskRCNN pytorch model? #418

achbogga commented Jun 27, 2019

bezero commented Jun 27, 2019

ligonzheng commented Jun 27, 2019

achbogga commented Jun 27, 2019

ligonzheng commented Jul 1, 2019

deadeyegoodwin commented Jul 1, 2019

achbogga commented Aug 7, 2019 •

edited

CoderHam commented Aug 7, 2019

achbogga commented Aug 8, 2019

achbogga commented Aug 9, 2019

CoderHam commented Aug 9, 2019

achbogga commented Aug 9, 2019

c464851257 commented Dec 1, 2020

okyspace commented May 5, 2021

CoderHam commented May 5, 2021

okyspace commented May 5, 2021

How to deploy maskRCNN pytorch model? #418

How to deploy maskRCNN pytorch model? #418

Comments

achbogga commented Jun 27, 2019

bezero commented Jun 27, 2019

ligonzheng commented Jun 27, 2019

achbogga commented Jun 27, 2019

ligonzheng commented Jul 1, 2019

deadeyegoodwin commented Jul 1, 2019

achbogga commented Aug 7, 2019 • edited

Regarding the r19.07-py3 version pytorch support:

CoderHam commented Aug 7, 2019

achbogga commented Aug 8, 2019

achbogga commented Aug 9, 2019

CoderHam commented Aug 9, 2019

achbogga commented Aug 9, 2019

c464851257 commented Dec 1, 2020

okyspace commented May 5, 2021

CoderHam commented May 5, 2021

okyspace commented May 5, 2021

achbogga commented Aug 7, 2019 •

edited