Problem when running sequence models #1123

zhouxuan009 · 2020-02-19T23:21:57Z

Description

I try to run sequence models but fails
I use the sample client provided in docker container

docker run -it --rm --net=host nvcr.io/nvidia/tensorrtserver:19.10-py3-clientsdk
/workspace/install/bin# ./simple_sequence_client
And it gives the error message like below

sequence 0 correlation ID 1 : sequence 1 correlation ID 2
error: unable to get INPUT: [ 0] INVALID_ARG - unknown input 'INPUT' for 'simple_sequence'

TRTIS and Model Information

I am using nvcr.io/nvidia/tensorrtserver:20.01-py3 container for server
Below is my model repository structure tree:

sequence_model_repository
---direct_stateful_resnet50_netdef
------1
----------libsequence.so
----------model.netdef
----------init_model.netdef
------config.pbtxt
------resnet50_labels.txt

name: "simple_sequence"
platform: "caffe2_netdef"
max_batch_size: 3


sequence_batching {
  max_sequence_idle_microseconds: 5000000
  direct { }
  control_input [
    {
      name: "START"
      control [
        {
          kind: CONTROL_SEQUENCE_START
          fp32_false_true: [ 0, 1 ]
        }
      ]
    },
    {
      name: "READY"
      control [
        {
          kind: CONTROL_SEQUENCE_READY
          fp32_false_true: [ 0, 1 ]
        }
      ]
    }
  ]
}
input [
  {
    name: "gpu_0/data"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
  }
]
output [
  {
    name: "gpu_0/softmax"
    data_type: TYPE_FP32
    dims: [ 1000 ]
    label_filename: "resnet50_labels.txt"
  }
]

Expected behavior
Could u please tell me what's going wrong ? thx very much !

The text was updated successfully, but these errors were encountered:

GuanLuo · 2020-02-19T23:53:17Z

"simple_sequence_client" assumes that the model has an input called "INPUT", and your model config shows that the model has "gpu_0/data" instead. Simply modifying simple_sequence_client.cc to use the correct input name should fix the issue.

GuanLuo · 2020-02-19T23:55:43Z

And I don't think your model repository is organized in the right way... Are you able to start TRTIS without any errors?

zhouxuan009 · 2020-02-19T23:56:25Z

Yes, I am able to run TRTIS using this respository

zhouxuan009 · 2020-02-19T23:57:20Z

After changing "gpu_0/data" to "INPUT". I can not run TRTIS and it gives me error message like below

I0219 23:54:58.385865 1 model_repository_manager.cc:675] loading: simple_sequence:1
I0219 23:54:58.490642 1 netdef_backend.cc:199] Creating instance simple_sequence_0_gpu0 on GPU 0 (7.5) using init_model.netdef and model.netdef
E0219 23:54:58.860617 1 model_repository_manager.cc:832] failed to load 'simple_sequence' version 1: Internal: load failed for 'simple_sequence': [enforce fail at operator.cc:75] blob != nullptr. op Conv: Encountered a non-existing input blob: gpu_0/data
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void const*) + 0x78 (0x7f9e4f43fbd8 in /opt/tensorrtserver/bin/../lib/pytorch/libc10.so)
frame #1: caffe2::OperatorBase::OperatorBase(caffe2::OperatorDef const&, caffe2::Workspace*) + 0x6e5 (0x7f9e53e43885 in /opt/tensorrtserver/bin/../lib/pytorch/libtorch.so)
frame #2: + 0x3086f1c (0x7f9e526dff1c in /opt/tensorrtserver/bin/../lib/pytorch/libtorch.so)
frame #3: + 0x62f430b (0x7f9e5594d30b in /opt/tensorrtserver/bin/../lib/pytorch/libtorch.so)
frame #4: + 0x62f528e (0x7f9e5594e28e in /opt/tensorrtserver/bin/../lib/pytorch/libtorch.so)
frame #5: std::_Function_handler<std::unique_ptr<caffe2::OperatorBase, std::default_deletecaffe2::OperatorBase >

CoderHam · 2020-02-20T00:06:06Z

For starters you might want to use the 20.01 clientsdk container.

GuanLuo · 2020-02-20T00:22:23Z

I meant to change the client, not config.pbtxt. config.pbtxt is supposed to reflect your model, so if the model does have an input called "gpu_0/data", then you should keep it this way in config.pbtxt.

zhouxuan009 · 2020-02-20T00:25:57Z

Cool ! thx a lot. I will try it

zhouxuan009 · 2020-02-21T05:21:00Z

I change the simple_sequence_client.cc, recompile and run, but error still happens
I change
FAIL_IF_ERR(ctx->GetInput("INPUT", &ivalue), "unable to get INPUT");
into
FAIL_IF_ERR(ctx->GetInput("gpu_0/data", &ivalue), "unable to get INPUT");

When I am running, I get the error message like below

sequence 0 correlation ID 1 : sequence 1 correlation ID 2
error: unable to set data for INPUT: [ 0] INVALID_ARG - invalid size 4 bytes for input 'gpu_0/data', expects 602112 bytes

GuanLuo · 2020-02-21T17:21:57Z

Please notice that "simple_sequence_client" is just an example to show sequence inference usage, and it is designed based on the "sequence" model, which is completely different from the stateful model you have. Thus you will have to modify the "simple_sequence_client" before you can use it on your model.

zhouxuan009 · 2020-02-21T20:59:58Z

Thank you for your clarification !

So now I just want to run sequence model. I have compiled the "sequence" model and get libsequence.so. I follow the document and create the model repository like below. Is that right? Do I need to add a config text file ?

sequence_model_repository
---simple_sequence
------1
----------libsequence.so

GuanLuo · 2020-02-21T21:48:58Z

Yes, you will need the config.pbtxt file for the model, which can be found here

zhouxuan009 · 2020-02-22T22:26:14Z

I add the config.pbtxt file you mentioned and run the server using docker container. It still gives error message

E0222 22:24:08.873353 1 sequence_batch_scheduler.cc:896] Initialization failed for Direct sequence-batch scheduler thread 0: initialize error for 'simple_sequence': (9) model must have two inputs and one output with shape [1]

GuanLuo · 2020-02-23T03:13:53Z

Can you change the dims for both input and output to [1]?

zhouxuan009 · 2020-02-23T03:38:59Z

Thank you. It works.
But when I change CPU to GPU, it gives me error message

I0223 03:21:06.027033 1 custom_backend.cc:194] Creating instance simple_sequence_0_0_gpu0 on GPU 0 (7.5) using libsequence.so
I0223 03:21:06.029121 1 custom_backend.cc:194] Creating instance simple_sequence_0_0_gpu1 on GPU 1 (7.5) using libsequence.so
E0223 03:21:06.141209 1 sequence_batch_scheduler.cc:896] Initialization failed for Direct sequence-batch scheduler thread 0: initialize error for 'simple_sequence': (6) execution on GPU not supported
E0223 03:21:06.141728 1 sequence_batch_scheduler.cc:896] Initialization failed for Direct sequence-batch scheduler thread 1: initialize error for 'simple_sequence': (6) execution on GPU not supported
E0223 03:21:06.142135 1 model_repository_manager.cc:832] failed to load 'simple_sequence' version 1: Internal: Initialization failed for all sequence-batch scheduler threads

So this model cannot run on GPU ?

GuanLuo · 2020-02-23T04:33:36Z

That's correct.

zhouxuan009 · 2020-02-23T04:47:51Z

Got it. Thank you very much

zhouxuan009 · 2020-02-23T22:14:06Z

Are there any other sequence models that support GPU ?

GuanLuo · 2020-02-24T17:43:19Z

Some of our qa models are sequence models (generation script). And you can certainly find more stateful model elsewhere, you just need to create appropriate model config for those models to be recognized by TRTIS. Again, all the models and client examples you can find on TRTIS repository are for demonstrating TRTIS features, if you have an existing use case, you can use the them to guide you on integrating it with TRTIS.

Closing the issue as the original problem is resolved.

rkoystart · 2020-09-25T12:57:28Z

@GuanLuo @zhouxuan009 while reading this issue i had a doubt , can you please let me know how the libsequence.so file is created.
for example i have a pytorch model , can any one of you guys guide me how to create libpytorchmodel.so file for the model

GuanLuo closed this as completed Feb 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem when running sequence models #1123

Problem when running sequence models #1123

zhouxuan009 commented Feb 19, 2020

GuanLuo commented Feb 19, 2020

GuanLuo commented Feb 19, 2020

zhouxuan009 commented Feb 19, 2020

zhouxuan009 commented Feb 19, 2020

CoderHam commented Feb 20, 2020

GuanLuo commented Feb 20, 2020

zhouxuan009 commented Feb 20, 2020

zhouxuan009 commented Feb 21, 2020

GuanLuo commented Feb 21, 2020

zhouxuan009 commented Feb 21, 2020

GuanLuo commented Feb 21, 2020

zhouxuan009 commented Feb 22, 2020

GuanLuo commented Feb 23, 2020

zhouxuan009 commented Feb 23, 2020

GuanLuo commented Feb 23, 2020

zhouxuan009 commented Feb 23, 2020

zhouxuan009 commented Feb 23, 2020

GuanLuo commented Feb 24, 2020

rkoystart commented Sep 25, 2020

Problem when running sequence models #1123

Problem when running sequence models #1123

Comments

zhouxuan009 commented Feb 19, 2020

GuanLuo commented Feb 19, 2020

GuanLuo commented Feb 19, 2020

zhouxuan009 commented Feb 19, 2020

zhouxuan009 commented Feb 19, 2020

CoderHam commented Feb 20, 2020

GuanLuo commented Feb 20, 2020

zhouxuan009 commented Feb 20, 2020

zhouxuan009 commented Feb 21, 2020

GuanLuo commented Feb 21, 2020

zhouxuan009 commented Feb 21, 2020

GuanLuo commented Feb 21, 2020

zhouxuan009 commented Feb 22, 2020

GuanLuo commented Feb 23, 2020

zhouxuan009 commented Feb 23, 2020

GuanLuo commented Feb 23, 2020

zhouxuan009 commented Feb 23, 2020

zhouxuan009 commented Feb 23, 2020

GuanLuo commented Feb 24, 2020

rkoystart commented Sep 25, 2020