Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem when running sequence models #1123

Closed
zhouxuan009 opened this issue Feb 19, 2020 · 19 comments
Closed

Problem when running sequence models #1123

zhouxuan009 opened this issue Feb 19, 2020 · 19 comments

Comments

@zhouxuan009
Copy link

Description

I try to run sequence models but fails
I use the sample client provided in docker container

docker run -it --rm --net=host nvcr.io/nvidia/tensorrtserver:19.10-py3-clientsdk
/workspace/install/bin# ./simple_sequence_client
And it gives the error message like below

sequence 0 correlation ID 1 : sequence 1 correlation ID 2
error: unable to get INPUT: [ 0] INVALID_ARG - unknown input 'INPUT' for 'simple_sequence'

TRTIS and Model Information

I am using nvcr.io/nvidia/tensorrtserver:20.01-py3 container for server
Below is my model repository structure tree:

sequence_model_repository
---direct_stateful_resnet50_netdef
------1
----------libsequence.so
----------model.netdef
----------init_model.netdef
------config.pbtxt
------resnet50_labels.txt

name: "simple_sequence"
platform: "caffe2_netdef"
max_batch_size: 3


sequence_batching {
  max_sequence_idle_microseconds: 5000000
  direct { }
  control_input [
    {
      name: "START"
      control [
        {
          kind: CONTROL_SEQUENCE_START
          fp32_false_true: [ 0, 1 ]
        }
      ]
    },
    {
      name: "READY"
      control [
        {
          kind: CONTROL_SEQUENCE_READY
          fp32_false_true: [ 0, 1 ]
        }
      ]
    }
  ]
}
input [
  {
    name: "gpu_0/data"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
  }
]
output [
  {
    name: "gpu_0/softmax"
    data_type: TYPE_FP32
    dims: [ 1000 ]
    label_filename: "resnet50_labels.txt"
  }
]

Expected behavior
Could u please tell me what's going wrong ? thx very much !

@GuanLuo
Copy link
Contributor

GuanLuo commented Feb 19, 2020

"simple_sequence_client" assumes that the model has an input called "INPUT", and your model config shows that the model has "gpu_0/data" instead. Simply modifying simple_sequence_client.cc to use the correct input name should fix the issue.

@GuanLuo
Copy link
Contributor

GuanLuo commented Feb 19, 2020

And I don't think your model repository is organized in the right way... Are you able to start TRTIS without any errors?

@zhouxuan009
Copy link
Author

Yes, I am able to run TRTIS using this respository

@zhouxuan009
Copy link
Author

After changing "gpu_0/data" to "INPUT". I can not run TRTIS and it gives me error message like below

I0219 23:54:58.385865 1 model_repository_manager.cc:675] loading: simple_sequence:1
I0219 23:54:58.490642 1 netdef_backend.cc:199] Creating instance simple_sequence_0_gpu0 on GPU 0 (7.5) using init_model.netdef and model.netdef
E0219 23:54:58.860617 1 model_repository_manager.cc:832] failed to load 'simple_sequence' version 1: Internal: load failed for 'simple_sequence': [enforce fail at operator.cc:75] blob != nullptr. op Conv: Encountered a non-existing input blob: gpu_0/data
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void const*) + 0x78 (0x7f9e4f43fbd8 in /opt/tensorrtserver/bin/../lib/pytorch/libc10.so)
frame #1: caffe2::OperatorBase::OperatorBase(caffe2::OperatorDef const&, caffe2::Workspace*) + 0x6e5 (0x7f9e53e43885 in /opt/tensorrtserver/bin/../lib/pytorch/libtorch.so)
frame #2: + 0x3086f1c (0x7f9e526dff1c in /opt/tensorrtserver/bin/../lib/pytorch/libtorch.so)
frame #3: + 0x62f430b (0x7f9e5594d30b in /opt/tensorrtserver/bin/../lib/pytorch/libtorch.so)
frame #4: + 0x62f528e (0x7f9e5594e28e in /opt/tensorrtserver/bin/../lib/pytorch/libtorch.so)
frame #5: std::_Function_handler<std::unique_ptr<caffe2::OperatorBase, std::default_deletecaffe2::OperatorBase >

@CoderHam
Copy link
Contributor

For starters you might want to use the 20.01 clientsdk container.

@GuanLuo
Copy link
Contributor

GuanLuo commented Feb 20, 2020

I meant to change the client, not config.pbtxt. config.pbtxt is supposed to reflect your model, so if the model does have an input called "gpu_0/data", then you should keep it this way in config.pbtxt.

@zhouxuan009
Copy link
Author

Cool ! thx a lot. I will try it

@zhouxuan009
Copy link
Author

I change the simple_sequence_client.cc, recompile and run, but error still happens
I change
FAIL_IF_ERR(ctx->GetInput("INPUT", &ivalue), "unable to get INPUT");
into
FAIL_IF_ERR(ctx->GetInput("gpu_0/data", &ivalue), "unable to get INPUT");

When I am running, I get the error message like below

sequence 0 correlation ID 1 : sequence 1 correlation ID 2
error: unable to set data for INPUT: [ 0] INVALID_ARG - invalid size 4 bytes for input 'gpu_0/data', expects 602112 bytes

@GuanLuo
Copy link
Contributor

GuanLuo commented Feb 21, 2020

Please notice that "simple_sequence_client" is just an example to show sequence inference usage, and it is designed based on the "sequence" model, which is completely different from the stateful model you have. Thus you will have to modify the "simple_sequence_client" before you can use it on your model.

@zhouxuan009
Copy link
Author

Thank you for your clarification !

So now I just want to run sequence model. I have compiled the "sequence" model and get libsequence.so. I follow the document and create the model repository like below. Is that right? Do I need to add a config text file ?

sequence_model_repository
---simple_sequence
------1
----------libsequence.so

@GuanLuo
Copy link
Contributor

GuanLuo commented Feb 21, 2020

Yes, you will need the config.pbtxt file for the model, which can be found here

@zhouxuan009
Copy link
Author

I add the config.pbtxt file you mentioned and run the server using docker container. It still gives error message

E0222 22:24:08.873353 1 sequence_batch_scheduler.cc:896] Initialization failed for Direct sequence-batch scheduler thread 0: initialize error for 'simple_sequence': (9) model must have two inputs and one output with shape [1]

@GuanLuo
Copy link
Contributor

GuanLuo commented Feb 23, 2020

Can you change the dims for both input and output to [1]?

@zhouxuan009
Copy link
Author

Thank you. It works.
But when I change CPU to GPU, it gives me error message

I0223 03:21:06.027033 1 custom_backend.cc:194] Creating instance simple_sequence_0_0_gpu0 on GPU 0 (7.5) using libsequence.so
I0223 03:21:06.029121 1 custom_backend.cc:194] Creating instance simple_sequence_0_0_gpu1 on GPU 1 (7.5) using libsequence.so
E0223 03:21:06.141209 1 sequence_batch_scheduler.cc:896] Initialization failed for Direct sequence-batch scheduler thread 0: initialize error for 'simple_sequence': (6) execution on GPU not supported
E0223 03:21:06.141728 1 sequence_batch_scheduler.cc:896] Initialization failed for Direct sequence-batch scheduler thread 1: initialize error for 'simple_sequence': (6) execution on GPU not supported
E0223 03:21:06.142135 1 model_repository_manager.cc:832] failed to load 'simple_sequence' version 1: Internal: Initialization failed for all sequence-batch scheduler threads

So this model cannot run on GPU ?

@GuanLuo
Copy link
Contributor

GuanLuo commented Feb 23, 2020

That's correct.

@zhouxuan009
Copy link
Author

Got it. Thank you very much

@zhouxuan009
Copy link
Author

Are there any other sequence models that support GPU ?

@GuanLuo
Copy link
Contributor

GuanLuo commented Feb 24, 2020

Some of our qa models are sequence models (generation script). And you can certainly find more stateful model elsewhere, you just need to create appropriate model config for those models to be recognized by TRTIS. Again, all the models and client examples you can find on TRTIS repository are for demonstrating TRTIS features, if you have an existing use case, you can use the them to guide you on integrating it with TRTIS.

Closing the issue as the original problem is resolved.

@GuanLuo GuanLuo closed this as completed Feb 24, 2020
@rkoystart
Copy link

@GuanLuo @zhouxuan009 while reading this issue i had a doubt , can you please let me know how the libsequence.so file is created.
for example i have a pytorch model , can any one of you guys guide me how to create libpytorchmodel.so file for the model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants