TF-TRT Add n_build_pass attribute #52033

tfeher · 2021-09-16T14:45:27Z

To convert a model with TF-TRT dynamic shapes, one can provide profile information by inferring the segmented model a number of times. Currently the build mode uses _profile_generation_mode attribute of the graph to declare that we are in build mode. This requires two rewrites of the graph. Here is the current workflow

Rewrite the graph to set _profile_generation_mode=True
infer the model n_inputs times
rewrite the graph to set _profile_generation_mode=False

This PR introduces a new attribute for TRTEngineOp: _n_build_pass, this is can be set by the rewriter config (example). If we set this parameter during conversion time then we can avoid rewriting the graph to enable/disable build mode.

This PR is not essential to C++ conversion of TF-TRT, it just decreases the number of graph rewrite steps. Alternatively to this PR we could:

Mirror a the python side workflow on the C++ side: two rewrites of _profile_generation_mode. No changes needed in the TRT optimization pass, but extra work in the C++ converter.
Set _profile_generation_mode to true already during graph conversion. We would need to read this value from the rewriter config (same way as this PR reads the _n_build_pass) attribute.

Tagging @bixia1 for discussing these points and for review.

Tracker: #52012

bixia1 · 2021-09-16T17:02:03Z

In the current PR, _n_build_pass is a value pass from a user facing config to the TRTEngineOp to help the implementation of a TF-TRT API that takes n user inputs, performs shape profiling and builds cuda engines. Changing the user facing config for this purpose is not necessary, as the implementation of the TF-TRT API can do these instead:

Derives the value of _b_build_pass from the number of user provided inputs.
Sets attribute_profile_generation_mode=True and runs the graph for n_build_pass times, to collect shape profile information.
Set attribute _profile_generation_mode=False and runs the graph to build the engine.

tfeher · 2021-09-17T12:51:29Z

Thanks @bixia1 for the comment. Indeed, if the approach that you describe would work, then we do not need this PR and we can close. There is one complication though:

We create session1 with graph_def1, which has _profile_generation_mode=True.
We infer the graph in session1 to collect the optimization profiles
In the last step we need to rewrite the graph_def to set _profile_generation_mode=False. Let us call the it graph_def2
We would create a new session2 with graph_def2, similar to this step in the converter example. We want to infer the new graph to initiate building the TRT engines.
The newly created session seems to have new Device objects attached. We will also have a new ResourceMgr.
This way we do not have access to the TRTEngineCacheResource that stored the profile information collected in session1.

What is the right way to overcome this problem?

Could we create thesession2 in a way that keeps the TRTEngineCacheResource from the old session1?
Is there a way to extend session1 with graph_def2 and run it?
A variante of the above, can we access the Graph object from session1 and add graph_def2 as a function?
Shall we just manually copy over the TRTEngineCacheResource objects from session1 to session2?

bixia1 · 2021-09-17T17:02:57Z

We can modify the _profile_generation_mode attribute in the TRTEngineOp inside the SavedModelBundle graphdef, without creating a new session. I think we are already doing something similar to this in the Python converterV2.build method. Do you agree?

Another problem with having _n_build_pass attribute in the TRTEngineOp and relying on its the value to become 0 in order to trigger the build of the cuda engine is that, the number of inputs provided by users may not be the same as the number of time we execute the TRTEngineOp, in the presence of loops and if-stmt.

tfeher · 2021-09-20T23:09:13Z

I agree with you, that it is desirable to use the existing _profile generation_mode attribute.

We can modify the _profile_generation_mode attribute in the TRTEngineOp inside the SavedModelBundle graphdef, without creating a new session.

How to do that, could you point me to an example?

It is not clear to me which API to use. Below is one example where I rewrite the GraphDef, but that does not change the actual graph used by the session. I need to create a new session for the changes to take effect.

  // Create end execute a graph with a single const op
  Scope root = Scope::NewRootScope();
  auto c = Const(root.WithOpName("my_const"), {{42.0f, 137.0f}});
  ClientSession session(root);
  std::vector<Tensor> outputs;

  Status status = session.Run({c}, &outputs);
  if (status.ok()) std::cout << outputs[0].DebugString() << std::endl;

  // Get the graph def and rewrite the constant value
  GraphDef gdef;
  status = root.ToGraphDef(&gdef);
  tensorflow::Tensor new_val(tensorflow::DT_FLOAT,
                             tensorflow::TensorShape({1, 2}));
  float *tensor_flat = new_val.flat<float>().data();
  tensor_flat[0] = 31;
  tensor_flat[1] = 41;

  NodeDef *node = gdef.mutable_node(0);
  TensorProto *tensor_attr = (*node->mutable_attr())["value"].mutable_tensor();
  new_val.AsProtoTensorContent(tensor_attr);
  
  // Changing the graph def has no effect on the results
  status = session.Run({c}, &outputs);
  if (status.ok()) std::cout << outputs[0].DebugString() << std::endl;

  // Alternative: create a new session with the modified graph def:
  std::unique_ptr<tensorflow::Session> session2(
      tensorflow::NewSession(tensorflow::SessionOptions()));
  status = session2->Create(gdef);
  status = session2->Run({}, {"my_const"}, {}, &outputs);
  std::cout << outputs[0].DebugString() << std::endl;

I think we are already doing something similar to this in the Python converterV2.build method. Do you agree?

I agree. We use function_from_graph_def which relies on import_graph_def_internal.
The latter uses TF C API calls, like TF_GraphImportGraphDefWithResults. This function can apply a prefix to the names in the graph_def, and that way we could import that to existing session.

It is not clear to me how to get the Graph object from the current session, and which API to use to manage the session. Could you give suggestions for the above example, on how to import the modified GraphDef back into the original session?

tfeher · 2021-09-25T20:02:47Z

Closing this PR as it is not necessary. One can use ImportGraphDef to load the modified gdef with a prefix into the existing session.

Some notes:

We cannot modify the _profile_generation_mode attribute of a node that is already constructed. We need to run the constructor to read the attribute.
When we execute the prefixed graph, it will create a new TRTEngineOp object, but that object willl use the same TrtEngineCacheResource object, because the prefix is ignored while looking op the cache resource.
An alternative solution could be to store the _profile_generation_mode as a state variable in the TrtEngineCacheResoure. One could have a special op to set its value. Alternatively, one could also access the cache resource directly through the ResourceMgr (but have to make sure that the TrtEngineCacheResource is already created in advance).

TF-TRT Add n_build_pass attribute

06b043f

google-ml-butler bot added the size:S CL Change Size: Small label Sep 16, 2021

google-ml-butler bot requested review from joker-eph, sanjoy and sherhut September 16, 2021 14:45

google-ml-butler bot added the awaiting review Pull request awaiting review label Sep 16, 2021

google-cla bot added the cla: yes label Sep 16, 2021

tfeher mentioned this pull request Sep 16, 2021

TF-TRT C++ conversion #52012

Merged

bixia1 self-requested a review September 16, 2021 16:40

bixia1 added the comp:gpu:tensorrt Issues specific to TensorRT label Sep 16, 2021

gbaned self-assigned this Sep 20, 2021

gbaned added this to Assigned Reviewer in PR Queue via automation Sep 20, 2021

sanjoy removed their request for review September 20, 2021 20:29

tfeher closed this Sep 25, 2021

google-ml-butler bot removed the awaiting review Pull request awaiting review label Sep 25, 2021

PR Queue automation moved this from Assigned Reviewer to Closed/Rejected Sep 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF-TRT Add n_build_pass attribute #52033

TF-TRT Add n_build_pass attribute #52033

tfeher commented Sep 16, 2021

bixia1 commented Sep 16, 2021 •

edited

tfeher commented Sep 17, 2021 •

edited

bixia1 commented Sep 17, 2021

tfeher commented Sep 20, 2021

tfeher commented Sep 25, 2021

TF-TRT Add n_build_pass attribute #52033

TF-TRT Add n_build_pass attribute #52033

Conversation

tfeher commented Sep 16, 2021

bixia1 commented Sep 16, 2021 • edited

tfeher commented Sep 17, 2021 • edited

bixia1 commented Sep 17, 2021

tfeher commented Sep 20, 2021

tfeher commented Sep 25, 2021

bixia1 commented Sep 16, 2021 •

edited

tfeher commented Sep 17, 2021 •

edited