Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF-TRT Add n_build_pass attribute #52033

Closed
wants to merge 1 commit into from

Conversation

tfeher
Copy link
Contributor

@tfeher tfeher commented Sep 16, 2021

To convert a model with TF-TRT dynamic shapes, one can provide profile information by inferring the segmented model a number of times. Currently the build mode uses _profile_generation_mode attribute of the graph to declare that we are in build mode. This requires two rewrites of the graph. Here is the current workflow

  1. Rewrite the graph to set _profile_generation_mode=True
  2. infer the model n_inputs times
  3. rewrite the graph to set _profile_generation_mode=False

This PR introduces a new attribute for TRTEngineOp: _n_build_pass, this is can be set by the rewriter config (example). If we set this parameter during conversion time then we can avoid rewriting the graph to enable/disable build mode.

This PR is not essential to C++ conversion of TF-TRT, it just decreases the number of graph rewrite steps. Alternatively to this PR we could:

  • Mirror a the python side workflow on the C++ side: two rewrites of _profile_generation_mode. No changes needed in the TRT optimization pass, but extra work in the C++ converter.
  • Set _profile_generation_mode to true already during graph conversion. We would need to read this value from the rewriter config (same way as this PR reads the _n_build_pass) attribute.

Tagging @bixia1 for discussing these points and for review.

Tracker: #52012

@google-ml-butler google-ml-butler bot added the size:S CL Change Size: Small label Sep 16, 2021
@google-ml-butler google-ml-butler bot added the awaiting review Pull request awaiting review label Sep 16, 2021
@google-cla google-cla bot added the cla: yes label Sep 16, 2021
@tfeher tfeher mentioned this pull request Sep 16, 2021
@bixia1 bixia1 self-requested a review September 16, 2021 16:40
@bixia1 bixia1 added the comp:gpu:tensorrt Issues specific to TensorRT label Sep 16, 2021
@bixia1
Copy link
Contributor

bixia1 commented Sep 16, 2021

In the current PR, _n_build_pass is a value pass from a user facing config to the TRTEngineOp to help the implementation of a TF-TRT API that takes n user inputs, performs shape profiling and builds cuda engines. Changing the user facing config for this purpose is not necessary, as the implementation of the TF-TRT API can do these instead:

  • Derives the value of _b_build_pass from the number of user provided inputs.
  • Sets attribute_profile_generation_mode=True and runs the graph for n_build_pass times, to collect shape profile information.
  • Set attribute _profile_generation_mode=False and runs the graph to build the engine.

@tfeher
Copy link
Contributor Author

tfeher commented Sep 17, 2021

Thanks @bixia1 for the comment. Indeed, if the approach that you describe would work, then we do not need this PR and we can close. There is one complication though:

  • We create session1 with graph_def1, which has _profile_generation_mode=True.
  • We infer the graph in session1 to collect the optimization profiles
  • In the last step we need to rewrite the graph_def to set _profile_generation_mode=False. Let us call the it graph_def2
  • We would create a new session2 with graph_def2, similar to this step in the converter example. We want to infer the new graph to initiate building the TRT engines.
  • The newly created session seems to have new Device objects attached. We will also have a new ResourceMgr.
  • This way we do not have access to the TRTEngineCacheResource that stored the profile information collected in session1.

What is the right way to overcome this problem?

  1. Could we create thesession2 in a way that keeps the TRTEngineCacheResource from the old session1?
  2. Is there a way to extend session1 with graph_def2 and run it?
  3. A variante of the above, can we access the Graph object from session1 and add graph_def2 as a function?
  4. Shall we just manually copy over the TRTEngineCacheResource objects from session1 to session2?

@bixia1
Copy link
Contributor

bixia1 commented Sep 17, 2021

We can modify the _profile_generation_mode attribute in the TRTEngineOp inside the SavedModelBundle graphdef, without creating a new session. I think we are already doing something similar to this in the Python converterV2.build method. Do you agree?

Another problem with having _n_build_pass attribute in the TRTEngineOp and relying on its the value to become 0 in order to trigger the build of the cuda engine is that, the number of inputs provided by users may not be the same as the number of time we execute the TRTEngineOp, in the presence of loops and if-stmt.

@gbaned gbaned self-assigned this Sep 20, 2021
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Sep 20, 2021
@sanjoy sanjoy removed their request for review September 20, 2021 20:29
@tfeher
Copy link
Contributor Author

tfeher commented Sep 20, 2021

I agree with you, that it is desirable to use the existing _profile generation_mode attribute.

We can modify the _profile_generation_mode attribute in the TRTEngineOp inside the SavedModelBundle graphdef, without creating a new session.

How to do that, could you point me to an example?

It is not clear to me which API to use. Below is one example where I rewrite the GraphDef, but that does not change the actual graph used by the session. I need to create a new session for the changes to take effect.

  // Create end execute a graph with a single const op
  Scope root = Scope::NewRootScope();
  auto c = Const(root.WithOpName("my_const"), {{42.0f, 137.0f}});
  ClientSession session(root);
  std::vector<Tensor> outputs;

  Status status = session.Run({c}, &outputs);
  if (status.ok()) std::cout << outputs[0].DebugString() << std::endl;

  // Get the graph def and rewrite the constant value
  GraphDef gdef;
  status = root.ToGraphDef(&gdef);
  tensorflow::Tensor new_val(tensorflow::DT_FLOAT,
                             tensorflow::TensorShape({1, 2}));
  float *tensor_flat = new_val.flat<float>().data();
  tensor_flat[0] = 31;
  tensor_flat[1] = 41;

  NodeDef *node = gdef.mutable_node(0);
  TensorProto *tensor_attr = (*node->mutable_attr())["value"].mutable_tensor();
  new_val.AsProtoTensorContent(tensor_attr);
  
  // Changing the graph def has no effect on the results
  status = session.Run({c}, &outputs);
  if (status.ok()) std::cout << outputs[0].DebugString() << std::endl;

  // Alternative: create a new session with the modified graph def:
  std::unique_ptr<tensorflow::Session> session2(
      tensorflow::NewSession(tensorflow::SessionOptions()));
  status = session2->Create(gdef);
  status = session2->Run({}, {"my_const"}, {}, &outputs);
  std::cout << outputs[0].DebugString() << std::endl;

I think we are already doing something similar to this in the Python converterV2.build method. Do you agree?

  • I agree. We use function_from_graph_def which relies on import_graph_def_internal.
  • The latter uses TF C API calls, like TF_GraphImportGraphDefWithResults. This function can apply a prefix to the names in the graph_def, and that way we could import that to existing session.

It is not clear to me how to get the Graph object from the current session, and which API to use to manage the session. Could you give suggestions for the above example, on how to import the modified GraphDef back into the original session?

@tfeher
Copy link
Contributor Author

tfeher commented Sep 25, 2021

Closing this PR as it is not necessary. One can use ImportGraphDef to load the modified gdef with a prefix into the existing session.

Some notes:

  • We cannot modify the _profile_generation_mode attribute of a node that is already constructed. We need to run the constructor to read the attribute.
  • When we execute the prefixed graph, it will create a new TRTEngineOp object, but that object willl use the same TrtEngineCacheResource object, because the prefix is ignored while looking op the cache resource.
  • An alternative solution could be to store the _profile_generation_mode as a state variable in the TrtEngineCacheResoure. One could have a special op to set its value. Alternatively, one could also access the cache resource directly through the ResourceMgr (but have to make sure that the TrtEngineCacheResource is already created in advance).

@tfeher tfeher closed this Sep 25, 2021
@google-ml-butler google-ml-butler bot removed the awaiting review Pull request awaiting review label Sep 25, 2021
PR Queue automation moved this from Assigned Reviewer to Closed/Rejected Sep 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:gpu:tensorrt Issues specific to TensorRT size:S CL Change Size: Small
Projects
PR Queue
  
Closed/Rejected
Development

Successfully merging this pull request may close these issues.

None yet

3 participants