Add SavedModel export to Resnet #3759

karmel · 2018-03-26T21:16:55Z

This adds the ability to export a Saved Model to ResNet. Note that this will be followed up with more functionality for TensorRT shortly.

Ideally, we want to update the WideDeep and all other new models to do the same-- then we can push the ExportParser into BaseParser.

@isaprykin - a question for you: right now, I pass batch_size through to the input receiver fn, even though in theory, one might want the savedmodel to request a batch_size of 1 by default. The reason I do that is because the replicate_model_fn ensure_divisible_by_shards complains during model saving if the batch size is one, even just for exporting the model (presumably because the graph gets run through again)? Not sure what the implications of my approach are, or if there is a better way?

@qlzh727 -- FYI, I envision that ultimately we will want every benchmark run to also export a savedmodel, which can then be used to benchmark TF inference.

karmel · 2018-03-26T22:13:13Z

official/resnet/resnet_run_loop.py

+        'the resulting SavedModel will require the same GPUs be available.'
+        'If you wish to serve the SavedModel from a different device, '
+        'try exporting the SavedModel with multi-GPU mode turned off.')
+


FYI @isaprykin - For now, warning the user. Eventually, it would be nice if Estimator would know not to run the saved model through the replication part of the graph.

Checkpoint loading seems flexible enough that you can load between single and multi gpu models. So in the case of multi gpu models it might makes sense to construct a new estimator, load in the trained weights, and then serialize that. It's perhaps not the most elegant solution, but training with multi_gpu and then serving on a per-GPU basis seems like the most common use case so supporting it is somewhat important.

For now, I'm going to leave as is. DistributionStrategies is a moving target that aims to hide replicate_model_fn, so we can reevaluate in a ~month when that is firmed up.

That sounds perfectly reasonable.

qlzh727 · 2018-03-26T22:21:39Z

official/resnet/resnet_run_loop.py

+  return classifier
+
+
+def export_savedmodel(classifier, export_dir, shape, batch_size):


savedmodel seems to be a two words to me, so probably export_saved_model

Unfortunately, there is a lot of inconsistency in the spelling of this across TF+friends. In this case, I went with what Estimator has, which is export_savedmodel.

qlzh727 · 2018-03-26T22:23:41Z

official/resnet/resnet_run_loop.py

+        'You are exporting a SavedModel while in multi-GPU mode. Note that '
+        'the resulting SavedModel will require the same GPUs be available.'
+        'If you wish to serve the SavedModel from a different device, '
+        'try exporting the SavedModel with multi-GPU mode turned off.')


I guess we should stay with double quote for consistence.

This file uses single quotes. Perhaps sometime we should just choose which quote style to use for this repo.

Yeah, I would opt for double quotes just because docstrings are """, but we can fix in a separate PR for all files if desired. For now, will stick with what's there to avoid confusion in this PR.

qlzh727 · 2018-03-26T22:24:49Z

official/resnet/resnet_run_loop.py

+  return classifier
+
+
+def warn_on_multi_gpu_export(multi_gpu=False):


If user export the model on a single GPU mode and try to run the exported model on a multi GPU env, will it work?

Probably not. Although serving in multi-GPU mode is not really a thing that's typically done, I think; parallelization is handled more efficiently on the request-partitioning side, I believe, rather than inside the model itself, since there is no need to update shared weights/other info.

qlzh727 · 2018-03-26T22:27:59Z

official/resnet/resnet_run_loop.py

+  input_receiver_fn = export.build_tensor_serving_input_receiver_fn(
+      shape, batch_size=batch_size)
+  classifier.export_savedmodel(export_dir, input_receiver_fn)
+  return classifier


Is there any reason to return the classifier here? Since the export_savedmodel is not called on classifier instance, I don't think it can help chaining the calls.

Consistency with the other function is what I was thinking, but I'm not attached to it. Will remove.

qlzh727 · 2018-03-26T22:31:01Z

official/resnet/resnet_run_loop.py

+  return classifier
+
+
+def export_savedmodel(classifier, export_dir, shape, batch_size):


I would suggest move this to some util file since it has nothing to do with the model or any internal variable for resnet. I would expect a lot of other model will have function like this too if they want export the model.

I went with @robieta 's suggestion above and included the steps in renset_main, which removes the need for this function.

robieta · 2018-03-26T22:09:37Z

official/resnet/cifar10_main.py

+  classifier = resnet_run_loop.resnet_main(
+      flags, cifar10_model_fn, input_function)
+
+  # Export the model if desired


I think it would be cleaner to handle this inside of resnet_main() and just pass in shape. That would dedupe some code and is more intuitive IMHO.

Sigh. Yeah, I went back and forth on that, but in the end, I felt like weighing resnet_main down with another, totally separate task set that required an extra param was asking too much of it. Quick poll: what do the people in reviewing this PR think-- put more of the logic in resnet_main, or keep it separate?

I'm ok with both ideas. The main function should contain all the functions IMO, and it would be nice to avoid duplicate code. If we keeping them separate, perhaps we should rename the function resnet_main to train, train_and_evaluate, run_training or similar.

Two and a half people counts as quorum. Done.

robieta · 2018-03-26T22:13:45Z

official/utils/arg_parsers/parsers.py

+    if export_dir:
+      self.add_argument(
+          "--export_dir",
+          type=str,


For strings type can be omitted.

robieta · 2018-03-26T22:17:45Z

official/utils/arg_parsers/parsers.py

+    super(ExportParser, self).__init__(add_help=add_help)
+    if export_dir:
+      self.add_argument(
+          "--export_dir",


Abbreviation and metavar please.

robieta · 2018-03-26T22:39:44Z

official/utils/export/export_test.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for exporting utils."""


Would it be much trouble to put tests for the individual models? (Probably in their respective files rather than this one.) Basically generate a trivial model with synthetic data and then confirm that it can be loaded and serve.

Hahahaha. I laugh because, as I learned during this process, actually using a saved model is incredibly difficult without a dedicated TF Serving instance. I am hoping we can change that next quarter, but, for now, it would require quite a few tf.contrib calls. Punting on that until we formalize the story on model exports, hopefully next quarter.

karmel

All comments addressed; please take a look.

karmel · 2018-03-27T20:13:07Z

official/resnet/resnet_run_loop.py

+  return classifier
+
+
+def export_savedmodel(classifier, export_dir, shape, batch_size):


I went with @robieta 's suggestion above and included the steps in renset_main, which removes the need for this function.

karmel · 2018-03-27T20:14:23Z

official/resnet/resnet_run_loop.py

+        'You are exporting a SavedModel while in multi-GPU mode. Note that '
+        'the resulting SavedModel will require the same GPUs be available.'
+        'If you wish to serve the SavedModel from a different device, '
+        'try exporting the SavedModel with multi-GPU mode turned off.')


Yeah, I would opt for double quotes just because docstrings are """, but we can fix in a separate PR for all files if desired. For now, will stick with what's there to avoid confusion in this PR.

karmel · 2018-03-27T20:15:44Z

official/resnet/resnet_run_loop.py

+        'the resulting SavedModel will require the same GPUs be available.'
+        'If you wish to serve the SavedModel from a different device, '
+        'try exporting the SavedModel with multi-GPU mode turned off.')
+


For now, I'm going to leave as is. DistributionStrategies is a moving target that aims to hide replicate_model_fn, so we can reevaluate in a ~month when that is firmed up.

karmel · 2018-03-27T20:25:32Z

official/resnet/cifar10_main.py

+  classifier = resnet_run_loop.resnet_main(
+      flags, cifar10_model_fn, input_function)
+
+  # Export the model if desired


Two and a half people counts as quorum. Done.

karmel added 5 commits March 21, 2018 15:32

Adding export_dir and model saving for Resnet

19d9644

Moving to utils for tests

58162fe

Merge remote-tracking branch 'origin' into feat/export-model

3781875

Adding batch_size

0ba85be

Adding batch_size

5eafbe3

karmel requested review from k-w-w, qlzh727 and robieta March 26, 2018 21:16

karmel requested a review from nealwu as a code owner March 26, 2018 21:16

googlebot added the cla: yes label Mar 26, 2018

Adding multi-gpu export warning

665bca6

karmel commented Mar 26, 2018

View reviewed changes

qlzh727 requested changes Mar 26, 2018

View reviewed changes

robieta suggested changes Mar 26, 2018

View reviewed changes

karmel added 2 commits March 27, 2018 13:05

Merge remote-tracking branch 'origin' into feat/export-model

34af841

Responding to CR

89fba53

karmel commented Mar 27, 2018

View reviewed changes

robieta approved these changes Mar 27, 2018

View reviewed changes

karmel added 2 commits March 28, 2018 14:03

Merge remote-tracking branch 'origin' into feat/export-model

7f7be94

Py3 compliance

a5f10dd

qlzh727 approved these changes Mar 28, 2018

View reviewed changes

karmel merged commit eb73a85 into master Mar 28, 2018

karmel deleted the feat/export-model branch March 28, 2018 21:28

		return classifier


		def export_savedmodel(classifier, export_dir, shape, batch_size):

		return classifier


		def warn_on_multi_gpu_export(multi_gpu=False):

Add SavedModel export to Resnet #3759

Add SavedModel export to Resnet #3759

Uh oh!

Conversation

karmel commented Mar 26, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karmel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants