Skip to content

Conversation

karmel
Copy link
Contributor

@karmel karmel commented Mar 26, 2018

This adds the ability to export a Saved Model to ResNet. Note that this will be followed up with more functionality for TensorRT shortly.

Ideally, we want to update the WideDeep and all other new models to do the same-- then we can push the ExportParser into BaseParser.

@isaprykin - a question for you: right now, I pass batch_size through to the input receiver fn, even though in theory, one might want the savedmodel to request a batch_size of 1 by default. The reason I do that is because the replicate_model_fn ensure_divisible_by_shards complains during model saving if the batch size is one, even just for exporting the model (presumably because the graph gets run through again)? Not sure what the implications of my approach are, or if there is a better way?

@qlzh727 -- FYI, I envision that ultimately we will want every benchmark run to also export a savedmodel, which can then be used to benchmark TF inference.

@karmel karmel requested review from k-w-w, qlzh727 and robieta March 26, 2018 21:16
@karmel karmel requested a review from nealwu as a code owner March 26, 2018 21:16
'the resulting SavedModel will require the same GPUs be available.'
'If you wish to serve the SavedModel from a different device, '
'try exporting the SavedModel with multi-GPU mode turned off.')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @isaprykin - For now, warning the user. Eventually, it would be nice if Estimator would know not to run the saved model through the replication part of the graph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkpoint loading seems flexible enough that you can load between single and multi gpu models. So in the case of multi gpu models it might makes sense to construct a new estimator, load in the trained weights, and then serialize that. It's perhaps not the most elegant solution, but training with multi_gpu and then serving on a per-GPU basis seems like the most common use case so supporting it is somewhat important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I'm going to leave as is. DistributionStrategies is a moving target that aims to hide replicate_model_fn, so we can reevaluate in a ~month when that is firmed up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds perfectly reasonable.

return classifier


def export_savedmodel(classifier, export_dir, shape, batch_size):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

savedmodel seems to be a two words to me, so probably export_saved_model

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, there is a lot of inconsistency in the spelling of this across TF+friends. In this case, I went with what Estimator has, which is export_savedmodel.

'You are exporting a SavedModel while in multi-GPU mode. Note that '
'the resulting SavedModel will require the same GPUs be available.'
'If you wish to serve the SavedModel from a different device, '
'try exporting the SavedModel with multi-GPU mode turned off.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should stay with double quote for consistence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file uses single quotes. Perhaps sometime we should just choose which quote style to use for this repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would opt for double quotes just because docstrings are """, but we can fix in a separate PR for all files if desired. For now, will stick with what's there to avoid confusion in this PR.

return classifier


def warn_on_multi_gpu_export(multi_gpu=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user export the model on a single GPU mode and try to run the exported model on a multi GPU env, will it work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. Although serving in multi-GPU mode is not really a thing that's typically done, I think; parallelization is handled more efficiently on the request-partitioning side, I believe, rather than inside the model itself, since there is no need to update shared weights/other info.

input_receiver_fn = export.build_tensor_serving_input_receiver_fn(
shape, batch_size=batch_size)
classifier.export_savedmodel(export_dir, input_receiver_fn)
return classifier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to return the classifier here? Since the export_savedmodel is not called on classifier instance, I don't think it can help chaining the calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency with the other function is what I was thinking, but I'm not attached to it. Will remove.

return classifier


def export_savedmodel(classifier, export_dir, shape, batch_size):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest move this to some util file since it has nothing to do with the model or any internal variable for resnet. I would expect a lot of other model will have function like this too if they want export the model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with @robieta 's suggestion above and included the steps in renset_main, which removes the need for this function.

classifier = resnet_run_loop.resnet_main(
flags, cifar10_model_fn, input_function)

# Export the model if desired
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner to handle this inside of resnet_main() and just pass in shape. That would dedupe some code and is more intuitive IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh. Yeah, I went back and forth on that, but in the end, I felt like weighing resnet_main down with another, totally separate task set that required an extra param was asking too much of it. Quick poll: what do the people in reviewing this PR think-- put more of the logic in resnet_main, or keep it separate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with both ideas. The main function should contain all the functions IMO, and it would be nice to avoid duplicate code. If we keeping them separate, perhaps we should rename the function resnet_main to train, train_and_evaluate, run_training or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two and a half people counts as quorum. Done.

if export_dir:
self.add_argument(
"--export_dir",
type=str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For strings type can be omitted.

super(ExportParser, self).__init__(add_help=add_help)
if export_dir:
self.add_argument(
"--export_dir",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abbreviation and metavar please.

# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for exporting utils."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be much trouble to put tests for the individual models? (Probably in their respective files rather than this one.) Basically generate a trivial model with synthetic data and then confirm that it can be loaded and serve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahahaha. I laugh because, as I learned during this process, actually using a saved model is incredibly difficult without a dedicated TF Serving instance. I am hoping we can change that next quarter, but, for now, it would require quite a few tf.contrib calls. Punting on that until we formalize the story on model exports, hopefully next quarter.

Copy link
Contributor Author

@karmel karmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All comments addressed; please take a look.

return classifier


def export_savedmodel(classifier, export_dir, shape, batch_size):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with @robieta 's suggestion above and included the steps in renset_main, which removes the need for this function.

'You are exporting a SavedModel while in multi-GPU mode. Note that '
'the resulting SavedModel will require the same GPUs be available.'
'If you wish to serve the SavedModel from a different device, '
'try exporting the SavedModel with multi-GPU mode turned off.')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would opt for double quotes just because docstrings are """, but we can fix in a separate PR for all files if desired. For now, will stick with what's there to avoid confusion in this PR.

'the resulting SavedModel will require the same GPUs be available.'
'If you wish to serve the SavedModel from a different device, '
'try exporting the SavedModel with multi-GPU mode turned off.')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I'm going to leave as is. DistributionStrategies is a moving target that aims to hide replicate_model_fn, so we can reevaluate in a ~month when that is firmed up.

classifier = resnet_run_loop.resnet_main(
flags, cifar10_model_fn, input_function)

# Export the model if desired
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two and a half people counts as quorum. Done.

@karmel karmel merged commit eb73a85 into master Mar 28, 2018
@karmel karmel deleted the feat/export-model branch March 28, 2018 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants