Skip to content

Conversation

@robieta
Copy link
Contributor

@robieta robieta commented Mar 12, 2018

This PR breaks various groups of common arguments into their own argparse class to enable some degree of standardization among the official models. The resnet argparser is replaced as an example.

@robieta robieta requested review from k-w-w, karmel and nealwu as code owners March 12, 2018 20:51
@karmel karmel requested review from qlzh727 and yhliang2018 and removed request for nealwu March 13, 2018 16:21
Copy link
Contributor

@karmel karmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great-- really looking forward to having fewer instances of --multi_gpu defined across the code. Some comments toward simplification, but looking good.

@@ -0,0 +1,14 @@
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters one way or the other, but a quick check of TF proper implies that they only include the license in init files if there is actual code in them, otherwise empty. Fine to leave empty in this case, probably.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

# limitations under the License.
# ==============================================================================

from .parsers import LocationParser, DeviceParser, SupervisedParser, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick with absolute imports. Both for pep8 and for the sake of rules that convert this code internally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Fixed.

"""

import argparse

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can collapse some of these categories-- all will need location, device, supervised-- those can be lumped into ModelParser or something, with all args True by default. Here we have tiny groups, and also the ability to turn things on and off one by one, which seems redundant. If we collapse those three, then remove the need to turn each option on individually, you substantially reduce the overall code required in resnet above, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I left the default as False for the secondary classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go ahead and make default=True for those as well. I would prefer to reduce the total amount of arg_parser code that users have to read through when looking at Resnet.


if data_dir:
self.add_argument(
"--data_dir", "-dd", default="/tmp",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe /tmp/model-data or something like that? Seems like just /tmp will never be a good guess. Not that model-data is though. So maybe just an empty string? Ditto for model dir below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm expecting defaults to be overridden. I am inclined to either use "/tmp" or an explicitly invalid string like "data_directory_path"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below.

if data_dir:
self.add_argument(
"--data_dir", "-dd", default="/tmp",
help="[default: %(default)s] The location of the input data.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe comment on some of the arg parser magic here? What is the default templating, and metavar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the docstring.


if inter_op:
self.add_argument(
"--inter_op_parallelism_threads", "-inrt",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inrt is really hard for me to parse as an abbreviation. Maybe just inter/intra for these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.



class DummyParser(argparse.ArgumentParser):
"""Default parser for specification of dummy/mocked behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be combined with performance above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


if use_synthetic_data:
self.add_argument(
"--use_synthetic_data", "-usd",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-synth?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

)



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: too many \ns at the end here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

'for generally homogeneous data sets, should be approximately the '
'number of available CPU cores.')
super(ResnetArgParser, self).__init__(parents=[
official.utils.arg_parsers.LocationParser(data_dir=True, model_dir=True),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import logic seems off here-- shouldn't this be from official.utils import arg_parsers above, then just arg_parsers.... here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the import logic of official module is like @karmel mentioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it looks like from foo.bar import baz is allowed by the style guide. I will certainly change this.

'for generally homogeneous data sets, should be approximately the '
'number of available CPU cores.')
super(ResnetArgParser, self).__init__(parents=[
official.utils.arg_parsers.LocationParser(data_dir=True, model_dir=True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since each of line is kind of long, can we just import official.utils.arg_parsers as parsers, and use it has parsers.LocationParser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto on above.


"""Collection of parsers which are shared among the official models.
The parsers in this module are intended to be used as parents to all arg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indent in the beginning of this line is not necessary.

import argparse


class LocationParser(argparse.ArgumentParser):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming of this class is bit confusing. This flag is trying to specify the file path, I think we should be more explicit here, eg DataPathParser or something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moot point now, but I'll keep that in mind for the future.

model_dir: Create a flag for specifying the model file directory.
"""

def __init__(self, add_help=False, data_dir=False, model_dir=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

param data_dir and model_dir give the impression that they should should be a path or string type, instead of boolean. Probably rename them into add_data_dir and add_model_dir.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should add "output_dir" here to specify the flag of the output directory for hooks logging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered suffixing them with "_flag" (i.e. data_dir_flag=True, ...) but it made things look awkward and long. I am open to suggestions though.

)


class DeviceParser(argparse.ArgumentParser):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class only contains one flag, which is not necessary to wrap. Are you intend to add more flags into this parser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

)


class SupervisedParser(argparse.ArgumentParser):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a bit more explicit naming is better here, eg SupervisedLearningParser or CommonSupervisedLearningParser

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on above.

)


class DummyParser(argparse.ArgumentParser):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DummyParse gives the impression that it does nothing, whereas it is actually doing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe DefaultParser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on above.

def __init__(self, add_help=False, data_dir=False, model_dir=False):
super(LocationParser, self).__init__(add_help=add_help)

if data_dir:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since those are not random string and should be a file path, should we do some validation and make sure they exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the arg parser is the place for that, since a model may choose some behavior (i.e. prompt for a specific script to be run or download automatically) on a case by case basis.


assert namespace.multi_gpu
assert namespace.use_synthetic_data

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do u also want to test the case that the flag value is specified while the arg parser is not turned on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turns out to not be practical. When parse_args fails, argparse fails hard by calling sys.exit(2) rather than raising an exception. While I certainly could invoke subprocess and assert an exit code of 2, this seems like too much machinery for a test that is just intended to show off parse_args().

@robieta robieta force-pushed the unified_arg_parser branch from c7eaae0 to 792b89f Compare March 13, 2018 20:29
Copy link
Contributor

@karmel karmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

"""

import argparse

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go ahead and make default=True for those as well. I would prefer to reduce the total amount of arg_parser code that users have to read through when looking at Resnet.

Taylor Robie added 2 commits March 13, 2018 16:40
the new arg parsers.

add parser unittests

condense classes and make some style cleanups.
@robieta robieta force-pushed the unified_arg_parser branch from 792b89f to 4fe8648 Compare March 13, 2018 23:41
@robieta robieta merged commit 086d914 into master Mar 13, 2018
@robieta robieta deleted the unified_arg_parser branch March 14, 2018 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants