Skip to content

Conversation

qlzh727
Copy link
Member

@qlzh727 qlzh727 commented Mar 27, 2018

Also update the benchmark logger and bigquery schema for the
errors found during the integration test.

Also update the benchmark logger and bigquery schema for the
errors found during the integration test.
qlzh727 added 3 commits March 27, 2018 12:24
This was causing error since the Kokoro test has TF_PKG=tf-nightly
injected during test.
This library require google cloud bigquery lib as dependency, which can be
installed with:
> pip install --upgrade google-cloud-bigquery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add to requirements.txt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

logging_dir: string, logging directory that contains the benchmark log.
gcp_project: string, the name of the GCP project that the log will be
uploaded to. The default project name will be detected from local
environment if no value is provide.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

µnit: provided.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

credentials: google.auth.credentials. The credential to access the
BigQuery service. The default service account credential will be
detected from local environment if no value is provided. Please use
google.oauth2.service_account.Credentials to load credential from local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

µnit: space before "Credentials"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google.oauth2.service_account is the module name and Credentials is the class name. Do u want me to split them with a space?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't read. Ignore, and I'm sorry.

metrics = []
for l in lines:
if not l.strip(): continue
metric = json.loads(l)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be inclined to wrap this in a try-except (and log error). That way a single malformed line doesn't ruin all entries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline if is trying to catch the case of the last line of the file which only contains "\n". This is kind of an expected entry, maybe I should just do a inline strip and filter of the "line"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember the exact failure modes, but because JSON is a fragile container (1 bad byte can corrupt the entire entry), I've always seen code that skips invalid entries.

FLAGS = None


class BigQueryUploader(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a conceptual proposal:

  1. Pull dataset_name and run_id into __init__(). In effect make all local information part of the class state.
  2. Support intermediate uploading. So if I call upload_metric() with 500 lines, and then call it again with 1000 the second call will upload lines 501 to 1000.

The reason is that is would allow intermediate uploading during the training loop, rather than at the very end. Depending on the sort of monitoring we want to build on top of BigTable that could be desirable. Discuss.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. Since bigquery does not care about the quota, I think I can update the benchmarkLogger to contain a instance of uploader in future, and do a direct upload with writing to local file, treating bigquery as a file system.

Will address that in a separate change.



if __name__ == "__main__":
parser = argparse.ArgumentParser()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, moving this arg_parser

)
parser.add_argument(
"--bigquery_data_set", "-bds", default="test_benchmark",
help="The Bigquery dataset name where the benchmark will be uploaded.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come there are no "[default: %(default)s]" for the rest of these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

" be uploaded.",
metavar="<BMT>"
)
FLAGS, unparsed = parser.parse_known_args()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use parse_args() instead of parse_known_args() so it blows up with unrecognized args.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Moved to use arg_parser

self.log_metric(key, eval_results[key], global_step=global_step)

def log_metric(self, name, value, unit=None, global_step=None, extras=None):
def log_metric(self, name, value, unit=None, global_step=None, extras={}):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutable types should not be default args.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


def setUp(self):
super(BenchmarkLoggerTest, self).setUp()
self.original_environ = dict(os.environ)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you put a comment about why this was necessary and when others would need to worry about environment variables?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@qlzh727 qlzh727 merged commit 932364b into tensorflow:master Mar 28, 2018
@qlzh727 qlzh727 deleted the metric-upload branch March 28, 2018 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants