Keras model benchmark #4476

yhliang2018 · 2018-06-07T00:38:43Z

Hi All,

Could you help to review this PR of keras application model benchmark? It includes:

benchmark_main.py: the main function to run benchmark pipeline
model_callbacks: customized callbacks for benchmark.
README

Note that, the current benchmark runs with synthetic dataset to test the pipeline. Will test it with ImageNet dataset in next PR. Thanks!

qlzh727

Please add unit test as well.

qlzh727 · 2018-06-07T14:42:34Z

official/keras_application_models/README.md

+ - NASNet
+
+## Dataset
+ImageNet data is used for the benchmark. To begin, you will need to download the ImageNet dataset and convert it to TFRecord format. Follow along with the [Inception guide](https://github.com/tensorflow/models/tree/master/research/inception#getting-started) in order to prepare the dataset.


data is a plural noun I think.

Not typically in the US :)

Also, is this the case? Or did we decide data will be processed in a different fashion?

I think the ImageNet preparation should be the same. Will update it accordingly when we are done with the actual ImageNet dataset.

qlzh727 · 2018-06-07T14:45:02Z

official/keras_application_models/benchmark_main.py

+# pylint: enable=g-bad-import-order
+
+from official.keras_application_models import model_callbacks
+from official.resnet import resnet_run_loop


Importing resnet_run_loop for a util function seems to be weird, let extract that into a common util function.

Will remove it after sync dataset is in utils.

qlzh727 · 2018-06-07T14:47:46Z

official/keras_application_models/benchmark_main.py

+MODELS = {
+    "vgg16": tf.keras.applications.VGG16,
+    "vgg19": tf.keras.applications.VGG19,
+    "inception": tf.keras.applications.InceptionV3,


Lets put the version number in the model name as well, so its more explicit.

qlzh727 · 2018-06-07T14:51:17Z

official/keras_application_models/benchmark_main.py

+_NUM_CLASSES = 1000
+
+# Define a dictionary that maps model names to their model classes inside Keras
+MODELS = {


Instead of using key-value pair, can we just use the list of full model name and find them in "tf.keras.applications"?

I think the explicit dict is sensible. Trying to go from "VGG16" to tf.keras.applications.VGG16 is a slippery slope of python magic and I think isn't worth the heartache. (If that is what is being suggested.)

qlzh727 · 2018-06-07T14:52:57Z

official/keras_application_models/benchmark_main.py

+    callbacks_list: a list of strings to name desired callbacks. Allowed:
+      ExamplesPerSecondCallback, LoggingMetricCallback, which are defined
+      in CALLBACKS.
+    batch_size: an int of batch size.


What's the usage of this batch_size?

qlzh727 · 2018-06-07T15:02:55Z

official/keras_application_models/benchmark_main.py

+  benchmark_logger = logger.config_benchmark_logger(FLAGS)
+  benchmark_logger.log_run_info(
+      model_name=FLAGS.model,
+      dataset_name="ImageNet",


synthetic_data is not real data, so dataset_name should have synthetic as suffix.

qlzh727 · 2018-06-07T15:36:25Z

official/keras_application_models/model_callbacks.py

+  This callback records the average examples per second during training.
+  """
+
+  def __init__(self, batch_size=None, epoch_size=None, batch_based=None,


I think having both boolean param and also int param is bit redundant. If user specified batch_size, I think its clear that they want to measure based on batch.

You mean when users specify batch_size in command line, we should use batch_based benchmark? as we have default batch_size for the model. When users use callbacks, they just specify names. Maybe we should accept more args like hooks?

qlzh727 · 2018-06-07T15:45:18Z

official/keras_application_models/model_callbacks.py

+
+  def __init__(self, batch_size=None, epoch_size=None, batch_based=None,
+               epoch_based=None, metric_logger=None):
+    if (batch_based is None) == (epoch_based is None):


Does it have to be either of the option, but not both? Batch based provides a micro view of training speed, and epoch based provides a overview of speed. User might want to get both.

We can support both, if users need it. (you will be the first user :) )

qlzh727 · 2018-06-07T15:48:20Z

official/keras_application_models/model_callbacks.py

+    if self._batch_based:
+      self._train_time_batch_based += time.time() - self._time_start_batch_based
+      examples_per_sec_batch_based = self._batch_size * (
+          self._global_step / self._train_time_batch_based)


I don't think this is correct. why the global_step is here? It should just be batch_size / time_per_batch.

(Assuming batch_size is global, and this func is called once per global batch.)

Thanks for the explanation, Karmel. Right, this is the global average one, not the exp_sec for the current batch. We can add the current_exp_sec if necessary.

qlzh727 · 2018-06-07T15:49:24Z

official/keras_application_models/model_callbacks.py

+      self._epochs += 1
+      self._train_time_epoch_based += time.time() - self._time_start_epoch_based
+      examples_per_sec_epoch_based = self._epoch_size * (
+          self._epochs / self._train_time_epoch_based)


This is also not correct.

karmel

Nine models in one-- fantastic.

karmel · 2018-06-07T15:51:33Z

official/keras_application_models/README.md

@@ -0,0 +1,31 @@
+# Keras Application Models Benchmark
+## Overview
+This provides a single scaffold to benchmark the nine Keras built-in application [models](https://keras.io/applications/). All the models are for image classification application, and include:


nit: are for image classification applications

karmel · 2018-06-07T15:52:05Z

official/keras_application_models/README.md

+ - NASNet
+
+## Dataset
+ImageNet data is used for the benchmark. To begin, you will need to download the ImageNet dataset and convert it to TFRecord format. Follow along with the [Inception guide](https://github.com/tensorflow/models/tree/master/research/inception#getting-started) in order to prepare the dataset.


Not typically in the US :)

karmel · 2018-06-07T15:52:33Z

official/keras_application_models/README.md

+ - NASNet
+
+## Dataset
+ImageNet data is used for the benchmark. To begin, you will need to download the ImageNet dataset and convert it to TFRecord format. Follow along with the [Inception guide](https://github.com/tensorflow/models/tree/master/research/inception#getting-started) in order to prepare the dataset.


Also, is this the case? Or did we decide data will be processed in a different fashion?

karmel · 2018-06-07T15:52:58Z

official/keras_application_models/README.md

+ImageNet data is used for the benchmark. To begin, you will need to download the ImageNet dataset and convert it to TFRecord format. Follow along with the [Inception guide](https://github.com/tensorflow/models/tree/master/research/inception#getting-started) in order to prepare the dataset.
+
+## Callbacks
+Two customized callbacks are provided for model benchmark: ExamplesPerSecondCallback and LoggingMetricCallback. For each callback, `epoch_based` and `batch_based` options are available to set the benchmark level. Check [model_callbacks.py](model_callbacks.py) for more details.


nit: Two custom callbacks are provided for model benchmarking

karmel · 2018-06-07T15:53:49Z

official/keras_application_models/README.md

+ - MobileNet
+ - DenseNet
+ - NASNet
+


I guess its ResNet50 only

karmel · 2018-06-07T16:41:39Z

official/keras_application_models/model_callbacks.py

+    self._metrics = metrics or _METRICS_TO_LOG
+    for metric in self._metrics:
+      if metric.strip().lower() not in _METRICS_TO_LOG.keys():
+        raise ValueError("Unrecognized metric requested: {}".format(metric))


super's init?

karmel · 2018-06-07T16:42:20Z

official/keras_application_models/model_callbacks.py

+    self._logger = metric_logger or logger.BaseBenchmarkLogger()
+    self._metrics = metrics or _METRICS_TO_LOG
+    for metric in self._metrics:
+      if metric.strip().lower() not in _METRICS_TO_LOG.keys():


Why give a choice in that case? Let's just always log all _METRICS_TO_LOG?

karmel · 2018-06-07T16:42:38Z

official/keras_application_models/model_callbacks.py

+        raise ValueError("Unrecognized metric requested: {}".format(metric))
+
+  def on_train_begin(self, logs=None):
+    self._global_step = 0


ditto on the above-- consider putting this in the init?

karmel · 2018-06-07T16:44:40Z

official/keras_application_models/model_callbacks.py

+          self._logger.log_metric(
+              _METRICS_TO_LOG[metric],
+              logs.get(metric),
+              global_step=self._global_step)


Let's split metrics_to_log into per_batch_metrics and per_epoch_metrics, and then log for both epoch and batch, without a choice for the user. That will simplify this code a lot, and I believe will match the use-case, though @yhliang2018 , @qlzh727 , let me know if I am incorrect.

karmel · 2018-06-07T16:45:27Z

official/keras_application_models/model_callbacks.py

+    """Log metrics after each epoch."""
+    if self._epoch_based:
+      for metric in self._metrics.keys():
+        metric = metric.strip().lower()


This stripping and lowering is unnecessary once we just use the set of constants. Even if done, should be done on init, rather than every time we want to log.

robieta · 2018-06-07T19:54:34Z

official/keras_application_models/benchmark_main.py

+_NUM_CLASSES = 1000
+
+# Define a dictionary that maps model names to their model classes inside Keras
+MODELS = {


I think the explicit dict is sensible. Trying to go from "VGG16" to tf.keras.applications.VGG16 is a slippery slope of python magic and I think isn't worth the heartache. (If that is what is being suggested.)

robieta · 2018-06-07T20:04:25Z

official/keras_application_models/benchmark_main.py

+      epochs=FLAGS.train_epochs,
+      callbacks=callbacks,
+      validation_data=val_dataset,
+      steps_per_epoch=int(np.ceil(train_num_images / float(batch_size))),


nit: Is float() necessary with from __future__ import division?

robieta · 2018-06-07T20:08:43Z

official/keras_application_models/model_callbacks.py

+    if self._batch_based:
+      self._time_start_batch_based = time.time()
+
+  def on_batch_end(self, batch, logs=None):


Once per global batch.

yhliang2018 · 2018-06-15T18:45:51Z

Hi All,

Thanks a lot for the helpful comments and offline discussions. The code is updated and ready for review. Let me know if you have any comments.

The actual ImageNet dataset and unit tests will come in next PR. Thanks!

qlzh727 · 2018-06-15T21:15:14Z

official/keras_application_models/benchmark_main.py

+    eval_results = {
+        "accuracy": history.history["val_acc"][epoch],
+        "loss": history.history["val_loss"][epoch],
+        "epoch": epoch + 1,


Usually epoch itself does not count as a metric.

I think this should be removed.

qlzh727 · 2018-06-15T21:16:54Z

official/keras_application_models/benchmark_main.py

+      epochs=FLAGS.train_epochs,
+      callbacks=callbacks,
+      validation_data=val_dataset,
+      steps_per_epoch=int(np.ceil(train_num_images / batch_size)),


why np.ceil is used here? I assume train_num_images and batch_size are just real number, rather than matrix.

qlzh727 · 2018-06-15T21:19:20Z

official/keras_application_models/dataset.py

+  return image_size
+
+
+def generate_synthetic_input_dataset(model, num_imgs):


Does this function need to care about the data format? eg channel first vs last?

By default, all the built-in models use channel_last format. So it's not an option here.

qlzh727 · 2018-06-15T21:21:45Z

official/keras_application_models/model_callbacks.py

+    self._batch_size = batch_size
+    self._every_n_steps = every_n_steps
+    self._logger = metric_logger or logger.BaseBenchmarkLogger()
+    self._global_step = 0


I think this should be set in on_train_begin, in case user reuse the callback instance for several training cycle.

Yeah, here is something actually I am not quite sure. In the last commit, I put global_step into the on_train_begin, but Karmel suggests to move it to the constructor. I thought as callback is invoked during the training phase, maybe it makes no difference here? What do you mean by "reusing the callback instance in several training cycle"? @qlzh727 @karmel

on_train_begin gets called each time someone starts a train loop? Which would be for each epoch for example? Shouldn't global step be maintained for the life of the callback/model, rather than reset to 0 with each train loop? Or does on_train_begin only get called once?

I would expect on_train_begin only gets called once when a training starts, eg a fit() call. For each epoch, it will use on_epoch_begin/end method.

From the code, it looks like on_train_begin gets called in the fit_loop, so it would be reset with each call to fit(). Since a model can be incrementally trained with fit(), we want the global step to persist across those training sessions. So, let's keep it in init for now.

qlzh727 · 2018-06-15T21:39:24Z

official/keras_application_models/model_callbacks.py

+
+  def on_train_begin(self, logs=None):
+    self._train_time = 0
+    self._batch_times = []


There is a side effect of saving all the historical timestamp/duration since it will consume quite some memory, and you actually don't need them all when you log the metric.

In my vision, the optimal implementation should be:

def on_train_begin(self): self._train_start_timestamp = time.time() self._previous_recorded_timestamp = time.time() def on_batch_end(self): self._global_step += 1 current_time = time.time() if self._global_step % self._every_n_steps == 0: average_examples_per_sec = self._batch_size * self._global_step / (current_time - self._train_start_timestamp) current_examples_per_sec = self._batch_size * self._every_n_steps / (current_time - self._previous_recorded_timestamp) self._previous_recorded_timestamp = current_time

Smart! Thanks!!

qlzh727 · 2018-06-15T21:42:37Z

official/keras_application_models/model_callbacks.py

+
+  def __init__(self, metric_logger=None):
+    self._logger = metric_logger or logger.BaseBenchmarkLogger()
+    self._per_batch_metrics = _PER_BATCH_METRICS


Should we expose them as the constructor params?

Here we decide to use the pre-defined metrics as Karmel suggested to make the code simple.

qlzh727 · 2018-06-15T21:43:04Z

official/keras_application_models/model_callbacks.py

+    self._logger = metric_logger or logger.BaseBenchmarkLogger()
+    self._per_batch_metrics = _PER_BATCH_METRICS
+    self._per_epoch_metrics = _PER_EPOCH_METRICS
+    self._global_step = 0


put it in on_train_start()

yhliang2018 · 2018-07-13T19:03:23Z

Hi Karmel and Scott,

I update this PR with latest API and also add a TODO for a WIP bug. Could you help to review it? Will merge it if everything looks fine. Thanks!

qlzh727 · 2018-07-13T20:34:57Z

official/keras_application_models/benchmark_main.py

+    "densenet121": tf.keras.applications.DenseNet121,
+    "densenet169": tf.keras.applications.DenseNet169,
+    "densenet201": tf.keras.applications.DenseNet201,
+    # TODO (b/80431378)


nit: usually there is no space between TODO and bracket.

qlzh727 · 2018-07-13T20:36:06Z

official/keras_application_models/benchmark_main.py

+  # Ensure a valid model name was supplied via command line argument
+  if FLAGS.model not in MODELS.keys():
+    raise AssertionError("The --model command line argument should "
+                         "be a key in the `MODELS` dictionary.")


Correct, the enum flag will ensure the value in valid.

qlzh727 · 2018-07-13T20:36:53Z

official/keras_application_models/benchmark_main.py

+    val_dataset = dataset.generate_synthetic_input_dataset(
+        FLAGS.model, val_num_images)
+  else:
+    # Use the actual ImageNet dataset (TODO)


Move the TODO to the front of the line and maybe assign to yourself.

I remove it, as we will only support synthetic dataset for now.

qlzh727 · 2018-07-13T20:37:31Z

official/keras_application_models/benchmark_main.py

+
+  # Get dataset
+  dataset_name = "ImageNet"
+  num_gpus = flags_core.get_num_gpus(FLAGS)


This line can be moved to the place where its used.

qlzh727 · 2018-07-13T20:42:59Z

official/keras_application_models/benchmark_main.py

+      "train_epochs": FLAGS.train_epochs
+  }
+
+  benchmark_logger = logger.config_benchmark_logger()


Please use the benchmark_context() for logger config. See https://github.com/tensorflow/models/blob/master/official/utils/logs/logger.py#L96 and sample in https://github.com/tensorflow/models/blob/master/official/resnet/cifar10_main.py#L253

qlzh727 · 2018-07-13T20:44:37Z

official/keras_application_models/benchmark_main.py

+      train_epochs=2)
+
+  flags.DEFINE_enum(
+      name="model", default="resnet50",


It's a bit weird to benchmark resnet50 by default, is there any reason why it is chosen?

Not really. Do you think it's better if we don't set default value here?

qlzh727 · 2018-07-13T20:45:22Z

official/keras_application_models/benchmark_main.py

+
+  flags.DEFINE_enum(
+      name="model", default="resnet50",
+      enum_values=MODELS.keys(), case_sensitive=True,


Not sure why case_sensitive=True is needed here. I don't see models that with same name by different cases.

qlzh727 · 2018-07-13T20:57:36Z

official/keras_application_models/model_callbacks.py

+    self._last_recorded_time = time.time()
+
+  def on_batch_begin(self, batch, logs=None):
+    self._time_start = time.time()


This is not used anywhere, did u missed it or used typo somewhere else?

Good catch!! It's useless given current implementation. Have removed it.

qlzh727 · 2018-07-13T21:02:30Z

official/keras_application_models/model_callbacks.py

+    """Log metrics after each epoch."""
+    for metric in _PER_EPOCH_METRICS:
+      self._logger.log_metric(
+          _PER_EPOCH_METRICS[metric],


For the epoch based metric, there are two of them same as the ones in batch based. Should we dedup them since they will be logged by batch based anyway?

I think it's okay to keep it. As the number of epochs are not large compared to the number of steps, there won't be too much redundant info.

qlzh727 · 2018-07-13T21:06:23Z

official/keras_application_models/model_callbacks.py

+
+
+# A dictionary to map the callback name and its corresponding function
+CALLBACKS = {


constants should be on the top

Yes for common constants. But here its values are two functions defined at the bottom. I think it's more clear to put it here for future extension, just like what we do for hooks.

yhliang2018 · 2018-07-13T21:57:41Z

Hi Scott,
Thank you for the comments! They are all addressed. Would you help to take a look at this new commit? Thanks.

qlzh727 · 2018-07-13T22:12:36Z

official/keras_application_models/benchmark_main.py



-def main(_):
+def run_keras_model_benchmark(_):


You can choose to use the input param here which is the flag object, and that will enable the value injection during test. This can be done in a later PR.

yhliang2018 added 2 commits June 6, 2018 09:54

Add callbacks

7cf46fc

Add readme

049c600

yhliang2018 requested review from karmel and a team as code owners June 7, 2018 00:38

googlebot added the cla: yes label Jun 7, 2018

yhliang2018 requested a review from qlzh727 June 7, 2018 00:38

qlzh727 requested changes Jun 7, 2018

View reviewed changes

karmel suggested changes Jun 7, 2018

View reviewed changes

karmel mentioned this pull request Jun 7, 2018

Transformer multi gpu #4457

Merged

robieta reviewed Jun 7, 2018

View reviewed changes

yhliang2018 added 5 commits June 8, 2018 09:37

update readme

3662e3c

fix some comments

9c635b0

Address all comments

ca03b62

Update docstrings

88f9ac9

Add method docstrings

558f78e

qlzh727 requested changes Jun 15, 2018

View reviewed changes

yhliang2018 added 3 commits June 18, 2018 09:53

Update callbacks

951630a

Add comments on global_step initialization

401252c

Some updates

48d8b13

karmel approved these changes Jul 13, 2018

View reviewed changes

yhliang2018 requested review from qlzh727 and removed request for qlzh727 July 13, 2018 20:37

qlzh727 requested changes Jul 13, 2018

View reviewed changes

Address comments

0e474b1

qlzh727 approved these changes Jul 13, 2018

View reviewed changes

yhliang2018 merged commit 937a530 into master Jul 13, 2018

yhliang2018 deleted the feat/keras_benchmark branch July 13, 2018 23:08

		return image_size


		def generate_synthetic_input_dataset(model, num_imgs):



		# A dictionary to map the callback name and its corresponding function
		CALLBACKS = {

Keras model benchmark #4476

Keras model benchmark #4476

Uh oh!

Conversation

yhliang2018 commented Jun 7, 2018

Uh oh!

qlzh727 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karmel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yhliang2018 commented Jun 15, 2018

Uh oh!