Skip to content

Conversation

yhliang2018
Copy link
Contributor

Hi All,

Could you help to review this PR of keras application model benchmark? It includes:

  • benchmark_main.py: the main function to run benchmark pipeline
  • model_callbacks: customized callbacks for benchmark.
  • README

Note that, the current benchmark runs with synthetic dataset to test the pipeline. Will test it with ImageNet dataset in next PR. Thanks!

@yhliang2018 yhliang2018 requested review from karmel and a team as code owners June 7, 2018 00:38
@yhliang2018 yhliang2018 requested a review from qlzh727 June 7, 2018 00:38
Copy link
Member

@qlzh727 qlzh727 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit test as well.

- NASNet

## Dataset
ImageNet data is used for the benchmark. To begin, you will need to download the ImageNet dataset and convert it to TFRecord format. Follow along with the [Inception guide](https://github.com/tensorflow/models/tree/master/research/inception#getting-started) in order to prepare the dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data is a plural noun I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not typically in the US :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is this the case? Or did we decide data will be processed in a different fashion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ImageNet preparation should be the same. Will update it accordingly when we are done with the actual ImageNet dataset.

# pylint: enable=g-bad-import-order

from official.keras_application_models import model_callbacks
from official.resnet import resnet_run_loop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing resnet_run_loop for a util function seems to be weird, let extract that into a common util function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove it after sync dataset is in utils.

MODELS = {
"vgg16": tf.keras.applications.VGG16,
"vgg19": tf.keras.applications.VGG19,
"inception": tf.keras.applications.InceptionV3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets put the version number in the model name as well, so its more explicit.

_NUM_CLASSES = 1000

# Define a dictionary that maps model names to their model classes inside Keras
MODELS = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using key-value pair, can we just use the list of full model name and find them in "tf.keras.applications"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the explicit dict is sensible. Trying to go from "VGG16" to tf.keras.applications.VGG16 is a slippery slope of python magic and I think isn't worth the heartache. (If that is what is being suggested.)

callbacks_list: a list of strings to name desired callbacks. Allowed:
ExamplesPerSecondCallback, LoggingMetricCallback, which are defined
in CALLBACKS.
batch_size: an int of batch size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the usage of this batch_size?

benchmark_logger = logger.config_benchmark_logger(FLAGS)
benchmark_logger.log_run_info(
model_name=FLAGS.model,
dataset_name="ImageNet",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synthetic_data is not real data, so dataset_name should have synthetic as suffix.

This callback records the average examples per second during training.
"""

def __init__(self, batch_size=None, epoch_size=None, batch_based=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having both boolean param and also int param is bit redundant. If user specified batch_size, I think its clear that they want to measure based on batch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean when users specify batch_size in command line, we should use batch_based benchmark? as we have default batch_size for the model. When users use callbacks, they just specify names. Maybe we should accept more args like hooks?


def __init__(self, batch_size=None, epoch_size=None, batch_based=None,
epoch_based=None, metric_logger=None):
if (batch_based is None) == (epoch_based is None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it have to be either of the option, but not both? Batch based provides a micro view of training speed, and epoch based provides a overview of speed. User might want to get both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can support both, if users need it. (you will be the first user :) )

if self._batch_based:
self._train_time_batch_based += time.time() - self._time_start_batch_based
examples_per_sec_batch_based = self._batch_size * (
self._global_step / self._train_time_batch_based)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. why the global_step is here? It should just be batch_size / time_per_batch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Assuming batch_size is global, and this func is called once per global batch.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, Karmel. Right, this is the global average one, not the exp_sec for the current batch. We can add the current_exp_sec if necessary.

self._epochs += 1
self._train_time_epoch_based += time.time() - self._time_start_epoch_based
examples_per_sec_epoch_based = self._epoch_size * (
self._epochs / self._train_time_epoch_based)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also not correct.

Copy link
Contributor

@karmel karmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nine models in one-- fantastic.

@@ -0,0 +1,31 @@
# Keras Application Models Benchmark
## Overview
This provides a single scaffold to benchmark the nine Keras built-in application [models](https://keras.io/applications/). All the models are for image classification application, and include:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: are for image classification applications

- NASNet

## Dataset
ImageNet data is used for the benchmark. To begin, you will need to download the ImageNet dataset and convert it to TFRecord format. Follow along with the [Inception guide](https://github.com/tensorflow/models/tree/master/research/inception#getting-started) in order to prepare the dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not typically in the US :)

- NASNet

## Dataset
ImageNet data is used for the benchmark. To begin, you will need to download the ImageNet dataset and convert it to TFRecord format. Follow along with the [Inception guide](https://github.com/tensorflow/models/tree/master/research/inception#getting-started) in order to prepare the dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is this the case? Or did we decide data will be processed in a different fashion?

ImageNet data is used for the benchmark. To begin, you will need to download the ImageNet dataset and convert it to TFRecord format. Follow along with the [Inception guide](https://github.com/tensorflow/models/tree/master/research/inception#getting-started) in order to prepare the dataset.

## Callbacks
Two customized callbacks are provided for model benchmark: ExamplesPerSecondCallback and LoggingMetricCallback. For each callback, `epoch_based` and `batch_based` options are available to set the benchmark level. Check [model_callbacks.py](model_callbacks.py) for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Two custom callbacks are provided for model benchmarking

- MobileNet
- DenseNet
- NASNet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resnet?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess its ResNet50 only

self._metrics = metrics or _METRICS_TO_LOG
for metric in self._metrics:
if metric.strip().lower() not in _METRICS_TO_LOG.keys():
raise ValueError("Unrecognized metric requested: {}".format(metric))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super's init?

self._logger = metric_logger or logger.BaseBenchmarkLogger()
self._metrics = metrics or _METRICS_TO_LOG
for metric in self._metrics:
if metric.strip().lower() not in _METRICS_TO_LOG.keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why give a choice in that case? Let's just always log all _METRICS_TO_LOG?

raise ValueError("Unrecognized metric requested: {}".format(metric))

def on_train_begin(self, logs=None):
self._global_step = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto on the above-- consider putting this in the init?

self._logger.log_metric(
_METRICS_TO_LOG[metric],
logs.get(metric),
global_step=self._global_step)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's split metrics_to_log into per_batch_metrics and per_epoch_metrics, and then log for both epoch and batch, without a choice for the user. That will simplify this code a lot, and I believe will match the use-case, though @yhliang2018 , @qlzh727 , let me know if I am incorrect.

"""Log metrics after each epoch."""
if self._epoch_based:
for metric in self._metrics.keys():
metric = metric.strip().lower()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stripping and lowering is unnecessary once we just use the set of constants. Even if done, should be done on init, rather than every time we want to log.

@karmel karmel mentioned this pull request Jun 7, 2018
_NUM_CLASSES = 1000

# Define a dictionary that maps model names to their model classes inside Keras
MODELS = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the explicit dict is sensible. Trying to go from "VGG16" to tf.keras.applications.VGG16 is a slippery slope of python magic and I think isn't worth the heartache. (If that is what is being suggested.)

epochs=FLAGS.train_epochs,
callbacks=callbacks,
validation_data=val_dataset,
steps_per_epoch=int(np.ceil(train_num_images / float(batch_size))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is float() necessary with from __future__ import division?

if self._batch_based:
self._time_start_batch_based = time.time()

def on_batch_end(self, batch, logs=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once per global batch.

@yhliang2018
Copy link
Contributor Author

Hi All,

Thanks a lot for the helpful comments and offline discussions. The code is updated and ready for review. Let me know if you have any comments.

The actual ImageNet dataset and unit tests will come in next PR. Thanks!

eval_results = {
"accuracy": history.history["val_acc"][epoch],
"loss": history.history["val_loss"][epoch],
"epoch": epoch + 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually epoch itself does not count as a metric.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be removed.

epochs=FLAGS.train_epochs,
callbacks=callbacks,
validation_data=val_dataset,
steps_per_epoch=int(np.ceil(train_num_images / batch_size)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why np.ceil is used here? I assume train_num_images and batch_size are just real number, rather than matrix.

return image_size


def generate_synthetic_input_dataset(model, num_imgs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function need to care about the data format? eg channel first vs last?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, all the built-in models use channel_last format. So it's not an option here.

self._batch_size = batch_size
self._every_n_steps = every_n_steps
self._logger = metric_logger or logger.BaseBenchmarkLogger()
self._global_step = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be set in on_train_begin, in case user reuse the callback instance for several training cycle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, here is something actually I am not quite sure. In the last commit, I put global_step into the on_train_begin, but Karmel suggests to move it to the constructor. I thought as callback is invoked during the training phase, maybe it makes no difference here? What do you mean by "reusing the callback instance in several training cycle"? @qlzh727 @karmel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on_train_begin gets called each time someone starts a train loop? Which would be for each epoch for example? Shouldn't global step be maintained for the life of the callback/model, rather than reset to 0 with each train loop? Or does on_train_begin only get called once?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect on_train_begin only gets called once when a training starts, eg a fit() call. For each epoch, it will use on_epoch_begin/end method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the code, it looks like on_train_begin gets called in the fit_loop, so it would be reset with each call to fit(). Since a model can be incrementally trained with fit(), we want the global step to persist across those training sessions. So, let's keep it in init for now.


def on_train_begin(self, logs=None):
self._train_time = 0
self._batch_times = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a side effect of saving all the historical timestamp/duration since it will consume quite some memory, and you actually don't need them all when you log the metric.

In my vision, the optimal implementation should be:

def on_train_begin(self):
  self._train_start_timestamp = time.time()
  self._previous_recorded_timestamp = time.time()

def on_batch_end(self):
  self._global_step += 1
  current_time = time.time()

  if self._global_step % self._every_n_steps == 0:
    average_examples_per_sec = self._batch_size * self._global_step / (current_time - self._train_start_timestamp)

    current_examples_per_sec = self._batch_size * self._every_n_steps / (current_time - self._previous_recorded_timestamp)
    self._previous_recorded_timestamp = current_time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart! Thanks!!


def __init__(self, metric_logger=None):
self._logger = metric_logger or logger.BaseBenchmarkLogger()
self._per_batch_metrics = _PER_BATCH_METRICS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we expose them as the constructor params?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we decide to use the pre-defined metrics as Karmel suggested to make the code simple.

self._logger = metric_logger or logger.BaseBenchmarkLogger()
self._per_batch_metrics = _PER_BATCH_METRICS
self._per_epoch_metrics = _PER_EPOCH_METRICS
self._global_step = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put it in on_train_start()

@yhliang2018
Copy link
Contributor Author

Hi Karmel and Scott,

I update this PR with latest API and also add a TODO for a WIP bug. Could you help to review it? Will merge it if everything looks fine. Thanks!

@yhliang2018 yhliang2018 requested review from qlzh727 and removed request for qlzh727 July 13, 2018 20:37
"densenet121": tf.keras.applications.DenseNet121,
"densenet169": tf.keras.applications.DenseNet169,
"densenet201": tf.keras.applications.DenseNet201,
# TODO (b/80431378)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: usually there is no space between TODO and bracket.

# Ensure a valid model name was supplied via command line argument
if FLAGS.model not in MODELS.keys():
raise AssertionError("The --model command line argument should "
"be a key in the `MODELS` dictionary.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the enum flag will ensure the value in valid.

val_dataset = dataset.generate_synthetic_input_dataset(
FLAGS.model, val_num_images)
else:
# Use the actual ImageNet dataset (TODO)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the TODO to the front of the line and maybe assign to yourself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove it, as we will only support synthetic dataset for now.


# Get dataset
dataset_name = "ImageNet"
num_gpus = flags_core.get_num_gpus(FLAGS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be moved to the place where its used.

"train_epochs": FLAGS.train_epochs
}

benchmark_logger = logger.config_benchmark_logger()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

train_epochs=2)

flags.DEFINE_enum(
name="model", default="resnet50",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird to benchmark resnet50 by default, is there any reason why it is chosen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Do you think it's better if we don't set default value here?


flags.DEFINE_enum(
name="model", default="resnet50",
enum_values=MODELS.keys(), case_sensitive=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why case_sensitive=True is needed here. I don't see models that with same name by different cases.

self._last_recorded_time = time.time()

def on_batch_begin(self, batch, logs=None):
self._time_start = time.time()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used anywhere, did u missed it or used typo somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!! It's useless given current implementation. Have removed it.

"""Log metrics after each epoch."""
for metric in _PER_EPOCH_METRICS:
self._logger.log_metric(
_PER_EPOCH_METRICS[metric],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the epoch based metric, there are two of them same as the ones in batch based. Should we dedup them since they will be logged by batch based anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to keep it. As the number of epochs are not large compared to the number of steps, there won't be too much redundant info.



# A dictionary to map the callback name and its corresponding function
CALLBACKS = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constants should be on the top

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for common constants. But here its values are two functions defined at the bottom. I think it's more clear to put it here for future extension, just like what we do for hooks.

@yhliang2018
Copy link
Contributor Author

Hi Scott,
Thank you for the comments! They are all addressed. Would you help to take a look at this new commit? Thanks.



def main(_):
def run_keras_model_benchmark(_):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can choose to use the input param here which is the flag object, and that will enable the value injection during test. This can be done in a later PR.

@yhliang2018 yhliang2018 merged commit 937a530 into master Jul 13, 2018
@yhliang2018 yhliang2018 deleted the feat/keras_benchmark branch July 13, 2018 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants