-
Notifications
You must be signed in to change notification settings - Fork 45.5k
Keras model benchmark #4476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keras model benchmark #4476
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add unit test as well.
- NASNet | ||
|
||
## Dataset | ||
ImageNet data is used for the benchmark. To begin, you will need to download the ImageNet dataset and convert it to TFRecord format. Follow along with the [Inception guide](https://github.com/tensorflow/models/tree/master/research/inception#getting-started) in order to prepare the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data is a plural noun I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not typically in the US :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is this the case? Or did we decide data will be processed in a different fashion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ImageNet preparation should be the same. Will update it accordingly when we are done with the actual ImageNet dataset.
# pylint: enable=g-bad-import-order | ||
|
||
from official.keras_application_models import model_callbacks | ||
from official.resnet import resnet_run_loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importing resnet_run_loop for a util function seems to be weird, let extract that into a common util function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove it after sync dataset is in utils.
MODELS = { | ||
"vgg16": tf.keras.applications.VGG16, | ||
"vgg19": tf.keras.applications.VGG19, | ||
"inception": tf.keras.applications.InceptionV3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets put the version number in the model name as well, so its more explicit.
_NUM_CLASSES = 1000 | ||
|
||
# Define a dictionary that maps model names to their model classes inside Keras | ||
MODELS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using key-value pair, can we just use the list of full model name and find them in "tf.keras.applications"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the explicit dict is sensible. Trying to go from "VGG16"
to tf.keras.applications.VGG16
is a slippery slope of python magic and I think isn't worth the heartache. (If that is what is being suggested.)
callbacks_list: a list of strings to name desired callbacks. Allowed: | ||
ExamplesPerSecondCallback, LoggingMetricCallback, which are defined | ||
in CALLBACKS. | ||
batch_size: an int of batch size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the usage of this batch_size?
benchmark_logger = logger.config_benchmark_logger(FLAGS) | ||
benchmark_logger.log_run_info( | ||
model_name=FLAGS.model, | ||
dataset_name="ImageNet", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synthetic_data is not real data, so dataset_name should have synthetic as suffix.
This callback records the average examples per second during training. | ||
""" | ||
|
||
def __init__(self, batch_size=None, epoch_size=None, batch_based=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having both boolean param and also int param is bit redundant. If user specified batch_size, I think its clear that they want to measure based on batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean when users specify batch_size in command line, we should use batch_based benchmark? as we have default batch_size for the model. When users use callbacks, they just specify names. Maybe we should accept more args like hooks?
|
||
def __init__(self, batch_size=None, epoch_size=None, batch_based=None, | ||
epoch_based=None, metric_logger=None): | ||
if (batch_based is None) == (epoch_based is None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it have to be either of the option, but not both? Batch based provides a micro view of training speed, and epoch based provides a overview of speed. User might want to get both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can support both, if users need it. (you will be the first user :) )
if self._batch_based: | ||
self._train_time_batch_based += time.time() - self._time_start_batch_based | ||
examples_per_sec_batch_based = self._batch_size * ( | ||
self._global_step / self._train_time_batch_based) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. why the global_step is here? It should just be batch_size / time_per_batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Assuming batch_size is global, and this func is called once per global batch.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, Karmel. Right, this is the global average one, not the exp_sec for the current batch. We can add the current_exp_sec if necessary.
self._epochs += 1 | ||
self._train_time_epoch_based += time.time() - self._time_start_epoch_based | ||
examples_per_sec_epoch_based = self._epoch_size * ( | ||
self._epochs / self._train_time_epoch_based) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nine models in one-- fantastic.
@@ -0,0 +1,31 @@ | |||
# Keras Application Models Benchmark | |||
## Overview | |||
This provides a single scaffold to benchmark the nine Keras built-in application [models](https://keras.io/applications/). All the models are for image classification application, and include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: are for image classification applications
- NASNet | ||
|
||
## Dataset | ||
ImageNet data is used for the benchmark. To begin, you will need to download the ImageNet dataset and convert it to TFRecord format. Follow along with the [Inception guide](https://github.com/tensorflow/models/tree/master/research/inception#getting-started) in order to prepare the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not typically in the US :)
- NASNet | ||
|
||
## Dataset | ||
ImageNet data is used for the benchmark. To begin, you will need to download the ImageNet dataset and convert it to TFRecord format. Follow along with the [Inception guide](https://github.com/tensorflow/models/tree/master/research/inception#getting-started) in order to prepare the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is this the case? Or did we decide data will be processed in a different fashion?
ImageNet data is used for the benchmark. To begin, you will need to download the ImageNet dataset and convert it to TFRecord format. Follow along with the [Inception guide](https://github.com/tensorflow/models/tree/master/research/inception#getting-started) in order to prepare the dataset. | ||
|
||
## Callbacks | ||
Two customized callbacks are provided for model benchmark: ExamplesPerSecondCallback and LoggingMetricCallback. For each callback, `epoch_based` and `batch_based` options are available to set the benchmark level. Check [model_callbacks.py](model_callbacks.py) for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Two custom callbacks are provided for model benchmarking
- MobileNet | ||
- DenseNet | ||
- NASNet | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resnet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess its ResNet50 only
self._metrics = metrics or _METRICS_TO_LOG | ||
for metric in self._metrics: | ||
if metric.strip().lower() not in _METRICS_TO_LOG.keys(): | ||
raise ValueError("Unrecognized metric requested: {}".format(metric)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super's init?
self._logger = metric_logger or logger.BaseBenchmarkLogger() | ||
self._metrics = metrics or _METRICS_TO_LOG | ||
for metric in self._metrics: | ||
if metric.strip().lower() not in _METRICS_TO_LOG.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why give a choice in that case? Let's just always log all _METRICS_TO_LOG?
raise ValueError("Unrecognized metric requested: {}".format(metric)) | ||
|
||
def on_train_begin(self, logs=None): | ||
self._global_step = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto on the above-- consider putting this in the init?
self._logger.log_metric( | ||
_METRICS_TO_LOG[metric], | ||
logs.get(metric), | ||
global_step=self._global_step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's split metrics_to_log into per_batch_metrics and per_epoch_metrics, and then log for both epoch and batch, without a choice for the user. That will simplify this code a lot, and I believe will match the use-case, though @yhliang2018 , @qlzh727 , let me know if I am incorrect.
"""Log metrics after each epoch.""" | ||
if self._epoch_based: | ||
for metric in self._metrics.keys(): | ||
metric = metric.strip().lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stripping and lowering is unnecessary once we just use the set of constants. Even if done, should be done on init, rather than every time we want to log.
_NUM_CLASSES = 1000 | ||
|
||
# Define a dictionary that maps model names to their model classes inside Keras | ||
MODELS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the explicit dict is sensible. Trying to go from "VGG16"
to tf.keras.applications.VGG16
is a slippery slope of python magic and I think isn't worth the heartache. (If that is what is being suggested.)
epochs=FLAGS.train_epochs, | ||
callbacks=callbacks, | ||
validation_data=val_dataset, | ||
steps_per_epoch=int(np.ceil(train_num_images / float(batch_size))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is float()
necessary with from __future__ import division
?
if self._batch_based: | ||
self._time_start_batch_based = time.time() | ||
|
||
def on_batch_end(self, batch, logs=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once per global batch.
Hi All, Thanks a lot for the helpful comments and offline discussions. The code is updated and ready for review. Let me know if you have any comments. The actual ImageNet dataset and unit tests will come in next PR. Thanks! |
eval_results = { | ||
"accuracy": history.history["val_acc"][epoch], | ||
"loss": history.history["val_loss"][epoch], | ||
"epoch": epoch + 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually epoch itself does not count as a metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be removed.
epochs=FLAGS.train_epochs, | ||
callbacks=callbacks, | ||
validation_data=val_dataset, | ||
steps_per_epoch=int(np.ceil(train_num_images / batch_size)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why np.ceil is used here? I assume train_num_images and batch_size are just real number, rather than matrix.
return image_size | ||
|
||
|
||
def generate_synthetic_input_dataset(model, num_imgs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function need to care about the data format? eg channel first vs last?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, all the built-in models use channel_last format. So it's not an option here.
self._batch_size = batch_size | ||
self._every_n_steps = every_n_steps | ||
self._logger = metric_logger or logger.BaseBenchmarkLogger() | ||
self._global_step = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be set in on_train_begin, in case user reuse the callback instance for several training cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, here is something actually I am not quite sure. In the last commit, I put global_step into the on_train_begin, but Karmel suggests to move it to the constructor. I thought as callback is invoked during the training phase, maybe it makes no difference here? What do you mean by "reusing the callback instance in several training cycle"? @qlzh727 @karmel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on_train_begin gets called each time someone starts a train loop? Which would be for each epoch for example? Shouldn't global step be maintained for the life of the callback/model, rather than reset to 0 with each train loop? Or does on_train_begin only get called once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect on_train_begin only gets called once when a training starts, eg a fit() call. For each epoch, it will use on_epoch_begin/end method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the code, it looks like on_train_begin gets called in the fit_loop, so it would be reset with each call to fit(). Since a model can be incrementally trained with fit(), we want the global step to persist across those training sessions. So, let's keep it in init for now.
|
||
def on_train_begin(self, logs=None): | ||
self._train_time = 0 | ||
self._batch_times = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a side effect of saving all the historical timestamp/duration since it will consume quite some memory, and you actually don't need them all when you log the metric.
In my vision, the optimal implementation should be:
def on_train_begin(self):
self._train_start_timestamp = time.time()
self._previous_recorded_timestamp = time.time()
def on_batch_end(self):
self._global_step += 1
current_time = time.time()
if self._global_step % self._every_n_steps == 0:
average_examples_per_sec = self._batch_size * self._global_step / (current_time - self._train_start_timestamp)
current_examples_per_sec = self._batch_size * self._every_n_steps / (current_time - self._previous_recorded_timestamp)
self._previous_recorded_timestamp = current_time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smart! Thanks!!
|
||
def __init__(self, metric_logger=None): | ||
self._logger = metric_logger or logger.BaseBenchmarkLogger() | ||
self._per_batch_metrics = _PER_BATCH_METRICS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we expose them as the constructor params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we decide to use the pre-defined metrics as Karmel suggested to make the code simple.
self._logger = metric_logger or logger.BaseBenchmarkLogger() | ||
self._per_batch_metrics = _PER_BATCH_METRICS | ||
self._per_epoch_metrics = _PER_EPOCH_METRICS | ||
self._global_step = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put it in on_train_start()
Hi Karmel and Scott, I update this PR with latest API and also add a TODO for a WIP bug. Could you help to review it? Will merge it if everything looks fine. Thanks! |
"densenet121": tf.keras.applications.DenseNet121, | ||
"densenet169": tf.keras.applications.DenseNet169, | ||
"densenet201": tf.keras.applications.DenseNet201, | ||
# TODO (b/80431378) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: usually there is no space between TODO and bracket.
# Ensure a valid model name was supplied via command line argument | ||
if FLAGS.model not in MODELS.keys(): | ||
raise AssertionError("The --model command line argument should " | ||
"be a key in the `MODELS` dictionary.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the enum flag will ensure the value in valid.
val_dataset = dataset.generate_synthetic_input_dataset( | ||
FLAGS.model, val_num_images) | ||
else: | ||
# Use the actual ImageNet dataset (TODO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the TODO to the front of the line and maybe assign to yourself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remove it, as we will only support synthetic dataset for now.
|
||
# Get dataset | ||
dataset_name = "ImageNet" | ||
num_gpus = flags_core.get_num_gpus(FLAGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be moved to the place where its used.
"train_epochs": FLAGS.train_epochs | ||
} | ||
|
||
benchmark_logger = logger.config_benchmark_logger() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the benchmark_context() for logger config. See https://github.com/tensorflow/models/blob/master/official/utils/logs/logger.py#L96 and sample in https://github.com/tensorflow/models/blob/master/official/resnet/cifar10_main.py#L253
train_epochs=2) | ||
|
||
flags.DEFINE_enum( | ||
name="model", default="resnet50", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit weird to benchmark resnet50 by default, is there any reason why it is chosen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. Do you think it's better if we don't set default value here?
|
||
flags.DEFINE_enum( | ||
name="model", default="resnet50", | ||
enum_values=MODELS.keys(), case_sensitive=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why case_sensitive=True is needed here. I don't see models that with same name by different cases.
self._last_recorded_time = time.time() | ||
|
||
def on_batch_begin(self, batch, logs=None): | ||
self._time_start = time.time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used anywhere, did u missed it or used typo somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!! It's useless given current implementation. Have removed it.
"""Log metrics after each epoch.""" | ||
for metric in _PER_EPOCH_METRICS: | ||
self._logger.log_metric( | ||
_PER_EPOCH_METRICS[metric], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the epoch based metric, there are two of them same as the ones in batch based. Should we dedup them since they will be logged by batch based anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to keep it. As the number of epochs are not large compared to the number of steps, there won't be too much redundant info.
|
||
|
||
# A dictionary to map the callback name and its corresponding function | ||
CALLBACKS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constants should be on the top
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes for common constants. But here its values are two functions defined at the bottom. I think it's more clear to put it here for future extension, just like what we do for hooks.
Hi Scott, |
|
||
|
||
def main(_): | ||
def run_keras_model_benchmark(_): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can choose to use the input param here which is the flag object, and that will enable the value injection during test. This can be done in a later PR.
Hi All,
Could you help to review this PR of keras application model benchmark? It includes:
Note that, the current benchmark runs with synthetic dataset to test the pipeline. Will test it with ImageNet dataset in next PR. Thanks!