Skip to content

Conversation

guptapriya
Copy link
Contributor

Added the following changes to improve input pipeline:

  1. Remove usage of one_hot labels (~2% improvement)
  2. Add drop_remainder to batching. This helps in input shape being determined ahead of time which helps improve performance. (~2% improvement)
  3. Use parallel_interleave for imagenet dataset reading. (~3% improvement)

The improvements were observed when testing with 8 V100 GPUs on DGX-1 using mirrored strategy (fp16, resnet v2). The differences may not be noticeable in other cases where input pipeline is not the bottleneck.

@guptapriya guptapriya requested review from a team and karmel as code owners May 11, 2018 23:54
@guptapriya guptapriya requested a review from robieta May 11, 2018 23:56
Copy link
Contributor

@robieta robieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall this is a good balance between performance and accessibility. Thanks a lot for doing this.

dataset = dataset.shuffle(buffer_size=_NUM_TRAIN_FILES)

# Convert to individual records
# TODO(guptapriya): Should we make this cycle_length a flag similar to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you get a chance to check how sensitive performance is to this on both something big (DGX, V100 GCE) and something small (1x K80/P100)? I prefer not to have a performance flag unless it makes a big difference. And if it is a constant it would be nice to have a brief comment so it isn't just a magic number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I haven't had the chance to run on things other than DGX-1V. I don't think the performance difference will show up on K80s because the input pipeline will not be the bottleneck. But I haven't tested it. I am talking to the input team to figure out if a constant here makes sense, or should this be tuned (in which case we may need to just remove it)

batch_size=batch_size,
num_parallel_batches=1))
num_parallel_batches=1,
drop_remainder=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could matter for cifar with only 60k images.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not matter much as long as the batch size is reasonable, since this only drops the part of the dataset that doesn't fit into a full batch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it only drops the last partial batch. so for 60k images for a batch size of 2048 also it should drop only 608 images or so.

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

@pkulzc
Copy link
Contributor

pkulzc commented May 15, 2018

The object detection change in this PR has already been approved and merged in other PR so object detection reviews are not necessary. I'm removing derekjchow, jch1 and me from reviewers.
Please let me know if further reviews from us are required. Thanks!

@pkulzc pkulzc removed request for derekjchow, jch1 and pkulzc May 15, 2018 06:44
@guptapriya
Copy link
Contributor Author

@pkulzc - yes! I have messed up this PR while doing rebase etc.. Hence it pulled in some already merged changes :( Sorry about the spam. I will fix it, and yes, no action is needed from you guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants