Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training time #12

Closed
bonlime opened this issue Oct 29, 2021 · 1 comment
Closed

Training time #12

bonlime opened this issue Oct 29, 2021 · 1 comment

Comments

@bonlime
Copy link

bonlime commented Oct 29, 2021

Hi, first of all thanks for a very interesting paper.

I would like to know how long did it take you to train the models? I'm trying to train ConvMixer-768/32 using 2xV100 and one epoch is ~3 hours, so I would estimate that full training would take ~= 2 * 3 * 300 ~= 1800 GPU hours, which is insane. Even if you trained with 10 GPUs it would take ~1 week for one experiment to finish. Are my calculations correct?

@tmp-iclr
Copy link
Collaborator

I think you're correct that it takes approximately a week to train a ConvMixer on ImageNet-1k on 10 GPUs (we used RTX8000s). The ConvMixer-1536/20 took ~9 days and the ConvMixer-768/32 took ~8 days for twice the number of epochs (300 vs. 150). The model is indeed quite slow, but we are optimistic that low-level optimizations of large-kernel depthwise convolution could improve this -- we are currently looking into that.

Another option is to try using a larger patch size (like patch_size=14), which will be significantly faster but less accurate. We also suspect that this could be improved by spending some time on parameter tuning.

I'm going to close this issue for now, but feel free to reopen it or open a new issue if you have more questions or comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants