GitHub - vinnik-dmitry07/full-batch: Super-Convergence on CIFAR10

📈 Free 5% Accuracy Boost with Super-Convergence: Revisited

Insights:

SGD is still the best (blue plot)
LAMB sucks (trains 100 years to performance of AdamW) (red plot)
AdamW is the best among stable optimizers (2nd image, red box)
Gradient accumulation sucks (X mini-batch SGD x Y Gradient accumulation != XY batch LAMB,
Super-Convergence is not noticeable, more GA charts of various optimizers to justify the point are on the 3rd image)
The One Cycle Scheduler saturates after some number of epoch (4th image)
Stochastic Weight Averaging does not improve the validation accuracy significantly: +0.0678%, std: 0.143%, though sometimes stabilizes training (5th image, red box)
Schedule-free optimizers are mid and not schedule-free as advertised
Sharpness-Aware Minimization consistently gives +0.684%, std: 0.0869% (6th image)

References (7th image):

2017 Loshchilov, Decoupled Weight Decay Regularization (AdamW)
2019 Smith, Super-convergence: very fast training of neural networks using large learning rates
2020 You, Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
2020 Foret, Sharpness-Aware Minimization for Efficiently Improving Generalization
2022 Geiping, Cramming: Training a Language Model on a Single GPU in One Day
2023 Chen, Symbolic Discovery of Optimization Algorithms
2023 Liu, Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
2023 Kaddour, No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
2024 Defazio, The Road Less Scheduled
2024 Hägele, Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Name	Name	Last commit message	Last commit date
Latest commit vinnik-dmitry07 Update train.py Jun 17, 2024 bba3049 · Jun 17, 2024 History 15 Commits
assets	assets	Update	Jun 17, 2024
data	data	Update	Jun 15, 2024
data_long_swa	data_long_swa	Update	Jun 17, 2024
data_sam	data_sam	Update	Jun 17, 2024
README.md	README.md	Update	Jun 17, 2024
compare_swa_sam.py	compare_swa_sam.py	Update	Jun 17, 2024
plot_results.py	plot_results.py	Update plot_results.py	Jun 15, 2024
requirements.txt	requirements.txt	Create requirements.txt	Jun 10, 2024
train.py	train.py	Update train.py	Jun 17, 2024

Provide feedback