On the Variance of the Adaptive Learning Rate and Beyond
-
Updated
Jul 31, 2021 - Python
On the Variance of the Adaptive Learning Rate and Beyond
ADAM - A Question Answering System. Inspired from IBM Watson
RAdam implemented in Keras & TensorFlow
Learning Rate Warmup in PyTorch
Easy-to-use AdaHessian optimizer (PyTorch)
Partially Adaptive Momentum Estimation method in the paper "Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks" (accepted by IJCAI 2020)
[Python] [arXiv/cs] Paper "An Overview of Gradient Descent Optimization Algorithms" by Sebastian Ruder
Adam, NAdam and AAdam optimizers
Quasi Hyperbolic Rectified DEMON Adam/Amsgrad with AdaMod, Gradient Centralization, Lookahead, iterative averaging and decorrelated Weight Decay
Tensorflow-Keras callback implementing arXiv 1712.07628
AdaShift optimizer implementation in PyTorch
Homogeneous Vector Capsules Enable Adaptive Gradient Descent in Convolutional Neural Networks. This repository contains the code used for the experiments detailed in a paper currently submitted to IEEE Transactions on Neural Networks and Learning Systems. The paper is available pre-published at arXiv: http://arxiv.org/abs/1906.08676
Toy implementations of some popular ML optimizers using Python/JAX
implementation of neural network from scratch only using numpy (Conv, Fc, Maxpool, optimizers and activation functions)
Coding Assignment of PFN intern 2019
This is an implementation of Adam: A Method for Stochastic Optimization.
Add a description, image, and links to the adam topic page so that developers can more easily learn about it.
To associate your repository with the adam topic, visit your repo's landing page and select "manage topics."