Instead of reducing the learning rate in steps (Step Decay), Exponential Decay smoothly decreases the learning rate at each step using an exponential function:

η t=η 0⋅e −kt
 
Where:
ηt is the learning rate at step 

η 0 is the initial learning rate.

k is a decay rate (a small constant).

e is Euler’s number (~2.718).

t is the current training step/epoch.

Why is Exponential Decay Useful?

✅ Smooth Reduction – Unlike Step Decay, which makes sudden jumps, Exponential Decay ensures a gradual decrease, preventing instability.

✅ Better Convergence – Keeps learning fast in the beginning and slows down at later stages.

✅ Common in Deep Learning – Used in models like CNNs, RNNs, Transformers.

Real-World Analogy

Think of boiling water and cooling it down.

Initially, the temperature is high (fast learning).

You gradually reduce the heat instead of turning it off suddenly (smooth decay).

This helps stabilize the process without shocking the system.

In [32]:
import torch

import torch.optim as optim

In [33]:
# Define a dummy model

model = torch.nn.Linear(2,1)

In [34]:
# Set an initial learning rate

initial_lr = 0.1

In [35]:
# Define the optimizer

optimizer = optim.SGD(model.parameters(), lr=initial_lr)

In [36]:
# Define Exponential LR Scheduler

gamma = 0.9 # Decay factor

scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=gamma)

In [37]:
# Simulate learning rate decay for 10 epochs

for epoch in range(10):

    optimizer.step()  # Simulating an optimizer step

    scheduler.step()  # Simulating an optimizer step

    print(f'Epoch : {epoch +1} , Learning rate : {scheduler.get_last_lr()[0]:.5f}')

Epoch : 1 , Learning rate : 0.09000
Epoch : 2 , Learning rate : 0.08100
Epoch : 3 , Learning rate : 0.07290
Epoch : 4 , Learning rate : 0.06561
Epoch : 5 , Learning rate : 0.05905
Epoch : 6 , Learning rate : 0.05314
Epoch : 7 , Learning rate : 0.04783
Epoch : 8 , Learning rate : 0.04305
Epoch : 9 , Learning rate : 0.03874
Epoch : 10 , Learning rate : 0.03487


We define a linear model (dummy).

Use SGD with an initial learning rate of 0.1.

Apply Exponential Decay using gamma=0.9 (meaning LR reduces by 10% every step).

Print the learning rate at each epoch.