Homework #2 Slide 14: Make a Colab code
using PyTorch to
determine the optimal
points for the means
shown in the text
alongside – you make
up the problem and
you solve it yourself to
fully understand the
role of cost functions in
optimization problems

This notebook demonstrates how different cost (loss) functions lead to different optimal points
for the same dataset. We numerically verify that:

- L2 loss → arithmetic mean
- L1 loss → median
- Log-space L2 loss → geometric mean
- Reciprocal-space L2 loss → harmonic mean

All optimizations are done using gradient descent in PyTorch.


In [8]:
import torch
torch.manual_seed(0)

# synthetic dataset
x = torch.tensor([1.0, 2.0, 4.0, 8.0])
print("Data:", x.tolist())

Data: [1.0, 2.0, 4.0, 8.0]


## L2 Loss → Arithmetic Mean

Cost function:
J(u) = sum (x_i - u)^2

In [9]:
u = torch.tensor(0.0, requires_grad=True)
optimizer = torch.optim.SGD([u], lr=0.1)

for _ in range(200):
    optimizer.zero_grad()
    loss = torch.sum((x - u)**2)
    loss.backward()
    optimizer.step()

print("Arithmetic mean (gradient descent):", u.item())
print("True arithmetic mean:", x.mean().item())


Arithmetic mean (gradient descent): 3.75
True arithmetic mean: 3.75


## L1 Loss → Median

Cost function:
J(u) = sum |x_i - u|


In [10]:
u = torch.tensor(0.0, requires_grad=True)
optimizer = torch.optim.SGD([u], lr=0.05)

for _ in range(300):
    optimizer.zero_grad()
    loss = torch.sum(torch.abs(x - u))
    loss.backward()
    optimizer.step()

print("Median (gradient descent):", u.item())
print("True median:", x.median().item())


Median (gradient descent): 2.0500001907348633
True median: 2.0


## Log-Space L2 Loss → Geometric Mean

Cost function:
J(u) = sum (log x_i - log u)^2


In [11]:
u = torch.tensor(1.0, requires_grad=True)
optimizer = torch.optim.SGD([u], lr=0.1)

for _ in range(300):
    optimizer.zero_grad()
    loss = torch.sum((torch.log(x) - torch.log(u))**2)
    loss.backward()
    optimizer.step()

print("Geometric mean (gradient descent):", u.item())
print("True geometric mean:", torch.exp(torch.mean(torch.log(x))).item())

Geometric mean (gradient descent): 2.8284261226654053
True geometric mean: 2.8284270763397217


## Reciprocal-Space L2 Loss → Harmonic Mean

Cost function:
J(u) = sum (1/x_i - 1/u)^2

In [12]:
u = torch.tensor(1.0, requires_grad=True)
optimizer = torch.optim.SGD([u], lr=0.5)

for _ in range(400):
    optimizer.zero_grad()
    loss = torch.sum((1/x - 1/u)**2)
    loss.backward()
    optimizer.step()

print("Harmonic mean (gradient descent):", u.item())
print("True harmonic mean:", len(x) / torch.sum(1/x).item())


Harmonic mean (gradient descent): 2.133333921432495
True harmonic mean: 2.1333333333333333


## Comparison of Optimal Points


In [13]:
means = {
    "Arithmetic Mean (L2)": x.mean().item(),
    "Median (L1)": x.median().item(),
    "Geometric Mean": torch.exp(torch.mean(torch.log(x))).item(),
    "Harmonic Mean": len(x) / torch.sum(1/x).item()
}

for k, v in means.items():
    print(f"{k}: {v}")


Arithmetic Mean (L2): 3.75
Median (L1): 2.0
Geometric Mean: 2.8284270763397217
Harmonic Mean: 2.1333333333333333


## Conclusion

This experiment shows that the optimal solution of an optimization problem depends entirely
on the chosen cost function. Different losses emphasize different aspects of the data, leading
to different “best” points even when the dataset is fixed.

Optimization is therefore not only about data — it is about what you choose to penalize, and this speaks to the variability we see with different uses of functions.
