Skip to content

velvetmonkey/sgd-lean

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sgd-lean

Lean 4 Mathlib License: MIT Proofs Zenodo

sgd-lean: Formal Proofs of Bounded-Noise SGD Convergence in Lean 4

Lean 4 formal proofs of stochastic gradient descent convergence under a deterministic bounded-noise oracle: a per-step distance bound and an O(1/√K) averaged-regret rate.

Zero sorry statements. Standard axioms only (propext, Classical.choice, Quot.sound).

Why it matters

Stochastic gradient descent is the workhorse of modern machine learning, but its iterates use noisy gradients rather than the true gradient. The central question is whether that noise still allows convergence — and at what rate. For convex objectives the answer is the classic O(1/√K) rate on the averaged optimality gap.

This library machine-checks that rate in Lean 4. To stay within Mathlib's current capabilities, it uses a deterministic bounded-noise model rather than a fully stochastic one: each oracle gradient gₖ is only assumed to satisfy ‖gₖ − ∇f(xₖ)‖ ≤ σ. Mathlib lacks the conditional-expectation / filtration infrastructure needed to state and manipulate the stochastic (in-expectation) version, so the bounded-noise model is the faithful deterministic surrogate. It yields an O(1/√K) + O(σR) bound — the σR term is the irreducible worst-case-noise error that would vanish in expectation in the stochastic setting.

Setting

A real inner product space E, with a convex, L-smooth objective f : E → ℝ, minimiser x*, and an abstract gradient map ∇f stated via first-order conditions (avoiding heavy differentiability infrastructure). The oracle is deterministic bounded-noise:

‖g_k − ∇f(x_k)‖ ≤ σ

The update is x_{k+1} = x_k − α • g_k with constant step size α > 0.

Main result

Under bounded iterates ‖xₖ − x*‖ ≤ R with constant step α > 0:

(1/K) Σ_{k<K} (f(x_k) − f*) ≤ R²/(2αK) + σR + α(L²R² + σ²)

Setting α = c/√K balances the R²/(2αK) and α(L²R² + σ²) terms, giving the O(1/√K) rate (with the additive σR worst-case-noise floor).

Project structure

SGD/
├── Defs.lean         — SGDSetup (convex L-smooth f, bounded-noise σ), first-order
│                       conditions, gap_nonneg, sgd_step
├── NoisyOracle.lean  — per-step distance bound: norm_sub_smul_sq,
│                       inner_noisy_grad_ge, noisy_grad_norm_sq_le, sgd_step_bound
└── Convergence.lean  — telescoping lemmas, SGDRun trajectory, sgd_convergence
SGD.lean              — Root module

Theorem inventory

8 headline results.

# Name Statement
1 norm_sub_smul_sq ‖a − α•b‖² = ‖a‖² − 2α⟪b,a⟫ + α²‖b‖² (squared-norm expansion)
2 inner_noisy_grad_ge ⟪g, x−x*⟫ ≥ (f(x)−f*) − σ‖x−x*‖ (convexity + Cauchy–Schwarz)
3 noisy_grad_norm_sq_le ‖g‖² ≤ 2L²‖x−x*‖² + 2σ² (noisy gradient norm bound)
4 sgd_step_bound ‖x′−x*‖² ≤ ‖x−x*‖² − 2α(f(x)−f*) + 2ασ‖x−x*‖ + α²(2L²‖x−x*‖²+2σ²)
5 telescope_sum_le Σ aₖ ≤ D(0) − D(K) + Σ bₖ when D(k+1) ≤ D(k) − aₖ + bₖ
6 telescope_sum_le' Σ aₖ ≤ D(0) + Σ bₖ (dropping − D(K) ≥ 0)
7 SGDRun.step_bound_uniform per-step bound with bounded iterates ‖xₖ−x*‖ ≤ R
8 sgd_convergence (1/K) Σ (f(xₖ)−f*) ≤ R²/(2αK) + σR + α(L²R²+σ²) — O(1/√K)

Design note

The analysis uses a deterministic bounded-noise oracle (‖gₖ − ∇f(xₖ)‖ ≤ σ) rather than a stochastic one, because Mathlib does not yet provide the conditional-expectation / filtration machinery required for the in-expectation analysis. The objective's convexity and L-smoothness are stated as first-order conditions on an abstract gradient map, keeping the development self-contained and free of Fréchet-derivative bookkeeping while remaining valid over a general real inner product space.

Dependencies

  • Lean 4.28.0
  • Mathlib v4.28.0

Paper

sgd-lean: Formal Proofs of Bounded-Noise SGD Convergence in Lean 4 Ben Cassie (2026). Companion paper: paper.md.

DOI: https://doi.org/10.5281/zenodo.20475583

Related work

Acknowledgements

Proofs in this library were generated using Aristotle, an AI proof assistant for Lean 4 and Mathlib. The proof discipline — zero sorry, standard axioms only — was specified by the author and enforced by the Lean type checker.

Author

Ben Cassie · @thevelvetmonke

About

Lean 4 formal proofs of stochastic gradient descent convergence: O(1/√k) rate for convex objectives. 0 sorry.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages