Simultaneous Perturbation Stochastic Approximation is a global optimizer for continuous loss functions of any dimension. It has a strong theoretical foundation with very few knobs that must be tuned. While it doesn't get the press of Genetic Algorithms or other Evolutionary Computing, it is on the same level with a better foundation.
Still working on this...
You can install via
cabal install spsa
Getting the Source
git clone git://github.com/yanatan16/haskell-spsa spsa
Set up a sandbox.
The first time through, you need to download and install a ton of dependencies, so hang in there.
cd spsa cabal-dev install \ --enable-tests \ --enable-benchmarks \ --only-dependencies \ -j
cabal-dev command is just a sandboxing wrapper around the
cabal command. The
-j flag above tells
cabal to use all of your
CPUs, so even the initial build shouldn't take more than a few
cabal-dev configure --enable-tests --enable-benchmarks cabal-dev build
Note: For the development mode use
--flags=developer. Development mode allows you to run tests from ghci easily to allow some good ole TDD.
Once you've built the code, you can run the entire test suite in a few seconds.
dist/build/tests/tests +RTS -N
We use the direct executable rather than
cabal-dev tests because it doesn't pass through options very well. The
+RTS -N above tells GHC's runtime system to use all available cores. If you want to explore, the
tests program (
dist/build/tests/tests) accepts a
--help option. Try it out.
Tests From GHCI
You can run all the tests from ghci.
starts up the REPL.
> import Test.SPSA > runAllTests
Or you can run a single test
> runTest "SPSA*" -- follows test patterns in Test.Framework > runGroup 4 -- if you know the group number
You can run benchmarks similar to tests.
Just like with tests, there's a
--help option to explore.
SPSA minimizes a loss function, so maximization problems must negate the fitness function. It can optionally use a constraint mapper that runs each round of SPSA. SPSA is an iterative or recursive optimization algorithm based as a stochastic extension of gradient descent. Like the Finite Difference (FDSA) approximation to the gradient, SPSA uses Simultaneous Perturbation to estimate the gradient of the loss function being minimized. However, it only uses two loss measurements, regardless of the dimension of the parameter vector, whereas FDSA uses 2p where p is the dimension of the parameter vector. Surprisingly, this shortcut has no ill effect on convergence rate and only a small effect on convergence criteria.
Since SPSA uses a small amount of randomness in its gradient estimate, it also produces some noise. This noise is the good kind however, because it promotes SPSA to become a global optimizer with the same convergence rate as FDSA!
SPSA has fewer tuning knobs than many other global optimization methods, but it still requires some work. The most important parameter is
a in the
a(k) = a / (k + 1 + A) ^ alpha gain sequence (see tuning below). It can wildly affect results. If you wish to limit function measurements, since SPSA uses two function measurements per iteration, pass in N / 2 as the number of rounds to run when using SPSA.
For more information on SPSA, please see Spall's papers from his website.
To tune SPSA, I suggest the Semiautomatic Tuning method by Spall introduced in his book Introduction to Stochastic Search and Optimization (ISSO). There are 3 main knobs, the two gain sequences ( a and c), and the perturbation distribution, delta. Delta is best chosen as Bernoulli +/- 1, as that is asymptotically optimal (it must not be Normal or Uniform, see ISSO). There are rules about the properties of the gain sequences. The standard form works well, and follows the forms:
a(k) = a / (k + 1 + A) ^ alpha c(k) = c / (k + 1) ^ gamma
Here, the values alpha and gamma are tied together, asymptotically optimial is alpha = 1 and gamma = 1/6, but the values alpha = .602 and gamma = .101 work well for finite cases. A should be about 10% of the total iterations, and c should be approximately the standard deviation of the loss function in question. a is the most volatile parameter, and must be tuned carefully. In general, one should pick a such that the first step is not too large so as to send the algorithm in the wrong direction.
- SPSA Website
- Introduction to Stochastic Search and Optimization. James Spall. Wiley 2003.
The MIT License found in the LICENSE file.