-
-
Notifications
You must be signed in to change notification settings - Fork 19
The single pole balancing experiment
The pole-balancing or inverted pendulum problem has long been established as a standard benchmark for artificial learning systems. It is one of the best early examples of a reinforcement learning task under conditions of incomplete knowledge.
The pole-balancing problem can be visualized as follows:
As you can see from the scheme above, the physical system has the following constraints:
- The pole must remain upright within ±r the pole failure angle
- The cart must remain within ±h of origin
- The controller must always exert a non-zero force Fx
Where r is a pole failure angle (±12 ̊ from 0 ̊) and h is a track limit (±2.4 meters from the track centre).
The simulation of the cart balancing terminated when either the following conditions were met: the pole inclines over the failure angle, or the cart position is outside of the track.
The experiment's goal is to devise a controller that can keep the pole balanced for a defined number of simulation steps. The controller always applies to the cart a force at full magnitude in either direction (bang-bang control). The successful controller should simulate a single-pole balancing for at least 500'000 time steps (10'000 simulated seconds).
To run this experiment with a population size equal to 150 organisms, execute the following commands:
cd $GOPATH/src/github.com/yaricom/goNEAT
go run executor.go -out ./out/pole1 -context ./data/pole1_150.neat -genome ./data/pole1startgenes -experiment cart_pole
The above command will execute 100 trials of a single pole-balancing experiment within 100 generations with a population of 150 organisms. The executor will save the results of the experiment in the output directory.
The console output will look similar to the following:
Solved 100 trials from 100, success rate: 1.000000
Random seed: 1620479404
Average
Trial duration: 221.247774ms
Epoch duration: 39.216614ms
Generations/trial: 10.9
Champion found in 83 trial run
Winner Nodes: 7
Winner Genes: 10
Winner Evals: 634
Diversity: 47
Complexity: 16
Age: 3
Fitness: 1.000000
Average among winners
Winner Nodes: 7.1
Winner Genes: 11.5
Winner Evals: 1557.2
Generations/trial: 10.9
Diversity: 59.940000
Complexity: 18.600000
Age: 3.570000
Fitness: 1.000000
Averages for all organisms evaluated during experiment
Diversity: 37.594982
Complexity: 18.064586
Age: 2.396373
Fitness: 0.171238
Efficiency score: 11.135725
>>> Start genome file: ./data/pole1startgenes
>>> Configuration file: ./data/pole1_150.neat
As you can see from the statistics above, the NEAT algorithm is extremely effective in finding a successful solver for pole balancing. It was able to find a solver in all trials compared to the XOR problem, where only about 90 percent of the trials were solved.
Next, we will evaluate the algorithm against a more advanced variant of the pole balancing problem: double-pole balancing.