Skip to content

The single pole balancing experiment

Iaroslav Omelianenko edited this page May 8, 2021 · 7 revisions

The pole-balancing or inverted pendulum problem has long been established as a standard benchmark for artificial learning systems. It is one of the best early examples of a reinforcement learning task under conditions of incomplete knowledge.

The pole-balancing problem can be visualized as follows:

The single pole-balancing

As you can see from the scheme above, the physical system has the following constraints:

  1. The pole must remain upright within ±r the pole failure angle
  2. The cart must remain within ±h of origin
  3. The controller must always exert a non-zero force Fx

Where r is a pole failure angle (±12 ̊ from 0 ̊) and h is a track limit (±2.4 meters from the track centre).

The simulation of the cart balancing terminated when either the following conditions were met: the pole inclines over the failure angle, or the cart position is outside of the track.

The experiment's goal is to devise a controller that can keep the pole balanced for a defined number of simulation steps. The controller always applies to the cart a force at full magnitude in either direction (bang-bang control). The successful controller should simulate a single-pole balancing for at least 500'000 time steps (10'000 simulated seconds).

To run this experiment with a population size equal to 150 organisms, execute the following commands:

cd $GOPATH/src/github.com/yaricom/goNEAT
go run executor.go -out ./out/pole1 -context ./data/pole1_150.neat -genome ./data/pole1startgenes -experiment cart_pole

The above command will execute 100 trials of a single pole-balancing experiment within 100 generations with a population of 150 organisms. The executor will save the results of the experiment in the output directory.

The console output will look similar to the following:

Solved 100 trials from 100, success rate: 1.000000
Random seed: 1620479404
Average
        Trial duration:         221.247774ms
        Epoch duration:         39.216614ms
        Generations/trial:      10.9

Champion found in 83 trial run
        Winner Nodes:           7
        Winner Genes:           10
        Winner Evals:           634

        Diversity:              47
        Complexity:             16
        Age:                    3
        Fitness:                1.000000

Average among winners
        Winner Nodes:           7.1
        Winner Genes:           11.5
        Winner Evals:           1557.2
        Generations/trial:      10.9

        Diversity:              59.940000
        Complexity:             18.600000
        Age:                    3.570000
        Fitness:                1.000000

Averages for all organisms evaluated during experiment
        Diversity:              37.594982
        Complexity:             18.064586
        Age:                    2.396373
        Fitness:                0.171238

Efficiency score:               11.135725

>>> Start genome file:  ./data/pole1startgenes
>>> Configuration file: ./data/pole1_150.neat

As you can see from the statistics above, the NEAT algorithm is extremely effective in finding a successful solver for pole balancing. It was able to find a solver in all trials compared to the XOR problem, where only about 90 percent of the trials were solved.

Next, we will evaluate the algorithm against a more advanced variant of the pole balancing problem: double-pole balancing.