Skip to content

Double Pole Balancing Experiments

Iaroslav Omelianenko edited this page May 8, 2021 · 3 revisions

This is an advanced version of the pole-balancing problem, which assumes that the cart has two poles with different mass and lengths to be balanced.

The schema of the experiment presented in the following image:

Double-Pole Balancing Schema

We consider benchmarking of the two types of this problem:

  • the Markovian with entire system state known (including velocities)
  • the Non-Markovian - excluding the velocity information

The former one is pretty simple, and the last one is quite challenging.

System Constraints

  • Both poles must remain upright within ±r the pole failure angle.
  • The cart must stay in ±h of the original position.
  • The controller must always exert a non-zero force * Fx.

Where: r is a pole failure angle (±36 ̊ from 0 ̊) and h is a track width limit (±2.4 meters from the track centre).

The double pole-balancing Markovian experiment (known velocity)

In this experiment, the agent will receive at each time step the entire system state, including the linear velocity of the cart and the angular velocity of both poles.

The winner is defined as a controller able to maintain double pole-balancing at least 100’000 time steps or 1’000 simulated seconds.

To run the experiment execute the following command:

cd $GOPATH/src/github.com/yaricom/goNEAT
go run executor.go -out ./out/pole2_markov -context ./data/pole2_markov.neat -genome ./data/pole2_markov_startgenes -experiment cart_2pole_markov

Or

make run-cartpole-two-markov

The command above will execute ten trials of double pole-balancing experiment within 100 generations using a population consisting of 1’000 organisms.

The example console output of the command as follows:

Solved 10 trials from 10, success rate: 1.000000
Random seed: 1620493433
Average
        Trial duration:         7.188878437s
        Epoch duration:         80.496471ms
        Generations/trial:      25.9

Champion found in 4 trial run
        Winner Nodes:           8
        Winner Genes:           7
        Winner Evals:           9827

        Diversity:              803
        Complexity:             15
        Age:                    1
        Fitness:                1.000000

Average among winners
        Winner Nodes:           14.0
        Winner Genes:           27.4
        Winner Evals:           25428.1
        Generations/trial:      25.9

        Diversity:              715.300000
        Complexity:             41.300000
        Age:                    3.300000
        Fitness:                1.000000

Averages for all organisms evaluated during experiment
        Diversity:              611.673619
        Complexity:             31.434594
        Age:                    2.809969
        Fitness:                0.100325

Efficiency score:               8.800246

>>> Start genome file:  ./data/pole2_markov_startgenes
>>> Configuration file: ./data/pole2_markov.neat

The winner solution can be found approximately within 13 generations with nearly doubled complexity of the resulting genome compared to the seed genome. The seed genome has eight nodes, where:

  • nodes #1-6 - is sensors for x, x', θ1, θ1', θ2, and θ2' correspondingly,
  • node #7 - is a bias
  • node #8 - is an output signaling the action to be applied at each time step.

The symbols above have the following meaning:

  • x - the cart's position
  • x' - the cart's linear velocity
  • θ1 - the inclination angle of the first pole from vertical
  • θ1' - the angular velocity of the first pole
  • θ2 - the inclination angle of the second pole from vertical
  • θ2' - the angular velocity of the second pole

The double pole-balancing Non-Markovian experiment (excluding velocity information)

In this experiment, the agent receives only a partial system state (excluding velocity) about the cart and both poles at each time step. Only the horizontal cart position X and angles of both poles θ1 and θ2 provided to the agent.

The best solver (i.e., the one with the highest fitness value) of every generation tested for its ability to balance the system for a more extended time. If a potential solution passes this test by keeping the system balanced for 100'000 time steps, the so-called generalization score (GS) of this individual is calculated. This score measures the potential of a controller to balance the system starting from different initial conditions. It's estimated with a series of experiments, running over 1000 time steps, starting from 625 different initial positions.

The initial positions are chosen by assigning each value of the set Ω = [0.05, 0.25, 0.5, 0.75, 0.95] to each of the states x, ∆x/∆t, θ1, and ∆θ1/∆t, scaled to the range of the corresponding variables. The short pole angle θ2 and its angular velocity ∆θ2/∆t are set to zero. The GS is defined as the number of successful runs from the 625 initial conditions. The successful solver should have a generalization score value of 200 or larger.

To run the experiment execute the following command:

cd $GOPATH/src/github.com/yaricom/goNEAT
go run executor.go -out ./out/pole2_non-markov -context ./data/pole2_non-markov.neat -genome ./data/pole2_non-markov_startgenes -experiment cart_2pole_non-markov

Or

make run-cartpole-two-non-markov

This command will execute ten trials of double pole-balancing Non-Markovian experiment with 100 epochs each, using a population of 1’000 organisms.

The example output of the command as follows:

Solved 6 trials from 10, success rate: 0.600000
Random seed: 1620493584
Average
        Trial duration:         22.566279721s
        Epoch duration:         212.347394ms
        Generations/trial:      64.6

Champion found in 7 trial run
        Winner Nodes:           5
        Winner Genes:           5
        Winner Evals:           21357

        Diversity:              611
        Complexity:             10
        Age:                    5
        Fitness:                244.000000

Average among winners
        Winner Nodes:           5.3
        Winner Genes:           7.2
        Winner Evals:           40355.8
        Generations/trial:      41.0

        Diversity:              589.166667
        Complexity:             12.500000
        Age:                    8.333333
        Fitness:                224.000000

Averages for all organisms evaluated during experiment
        Diversity:              553.255329
        Complexity:             16.167851
        Age:                    3.963881
        Fitness:                22.409918

Efficiency score:               11.151522

The maximal generalization score achieved in this test run is about 244 for a very simple genome, which consists of five nodes and five connectin genes (links). It has the same number of nodes as the seed genome and only grew one extra recurrent gene connecting the output node to itself. Furthermore, it happens to be the most optimal configuration among others.

The fittest organism's genome based on test Non-Markov run with maximal generalization score of 244:

genomestart 357
trait 1 0.1 0 0 0 0 0.01807369170916446 0 0
node 1 1 1 1 SigmoidSteepenedActivation
node 2 1 1 1 SigmoidSteepenedActivation
node 3 1 1 1 SigmoidSteepenedActivation
node 4 1 1 3 SigmoidSteepenedActivation
node 5 1 0 2 SigmoidSteepenedActivation
gene 1 1 5 -2.0356880305109195 false 1 -2.0356880305109195 true
gene 1 2 5 -12.084263391764111 false 2 -12.084263391764111 true
gene 1 3 5 8.598497746619763 false 3 8.598497746619763 true
gene 1 4 5 0.5469833699989226 false 4 0.5469833699989226 true
gene 1 5 5 -1.1668266658177189 true 60 -1.1668266658177189 true
genomeend 357

Additionally, you can notice the higher efficiency score of the algorithm with the Non-Markovian type of the double-pole balancing experiment compared to the Markovian type. The higher score determined by more straightforward genome configuration found, i.e., having the lower complexity.

You can find detailed analysis of the experiment data samples at Jupyter Notebook