ntreeshape C++ code to calculate the number of trees of a given size From the number of trees, the number of genetic programming programs of that size can be calculated.
Distribution shown here http://www.cs.ucl.ac.uk/staff/W.Langdon/fitnessspaces/node3.html http://www.cs.ucl.ac.uk/staff/W.Langdon/fitnessspaces/node4.html
Code used in "Scaling of Program Tree Fitness Spaces", W.B.Langdon, Evolutionary Computation 7(4) pp399-428 doi:10.1162/evco.1999.7.4.399 and "A Field Guide to Genetic Programming", ISBN 978-1-4092-0073-4 http://www.gp-field-guide.org.uk/
Contents
- ntrees.cc, Apr 9 2001, r1.2 (now r1.5)
Calculate number of trees of a given size - ntrees_example141.out
Expected output of ./ntrees 1,141,2 1 0 1
Note by having exactly one leaf and exactly one binary function the number of programs is the same as the number of (binary) tree shapes. - ntreeshape.cc, Aug 20 2002, r1.22 (now r1.23)
Calculate number of trees by depth for a given size - ntreeshape_example141.out
Expected output of ./ntreeshape 1,141,2 1,100 - README.md
This file
Note the mean and variance for various types of tree have been given asymptotically for large trees by Flajolet and co-authors. E.g. for random binary trees:
mean depth = (2π(n-1))0.5
variance = 4π(π/3-1)(n-1)/2
Notice for random binary trees, although the distribution tends to a Gaussian as the trees get bigger, it continues to have a broad spread, with the standard deviation being 22% of the mean.
Formulae are often given in terms of number of internal nodes (N), rather than tree size (n, including leaf nodes, for binary trees N=(n-1)/2). Also for trees under 10000 nodes, the actual mean and variance can be somewhat different from the asymptotic formula.
Note: in this version of ntreeshape.cc depth starts at 1 (rather than zero).
As an example of software bit-rot, consider the differences between the first versions of these C++ files in GitHub (dating from 2001 and 2002) and the current versions. http://blog.ieeesoftware.org/2020/03/bit-rot-computer-software-degrades-over.html