Python implementation of the PGE algorithm (voted Best Paper Gecco 2013)
If you publish using this library, please cite the above paper.
PGE stands for Prioritized Grammar Enumeration and is the method for solving the Symbolic Regression problem.
This package is under heavy development until this comment is removed
Clone the repository.
Pip is no longer the recommended method.
You will want to run an evaluation server, runtime is much better this way.
docker run -d -p 8080:8080 --name evalr verdverm/pypge-eval
Then you can run the experimenter container:
docker run --rm -it --name pypge \ -v `pwd`:/pycode \ -p 8888:8888 \ verdverm/pypge-experiments \ /bin/bash
You will find a
run.sh bash script.
Running from within the docker is the best place to run pypge at the moment.
There is a lot of parameteriztion matching that happens behind the scenes.
Much of the configuration change and file moving can happen outside of
the docker because we mount the repository into the docker.
All scripts should be run from inside, however,
out put will also persist outside of the container.
run -x <config_folder> -s <problem_type> -p <problem_set>
<config_folder> should contain one or more yaml configuration files.
<problem_type> must be
Your yaml config filenames must contain one of these strings.
<problem_set> is a file in the
and is a simple bash list.
You can also use the
-P option to see what experiments will be performed
without actually running anything.
For those interested, follow the flow:
experiments/megarun.sh experiments/run.sh experiments/scripts/helpers.sh experiments/main.py
Create a New Experiment
Right now, namimg is a bit intertwined with the configuration and where PyPGE run script looks for different files and directories.
You will need to create a directory in the experiments folder.
It's name should match the
Inside of this folder, you will have your config files for PyPGE.
You will need to place your data in the
You will also need to create a
<problem_set> in the
These are the recommended default settings. You should change the workers and remote cores to match the machine you are using. Be careful when changing other parameters. Both logical and runtime performance are very sensative.
Sample config file:
name: "explicit_final" workers: 4 queue_size: 4096 remote_eval: true remote_cores: 4 remote_host: "ws://172.17.0.1:8080/echo" max_iter: 12 pop_count: 3 peek_count: 12 peek_npts: 0 min_size: 1 max_size: 64 min_depth: 1 max_depth: 6 max_power: 6 zero_epsilon: 0.000001 excluded_cols:  usable_funcs:  - "sin" - "cos" algebra_methods:  multi_expander_params: - name: "level_1" pop_count: 3 usable_funcs: - "sin" - "cos" grow_params: func_level: "linear" init_level: "med" grow_level: "med" subs_level: "med" shrinker: false add_xtop: true grow_filter: false limiting_depth: 4 err_method: "rmse" fitness_func_params: - "normalize" - "-(1)jpsz" - "-score" - "+bic" - "-(1)psz" print_timing: true log_details: true
Things to know
- When pretty printing, sympy performs simplification, which can remove terms if the floating point print precision is not sufficient (looks like zero)
The biggest todo on my list is moving the main loop back to Golang. Python will then only be used for the SymPy functionality.
The reason for this is that Python is slow as shit and there will be a massive performance boost by switching back to Golang.
Sometimes implementation matters.
Branching practices follow the methodology outlined at: http://nvie.com/posts/a-successful-git-branching-model/