Adding exp() as a make_function raises overflow error #49

iblasi · 2017-10-15T18:16:16Z

@trevorstephens,
The 0.2.0 release it is perfect to make new functions as solved in issue #18.
However, I find that exponential function encounters some errors that make not achieve a result.
Looking for the following example, where we are looking for a simple exponential equation, the gplearn encounters invalid values in evaluating some functions. This makes fitness to become NaN and the algorithm seem to not converge anywhere.

import numpy as np
from gplearn.genetic import SymbolicRegressor
from gplearn.functions import make_function

def exponent(x):
  return np.exp(x)

X = np.random.randint(0,100,size=(100,3))
y = np.exp(X[:, 0])

X_train , y_train = X[:80,:], y[:80]
X_test , y_test = X[80:,:], y[80:]

exponential = make_function(function=exponent, name='exp', arity=1)
function_set = ['add', 'sub', 'mul', 'div', 'sqrt', 'log',
                'abs', 'neg', 'inv', 'max', 'min', 'sin', 'cos', 'tan', exponential]

est_gp = SymbolicRegressor(population_size=5000,
                           generations=20, stopping_criteria=0.01,
                           function_set=function_set,
                           p_crossover=0.7, p_subtree_mutation=0.1,
                           p_hoist_mutation=0.05, p_point_mutation=0.1,
                           max_samples=0.9, verbose=1,
                           parsimony_coefficient=0.01, random_state=0)
est_gp.fit(X_train, y_train)
print 'Score: ', est_gp.score(X_test, y_test)
print est_gp._program

This code show overflow errors creating NaN fitness as mentioned, and the final result is None

    |    Population Average   |             Best Individual              |
---- ------------------------- ------------------------------------------ ----------
 Gen   Length          Fitness   Length          Fitness      OOB Fitness  Time Left
GPlearn_example_exp.py:19: RuntimeWarning: overflow encountered in exp
  return np.exp(x)
/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py:1142: RuntimeWarning: invalid value encountered in multiply
  avg = np.multiply(a, wgt, dtype=result_dtype).sum(axis)/scl
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:46: RuntimeWarning: invalid value encountered in tan
  return self.function(*args)
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:46: RuntimeWarning: invalid value encountered in multiply
  return self.function(*args)
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:46: RuntimeWarning: invalid value encountered in cos
  return self.function(*args)
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:46: RuntimeWarning: invalid value encountered in sin
  return self.function(*args)
   0    11.09              nan        7              nan              nan     39.42s
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:46: RuntimeWarning: invalid value encountered in subtract
  return self.function(*args)
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:46: RuntimeWarning: invalid value encountered in add
  return self.function(*args)
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:46: RuntimeWarning: overflow encountered in multiply
  return self.function(*args)
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:111: RuntimeWarning: overflow encountered in divide
  return np.where(np.abs(x2) > 0.001, np.divide(x1, x2), 1.)
   1    11.48              nan       38              nan              nan     44.32s
   2     16.2              nan       10              nan              nan     46.98s
   3    18.69              nan       29              nan              nan     46.91s
   4    21.19              nan       22              nan              nan     46.92s
   5    23.44              nan       25              nan              nan     46.38s
   6    25.52              nan       30              nan              nan     44.79s
   7    27.96              nan       56              nan              nan     43.73s
   8    30.01              nan       44              nan              nan     41.50s
   9    32.29              nan       54              nan              nan     39.10s
  10    34.59              nan       11              nan              nan     36.28s
  11    37.08              nan       18              nan              nan     34.26s
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:128: RuntimeWarning: overflow encountered in divide
  return np.where(np.abs(x1) > 0.001, 1. / x1, 0.)
  12    39.66              nan       34              nan              nan     31.14s
  13    42.05              nan       43              nan              nan     27.39s
  14    43.91              nan       52              nan              nan     23.64s
  15     47.2              nan      118              nan              nan     19.36s
  16    49.95              nan       36              nan              nan     14.88s
  17    52.13              nan        8              nan              nan     10.15s
  18    55.53              nan       21              nan              nan      5.31s
  19    58.17              nan       63              nan              nan      0.00s
Score: 
Traceback (most recent call last):
  File "GPlearn_example_exp.py", line 69, in <module>
    
  File "/usr/local/lib/python2.7/site-packages/sklearn/base.py", line 388, in score
    multioutput='variance_weighted')
  File "/usr/local/lib/python2.7/site-packages/sklearn/metrics/regression.py", line 530, in r2_score
    y_true, y_pred, multioutput)
  File "/usr/local/lib/python2.7/site-packages/sklearn/metrics/regression.py", line 77, in _check_reg_targets
    y_pred = check_array(y_pred, ensure_2d=False)
  File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 422, in check_array
    _assert_all_finite(array)
  File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 43, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

How can this issue be solved? May a solution be to avoid to check results that contain NaNs which will discard that result for fitness?
Using lower values (uniform [0,1] values) also raises the overflow

X = np.random.uniform(0,1,size=(100,3))

The text was updated successfully, but these errors were encountered:

iblasi · 2017-10-15T21:14:09Z

@trevorstephens just for your information, I found how to avoid invalid values encountered when making functions added to function_set.
I checked that do to maximum values achieved, sometimes the values become to inf or and later to nan. This can be avoided returning only finite values.
Coming back to the previous example of exponential, modifying the previous exponent function to the following, the result is correct.

def exponent(x):
  a = np.exp(x)
  a[~np.isfinite(a)] = 0
  return a

It still shows a warning (RuntimeWarning: overflow encountered in exp), but the fitness value does not become to nan and converges to a solution.

trevorstephens · 2017-10-15T22:27:47Z

This is due to closure. You can either protect against it in the inputs by clipping large values, or in the output as you have done. This would need to be implemented by the user for each custom function. See here and here.

admercs · 2017-10-27T15:39:05Z

I have a protected exp function working as follows, but it still breaks the trigonomic functions when I include it. Any ideas on how to resolve this?

    def _protected_exp(x1):
        """Closure of exp for zero arguments."""
        with np.errstate(divide='ignore', invalid='ignore'):
            return np.where(np.abs(x1) > 0.001, np.exp(np.abs(x1)), 1.)
    
    pexp = make_function(function=_protected_exp, name='exp', arity=1)

trevorstephens · 2017-10-28T00:50:50Z

Closure of such a function doesn't require protection against negative numbers, it requires protection against very large numbers that could overflow if you had exp(exp(exp(x1))) for instance.

trevorstephens · 2017-11-11T02:30:42Z

As this has come up more than once I will reopen and add as additional documentation for how to do closure on custom functions

ferb2015 · 2020-04-02T06:36:17Z

def _protected_exponent(x1):
with np.errstate(over='ignore'):
return np.where(np.abs(x1) < 100, np.exp(x1), 0.)

soerenab · 2022-05-19T14:41:39Z

I have run into this problem as well and looking at the solution in the above comment (thanks a lot for sharing!) I am wondering: wouldn't you want to return a really big (but finite) number rather than 0? As in:

def _protected_exponent(x1):
    with np.errstate(over='ignore'):
    return np.where(np.abs(x1) < 100, np.exp(x1), 9999999.)

iblasi closed this as completed Oct 15, 2017

trevorstephens reopened this Nov 11, 2017

trevorstephens added the documentation label Nov 11, 2017

trevorstephens added this to the 0.3.0 milestone Nov 17, 2017

trevorstephens mentioned this issue Nov 22, 2017

Improve advanced documentation #62

Merged

trevorstephens closed this as completed in #62 Nov 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding exp() as a make_function raises overflow error #49

Adding exp() as a make_function raises overflow error #49

iblasi commented Oct 15, 2017 •

edited

Loading

iblasi commented Oct 15, 2017

trevorstephens commented Oct 15, 2017

admercs commented Oct 27, 2017

trevorstephens commented Oct 28, 2017

trevorstephens commented Nov 11, 2017

ferb2015 commented Apr 2, 2020

soerenab commented May 19, 2022 •

edited

Loading

Adding exp() as a make_function raises overflow error #49

Adding exp() as a make_function raises overflow error #49

Comments

iblasi commented Oct 15, 2017 • edited Loading

iblasi commented Oct 15, 2017

trevorstephens commented Oct 15, 2017

admercs commented Oct 27, 2017

trevorstephens commented Oct 28, 2017

trevorstephens commented Nov 11, 2017

ferb2015 commented Apr 2, 2020

soerenab commented May 19, 2022 • edited Loading

iblasi commented Oct 15, 2017 •

edited

Loading

soerenab commented May 19, 2022 •

edited

Loading