Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding exp() as a make_function raises overflow error #49

Closed
iblasi opened this issue Oct 15, 2017 · 7 comments
Closed

Adding exp() as a make_function raises overflow error #49

iblasi opened this issue Oct 15, 2017 · 7 comments
Milestone

Comments

@iblasi
Copy link

iblasi commented Oct 15, 2017

@trevorstephens,
The 0.2.0 release it is perfect to make new functions as solved in issue #18.
However, I find that exponential function encounters some errors that make not achieve a result.
Looking for the following example, where we are looking for a simple exponential equation, the gplearn encounters invalid values in evaluating some functions. This makes fitness to become NaN and the algorithm seem to not converge anywhere.

import numpy as np
from gplearn.genetic import SymbolicRegressor
from gplearn.functions import make_function

def exponent(x):
  return np.exp(x)

X = np.random.randint(0,100,size=(100,3))
y = np.exp(X[:, 0])

X_train , y_train = X[:80,:], y[:80]
X_test , y_test = X[80:,:], y[80:]

exponential = make_function(function=exponent, name='exp', arity=1)
function_set = ['add', 'sub', 'mul', 'div', 'sqrt', 'log',
                'abs', 'neg', 'inv', 'max', 'min', 'sin', 'cos', 'tan', exponential]

est_gp = SymbolicRegressor(population_size=5000,
                           generations=20, stopping_criteria=0.01,
                           function_set=function_set,
                           p_crossover=0.7, p_subtree_mutation=0.1,
                           p_hoist_mutation=0.05, p_point_mutation=0.1,
                           max_samples=0.9, verbose=1,
                           parsimony_coefficient=0.01, random_state=0)
est_gp.fit(X_train, y_train)
print 'Score: ', est_gp.score(X_test, y_test)
print est_gp._program

This code show overflow errors creating NaN fitness as mentioned, and the final result is None

    |    Population Average   |             Best Individual              |
---- ------------------------- ------------------------------------------ ----------
 Gen   Length          Fitness   Length          Fitness      OOB Fitness  Time Left
GPlearn_example_exp.py:19: RuntimeWarning: overflow encountered in exp
  return np.exp(x)
/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py:1142: RuntimeWarning: invalid value encountered in multiply
  avg = np.multiply(a, wgt, dtype=result_dtype).sum(axis)/scl
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:46: RuntimeWarning: invalid value encountered in tan
  return self.function(*args)
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:46: RuntimeWarning: invalid value encountered in multiply
  return self.function(*args)
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:46: RuntimeWarning: invalid value encountered in cos
  return self.function(*args)
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:46: RuntimeWarning: invalid value encountered in sin
  return self.function(*args)
   0    11.09              nan        7              nan              nan     39.42s
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:46: RuntimeWarning: invalid value encountered in subtract
  return self.function(*args)
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:46: RuntimeWarning: invalid value encountered in add
  return self.function(*args)
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:46: RuntimeWarning: overflow encountered in multiply
  return self.function(*args)
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:111: RuntimeWarning: overflow encountered in divide
  return np.where(np.abs(x2) > 0.001, np.divide(x1, x2), 1.)
   1    11.48              nan       38              nan              nan     44.32s
   2     16.2              nan       10              nan              nan     46.98s
   3    18.69              nan       29              nan              nan     46.91s
   4    21.19              nan       22              nan              nan     46.92s
   5    23.44              nan       25              nan              nan     46.38s
   6    25.52              nan       30              nan              nan     44.79s
   7    27.96              nan       56              nan              nan     43.73s
   8    30.01              nan       44              nan              nan     41.50s
   9    32.29              nan       54              nan              nan     39.10s
  10    34.59              nan       11              nan              nan     36.28s
  11    37.08              nan       18              nan              nan     34.26s
/usr/local/lib/python2.7/site-packages/gplearn/functions.py:128: RuntimeWarning: overflow encountered in divide
  return np.where(np.abs(x1) > 0.001, 1. / x1, 0.)
  12    39.66              nan       34              nan              nan     31.14s
  13    42.05              nan       43              nan              nan     27.39s
  14    43.91              nan       52              nan              nan     23.64s
  15     47.2              nan      118              nan              nan     19.36s
  16    49.95              nan       36              nan              nan     14.88s
  17    52.13              nan        8              nan              nan     10.15s
  18    55.53              nan       21              nan              nan      5.31s
  19    58.17              nan       63              nan              nan      0.00s
Score: 
Traceback (most recent call last):
  File "GPlearn_example_exp.py", line 69, in <module>
    
  File "/usr/local/lib/python2.7/site-packages/sklearn/base.py", line 388, in score
    multioutput='variance_weighted')
  File "/usr/local/lib/python2.7/site-packages/sklearn/metrics/regression.py", line 530, in r2_score
    y_true, y_pred, multioutput)
  File "/usr/local/lib/python2.7/site-packages/sklearn/metrics/regression.py", line 77, in _check_reg_targets
    y_pred = check_array(y_pred, ensure_2d=False)
  File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 422, in check_array
    _assert_all_finite(array)
  File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 43, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

How can this issue be solved? May a solution be to avoid to check results that contain NaNs which will discard that result for fitness?
Using lower values (uniform [0,1] values) also raises the overflow

X = np.random.uniform(0,1,size=(100,3))
@iblasi
Copy link
Author

iblasi commented Oct 15, 2017

@trevorstephens just for your information, I found how to avoid invalid values encountered when making functions added to function_set.
I checked that do to maximum values achieved, sometimes the values become to inf or and later to nan. This can be avoided returning only finite values.
Coming back to the previous example of exponential, modifying the previous exponent function to the following, the result is correct.

def exponent(x):
  a = np.exp(x)
  a[~np.isfinite(a)] = 0
  return a

It still shows a warning (RuntimeWarning: overflow encountered in exp), but the fitness value does not become to nan and converges to a solution.

@iblasi iblasi closed this as completed Oct 15, 2017
@trevorstephens
Copy link
Owner

This is due to closure. You can either protect against it in the inputs by clipping large values, or in the output as you have done. This would need to be implemented by the user for each custom function. See here and here.

@admercs
Copy link

admercs commented Oct 27, 2017

I have a protected exp function working as follows, but it still breaks the trigonomic functions when I include it. Any ideas on how to resolve this?

    def _protected_exp(x1):
        """Closure of exp for zero arguments."""
        with np.errstate(divide='ignore', invalid='ignore'):
            return np.where(np.abs(x1) > 0.001, np.exp(np.abs(x1)), 1.)
    
    pexp = make_function(function=_protected_exp, name='exp', arity=1)

@trevorstephens
Copy link
Owner

Closure of such a function doesn't require protection against negative numbers, it requires protection against very large numbers that could overflow if you had exp(exp(exp(x1))) for instance.

@trevorstephens
Copy link
Owner

As this has come up more than once I will reopen and add as additional documentation for how to do closure on custom functions

@ferb2015
Copy link

ferb2015 commented Apr 2, 2020

def _protected_exponent(x1):
with np.errstate(over='ignore'):
return np.where(np.abs(x1) < 100, np.exp(x1), 0.)

@soerenab
Copy link

soerenab commented May 19, 2022

I have run into this problem as well and looking at the solution in the above comment (thanks a lot for sharing!) I am wondering: wouldn't you want to return a really big (but finite) number rather than 0? As in:

def _protected_exponent(x1):
    with np.errstate(over='ignore'):
    return np.where(np.abs(x1) < 100, np.exp(x1), 9999999.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants