Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New closure methods #131

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

pricebenjamin
Copy link
Contributor

This PR is intended to solve issue #130. Performance tests should be conducted. There's probably a better solution for determining eps, but this method determines the smallest eps available for the inferred dtype. It may fail if x1.dtype != x2.dtype. (How often would that be the case?)

* Note that the last sentence in the `Initialization` section
  was completely removed. It was unclear what this sentence was
  meant to say, but did not make sense in the context of the
  surrounding paragraph.
* Some additional explanation is necessary in this section.
  Specifically, does the winner of a tournament undergo exactly
  one of these operations, or does it undergo potentially all of
  these ops, with the probability of each op being performed given
  by its respective parameter?
@pricebenjamin
Copy link
Contributor Author

pricebenjamin commented Feb 10, 2019

Integers are not inexact, and thus do not have an eps. Can we safely cast inputs to floats? e.g.

abs_x2 = np.abs(x2, dtype=np.float64)

If we always cast to float64, then we could define eps globally instead of looking it up every call.

@pricebenjamin
Copy link
Contributor Author

test_transformer_iterable() (and probably others) from /gplearn/tests/test_genetic.py fails because the protected functions have been modified.

@coveralls
Copy link
Collaborator

coveralls commented Feb 10, 2019

Coverage Status

Coverage remained the same at 98.703% when pulling 70f0c2b on pricebenjamin:new_closure into 07b41a1 on trevorstephens:master.

@pricebenjamin
Copy link
Contributor Author

@trevorstephens, any input on this?

@trevorstephens
Copy link
Owner

trevorstephens commented Feb 15, 2019 via email

@@ -14,6 +14,7 @@

__all__ = ['make_function']

_EPS = np.finfo(np.float64).eps # Used by protected functions
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean that results may differ from machine to machine I guess. Don't really like this!

Copy link
Contributor Author

@pricebenjamin pricebenjamin Feb 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_EPS = np.finfo(np.float64).eps # Used by protected functions
_EPS = 1e-6 # or any other value
# Consider printing a warning if default value of _EPS is smaller than
# machine epsilon of whatever dtype is being used (e.g. np.float16)

This solution still provides the benefit of returning more reasonable values as we approach singularities. I could also then remove the explicit conversion to np.float64 in the definitions of the protected functions--inputs would be implicity converted upon addition of _EPS. Testing of protected functions would need to be updated with lower tolerance.

@trevorstephens
Copy link
Owner

Hey @pricebenjamin , appreciate the input. The solution here has the immediate red flag that the _EPS = np.finfo(np.float64).eps line could differ between machines.

But perhaps we should back up a little though and discuss why this solution is better? Is a more mathematically exact formulation around singularities desirable? Feels like close to infinity is still kind of as bad as actual infinity when it comes to fitness in most cases?

@trevorstephens
Copy link
Owner

Also can we remove the documentation changes from this PR and put them in #129 @pricebenjamin ? I am very hesitant about this change due to the potential machine dependencies as it stands, but each PR should be a single issue as well.

@pricebenjamin
Copy link
Contributor Author

Also can we remove the documentation changes from this PR and put them in #129

All of the documentation changes are actually in #129. I made the mistake of not creating the documentation branch separately, and so those changes are automatically in this branch. Should I pull the original documentation into this branch? Or would it be possible to complete #129 first?

@trevorstephens
Copy link
Owner

Ah yeah no worries then

@pricebenjamin
Copy link
Contributor Author

pricebenjamin commented Feb 17, 2019

The solution here has the immediate red flag that the _EPS = np.finfo(np.float64).eps line could differ between machines.

Understood. I'll try and look around for other solutions and am open to any other ideas. All of the benefits of this implementation can still be attained with any fixed value of _EPS.

But perhaps we should back up a little though and discuss why this solution is better?

These new protected functions have a much smaller range over which they're degenerate (with 64-bit precision, these regions are nearly non-existent), which consequently decreases the likelihood that programs will contain degenerate nodes. However, it may be the case that [-0.001, 0.001] was already sufficient.

Edit: Perhaps more importantly, see the answer to your last question.

Is a more mathematically exact formulation around singularities desirable?

Having formulas which match expectations is desirable, in my opinion. If it doesn't impact performance (which I could try and test more thoroughly), is there a downside to making them more exact?

Feels like close to infinity is still kind of as bad as actual infinity when it comes to fitness in most cases?

Part of the issue with the old implementation is that the functions were not producing large results when they should have been. For example, log(0.001) should produce a large negative number, but instead produces 0. This completely removes the penalty that should be associated with approaching singularities.

@pricebenjamin
Copy link
Contributor Author

Also, I greatly appreciate the input @trevorstephens!

@trevorstephens
Copy link
Owner

Though I welcome the input and discussion, I am very reluctant to merge this PR.

I'm a bit unconvinced by this "reasonable value" point of view. What is reasonable? Why is the current implementation unreasonable? Does a more pure division mean that the results will be more reasonable? or better? Is there any research on this front? (if so then happy to read it)

While I understand that the change is at first glance minor, it does completely break compatibility for all prior releases, so that needs a bit of pause.

Since the advantages of the change are unknown, disadvantages equally unknown, and this change can be completely implemented by the user easily using the existing custom functions functionality, I think I'm leaning towards a no on this one.

Do appreciate the input, but not convinced that the benefits are proven here, and the negatives are non-trivial

@pricebenjamin
Copy link
Contributor Author

I totally understand the reluctance 👍 I'll see if I can find some resources that discuss the issue of closure.

it does completely break compatibility for all prior releases

Could you elaborate on this? I understand that code written for prior versions would, most likely, produce different results, but I would expect that this method produces fewer degenerate nodes and ultimately better programs (specifically because it correctly penalizes parameters near singularities).

If you're interested, I could work on thoroughly comparing this implementation to the previous. It may take a few weeks, though.

@trevorstephens
Copy link
Owner

@pricebenjamin -- exactly. Code written to run in older versions of the package will produce different results in the new version. This makes people less likely to upgrade versions and thus not adopt new features, or be surprised by different results. I like to avoid these things wherever possible.

You're welcome to make any comparisons that you wish, I can't promise that I will merge this though due to the compatibility issue.

@hwulfmeyer
Copy link

hwulfmeyer commented Mar 12, 2019

What might at least be considerable is the way the closure is performed. I personally don't care so much about how large the EPS is but how it is applied.
I find it a bit questionable why a division would return "1" whenever the divisor is too low when in fact the lower a divisor the higher the result.
In the Field Guide to GP they argue:

The decision to return the value 1 provides the GP system with a simple way to
generate the constant 1, via an expression of the form (% x x). This combined with a
similar mechanism for generating 0 via (- x x) ensures that GP can easily construct
these two important constants.

I fail to see the reason why the GP system needs to generate a constant 1 in the first place. It is also not explained in the field guide.

Example A:
Consider GP is trying to learn the following function (5/x^4)+(5/x^3) and x happens to be between 0.1 and 1 with steps of 0.1.
So our training data consists of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0.
For the function we are trying to learn the current protected division would produce an RMSE of ~15811 and the system proposed in this PR produces an RMSE of ~158.
Not only produces the current method a higher error but because of that it would prefer programs that are less prone to producing denominators lower than 0.001.
For example the RMSE for the program (5/x^3)+(5/x^3) for both protected divisions is ~14250.
For the current protected division the error for the program (5/x^3)+(5/x^3) is lower than for (5/x^4)+(5/x^3) (the function we are trying to learn). The same happens for (5/x^2)+(5/x^3).

In this certain case the current gp system would never be able to learn the original function.

Example B:
If the function we are trying to learn doesn't produce denominators that are lower than 0.001 the current protected division would produce an error of exactly 0.
The proposed method would produce an error larger than 0.0 even if it is the function we are trying to learn.

As Example A and Example B highlight both implementations have their drawbacks.
I therefore propose to use a mixture of both methods.

def _protected_division(x1, x2):
    """Closure of division (x1/x2) for zero denominator."""
    with np.errstate(divide='ignore', invalid='ignore'):
        abs_x2 = np.abs(x2, dtype=np.float64)
        return np.where(np.abs(x2) > _EPS, np.divide(x1, x2), np.sign(x2) * np.divide(x1, abs_x2 + _EPS))

def _protected_log(x1):
    """Closure of log for zero arguments."""
    with np.errstate(divide='ignore', invalid='ignore'):
        abs_x1 = np.abs(x1, dtype=np.float64)
        return np.where(abs_x1 > _EPS, np.log(abs_x1), np.log(abs_x1 + _EPS))

@trevorstephens
Copy link
Owner

That's fairly a persuasive case @wulfihm ... Will think about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants