Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customed Metric Function executes Twice? #208

Closed
asherzhao8 opened this issue Oct 23, 2020 · 9 comments
Closed

Customed Metric Function executes Twice? #208

asherzhao8 opened this issue Oct 23, 2020 · 9 comments
Labels

Comments

@asherzhao8
Copy link

asherzhao8 commented Oct 23, 2020

Hello,

I custom a metric function, and print the metric result every time when a new formula is generated. But I find the metric function will be executed twice each time, just like the image below. It doubles the training time, so can you help me figure out the reason?

Thank you very much!
1

@trevorstephens
Copy link
Owner

Sorry but I don't understand what your problem is, most of the output is missing. Have you removed/edited some of the output?

As I cannot run any code, recreate your problem, understand what commands you have run, there is very little I can do to help without more information.

@asherzhao8
Copy link
Author

Sorry but I don't understand what your problem is, most of the output is missing. Have you removed/edited some of the output?

As I cannot run any code, recreate your problem, understand what commands you have run, there is very little I can do to help without more information.

@trevorstephens ,Sorry for my expression.
I just add a ‘print’ to the customed metric function. I want to see the metric result each time. Then I find that the result printed every two times is the same, just like the figure above attached in my question. The first one is 0.01647, and the second is still 0.01647. As for why there is no output, it is because the first generation of the model is still in the process of training, and I didn't change any other parts of the model.

metric

@trevorstephens
Copy link
Owner

Without a reproducible example it's near impossible for me to look into this.

It's possible this is the out of bag fitness being calculated? I won't comment further without a full reproducible example of the problem you wish to solve. Impossible to know what you are doing

@trevorstephens
Copy link
Owner

Try crafting a toy dataset with your problem using the sklearn blobs or something like that, otherwise I cannot look into what is happening.

@asherzhao8
Copy link
Author

`import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
from sklearn.utils import check_random_state
from gplearn.fitness import make_fitness
from gplearn.genetic import SymbolicTransformer

rng = check_random_state(0)
boston = load_boston()
perm = rng.permutation(boston.target.size)
boston.data = boston.data[perm]
boston.target = boston.target[perm]

def _my_metric(y, y_pred, w):
value = np.sum(np.abs(y) + np.abs(y_pred))
print(value)
return value
my_metric = make_fitness(function=_my_metric, greater_is_better=False)

function_set = ['add', 'sub', 'mul', 'div',
'sqrt', 'log', 'abs', 'neg', 'inv',
'max', 'min']
gp = SymbolicTransformer(generations=5, population_size=200,
hall_of_fame=100, n_components=10,
function_set=function_set, metric=my_metric,
parsimony_coefficient=0.0005,
max_samples=0.9, verbose=1,
random_state=0, n_jobs=3)
gp.fit(boston.data[:300, :], boston.target[:300])`

@trevorstephens .It seems that I have encountered a new problem. Why did I add ‘print(value)’ in the custom metric function, but did not print the metric result in the process of training the model?

@hwulfmeyer
Copy link

hwulfmeyer commented Oct 28, 2020

gp = SymbolicTransformer(generations=3, population_size=10,
                        hall_of_fame=10, n_components=10,
                        function_set=function_set, metric=my_metric,
                        parsimony_coefficient=0.0005,
                        max_samples=0.9, verbose=1,
                        random_state=0, n_jobs=1)

My output looks like this:

    |   Population Average    |             Best Individual              |
---- ------------------------- ------------------------------------------ ----------
 Gen   Length          Fitness   Length          Fitness      OOB Fitness  Time Left
69477.63277052
69477.63277052
248517335.35458386
248517335.35458386
6889.867984966956
6889.867984966956
11234.944530059192
11234.944530059192
6642.3
6642.3
6643.937139416466
6643.937139416466
12082.188280356442
12082.188280356442
10773.009932388375
10773.009932388375
7678.773066514686
7678.773066514686
9428.260977538152
9428.260977538152
   0    12.70      2.48658e+07        4           6642.3           6642.3      0.05s
6642.3
6642.3
6942.3
6942.3
6664.3
6664.3
6664.3
6664.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
   1     4.60           6676.7        6           6642.3           6642.3      0.02s
6664.3
6664.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6664.3
6664.3
6642.3
6642.3
6664.3
6664.3
6664.3
6664.3
6642.3
6642.3
   2     2.80           6651.1        3           6642.3           6642.3      0.00s

The reason the output is doubled is because you activated sampling with max_samples=0.9. This causes that there is an OOB fitness.

        program.raw_fitness_ = program.raw_fitness(X, y, curr_sample_weight)
        if max_samples < n_samples:
            # Calculate OOB fitness
            program.oob_fitness_ = program.raw_fitness(X, y, oob_sample_weight)

If you set max_samples=1.0 there is no double output.


    |   Population Average    |             Best Individual              |
---- ------------------------- ------------------------------------------ ----------
 Gen   Length          Fitness   Length          Fitness      OOB Fitness  Time Left
69477.63277052
248517335.35458386
6889.867984966956
11234.944530059192
6642.3
6643.937139416466
12082.188280356442
10773.009932388375
7678.773066514686
9428.260977538152
   0    12.70      2.48658e+07        4           6642.3              N/A      0.02s
6642.3
6942.3
6664.3
6664.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
   1     4.60           6676.7        6           6642.3              N/A      0.02s
6664.3
6642.3
6642.3
6642.3
6642.3
6664.3
6642.3
6664.3
6664.3
6642.3
   2     2.80           6651.1        3           6642.3              N/A      0.00s

Actually the OOB fitness and the other fitness technically shouldn't be the same. Not sure if this is a bug. Could be because of the datatset and the size.

@hwulfmeyer
Copy link

hwulfmeyer commented Oct 28, 2020

The problem is that you didn't include the parameter w in your custom fitness metric. Without including it the sample functionality won't have any effect.

Your fitness function needs to look like this:

def _my_metric(y, y_pred, w):
    value = np.sum((np.abs(y) + np.abs(y_pred)) * w)
    print(value)
    return value

Your output will then look like this:

    |   Population Average    |             Best Individual              |
---- ------------------------- ------------------------------------------ ----------
 Gen   Length          Fitness   Length          Fitness      OOB Fitness  Time Left
103135.19182698999
11243.86234392
390808514.1441565
42480653.74971996
10610.232259661774
1213.3752701012086
17631.259109606122
1627.374035802603
10194.2
1207.4
10206.314598719317
1198.112305023284
18461.194855534664
2071.2135418505854
16554.638234604558
1777.4661191383284
11743.404925226641
1387.5377017070841
14329.17339991324
1558.1426365400375
   0    12.70      3.91021e+07        4          10194.2           1207.4      0.00s
10397.300000000001
1004.3
10775.900000000001
1131.6999999999998
10298.1
1138.5
10336.099999999999
1100.5
10314.5
1087.1
10363.7
1037.9
10309.0
1092.6
10302.3
1099.3000000000002
10283.599999999999
1118.0
10383.2
1018.4000000000001
   1     4.60          10376.4        5          10283.6             1118      0.02s
10198.699999999999
1202.9
10171.8
1229.8000000000002
10247.400000000001
1189.2
10245.099999999999
1156.5
10250.2
1151.4
10204.8
1231.8000000000002
10154.7
1246.9
10278.0
1158.6
10279.0
1157.6
10147.0
1254.6000000000001
   2     4.40          10217.7        5            10147           1254.6      0.00s

Also, I have to say your custom fitness function is bad for evaluation.

@trevorstephens
Copy link
Owner

Thx @hwulfmeyer thought it was probably the OOB fitness, appreciate you helping out! 😄

@asherzhao8
Copy link
Author

@hwulfmeyer I got it, thank you very much! By the way, the function was created casually, i'll never use it. haha!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants