Customed Metric Function executes Twice? #208

asherzhao8 · 2020-10-23T06:44:00Z

Hello,

I custom a metric function, and print the metric result every time when a new formula is generated. But I find the metric function will be executed twice each time, just like the image below. It doubles the training time, so can you help me figure out the reason?

Thank you very much!

trevorstephens · 2020-10-26T08:00:48Z

Sorry but I don't understand what your problem is, most of the output is missing. Have you removed/edited some of the output?

As I cannot run any code, recreate your problem, understand what commands you have run, there is very little I can do to help without more information.

asherzhao8 · 2020-10-26T08:27:33Z

Sorry but I don't understand what your problem is, most of the output is missing. Have you removed/edited some of the output?

As I cannot run any code, recreate your problem, understand what commands you have run, there is very little I can do to help without more information.

@trevorstephens ，Sorry for my expression.
I just add a ‘print’ to the customed metric function. I want to see the metric result each time. Then I find that the result printed every two times is the same, just like the figure above attached in my question. The first one is 0.01647, and the second is still 0.01647. As for why there is no output, it is because the first generation of the model is still in the process of training, and I didn't change any other parts of the model.

trevorstephens · 2020-10-26T08:42:46Z

Without a reproducible example it's near impossible for me to look into this.

It's possible this is the out of bag fitness being calculated? I won't comment further without a full reproducible example of the problem you wish to solve. Impossible to know what you are doing

trevorstephens · 2020-10-26T08:44:23Z

Try crafting a toy dataset with your problem using the sklearn blobs or something like that, otherwise I cannot look into what is happening.

asherzhao8 · 2020-10-26T09:23:50Z

`import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
from sklearn.utils import check_random_state
from gplearn.fitness import make_fitness
from gplearn.genetic import SymbolicTransformer

rng = check_random_state(0)
boston = load_boston()
perm = rng.permutation(boston.target.size)
boston.data = boston.data[perm]
boston.target = boston.target[perm]

def _my_metric(y, y_pred, w):
value = np.sum(np.abs(y) + np.abs(y_pred))
print(value)
return value
my_metric = make_fitness(function=_my_metric, greater_is_better=False)

function_set = ['add', 'sub', 'mul', 'div',
'sqrt', 'log', 'abs', 'neg', 'inv',
'max', 'min']
gp = SymbolicTransformer(generations=5, population_size=200,
hall_of_fame=100, n_components=10,
function_set=function_set, metric=my_metric,
parsimony_coefficient=0.0005,
max_samples=0.9, verbose=1,
random_state=0, n_jobs=3)
gp.fit(boston.data[:300, :], boston.target[:300])`

@trevorstephens .It seems that I have encountered a new problem. Why did I add ‘print(value)’ in the custom metric function, but did not print the metric result in the process of training the model?

hwulfmeyer · 2020-10-28T14:44:53Z

gp = SymbolicTransformer(generations=3, population_size=10,
                        hall_of_fame=10, n_components=10,
                        function_set=function_set, metric=my_metric,
                        parsimony_coefficient=0.0005,
                        max_samples=0.9, verbose=1,
                        random_state=0, n_jobs=1)

My output looks like this:

    |   Population Average    |             Best Individual              |
---- ------------------------- ------------------------------------------ ----------
 Gen   Length          Fitness   Length          Fitness      OOB Fitness  Time Left
69477.63277052
69477.63277052
248517335.35458386
248517335.35458386
6889.867984966956
6889.867984966956
11234.944530059192
11234.944530059192
6642.3
6642.3
6643.937139416466
6643.937139416466
12082.188280356442
12082.188280356442
10773.009932388375
10773.009932388375
7678.773066514686
7678.773066514686
9428.260977538152
9428.260977538152
   0    12.70      2.48658e+07        4           6642.3           6642.3      0.05s
6642.3
6642.3
6942.3
6942.3
6664.3
6664.3
6664.3
6664.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
   1     4.60           6676.7        6           6642.3           6642.3      0.02s
6664.3
6664.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
6664.3
6664.3
6642.3
6642.3
6664.3
6664.3
6664.3
6664.3
6642.3
6642.3
   2     2.80           6651.1        3           6642.3           6642.3      0.00s

The reason the output is doubled is because you activated sampling with max_samples=0.9. This causes that there is an OOB fitness.

        program.raw_fitness_ = program.raw_fitness(X, y, curr_sample_weight)
        if max_samples < n_samples:
            # Calculate OOB fitness
            program.oob_fitness_ = program.raw_fitness(X, y, oob_sample_weight)

If you set max_samples=1.0 there is no double output.


    |   Population Average    |             Best Individual              |
---- ------------------------- ------------------------------------------ ----------
 Gen   Length          Fitness   Length          Fitness      OOB Fitness  Time Left
69477.63277052
248517335.35458386
6889.867984966956
11234.944530059192
6642.3
6643.937139416466
12082.188280356442
10773.009932388375
7678.773066514686
9428.260977538152
   0    12.70      2.48658e+07        4           6642.3              N/A      0.02s
6642.3
6942.3
6664.3
6664.3
6642.3
6642.3
6642.3
6642.3
6642.3
6642.3
   1     4.60           6676.7        6           6642.3              N/A      0.02s
6664.3
6642.3
6642.3
6642.3
6642.3
6664.3
6642.3
6664.3
6664.3
6642.3
   2     2.80           6651.1        3           6642.3              N/A      0.00s

Actually the OOB fitness and the other fitness technically shouldn't be the same. Not sure if this is a bug. Could be because of the datatset and the size.

hwulfmeyer · 2020-10-28T15:00:48Z

The problem is that you didn't include the parameter w in your custom fitness metric. Without including it the sample functionality won't have any effect.

Your fitness function needs to look like this:

def _my_metric(y, y_pred, w):
    value = np.sum((np.abs(y) + np.abs(y_pred)) * w)
    print(value)
    return value

Your output will then look like this:

    |   Population Average    |             Best Individual              |
---- ------------------------- ------------------------------------------ ----------
 Gen   Length          Fitness   Length          Fitness      OOB Fitness  Time Left
103135.19182698999
11243.86234392
390808514.1441565
42480653.74971996
10610.232259661774
1213.3752701012086
17631.259109606122
1627.374035802603
10194.2
1207.4
10206.314598719317
1198.112305023284
18461.194855534664
2071.2135418505854
16554.638234604558
1777.4661191383284
11743.404925226641
1387.5377017070841
14329.17339991324
1558.1426365400375
   0    12.70      3.91021e+07        4          10194.2           1207.4      0.00s
10397.300000000001
1004.3
10775.900000000001
1131.6999999999998
10298.1
1138.5
10336.099999999999
1100.5
10314.5
1087.1
10363.7
1037.9
10309.0
1092.6
10302.3
1099.3000000000002
10283.599999999999
1118.0
10383.2
1018.4000000000001
   1     4.60          10376.4        5          10283.6             1118      0.02s
10198.699999999999
1202.9
10171.8
1229.8000000000002
10247.400000000001
1189.2
10245.099999999999
1156.5
10250.2
1151.4
10204.8
1231.8000000000002
10154.7
1246.9
10278.0
1158.6
10279.0
1157.6
10147.0
1254.6000000000001
   2     4.40          10217.7        5            10147           1254.6      0.00s

Also, I have to say your custom fitness function is bad for evaluation.

trevorstephens · 2020-10-28T22:41:41Z

Thx @hwulfmeyer thought it was probably the OOB fitness, appreciate you helping out! 😄

asherzhao8 · 2020-10-29T07:10:40Z

@hwulfmeyer I got it, thank you very much! By the way, the function was created casually, i'll never use it. haha!

asherzhao8 added the enhancement label Oct 23, 2020

trevorstephens added question and removed enhancement labels Oct 26, 2020

trevorstephens closed this as completed Jun 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customed Metric Function executes Twice? #208

Customed Metric Function executes Twice? #208

asherzhao8 commented Oct 23, 2020 •

edited

trevorstephens commented Oct 26, 2020

asherzhao8 commented Oct 26, 2020

trevorstephens commented Oct 26, 2020

trevorstephens commented Oct 26, 2020

asherzhao8 commented Oct 26, 2020

hwulfmeyer commented Oct 28, 2020 •

edited

hwulfmeyer commented Oct 28, 2020 •

edited

trevorstephens commented Oct 28, 2020

asherzhao8 commented Oct 29, 2020

Customed Metric Function executes Twice? #208

Customed Metric Function executes Twice? #208

Comments

asherzhao8 commented Oct 23, 2020 • edited

trevorstephens commented Oct 26, 2020

asherzhao8 commented Oct 26, 2020

trevorstephens commented Oct 26, 2020

trevorstephens commented Oct 26, 2020

asherzhao8 commented Oct 26, 2020

hwulfmeyer commented Oct 28, 2020 • edited

hwulfmeyer commented Oct 28, 2020 • edited

trevorstephens commented Oct 28, 2020

asherzhao8 commented Oct 29, 2020

asherzhao8 commented Oct 23, 2020 •

edited

hwulfmeyer commented Oct 28, 2020 •

edited

hwulfmeyer commented Oct 28, 2020 •

edited