Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving best metrics based on Custom metrics failing (WARNING:tensorflow:Can save best model only with CUSTOM METRICS available, skipping) #53155

Closed
ravinderkhatri opened this issue Nov 22, 2021 · 6 comments
Assignees
Labels
comp:keras Keras related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.7 Issues related to TF 2.7.0 type:bug Bug

Comments

@ravinderkhatri
Copy link

ravinderkhatri commented Nov 22, 2021

I have defined a callback that runs on the epoch end and calculated the metrics. It is working fine in terms of calculating the desired metrics. Below is the function for reference

class Metrics(tf.keras.callbacks.Callback):
    def __init__(self, train_tf_data, val_tf_data, model, CLASSES, logs={}, **kwargs):
        super().__init__(**kwargs)
        self.train_tf_data = train_tf_data
        self.val_tf_data = val_tf_data
        self.model = model
        self.CLASSES = CLASSES
        # for train data
        self.train_f1_after_epoch = 0
        self.train_prec_after_epoch = 0
        self.train_recall_after_epoch = 0
        # for val data
        self.val_f1_after_epoch = 0
        self.val_prec_after_epoch = 0
        self.val_recall_after_epoch = 0

    def on_train_begin(self, logs={}):
        self.train_reports = None
        self.val_reports = None
        self.val_f1_after_epoch = 0

    def on_epoch_end(self, epoch, logs={}):
        # for train data
        self.train_reports = test_model(model=self.model, data=self.train_tf_data, 
                                        CLASSES=self.CLASSES)
        self.train_f1_after_epoch = self.train_reports['f1_score']
        self.train_recall_after_epoch = self.train_reports['recall']
        self.train_prec_after_epoch = self.train_reports['precision']

        # for val data
        self.val_reports = test_model(model=self.model, data=self.val_tf_data, 
                                      CLASSES=self.CLASSES)
        self.val_f1_after_epoch = self.val_reports['f1_score']
        self.val_recall_after_epoch = self.val_reports['recall']
        self.val_prec_after_epoch = self.val_reports['precision']

        # saving train results to log dir
        logs["train_f1_after_epoch"]=self.train_f1_after_epoch
        logs['train_precision_after_epoch'] = self.train_prec_after_epoch
        logs['train_recall_after_epoch'] = self.train_recall_after_epoch
        
        # saving val results to log dir
        logs['val_f1_after_epoch'] = self.val_f1_after_epoch
        logs['val_precision_after_epoch'] = self.val_prec_after_epoch
        logs['val_recall_after_epoch'] = self.val_recall_after_epoch


        print('train_reports_after_epoch', self.train_reports)
        print('val_reports_after_epoch', self.val_reports)

** .....Some model code .....**

Using this in call back

m1 = tf.keras.metrics.CategoricalAccuracy()
m2 = tf.keras.metrics.Recall()
m3 = tf.keras.metrics.Precision()
m4 = Metrics(train_tf_data=train_data, 
             val_tf_data=test_data, model=model, 
             CLASSES=CLASS_NAMES)
optimizers = [
        tfa.optimizers.AdamW(learning_rate=lr * .001 , weight_decay=wd),
        tfa.optimizers.AdamW(learning_rate=lr, weight_decay=wd)

           ]
optimizers_and_layers = [(optimizers[0], model.layers[0]), (optimizers[1], model.layers[1:])]
    
optimizer = tfa.optimizers.MultiOptimizer(optimizers_and_layers)


model.compile(
    optimizer= optimizer,
    loss = 'categorical_crossentropy',
    metrics=[m1, m2, m3],
    )

checkpoint_cb = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, 
                                                    monitor = 'val_f1_after_epoch',
                                                    save_best_only=True,
                                                    save_weights_only=True,
                                                    mode='max',
                                                    save_freq='epoch',
                                                    verbose=1)
                                                    
checkpoint_cb._supports_tf_logs = False

The issue that I am facing is that it is giving me a warning that says

WARNING:TensorFlow: Can save best model only with val_f1_after_epoch available, skipping

Upon investigating history I found that metrics is available in the history

print(list(history.history.keys()))
['loss',
'categorical_accuracy',
'recall',
'precision',
'val_loss',
'val_categorical_accuracy',
'val_recall',
'val_precision',
'train_f1_after_epoch',
'train_precision_after_epoch',
'train_recall_after_epoch',
'val_f1_after_epoch', #this is the metrics
'val_precision_after_epoch',
'val_recall_after_epoch']

I think there is a bug in ModelCheckpoint where is it not looking at the custom metrics and not saving the model.

I am using Tensorflow 2.7 (Also tried this with Tensorflow 2.5)

@tilakrayal
Copy link
Contributor

@ravinderkhatri ,
Please take a look at this link 1 and 2 with the similar error.It helps.Thanks

@tilakrayal tilakrayal added TF 2.7 Issues related to TF 2.7.0 comp:keras Keras related issues labels Nov 22, 2021
@tilakrayal
Copy link
Contributor

@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label Nov 22, 2021
@ravinderkhatri
Copy link
Author

ravinderkhatri commented Nov 22, 2021

@tilakrayal I have tried all the approaches mentioned in the link. Nothing is working. I have created a ticket at the link keras-team/Keras but I was thinking it is related to TensorFlow. Below is the link for reference
keras-team/keras#15684

@tilakrayal
Copy link
Contributor

@ravinderkhatri ,
Please feel free to close this issue as it has been tracked in Keras repo.

@tilakrayal tilakrayal added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting response Status - Awaiting response from author labels Nov 23, 2021
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Nov 30, 2021
@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.7 Issues related to TF 2.7.0 type:bug Bug
Projects
None yet
Development

No branches or pull requests

2 participants