KerasClassifier.score is ... broken!? #38004

FirefoxMetzger · 2020-03-28T18:31:12Z

I am using the scikit_learn wrapper to wrap a keras model and train / evaluate it in scikit learn. Calling KerasClassifer.score should return the accuracy of the classifier; however, no matter what I do, it just doesn't.

Looking at the source the code does two things:

In case of sparse labels it converts them to a OneHot matrix (lines 296 - 300)
It calles Sequential.evaluate and then hopes to find a metric called acc or accuracy which it treats as the accuracy of the model. (lines 302 - 307)

If it doesn't manage to find a named metric with the right name, it raises an exception.

I don't understand how this could possibly work (and it doesn't work for me). Given that the target labels are OneHot encoded the correct metric to use is CategoricalAccuracy; however, it is named

tensorflow/tensorflow/python/keras/metrics.py

Line 758 in e24331b

def __init__(self, name='categorical_accuracy', dtype=None):

Logically KerasClassifer.score raises an exception. Worse, the error message suggests to add the Accuracy metric to the model. This can be misleading, as it makes the error disappear and returns a value, but that value is not ... accurate (pun intended).

I suggest renaming accuracy in

tensorflow/tensorflow/python/keras/wrappers/scikit_learn.py

Line 306 in e24331b

if name in ['accuracy', 'acc']:

to categorical_accuracy, and, while at it, I suggest to add _estimator_type = "classifier" as a class variable. Scikit learn checks for it to identify KerasClassifier as a classifier, and without it a lot of functionality doesn't work as intended.

If there is agreement for this change I can submit a PR.

The text was updated successfully, but these errors were encountered:

amahendrakar · 2020-03-30T14:26:37Z

@FirefoxMetzger,
In order to expedite the trouble-shooting process, could you please provide the complete code to reproduce the issue reported here and also the TensorFlow version you are using. Thanks!

FirefoxMetzger · 2020-04-01T07:57:25Z

Sure: tf.__version__ == '2.0.0'
Minimal (not) working example to demonstrate the problem:

import tensorflow as tf
import numpy as np
import sklearn.preprocessing
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier 

## --- LOAD DATA ---
fashion_mnist = tf.keras.datasets.fashion_mnist
(X_train, Y_train), (X_test, Y_test) = fashion_mnist.load_data()

encoder = sklearn.preprocessing.OneHotEncoder(dtype=np.float32)
encoder.fit(Y_train.reshape((-1, 1)))

Y_train_encoded = encoder.transform(Y_train.reshape((-1, 1))).toarray()
Y_test_encoded = encoder.transform(Y_test.reshape((-1, 1))).toarray()


## --- SETUP MODEL ---
def setupModel():
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Input(shape=(28,28)))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(10))
    model.add(tf.keras.layers.Softmax())

    model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=1e-5),
                  loss=tf.keras.losses.CategoricalCrossentropy(),
                  metrics=[tf.keras.metrics.CategoricalAccuracy()]
                 )
    return model

# this model now uses the sklearn API instead of the Keras API
# it still trains using tensorflow under the hood
model = KerasClassifier(build_fn=setupModel,
                        epochs=2, # show that training executes
                        batch_size=256,
                       )

# train the model
model.fit(X_train, Y_train_encoded)

# evaluate the test accuracy
test_accuracy = model.score(X_test, Y_test_encoded, verbose=0)
print(f"The test accuracy is {test_accuracy * 100:.2f}%")

Minimal not working example to demonstrate the Accuracy metric problem:

import tensorflow as tf
import numpy as np
import sklearn.preprocessing
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier 

## --- LOAD DATA ---
fashion_mnist = tf.keras.datasets.fashion_mnist
(X_train, Y_train), (X_test, Y_test) = fashion_mnist.load_data()

encoder = sklearn.preprocessing.OneHotEncoder(dtype=np.float32)
encoder.fit(Y_train.reshape((-1, 1)))

Y_train_encoded = encoder.transform(Y_train.reshape((-1, 1))).toarray()
Y_test_encoded = encoder.transform(Y_test.reshape((-1, 1))).toarray()


## --- SETUP MODEL ---
def setupModel():
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Input(shape=(28,28)))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(10))
    model.add(tf.keras.layers.Softmax())

    model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=1e-5),
                  loss=tf.keras.losses.CategoricalCrossentropy(),
                  metrics=[tf.keras.metrics.Accuracy()]
                 )
    return model

# this model now uses the sklearn API instead of the Keras API
# it still trains using tensorflow under the hood
model = KerasClassifier(build_fn=setupModel,
                        epochs=2, # show that training executes
                        batch_size=256,
                       )

# train the model
model.fit(X_train, Y_train_encoded)

# evaluate the test accuracy
test_accuracy = model.score(X_test, Y_test_encoded, verbose=0)
print(f"The test accuracy is {test_accuracy * 100:.2f}%")

Y_pred = model.model.predict(X_test)
acc = tf.keras.metrics.CategoricalAccuracy()
acc.update_state(Y_test, Y_pred)
print(f"The (true) categorical accuracy is {acc.result().numpy()*100:.2f}%")

Output:

Train on 60000 samples
Epoch 1/2
60000/60000 [==============================] - 1s 13us/sample - loss: 67.0612 - accuracy: 0.6470
Epoch 2/2
60000/60000 [==============================] - 1s 9us/sample - loss: 29.1524 - accuracy: 0.7125
The test accuracy is 72.88%
The (true) categorical accuracy is 10.40%

The reason for this discrepancy is described in the opening comment.

amahendrakar · 2020-04-01T17:51:51Z

Was able to reproduce the issue with TF v2.0, TF v2.1 and TF-nightly. Please find the attached gist. Thanks!

jvishnuvardhan · 2020-04-01T21:53:36Z

@FirefoxMetzger As mentioned in the error, I changed metric in the compile from metrics=[tf.keras.metrics.CategoricalAccuracy()] to `metrics=['accuracy']. With that change everything worked as expected. Please check the gist here.

Please close the issue if this was resolved for you. Thanks!

FirefoxMetzger · 2020-04-03T15:51:46Z

You are correct. It does resolve the exception, as I mentioned in my issue. It does not solve the underlying issue though. It simply doesn't crash anymore and instead silently computes the wrong accuracy; arguably much worse, because easier to miss.

Using metrics=['accuracy'] is, as far as I am aware, a shorthand for metrics=[tf.keras.metrics.Accuracy()], and - as demonstrated in my reply to @amahendrakar - the metric is unsuited for OneHot encoded data. To demonstrate this, the test case I posted above run on @jvishnuvardhan 's modified code produces the following output:

Epoch 1/2
235/235 [==============================] - 1s 3ms/step - loss: 51.1440 - accuracy: 0.3992
Epoch 2/2
235/235 [==============================] - 1s 3ms/step - loss: 25.3406 - accuracy: 0.5925
The test accuracy is 61.51%
The (true) categorical accuracy is 10.04%

Both accuracy numbers should be identical. Both represent the fraction of the test dataset that has been assigned the correct label by the model, and both numbers use the same model and same test data.

For reference, here is the code:

import tensorflow as tf
import numpy as np
import sklearn.preprocessing
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier 

## --- LOAD DATA ---
fashion_mnist = tf.keras.datasets.fashion_mnist
(X_train, Y_train), (X_test, Y_test) = fashion_mnist.load_data()

encoder = sklearn.preprocessing.OneHotEncoder(dtype=np.float32)
encoder.fit(Y_train.reshape((-1, 1)))

Y_train_encoded = encoder.transform(Y_train.reshape((-1, 1))).toarray()
Y_test_encoded = encoder.transform(Y_test.reshape((-1, 1))).toarray()


## --- SETUP MODEL ---
def setupModel():
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Input(shape=(28,28)))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(10))
    model.add(tf.keras.layers.Softmax())

    model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=1e-5),
                  loss=tf.keras.losses.CategoricalCrossentropy(),
                  metrics = ['accuracy'] #metrics=[tf.keras.metrics.CategoricalAccuracy()]
                 )
    return model

# this model now uses the sklearn API instead of the Keras API
# it still trains using tensorflow under the hood
model = KerasClassifier(build_fn=setupModel,
                        epochs=2, # show that training executes
                        batch_size=256,
                       )

# train the model
model.fit(X_train, Y_train_encoded)

# evaluate the test accuracy
test_accuracy = model.score(X_test, Y_test_encoded, verbose=0)
print(f"The test accuracy is {test_accuracy * 100:.2f}%")

# test case to compare metrics
Y_pred = model.model.predict(X_test)
acc = tf.keras.metrics.CategoricalAccuracy()
acc.update_state(Y_test, Y_pred)
print(f"The (true) categorical accuracy is {acc.result().numpy()*100:.2f}%")

bentyeh · 2020-04-16T08:44:45Z

Please take a look at issue #38596, linked above. I believe this is the underlying issue for why the Keras model.evaluate() method computes the wrong loss.

amahendrakar · 2020-06-18T19:24:32Z

Please take a look at issue #38596, linked above. I believe this is the underlying issue for why the Keras model.evaluate() method computes the wrong loss.

@FirefoxMetzger,
Please take a look at @bentyeh's comment and let us know if it works. Thanks!

google-ml-butler · 2020-06-25T19:26:01Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

FirefoxMetzger · 2020-06-25T20:16:30Z

Hey @amahendrakar I checked out the comment, and it, unfortunately, doesn't resolve the issue.

It really seems to be the naming of the desired metric in line

tensorflow/tensorflow/python/keras/wrappers/scikit_learn.py

Line 306 in e24331b

if name in ['accuracy', 'acc']:

as I quoted in the opening post.

If you check the keras metrics, you can find the metric with name='accuracy' at

tensorflow/tensorflow/python/keras/metrics.py

Line 662 in e24331b

def __init__(self, name='accuracy', dtype=None):

whereas the desired metric is defined at

tensorflow/tensorflow/python/keras/metrics.py

Line 758 in e24331b

def __init__(self, name='categorical_accuracy', dtype=None):

under the name categorical_accuracy.

That is what I believe to be the issue, unless I misunderstand how the wrapper is supposed to work.

amahendrakar · 2020-07-02T15:57:04Z

Was able to reproduce the issue with TF v2.2 and TF-nightly. Please find the attached gist. Thanks!

sachinprasadhs · 2021-05-25T12:13:25Z

Was able to reproduce your issue in Tf Nightly 2.6.0-dev20210524, please find the gist here. Thanks!

tensorflowbutler · 2021-12-14T23:11:22Z

Hi There,

This is a stale issue. As you are using an older version of tensorflow, we are checking to see if you still need help on this issue. Please test the issue with the latest TensorFlow (TF2.7 and tf-nightly). If the issue still persists with the newer versions of TF, please feel free to open it in keras-team/keras repository by providing details about the issue and a standalone code to reproduce the issue. Thanks!

Please note that Keras development has moved to a separate Keras-team/keras repository to focus entirely on only Keras. Thanks!

google-ml-butler · 2022-01-11T17:46:21Z

Are you satisfied with the resolution of your issue?
Yes
No

tensorflow-bot bot assigned amahendrakar Mar 28, 2020

amahendrakar added the stat:awaiting response Status - Awaiting response from author label Mar 30, 2020

amahendrakar added comp:keras Keras related issues TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug and removed stat:awaiting response Status - Awaiting response from author labels Apr 1, 2020

amahendrakar assigned jvishnuvardhan and unassigned amahendrakar Apr 1, 2020

jvishnuvardhan added the stat:awaiting response Status - Awaiting response from author label Apr 1, 2020

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Apr 5, 2020

jvishnuvardhan assigned pavithrasv and unassigned jvishnuvardhan Apr 8, 2020

jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 8, 2020

bentyeh mentioned this issue Apr 16, 2020

Keras fails to account for smaller last batch in loss metric calculation #38596

Closed

amahendrakar self-assigned this Jun 18, 2020

amahendrakar added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Jun 18, 2020

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 25, 2020

google-ml-butler bot removed the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 25, 2020

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Jun 27, 2020

amahendrakar added TF 2.2 Issues related to TF 2.2 and removed TF 2.0 Issues relating to TensorFlow 2.0 labels Jul 2, 2020

amahendrakar removed their assignment Jul 2, 2020

tensorflowbutler closed this as completed Jan 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KerasClassifier.score is ... broken!? #38004

KerasClassifier.score is ... broken!? #38004

FirefoxMetzger commented Mar 28, 2020

amahendrakar commented Mar 30, 2020

FirefoxMetzger commented Apr 1, 2020

amahendrakar commented Apr 1, 2020

jvishnuvardhan commented Apr 1, 2020

FirefoxMetzger commented Apr 3, 2020

bentyeh commented Apr 16, 2020

amahendrakar commented Jun 18, 2020

google-ml-butler bot commented Jun 25, 2020

FirefoxMetzger commented Jun 25, 2020

amahendrakar commented Jul 2, 2020

sachinprasadhs commented May 25, 2021

tensorflowbutler commented Dec 14, 2021

google-ml-butler bot commented Jan 11, 2022

KerasClassifier.score is ... broken!? #38004

KerasClassifier.score is ... broken!? #38004

Comments

FirefoxMetzger commented Mar 28, 2020

amahendrakar commented Mar 30, 2020

FirefoxMetzger commented Apr 1, 2020

amahendrakar commented Apr 1, 2020

jvishnuvardhan commented Apr 1, 2020

FirefoxMetzger commented Apr 3, 2020

bentyeh commented Apr 16, 2020

amahendrakar commented Jun 18, 2020

google-ml-butler bot commented Jun 25, 2020

FirefoxMetzger commented Jun 25, 2020

amahendrakar commented Jul 2, 2020

sachinprasadhs commented May 25, 2021

tensorflowbutler commented Dec 14, 2021

google-ml-butler bot commented Jan 11, 2022