difference between evaluate_preds and model.evaluate #16

dawnranger · 2017-10-23T12:51:34Z

Thanks for your excellent work. Your codes are really helpful.

In your code about evaluating the gcn model, what confused me is the difference between utils.evaluate_preds(your implementation) and model.evaluate(keras API). Here are my changes to evaluate gcn using model.evaluate function:

add metric accuracy to model.compile for accuracy logging:

model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.01),
                        metrics=['accuracy'])

evaluate the train data using mode.evaluate:

train_loss, train_acc = model.evaluate(
    graph, y_train, sample_weight=train_mask, batch_size=X.shape[0], verbose=0)

the rest code:

print('evaluate: train_loss={:.4f}, train_acc:{:.4f}'.format(train_loss, train_acc))

preds = model.predict(graph, batch_size=X.shape[0])
train_val_loss, train_val_acc = utils.evaluate_preds(preds, [y_train, y_val],
                                                             [idx_train, idx_val])
print("predict:  train_loss={:.4f}, train_acc={:.4f}".format(train_val_loss[0], train_val_acc[0]))

And here are the outputs I got after 10 loops:

evaluate: train_loss=1.9505, train_acc:0.0240
predict:  train_loss=1.9389, train_acc=0.4286

evaluate: train_loss=1.9400, train_acc:0.0222
predict:  train_loss=1.9310, train_acc=0.4143

evaluate: train_loss=1.9294, train_acc:0.0233
predict:  train_loss=1.9216, train_acc=0.4429

evaluate: train_loss=1.9191, train_acc:0.0233
predict:  train_loss=1.9114, train_acc=0.4500

evaluate: train_loss=1.9091, train_acc:0.0229
predict:  train_loss=1.9007, train_acc=0.4429

evaluate: train_loss=1.8993, train_acc:0.0229
predict:  train_loss=1.8895, train_acc=0.4429

evaluate: train_loss=1.8895, train_acc:0.0240
predict:  train_loss=1.8777, train_acc=0.4643

evaluate: train_loss=1.8797, train_acc:0.0240
predict:  train_loss=1.8655, train_acc=0.4643

evaluate: train_loss=1.8697, train_acc:0.0240
predict:  train_loss=1.8529, train_acc=0.4643

evaluate: train_loss=1.8595, train_acc:0.0236
predict:  train_loss=1.8398, train_acc=0.4571

Test set results: loss= 1.8782 accuracy= 0.3590
[Finished in 19.3s]

According to keras doc, regularization mechanisms, such as Dropout and L1/L2 weight regularization, are turned off at testing time.

So why does the loss returned by model.evaluate is not exactly the same as utils.evaluate_preds?

what I have tried:

I tried to implement categorical_crossentropy loss function according to keras tensorflow backend. Here are my codes:

def categorical_crossentropy_keras(preds, labels):
    # # scale preds so that the class probas of each sample sum to 1
    preds /= np.sum(preds, axis=- 1).reshape(-1, 1)
    _epsilon = 10e-8
    output = np.clip(preds, _epsilon, 1. - _epsilon)
    loss = - np.sum(labels * np.log(output), axis= - 1)
    return np.mean(loss)

but the results of this function are exactly the same as utils.categorical_crossentropy.

The text was updated successfully, but these errors were encountered:

tkipf · 2017-10-24T07:41:36Z

Thanks for your question. Note that the evaluate_preds function takes an additional index/mask array, e.g. idx_train which is not the case in your keras-based implementation. This explains the difference in measured accuracy.

dawnranger · 2017-11-30T08:04:28Z

Sorry for late response. I think I solved this issue. I mistakenly thought that the argument sample_weight is enough to apply weight to the input data. In fact we also need to use weighted_metrics but not metrics. I found that evaluate_pred can be replaced by mode.evaluate therefore we don't need to implement metrics categorical_crossentropy and accuracy. The codes can be simplified to keras-based implementation without changing the results. Here are my solutions:

add weighted_metrics categorical_crossentropy and accuracy to model.compile:

model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.01), weighted_metrics=['categorical_crossentropy', 'accuracy'])

evaluate the train data using mode.evaluate:

_, train_loss, train_acc = model.evaluate(
        graph, y_train, sample_weight=train_mask, batch_size=X.shape[0], verbose=0)

To be more simplified, we can also get rid of loops over epochs and use earlystopping callbacks.

tkipf · 2017-12-03T16:34:21Z

This sounds good, thanks for looking into this! Feel free to make a pull request if you think this might be helpful for other users as well.

tkipf closed this as completed Dec 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

difference between evaluate_preds and model.evaluate #16

difference between evaluate_preds and model.evaluate #16

dawnranger commented Oct 23, 2017 •

edited

Loading

tkipf commented Oct 24, 2017

dawnranger commented Nov 30, 2017 •

edited

Loading

tkipf commented Dec 3, 2017

difference between evaluate_preds and model.evaluate #16

difference between evaluate_preds and model.evaluate #16

Comments

dawnranger commented Oct 23, 2017 • edited Loading

tkipf commented Oct 24, 2017

dawnranger commented Nov 30, 2017 • edited Loading

tkipf commented Dec 3, 2017

dawnranger commented Oct 23, 2017 •

edited

Loading

dawnranger commented Nov 30, 2017 •

edited

Loading