Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

difference between evaluate_preds and model.evaluate #16

Closed
dawnranger opened this issue Oct 23, 2017 · 3 comments
Closed

difference between evaluate_preds and model.evaluate #16

dawnranger opened this issue Oct 23, 2017 · 3 comments

Comments

@dawnranger
Copy link

dawnranger commented Oct 23, 2017

Thanks for your excellent work. Your codes are really helpful.

In your code about evaluating the gcn model, what confused me is the difference between utils.evaluate_preds(your implementation) and model.evaluate(keras API). Here are my changes to evaluate gcn using model.evaluate function:

  • add metric accuracy to model.compile for accuracy logging:
model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.01),
                        metrics=['accuracy'])
  • evaluate the train data using mode.evaluate:
train_loss, train_acc = model.evaluate(
    graph, y_train, sample_weight=train_mask, batch_size=X.shape[0], verbose=0)

the rest code:

print('evaluate: train_loss={:.4f}, train_acc:{:.4f}'.format(train_loss, train_acc))

preds = model.predict(graph, batch_size=X.shape[0])
train_val_loss, train_val_acc = utils.evaluate_preds(preds, [y_train, y_val],
                                                             [idx_train, idx_val])
print("predict:  train_loss={:.4f}, train_acc={:.4f}".format(train_val_loss[0], train_val_acc[0]))

And here are the outputs I got after 10 loops:

evaluate: train_loss=1.9505, train_acc:0.0240
predict:  train_loss=1.9389, train_acc=0.4286

evaluate: train_loss=1.9400, train_acc:0.0222
predict:  train_loss=1.9310, train_acc=0.4143

evaluate: train_loss=1.9294, train_acc:0.0233
predict:  train_loss=1.9216, train_acc=0.4429

evaluate: train_loss=1.9191, train_acc:0.0233
predict:  train_loss=1.9114, train_acc=0.4500

evaluate: train_loss=1.9091, train_acc:0.0229
predict:  train_loss=1.9007, train_acc=0.4429

evaluate: train_loss=1.8993, train_acc:0.0229
predict:  train_loss=1.8895, train_acc=0.4429

evaluate: train_loss=1.8895, train_acc:0.0240
predict:  train_loss=1.8777, train_acc=0.4643

evaluate: train_loss=1.8797, train_acc:0.0240
predict:  train_loss=1.8655, train_acc=0.4643

evaluate: train_loss=1.8697, train_acc:0.0240
predict:  train_loss=1.8529, train_acc=0.4643

evaluate: train_loss=1.8595, train_acc:0.0236
predict:  train_loss=1.8398, train_acc=0.4571

Test set results: loss= 1.8782 accuracy= 0.3590
[Finished in 19.3s]

According to keras doc, regularization mechanisms, such as Dropout and L1/L2 weight regularization, are turned off at testing time.

So why does the loss returned by model.evaluate is not exactly the same as utils.evaluate_preds?

what I have tried:

I tried to implement categorical_crossentropy loss function according to keras tensorflow backend. Here are my codes:

def categorical_crossentropy_keras(preds, labels):
    # # scale preds so that the class probas of each sample sum to 1
    preds /= np.sum(preds, axis=- 1).reshape(-1, 1)
    _epsilon = 10e-8
    output = np.clip(preds, _epsilon, 1. - _epsilon)
    loss = - np.sum(labels * np.log(output), axis= - 1)
    return np.mean(loss)

but the results of this function are exactly the same as utils.categorical_crossentropy.

@tkipf
Copy link
Owner

tkipf commented Oct 24, 2017

Thanks for your question. Note that the evaluate_preds function takes an additional index/mask array, e.g. idx_train which is not the case in your keras-based implementation. This explains the difference in measured accuracy.

@dawnranger
Copy link
Author

dawnranger commented Nov 30, 2017

Sorry for late response. I think I solved this issue. I mistakenly thought that the argument sample_weight is enough to apply weight to the input data. In fact we also need to use weighted_metrics but not metrics. I found that evaluate_pred can be replaced by mode.evaluate therefore we don't need to implement metrics categorical_crossentropy and accuracy. The codes can be simplified to keras-based implementation without changing the results. Here are my solutions:

  1. add weighted_metrics categorical_crossentropy and accuracy to model.compile:
model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.01), weighted_metrics=['categorical_crossentropy', 'accuracy'])
  1. evaluate the train data using mode.evaluate:
_, train_loss, train_acc = model.evaluate(
        graph, y_train, sample_weight=train_mask, batch_size=X.shape[0], verbose=0)

To be more simplified, we can also get rid of loops over epochs and use earlystopping callbacks.

@tkipf
Copy link
Owner

tkipf commented Dec 3, 2017

This sounds good, thanks for looking into this! Feel free to make a pull request if you think this might be helpful for other users as well.

@tkipf tkipf closed this as completed Dec 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants