This code chooses best model on test set instead of validation set. #2

zzxn · 2021-01-09T10:58:18Z

At utils/train.py, the train process returns the best MRR and HIT on TEST SET, RESPECTIVELY:

def train(self, epochs, log_interval=100):
    # ...
    for epoch in range(epochs):
        self.model.train()
        for batch in self.train_loader:
        # ...
        mrr, hit = evaluate(self.model, self.test_loader, self.device)  # mrr and hit on test set
        # ...
        max_mrr = max(max_mrr, mrr)  # best mrr on test set
        max_hit = max(max_hit, hit)  # best hit on test set
        self.epoch += 1
    return max_mrr, max_hit  # <- It's NOT OK to return best mrr and hit on TEST SET, RESPECTIVELY

I believe this code is not the final version, please update it to the code used in your paper.

The text was updated successfully, but these errors were encountered:

twchen · 2021-01-10T06:01:10Z

This is actually a common practice in session-based recommendation. The same setting was adopted in some previous works such as STAMP, SR-GNN, and FGNN, and in some recently published works such as DHCN and TAGNN. We could even find the same setting in other fields of recommender systems, such as LightGCN and NGCF.

Despite the popularity of our setting, I agree with you that in principle, we should choose the best model on the validation set and evaluate the model on the test set. The common approach is to first extract a most recent fraction of the data as the test set, and then extract a most recent fraction of the remaining data as the validation set. But this approach may also have a problem, i.e., the distributions of the validation set and the test set are different because their collection times are different, so the performance on the validation set does not reveal the true performance on the test set. I guess this is part of the reason some existing methods did not adopt the correct-in-principle approach. I think a better approach is to first extract a most recent fraction of the data and then randomly split it into the validation and test sets. But I did not think of this approach when we published this work.

In conclusion, I think our setting is OK because it is very common. Besides, it is used for all models in our experiments. The comparison of models' performance is fair.
You could update our code to use the validation set to select the best model. It should be quite easy. However, we would not update our code to make it consistent with our paper.

zzxn · 2021-01-10T09:57:05Z

Still confused by this common practice, but thanks for your reply.

zzxn changed the title ~~This code choose best model on test set instead of validation set.~~ This code chooses best model on test set instead of validation set. Jan 9, 2021

zzxn closed this as completed Jan 10, 2021

lypingk mentioned this issue Feb 2, 2022

Issue the request：RuntimeError #7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This code chooses best model on test set instead of validation set. #2

This code chooses best model on test set instead of validation set. #2

zzxn commented Jan 9, 2021

twchen commented Jan 10, 2021

zzxn commented Jan 10, 2021

This code chooses best model on test set instead of validation set. #2

This code chooses best model on test set instead of validation set. #2

Comments

zzxn commented Jan 9, 2021

twchen commented Jan 10, 2021

zzxn commented Jan 10, 2021