Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The evaluation results are different #25

Closed
Bigtuo opened this issue Jul 10, 2021 · 4 comments
Closed

The evaluation results are different #25

Bigtuo opened this issue Jul 10, 2021 · 4 comments

Comments

@Bigtuo
Copy link

Bigtuo commented Jul 10, 2021

epoch20
WSADE: 15.483413498840125
ADEv, ADEp, ADEb: 15.903188312281609 15.339882580782012 15.480199725137442
WSFDE: 15.815679429015853
FDEv, FDEp, FDEb: 16.56610081965146 15.41657854746371 16.185653216166408

@xincoder
Copy link
Owner

Hi @Bigtuo, thank you for trying our code.
If this is what you get after submitting your results to ApolloScape website, there is a high probability that your code has bugs. Because you did you provide any useful information about how you get this result (Code modification, GPUs, OS, Libraries, etc.), I cannot figure out the reason.

There are so many researchers have tried our code and got duplicated results.
The code we released can get similar results (after submitting results to ApolloScape):
GRIP 1.2524 2.2081 0.7142 1.8024 2.3440 3.9805 1.3732 3.4155

If you are sure that your code does not have any bugs, there are some factors may impact the precision (but not in a large scale):

  1. The GPUs. Different GPUs have different precision.
  2. Driver/Library version. The NVIDIA driver, CUDA, CUDNN, Pytorch, Python, even OS can impact the precision.

@Bigtuo
Copy link
Author

Bigtuo commented Jul 10, 2021

I didn't make any changes,and I use APOLLO's evaluation.py. In addition to your inconsistent results, the official baseline evaluation results are also inconsistent.
The evaluation results of baseline model are as follows:
WSADE: 35.52371758873721
ADEv, ADEp, ADEb: 35.73273751079803 36.26945977251843 33.367651902349614
WSFDE: 18.967547482188102
FDEv, FDEp, FDEb: 24.395048414119227 16.602132518526005 20.269549720996242

How should I solve it

@xincoder
Copy link
Owner

@Bigtuo if you did not change the code, and directly run the "evaluation.py", you are expected to get a very bad result.
Because they have already clearly describe that "./test_eval_data/prediction_gt.txt is just for testing the code which is not the real ground truth.", which means using testing on a fake ground truth. Please submit your result to their website to get true error. Thanks.

@Bigtuo
Copy link
Author

Bigtuo commented Jul 10, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants