The evaluation results are different #25

Bigtuo · 2021-07-10T01:39:48Z

epoch20
WSADE: 15.483413498840125
ADEv, ADEp, ADEb: 15.903188312281609 15.339882580782012 15.480199725137442
WSFDE: 15.815679429015853
FDEv, FDEp, FDEb: 16.56610081965146 15.41657854746371 16.185653216166408

xincoder · 2021-07-10T03:10:49Z

Hi @Bigtuo, thank you for trying our code.
If this is what you get after submitting your results to ApolloScape website, there is a high probability that your code has bugs. Because you did you provide any useful information about how you get this result (Code modification, GPUs, OS, Libraries, etc.), I cannot figure out the reason.

There are so many researchers have tried our code and got duplicated results.
The code we released can get similar results (after submitting results to ApolloScape):
GRIP 1.2524 2.2081 0.7142 1.8024 2.3440 3.9805 1.3732 3.4155

If you are sure that your code does not have any bugs, there are some factors may impact the precision (but not in a large scale):

The GPUs. Different GPUs have different precision.
Driver/Library version. The NVIDIA driver, CUDA, CUDNN, Pytorch, Python, even OS can impact the precision.

Bigtuo · 2021-07-10T08:57:06Z

I didn't make any changes,and I use APOLLO's evaluation.py. In addition to your inconsistent results, the official baseline evaluation results are also inconsistent.
The evaluation results of baseline model are as follows:
WSADE: 35.52371758873721
ADEv, ADEp, ADEb: 35.73273751079803 36.26945977251843 33.367651902349614
WSFDE: 18.967547482188102
FDEv, FDEp, FDEb: 24.395048414119227 16.602132518526005 20.269549720996242

How should I solve it

xincoder · 2021-07-10T09:24:26Z

@Bigtuo if you did not change the code, and directly run the "evaluation.py", you are expected to get a very bad result.
Because they have already clearly describe that "./test_eval_data/prediction_gt.txt is just for testing the code which is not the real ground truth.", which means using testing on a fake ground truth. Please submit your result to their website to get true error. Thanks.

Bigtuo · 2021-07-10T09:26:31Z

thanks😁

…

---Original--- From: ***@***.***> Date: Sat, Jul 10, 2021 17:24 PM To: ***@***.***>; Cc: ***@***.******@***.***>; Subject: Re: [xincoder/GRIP] The evaluation results are different (#25) @Bigtuo if you did not change the code, and directly run the "evaluation.py", you are expected to get a very bad result. Because they have already clearly describe that "./test_eval_data/prediction_gt.txt is just for testing the code which is not the real ground truth.", which means using testing on a fake ground truth. Please submit your result to their website to get true error. Thanks. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

xincoder closed this as completed Jul 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The evaluation results are different #25

The evaluation results are different #25

Bigtuo commented Jul 10, 2021

xincoder commented Jul 10, 2021

Bigtuo commented Jul 10, 2021

xincoder commented Jul 10, 2021

Bigtuo commented Jul 10, 2021 via email

The evaluation results are different #25

The evaluation results are different #25

Comments

Bigtuo commented Jul 10, 2021

xincoder commented Jul 10, 2021

Bigtuo commented Jul 10, 2021

xincoder commented Jul 10, 2021

Bigtuo commented Jul 10, 2021 via email