Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding evaluation on S3DIS #30

Open
chrischoy opened this issue Feb 24, 2020 · 1 comment
Open

Question regarding evaluation on S3DIS #30

chrischoy opened this issue Feb 24, 2020 · 1 comment

Comments

@chrischoy
Copy link

Dear Bo,

Thanks for sharing the code. It was pretty easy to run, but I have a few questions regarding the evaluation of the S3DIS.

I just trained your network from scratch simply by running main_train.py on S3DIS, but, I found out that the final evaluations on Area 5 were different from the ones reported on the paper, and wanted to know the cause of the discrepancy.

In Table 3 of your paper https://arxiv.org/pdf/1906.01140.pdf, the Area 5 mPrec and mRecall are 57.5 and 40.2, but the final results that I got were 53.36 and 40.55 respectively.

Is the default variable different from the one you reported on the paper?

Second question is why did you use mPrec and mRecall separately? It is quite standard to use mAP which measures the unweighted class-wise average of the areas under the precision-recall curve for all classes. What're your thoughts on the evaluation metrics?

Finally, would it be possible for you to share the class-wise results of the reported baselines? I would prefer to compute mAP on all baselines and it would be nice if you could share the results that you got on S3DIS.

Thanks!
Chris

@Yang7879
Copy link
Owner

Hi @chrischoy , thanks for your interest in our paper.
(1) The results can indeed be different from the paper if the model is trained again. It can be better or worse. For example, the released model (trained after NeurIPS submission) for Area 5 is better than the paper, but it may also be worse as you reported. I guess the primary reason is the Hungarian algorithm which may bring about instability during training. It seems a more stable back-prop algorithm is worthwhile to be explored. (Btw, all network configurations are the same.)

(2) I do agree mAP is more general to measure the results of obj detection or ins segmentation. However, the reported mAP scores of the first paper SGPN are incorrect according to their released code, which is also pointed out in GSPN. For a fair comparison with the ASIS which was SoTA on S3DIS, we simply follow their mPrec/mRec protocol. For the benefit of the community, I strongly believe a standard mAP protocol and the correct implementation are quite important.

(3) Here are the per-category pre/rec scores of 3D-BoNet (6 fold cross-validation), but unfortunately the results for other baselines are no longer available.

-----------pre/rec-------
ceiling: 0.8852/0.6180
floor: 0.8989/0.7464
wall: 0.6487/0.4999
beam: 0.4230/0.4217
column: 0.4801/0.2716
window: 0.9301/0.6242
door: 0.6676/0.5845
table: 0.5539/0.4861
chair: 0.7198/0.6158
sofa: 0.4972/0.2876
bookcase: 0.5830/0.2843
board: 0.8074/0.4648
clutter: 0.4762/0.2860

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants