Question regarding evaluation on S3DIS #30

chrischoy · 2020-02-24T21:08:38Z

Dear Bo,

Thanks for sharing the code. It was pretty easy to run, but I have a few questions regarding the evaluation of the S3DIS.

I just trained your network from scratch simply by running main_train.py on S3DIS, but, I found out that the final evaluations on Area 5 were different from the ones reported on the paper, and wanted to know the cause of the discrepancy.

In Table 3 of your paper https://arxiv.org/pdf/1906.01140.pdf, the Area 5 mPrec and mRecall are 57.5 and 40.2, but the final results that I got were 53.36 and 40.55 respectively.

Is the default variable different from the one you reported on the paper?

Second question is why did you use mPrec and mRecall separately? It is quite standard to use mAP which measures the unweighted class-wise average of the areas under the precision-recall curve for all classes. What're your thoughts on the evaluation metrics?

Finally, would it be possible for you to share the class-wise results of the reported baselines? I would prefer to compute mAP on all baselines and it would be nice if you could share the results that you got on S3DIS.

Thanks!
Chris

The text was updated successfully, but these errors were encountered:

Yang7879 · 2020-02-26T02:01:32Z

Hi @chrischoy , thanks for your interest in our paper.
(1) The results can indeed be different from the paper if the model is trained again. It can be better or worse. For example, the released model (trained after NeurIPS submission) for Area 5 is better than the paper, but it may also be worse as you reported. I guess the primary reason is the Hungarian algorithm which may bring about instability during training. It seems a more stable back-prop algorithm is worthwhile to be explored. (Btw, all network configurations are the same.)

(2) I do agree mAP is more general to measure the results of obj detection or ins segmentation. However, the reported mAP scores of the first paper SGPN are incorrect according to their released code, which is also pointed out in GSPN. For a fair comparison with the ASIS which was SoTA on S3DIS, we simply follow their mPrec/mRec protocol. For the benefit of the community, I strongly believe a standard mAP protocol and the correct implementation are quite important.

(3) Here are the per-category pre/rec scores of 3D-BoNet (6 fold cross-validation), but unfortunately the results for other baselines are no longer available.

-----------pre/rec-------
ceiling: 0.8852/0.6180
floor: 0.8989/0.7464
wall: 0.6487/0.4999
beam: 0.4230/0.4217
column: 0.4801/0.2716
window: 0.9301/0.6242
door: 0.6676/0.5845
table: 0.5539/0.4861
chair: 0.7198/0.6158
sofa: 0.4972/0.2876
bookcase: 0.5830/0.2843
board: 0.8074/0.4648
clutter: 0.4762/0.2860

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding evaluation on S3DIS #30

Question regarding evaluation on S3DIS #30

chrischoy commented Feb 24, 2020

Yang7879 commented Feb 26, 2020

Question regarding evaluation on S3DIS #30

Question regarding evaluation on S3DIS #30

Comments

chrischoy commented Feb 24, 2020

Yang7879 commented Feb 26, 2020