Question on dynamic results #2

JunMa11 · 2019-10-19T12:43:49Z

Thanks for sharing the great code. It's very clear and out-of-the-box.

Question on "dynamic" results

My friend and I run the code (without any modification) and get the following results.
The results are a little diverse. Some metrics can be reproduced, some metrics (red) are even better than the paper reported results, but some metrics (blue) are degraded.

Could you share your insights on these diverse results? and what could be the possible reason for the degraded results?

We also try to re-run the code on the local server, however, the results are similar.

A minor bug

Here, the case folder name is missed, so all the saved results have the same name and will be overwritten during saving.

UA-MT/code/test_util.py

Lines 28 to 31 in da31df5

    
           if save_result: 
        
               nib.save(nib.Nifti1Image(prediction.astype(np.float32), np.eye(4)), test_save_path + id + "_pred.nii.gz") 
        
               nib.save(nib.Nifti1Image(image[:].astype(np.float32), np.eye(4)), test_save_path + id + "_img.nii.gz") 
        
               nib.save(nib.Nifti1Image(label[:].astype(np.float32), np.eye(4)), test_save_path + id + "_gt.nii.gz")

Finally, I really appreciate that you make the code publicly available. The code is well written, it would be great learning materials for me.

Looking forward to your reply.
Best,
Jun

yulequan · 2019-10-19T14:30:04Z

Hi Jun,

Thanks for your interest in our work. Actually, the results of each model are diverse due to the small number of data in the medical image domain. Most of the results in the paper are the average results of three runs.

Furthermore, we also find the UAMT-UN is more robust (stable) than UAMT. So I would suggest you to add the consistency loss on unlabeled data only and then to see if we can obtain further improvement by adding consistency loss on labeled data.

JunMa11 · 2019-10-20T06:37:48Z

Hi @yulequan ,

Thanks for your reply very much.
It's interesting, I train the UAMT_unlabel model twice on the same GPU server. However, the test results are exactly the same.

1st training

2nd training

Are the results of each model only diverse on different GPU servers?

Best,
Jun

yulequan · 2019-10-20T06:43:20Z

I have fixed the random seed in the code. So if you run the experiment on the same machine, the results should be similar. I am not sure if we can get the same results on different machine and environments even with the same random seed.

JunMa11 · 2019-10-20T06:47:20Z

Got it. Thanks for your reply very much.

JunMa11 closed this as completed Oct 20, 2019

yulequan mentioned this issue Feb 13, 2021

the results does not match with your paper #12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on dynamic results #2

Question on dynamic results #2

JunMa11 commented Oct 19, 2019

yulequan commented Oct 19, 2019 •

edited

Loading

JunMa11 commented Oct 20, 2019

yulequan commented Oct 20, 2019

JunMa11 commented Oct 20, 2019

Question on dynamic results #2

Question on dynamic results #2

Comments

JunMa11 commented Oct 19, 2019