Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce issues #8

Open
yyNoBug opened this issue Jun 21, 2024 · 8 comments
Open

Reproduce issues #8

yyNoBug opened this issue Jun 21, 2024 · 8 comments

Comments

@yyNoBug
Copy link

yyNoBug commented Jun 21, 2024

Hi authors! Thank you so much for your code and scripts. I am trying to replicate your DiM-H results. I trained the model on ImageNet with the given script configs/imagenet256_H_DiM.py for 300k epochs, but I only get a fid score of 2.7. While it is not a huge difference from your checkpoint, it is still a bit weird to see such a difference since I have used the exact same settings. I would really appreciate it if you have any insights for my issue.

Here are the FID results.

2024-06-20 14:43:03,462 - eval_ldm_discrete.py - load nnet from /users/yuany/DiM-DiffusionMamba/workdir/imagenet256_H_DiM/default/ckpts/300000.ckpt/nnet_ema.pth
2024-06-20 14:43:19,043 - eval_ldm_discrete.py - Use classifier free guidance with scale=0.4
2024-06-20 14:43:19,044 - eval_ldm_discrete.py - algorithm: dpm_solver
cfg: true
mini_batch_size: 25
n_samples: 50000
path: ''
sample_steps: 50
scale: 0.4
2024-06-20 14:43:19,045 - eval_ldm_discrete.py - sample: n_samples=50000, mode=cond, mixed_precision=bf16
2024-06-20 14:43:19,046 - eval_ldm_discrete.py - Samples are saved in /tmp/tmpmo7hdec1
2024-06-20 16:50:23,256 - eval_ldm_discrete.py - nnet_path=/users/yuany/DiM-DiffusionMamba/workdir/imagenet256_H_DiM/default/ckpts/300000.ckpt/nnet_ema.pth, fid=2.7085910746786794

Do you know what might be the issue? Thanks a lot for your help!

@yyNoBug yyNoBug changed the title Replication issues Reproduce issues Jun 21, 2024
@yyNoBug
Copy link
Author

yyNoBug commented Jun 21, 2024

Here are my training logs.

output.log

@tyshiwo1
Copy link
Owner

tyshiwo1 commented Jun 21, 2024

As you can see, our checkpoint provided in the shared google drive link achieves step=300000 fid10000=8.916990087748843. And yours is step=300000 fid10000=8.782864443207245, so the FID with CFG should be close.

Fortunately, I found my experimental records. My FiD50K with CFG on 300K is fid=2.4557618155982937. A difference of 0.25 FID is somehow big, and 0.1 is acceptable.

It is weird.

@yyNoBug
Copy link
Author

yyNoBug commented Jun 21, 2024

Yes. I tested your checkpoint and it gives me something around 2.42, which is reasonable. However the model I trained cannot get this number. Do you have any idea about what might be the cause?

@tyshiwo1
Copy link
Owner

tyshiwo1 commented Jun 21, 2024

Yes. I tested your checkpoint and it gives me something around 2.42, which is reasonable. However the model I trained cannot get this number. Do you have any idea about what might be the cause?

Did you use the same config for evaluation as in training?

@yyNoBug
Copy link
Author

yyNoBug commented Jun 21, 2024

Yes, I used the same config you have provided.

@tyshiwo1
Copy link
Owner

OK.
Your .npz files for calculating FID and the checkpoints of AE are also from U-ViT?

@yyNoBug
Copy link
Author

yyNoBug commented Jun 21, 2024

Yes that's correct.

Does the loss in my training log look reasonable?

@tyshiwo1
Copy link
Owner

tyshiwo1 commented Jun 21, 2024

image

Your loss seems slightly higher than mine. This may cause the difference on performance. Here is my log file:
output.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants