Reproduce issues #8

yyNoBug · 2024-06-21T01:45:23Z

Hi authors! Thank you so much for your code and scripts. I am trying to replicate your DiM-H results. I trained the model on ImageNet with the given script configs/imagenet256_H_DiM.py for 300k epochs, but I only get a fid score of 2.7. While it is not a huge difference from your checkpoint, it is still a bit weird to see such a difference since I have used the exact same settings. I would really appreciate it if you have any insights for my issue.

Here are the FID results.

2024-06-20 14:43:03,462 - eval_ldm_discrete.py - load nnet from /users/yuany/DiM-DiffusionMamba/workdir/imagenet256_H_DiM/default/ckpts/300000.ckpt/nnet_ema.pth
2024-06-20 14:43:19,043 - eval_ldm_discrete.py - Use classifier free guidance with scale=0.4
2024-06-20 14:43:19,044 - eval_ldm_discrete.py - algorithm: dpm_solver
cfg: true
mini_batch_size: 25
n_samples: 50000
path: ''
sample_steps: 50
scale: 0.4
2024-06-20 14:43:19,045 - eval_ldm_discrete.py - sample: n_samples=50000, mode=cond, mixed_precision=bf16
2024-06-20 14:43:19,046 - eval_ldm_discrete.py - Samples are saved in /tmp/tmpmo7hdec1
2024-06-20 16:50:23,256 - eval_ldm_discrete.py - nnet_path=/users/yuany/DiM-DiffusionMamba/workdir/imagenet256_H_DiM/default/ckpts/300000.ckpt/nnet_ema.pth, fid=2.7085910746786794

Do you know what might be the issue? Thanks a lot for your help!

The text was updated successfully, but these errors were encountered:

yyNoBug · 2024-06-21T01:51:32Z

Here are my training logs.

output.log

tyshiwo1 · 2024-06-21T03:47:47Z

As you can see, our checkpoint provided in the shared google drive link achieves step=300000 fid10000=8.916990087748843. And yours is step=300000 fid10000=8.782864443207245, so the FID with CFG should be close.

Fortunately, I found my experimental records. My FiD50K with CFG on 300K is fid=2.4557618155982937. A difference of 0.25 FID is somehow big, and 0.1 is acceptable.

It is weird.

yyNoBug · 2024-06-21T06:12:31Z

Yes. I tested your checkpoint and it gives me something around 2.42, which is reasonable. However the model I trained cannot get this number. Do you have any idea about what might be the cause?

tyshiwo1 · 2024-06-21T06:20:33Z

Yes. I tested your checkpoint and it gives me something around 2.42, which is reasonable. However the model I trained cannot get this number. Do you have any idea about what might be the cause?

Did you use the same config for evaluation as in training?

yyNoBug · 2024-06-21T06:25:28Z

Yes, I used the same config you have provided.

tyshiwo1 · 2024-06-21T06:34:43Z

OK.
Your .npz files for calculating FID and the checkpoints of AE are also from U-ViT?

yyNoBug · 2024-06-21T07:24:49Z

Yes that's correct.

Does the loss in my training log look reasonable?

tyshiwo1 · 2024-06-21T10:18:05Z

Your loss seems slightly higher than mine. This may cause the difference on performance. Here is my log file:
output.log

yyNoBug changed the title ~~Replication issues~~ Reproduce issues Jun 21, 2024

tyshiwo1 mentioned this issue Jun 28, 2024

RuntimeError: Error(s) in loading state_dict for Mamba2DModel: size mismatch for additional_embed: copying a param with shape torch.Size([1, 1026, 1536]) from checkpoint, the shape in current model is torch.Size([1, 258, 1536]). #9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce issues #8

Reproduce issues #8

yyNoBug commented Jun 21, 2024 •

edited

Loading

yyNoBug commented Jun 21, 2024

tyshiwo1 commented Jun 21, 2024 •

edited

Loading

yyNoBug commented Jun 21, 2024

tyshiwo1 commented Jun 21, 2024 •

edited

Loading

yyNoBug commented Jun 21, 2024

tyshiwo1 commented Jun 21, 2024

yyNoBug commented Jun 21, 2024 •

edited

Loading

tyshiwo1 commented Jun 21, 2024 •

edited

Loading

Reproduce issues #8

Reproduce issues #8

Comments

yyNoBug commented Jun 21, 2024 • edited Loading

yyNoBug commented Jun 21, 2024

tyshiwo1 commented Jun 21, 2024 • edited Loading

yyNoBug commented Jun 21, 2024

tyshiwo1 commented Jun 21, 2024 • edited Loading

yyNoBug commented Jun 21, 2024

tyshiwo1 commented Jun 21, 2024

yyNoBug commented Jun 21, 2024 • edited Loading

tyshiwo1 commented Jun 21, 2024 • edited Loading

yyNoBug commented Jun 21, 2024 •

edited

Loading

tyshiwo1 commented Jun 21, 2024 •

edited

Loading

tyshiwo1 commented Jun 21, 2024 •

edited

Loading

yyNoBug commented Jun 21, 2024 •

edited

Loading

tyshiwo1 commented Jun 21, 2024 •

edited

Loading