Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some weights of AutoencoderKL were not initialized from the model checkpoint at /path/to/Latte/t2v_required_models/ and are newly initialized because the shapes did not match: #66

Closed
likeatingcake opened this issue Mar 27, 2024 · 3 comments
Labels
automatic-stale duplicate This issue or pull request already exists

Comments

@likeatingcake
Copy link

  • decoder.conv_in.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_in.weight: found shape torch.Size([512, 4, 3, 3]) in the checkpoint and torch.Size([64, 4, 3, 3]) in the model instantiated
  • decoder.conv_norm_out.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_norm_out.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_out.weight: found shape torch.Size([3, 128, 3, 3]) in the checkpoint and torch.Size([3, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_in.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_in.weight: found shape torch.Size([128, 3, 3, 3]) in the checkpoint and torch.Size([64, 3, 3, 3]) in the model instantiated
  • encoder.conv_norm_out.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_norm_out.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_out.weight: found shape torch.Size([8, 512, 3, 3]) in the checkpoint and torch.Size([8, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv1.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv2.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm1.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm2.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

当我执行命令bash sample/t2v.sh ,出现预训练模型与实际模型形状不匹配的情况,请问这个问题该如何解决呀?谢谢您!

@maxin-cn
Copy link
Collaborator

  • decoder.conv_in.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_in.weight: found shape torch.Size([512, 4, 3, 3]) in the checkpoint and torch.Size([64, 4, 3, 3]) in the model instantiated
  • decoder.conv_norm_out.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_norm_out.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_out.weight: found shape torch.Size([3, 128, 3, 3]) in the checkpoint and torch.Size([3, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_in.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_in.weight: found shape torch.Size([128, 3, 3, 3]) in the checkpoint and torch.Size([64, 3, 3, 3]) in the model instantiated
  • encoder.conv_norm_out.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_norm_out.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_out.weight: found shape torch.Size([8, 512, 3, 3]) in the checkpoint and torch.Size([8, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv1.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv2.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm1.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm2.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

当我执行命令bash sample/t2v.sh ,出现预训练模型与实际模型形状不匹配的情况,请问这个问题该如何解决呀?谢谢您!

It looks like you used an incorrect pre-trained model when loading the vae model. Please check it.

@likeatingcake
Copy link
Author

  • decoder.conv_in.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_in.weight: found shape torch.Size([512, 4, 3, 3]) in the checkpoint and torch.Size([64, 4, 3, 3]) in the model instantiated
  • decoder.conv_norm_out.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_norm_out.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.conv_out.weight: found shape torch.Size([3, 128, 3, 3]) in the checkpoint and torch.Size([3, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • decoder.up_blocks.0.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_in.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_in.weight: found shape torch.Size([128, 3, 3, 3]) in the checkpoint and torch.Size([64, 3, 3, 3]) in the model instantiated
  • encoder.conv_norm_out.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_norm_out.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.conv_out.weight: found shape torch.Size([8, 512, 3, 3]) in the checkpoint and torch.Size([8, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv1.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.conv2.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm1.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.down_blocks.0.resnets.0.norm2.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
  • encoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
  • encoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

当我执行命令bash sample/t2v.sh ,出现预训练模型与实际模型形状不匹配的情况,请问这个问题该如何解决呀?谢谢您!

It looks like you used an incorrect pre-trained model when loading the vae model. Please check it.

(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/t2v.sh
Using model!
Traceback (most recent call last):
File "/home/yueyc/Latte/sample/sample_t2v.py", line 167, in
main(OmegaConf.load(args.config))
File "/home/yueyc/Latte/sample/sample_t2v.py", line 38, in main
vae = AutoencoderKL.from_pretrained(args.pretrained_model_path, subfolder="vae", torch_dtype=torch.float16).to(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/modeling_utils.py", line 812, in from_pretrained
unexpected_keys = load_model_dict_into_meta(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/modeling_utils.py", line 155, in load_model_dict_into_meta
raise ValueError(
ValueError: Cannot load /home/yueyc/Latte/t2v_required_models/ because decoder.conv_in.bias expected shape tensor(..., device='meta', size=(64,)), but got torch.Size([512]). If you want to instead overwrite randomly initialized weights, please make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. For more information, see also: huggingface/diffusers#1619 (comment) as an example.
之前的代码在加载vae预训练模型时,我添加了 low_cpu_mem_usage=False and `ignore_mismatched_sizes=True这两个参数,但会出现之前提到的警告,如果不添加这两个参数,便会出现上面的错误。

@maxin-cn maxin-cn added the duplicate This issue or pull request already exists label Mar 29, 2024
Copy link

Hi There! 👋

This issue has been marked as stale due to inactivity for 60 days.

We would like to inquire if you still have the same problem or if it has been resolved.

If you need further assistance, please feel free to respond to this comment within the next 7 days. Otherwise, the issue will be automatically closed.

We appreciate your understanding and would like to express our gratitude for your contribution to Latte. Thank you for your support. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automatic-stale duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants