Some weights of AutoencoderKL were not initialized from the model checkpoint at /path/to/Latte/t2v_required_models/ and are newly initialized because the shapes did not match: #66

likeatingcake · 2024-03-27T10:17:06Z

decoder.conv_in.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.conv_in.weight: found shape torch.Size([512, 4, 3, 3]) in the checkpoint and torch.Size([64, 4, 3, 3]) in the model instantiated
decoder.conv_norm_out.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.conv_norm_out.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.conv_out.weight: found shape torch.Size([3, 128, 3, 3]) in the checkpoint and torch.Size([3, 64, 3, 3]) in the model instantiated
decoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
decoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
decoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
decoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
decoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.up_blocks.0.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.up_blocks.0.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.up_blocks.0.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
decoder.up_blocks.0.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
decoder.up_blocks.0.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.conv_in.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.conv_in.weight: found shape torch.Size([128, 3, 3, 3]) in the checkpoint and torch.Size([64, 3, 3, 3]) in the model instantiated
encoder.conv_norm_out.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.conv_norm_out.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.conv_out.weight: found shape torch.Size([8, 512, 3, 3]) in the checkpoint and torch.Size([8, 64, 3, 3]) in the model instantiated
encoder.down_blocks.0.resnets.0.conv1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.down_blocks.0.resnets.0.conv1.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
encoder.down_blocks.0.resnets.0.conv2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.down_blocks.0.resnets.0.conv2.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
encoder.down_blocks.0.resnets.0.norm1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.down_blocks.0.resnets.0.norm1.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.down_blocks.0.resnets.0.norm2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.down_blocks.0.resnets.0.norm2.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
encoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
encoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
encoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated
encoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
encoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
encoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
encoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated
encoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated
encoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

当我执行命令bash sample/t2v.sh ,出现预训练模型与实际模型形状不匹配的情况，请问这个问题该如何解决呀？谢谢您！

maxin-cn · 2024-03-27T22:58:30Z

decoder.conv_in.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.conv_in.weight: found shape torch.Size([512, 4, 3, 3]) in the checkpoint and torch.Size([64, 4, 3, 3]) in the model instantiated

decoder.conv_norm_out.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.conv_norm_out.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.conv_out.weight: found shape torch.Size([3, 128, 3, 3]) in the checkpoint and torch.Size([3, 64, 3, 3]) in the model instantiated

decoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_in.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_in.weight: found shape torch.Size([128, 3, 3, 3]) in the checkpoint and torch.Size([64, 3, 3, 3]) in the model instantiated

encoder.conv_norm_out.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_norm_out.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_out.weight: found shape torch.Size([8, 512, 3, 3]) in the checkpoint and torch.Size([8, 64, 3, 3]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv1.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv2.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm1.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm2.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

当我执行命令bash sample/t2v.sh ,出现预训练模型与实际模型形状不匹配的情况，请问这个问题该如何解决呀？谢谢您！

It looks like you used an incorrect pre-trained model when loading the vae model. Please check it.

likeatingcake · 2024-03-28T06:40:02Z

decoder.conv_in.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.conv_in.weight: found shape torch.Size([512, 4, 3, 3]) in the checkpoint and torch.Size([64, 4, 3, 3]) in the model instantiated

decoder.conv_norm_out.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.conv_norm_out.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.conv_out.weight: found shape torch.Size([3, 128, 3, 3]) in the checkpoint and torch.Size([3, 64, 3, 3]) in the model instantiated

decoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

decoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

decoder.up_blocks.0.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_in.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_in.weight: found shape torch.Size([128, 3, 3, 3]) in the checkpoint and torch.Size([64, 3, 3, 3]) in the model instantiated

encoder.conv_norm_out.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_norm_out.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.conv_out.weight: found shape torch.Size([8, 512, 3, 3]) in the checkpoint and torch.Size([8, 64, 3, 3]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv1.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.conv2.weight: found shape torch.Size([128, 128, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm1.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm1.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm2.bias: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.down_blocks.0.resnets.0.norm2.weight: found shape torch.Size([128]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.group_norm.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.group_norm.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_k.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_k.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.attentions.0.to_out.0.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_out.0.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.attentions.0.to_q.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_q.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.attentions.0.to_v.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.attentions.0.to_v.weight: found shape torch.Size([512, 512]) in the checkpoint and torch.Size([64, 64]) in the model instantiated

encoder.mid_block.resnets.0.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.0.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.0.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.0.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.conv1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.conv1.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.1.conv2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.conv2.weight: found shape torch.Size([512, 512, 3, 3]) in the checkpoint and torch.Size([64, 64, 3, 3]) in the model instantiated

encoder.mid_block.resnets.1.norm1.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.norm1.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.norm2.bias: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

encoder.mid_block.resnets.1.norm2.weight: found shape torch.Size([512]) in the checkpoint and torch.Size([64]) in the model instantiated

当我执行命令bash sample/t2v.sh ,出现预训练模型与实际模型形状不匹配的情况，请问这个问题该如何解决呀？谢谢您！

It looks like you used an incorrect pre-trained model when loading the vae model. Please check it.

(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/t2v.sh
Using model!
Traceback (most recent call last):
File "/home/yueyc/Latte/sample/sample_t2v.py", line 167, in
main(OmegaConf.load(args.config))
File "/home/yueyc/Latte/sample/sample_t2v.py", line 38, in main
vae = AutoencoderKL.from_pretrained(args.pretrained_model_path, subfolder="vae", torch_dtype=torch.float16).to(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/modeling_utils.py", line 812, in from_pretrained
unexpected_keys = load_model_dict_into_meta(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/modeling_utils.py", line 155, in load_model_dict_into_meta
raise ValueError(
ValueError: Cannot load /home/yueyc/Latte/t2v_required_models/ because decoder.conv_in.bias expected shape tensor(..., device='meta', size=(64,)), but got torch.Size([512]). If you want to instead overwrite randomly initialized weights, please make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. For more information, see also: huggingface/diffusers#1619 (comment) as an example.
之前的代码在加载vae预训练模型时，我添加了 low_cpu_mem_usage=False and `ignore_mismatched_sizes=True这两个参数，但会出现之前提到的警告，如果不添加这两个参数，便会出现上面的错误。

github-actions · 2024-07-19T03:33:04Z

Hi There! 👋

This issue has been marked as stale due to inactivity for 60 days.

We would like to inquire if you still have the same problem or if it has been resolved.

If you need further assistance, please feel free to respond to this comment within the next 7 days. Otherwise, the issue will be automatically closed.

We appreciate your understanding and would like to express our gratitude for your contribution to Latte. Thank you for your support. 🙏

maxin-cn added the duplicate This issue or pull request already exists label Mar 29, 2024

github-actions bot added the automatic-stale label Jul 19, 2024

maxin-cn closed this as completed Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some weights of AutoencoderKL were not initialized from the model checkpoint at /path/to/Latte/t2v_required_models/ and are newly initialized because the shapes did not match: #66

Some weights of AutoencoderKL were not initialized from the model checkpoint at /path/to/Latte/t2v_required_models/ and are newly initialized because the shapes did not match: #66

likeatingcake commented Mar 27, 2024

maxin-cn commented Mar 27, 2024

likeatingcake commented Mar 28, 2024

github-actions bot commented Jul 19, 2024

Some weights of AutoencoderKL were not initialized from the model checkpoint at /path/to/Latte/t2v_required_models/ and are newly initialized because the shapes did not match: #66

Some weights of AutoencoderKL were not initialized from the model checkpoint at /path/to/Latte/t2v_required_models/ and are newly initialized because the shapes did not match: #66

Comments

likeatingcake commented Mar 27, 2024

maxin-cn commented Mar 27, 2024

likeatingcake commented Mar 28, 2024

github-actions bot commented Jul 19, 2024