Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot load checkpoints #69

Closed
danieltudosiu opened this issue May 16, 2022 · 0 comments
Closed

Cannot load checkpoints #69

danieltudosiu opened this issue May 16, 2022 · 0 comments

Comments

@danieltudosiu
Copy link

danieltudosiu commented May 16, 2022

Hi

Thank you very much for your great work.

I was trying to integrate your models in my codebase but I cannot seem to be able to load the models for depth 10, 18 and 34 due to share mismatch of checkpoint data and model definition.

Could you please help me out? My code is bellow:

import torch
from models.resnet import ResNet, Bottleneck

for depth in [10, 18, 34, 50]:
    if depth == 10:
        layers = [1, 1, 1, 1]
        shortcut_type = "B"
    elif depth == 18:
        layers = [2, 2, 2, 2]
        shortcut_type = "A"
    elif depth == 34:
        layers = [3, 4, 6, 3]
        shortcut_type = "A"
    elif depth == 50:
        layers = [3, 4, 6, 3]
        shortcut_type = "B"
    elif depth == 101:
        layers = [3, 4, 23, 3]
        shortcut_type = "B"
    elif depth == 152:
        layers = [3, 8, 36, 3]
        shortcut_type = "B"
    elif depth == 200:
        layers = [3, 24, 36, 3]
        shortcut_type = "B"

    model = ResNet(
        block=Bottleneck,
        layers=layers,
        sample_input_D=56,
        sample_input_H=448,
        sample_input_W=448,
        num_seg_classes=2,
        shortcut_type=shortcut_type,
        no_cuda=True,
    )

    try:
        checkpoint = torch.load(
            f"/path/to/weights/pretrain/resnet_{depth}_23dataset.pth"
        )
        net_dict = model.state_dict()
        pretrain_dict = {
            k.replace("module.", ""): v for k, v in checkpoint["state_dict"].items()
        }
        pretrain_dict = {k: v for k, v in pretrain_dict.items() if k in net_dict.keys()}
        net_dict.update(pretrain_dict)
        model.load_state_dict(net_dict)
    except RuntimeError as err:
        print(f"Depth {depth} did not work.")
        print(err)

It seems that your make_layer method is not correct from my understanding of the model differences.

The errors I get are:

Depth 10 did not work.
Error(s) in loading state_dict for ResNet:
	size mismatch for layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 1, 1, 1]).
	size mismatch for layer2.0.conv1.weight: copying a param with shape torch.Size([128, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1, 1]).
	size mismatch for layer2.0.downsample.0.weight: copying a param with shape torch.Size([128, 64, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1, 1]).
	size mismatch for layer2.0.downsample.1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for layer2.0.downsample.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for layer2.0.downsample.1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for layer2.0.downsample.1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for layer3.0.conv1.weight: copying a param with shape torch.Size([256, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1, 1]).
	size mismatch for layer3.0.downsample.0.weight: copying a param with shape torch.Size([256, 128, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 512, 1, 1, 1]).
	size mismatch for layer3.0.downsample.1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for layer3.0.downsample.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for layer3.0.downsample.1.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for layer3.0.downsample.1.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for layer4.0.conv1.weight: copying a param with shape torch.Size([512, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1, 1]).
	size mismatch for layer4.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([2048, 1024, 1, 1, 1]).
	size mismatch for layer4.0.downsample.1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
	size mismatch for layer4.0.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
	size mismatch for layer4.0.downsample.1.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
	size mismatch for layer4.0.downsample.1.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
Depth 18 did not work.
Error(s) in loading state_dict for ResNet:
	size mismatch for layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 1, 1, 1]).
	size mismatch for layer1.1.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 1, 1, 1]).
	size mismatch for layer2.0.conv1.weight: copying a param with shape torch.Size([128, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1, 1]).
	size mismatch for layer2.1.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1, 1]).
	size mismatch for layer3.0.conv1.weight: copying a param with shape torch.Size([256, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1, 1]).
	size mismatch for layer3.1.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1, 1]).
	size mismatch for layer4.0.conv1.weight: copying a param with shape torch.Size([512, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1, 1]).
	size mismatch for layer4.1.conv1.weight: copying a param with shape torch.Size([512, 512, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 2048, 1, 1, 1]).
Depth 34 did not work.
Error(s) in loading state_dict for ResNet:
	size mismatch for layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 1, 1, 1]).
	size mismatch for layer1.1.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 1, 1, 1]).
	size mismatch for layer1.2.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 1, 1, 1]).
	size mismatch for layer2.0.conv1.weight: copying a param with shape torch.Size([128, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1, 1]).
	size mismatch for layer2.1.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1, 1]).
	size mismatch for layer2.2.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1, 1]).
	size mismatch for layer2.3.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1, 1]).
	size mismatch for layer3.0.conv1.weight: copying a param with shape torch.Size([256, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1, 1]).
	size mismatch for layer3.1.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1, 1]).
	size mismatch for layer3.2.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1, 1]).
	size mismatch for layer3.3.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1, 1]).
	size mismatch for layer3.4.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1, 1]).
	size mismatch for layer3.5.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1, 1]).
	size mismatch for layer4.0.conv1.weight: copying a param with shape torch.Size([512, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1, 1]).
	size mismatch for layer4.1.conv1.weight: copying a param with shape torch.Size([512, 512, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 2048, 1, 1, 1]).
	size mismatch for layer4.2.conv1.weight: copying a param with shape torch.Size([512, 512, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 2048, 1, 1, 1]).

Process finished with exit code 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant