Cannot load checkpoints #69

danieltudosiu · 2022-05-16T16:54:00Z

Hi

Thank you very much for your great work.

I was trying to integrate your models in my codebase but I cannot seem to be able to load the models for depth 10, 18 and 34 due to share mismatch of checkpoint data and model definition.

Could you please help me out? My code is bellow:

import torch
from models.resnet import ResNet, Bottleneck

for depth in [10, 18, 34, 50]:
    if depth == 10:
        layers = [1, 1, 1, 1]
        shortcut_type = "B"
    elif depth == 18:
        layers = [2, 2, 2, 2]
        shortcut_type = "A"
    elif depth == 34:
        layers = [3, 4, 6, 3]
        shortcut_type = "A"
    elif depth == 50:
        layers = [3, 4, 6, 3]
        shortcut_type = "B"
    elif depth == 101:
        layers = [3, 4, 23, 3]
        shortcut_type = "B"
    elif depth == 152:
        layers = [3, 8, 36, 3]
        shortcut_type = "B"
    elif depth == 200:
        layers = [3, 24, 36, 3]
        shortcut_type = "B"

    model = ResNet(
        block=Bottleneck,
        layers=layers,
        sample_input_D=56,
        sample_input_H=448,
        sample_input_W=448,
        num_seg_classes=2,
        shortcut_type=shortcut_type,
        no_cuda=True,
    )

    try:
        checkpoint = torch.load(
            f"/path/to/weights/pretrain/resnet_{depth}_23dataset.pth"
        )
        net_dict = model.state_dict()
        pretrain_dict = {
            k.replace("module.", ""): v for k, v in checkpoint["state_dict"].items()
        }
        pretrain_dict = {k: v for k, v in pretrain_dict.items() if k in net_dict.keys()}
        net_dict.update(pretrain_dict)
        model.load_state_dict(net_dict)
    except RuntimeError as err:
        print(f"Depth {depth} did not work.")
        print(err)

It seems that your make_layer method is not correct from my understanding of the model differences.

The errors I get are:

Depth 10 did not work.
Error(s) in loading state_dict for ResNet:
	size mismatch for layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 1, 1, 1]).
	size mismatch for layer2.0.conv1.weight: copying a param with shape torch.Size([128, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1, 1]).
	size mismatch for layer2.0.downsample.0.weight: copying a param with shape torch.Size([128, 64, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1, 1]).
	size mismatch for layer2.0.downsample.1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for layer2.0.downsample.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for layer2.0.downsample.1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for layer2.0.downsample.1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for layer3.0.conv1.weight: copying a param with shape torch.Size([256, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1, 1]).
	size mismatch for layer3.0.downsample.0.weight: copying a param with shape torch.Size([256, 128, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 512, 1, 1, 1]).
	size mismatch for layer3.0.downsample.1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for layer3.0.downsample.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for layer3.0.downsample.1.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for layer3.0.downsample.1.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for layer4.0.conv1.weight: copying a param with shape torch.Size([512, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1, 1]).
	size mismatch for layer4.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1, 1]) from checkpoint, the shape in current model is torch.Size([2048, 1024, 1, 1, 1]).
	size mismatch for layer4.0.downsample.1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
	size mismatch for layer4.0.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
	size mismatch for layer4.0.downsample.1.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
	size mismatch for layer4.0.downsample.1.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
Depth 18 did not work.
Error(s) in loading state_dict for ResNet:
	size mismatch for layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 1, 1, 1]).
	size mismatch for layer1.1.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 1, 1, 1]).
	size mismatch for layer2.0.conv1.weight: copying a param with shape torch.Size([128, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1, 1]).
	size mismatch for layer2.1.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1, 1]).
	size mismatch for layer3.0.conv1.weight: copying a param with shape torch.Size([256, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1, 1]).
	size mismatch for layer3.1.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1, 1]).
	size mismatch for layer4.0.conv1.weight: copying a param with shape torch.Size([512, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1, 1]).
	size mismatch for layer4.1.conv1.weight: copying a param with shape torch.Size([512, 512, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 2048, 1, 1, 1]).
Depth 34 did not work.
Error(s) in loading state_dict for ResNet:
	size mismatch for layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 1, 1, 1]).
	size mismatch for layer1.1.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 1, 1, 1]).
	size mismatch for layer1.2.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 1, 1, 1]).
	size mismatch for layer2.0.conv1.weight: copying a param with shape torch.Size([128, 64, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1, 1]).
	size mismatch for layer2.1.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1, 1]).
	size mismatch for layer2.2.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1, 1]).
	size mismatch for layer2.3.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1, 1]).
	size mismatch for layer3.0.conv1.weight: copying a param with shape torch.Size([256, 128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1, 1]).
	size mismatch for layer3.1.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1, 1]).
	size mismatch for layer3.2.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1, 1]).
	size mismatch for layer3.3.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1, 1]).
	size mismatch for layer3.4.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1, 1]).
	size mismatch for layer3.5.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1, 1]).
	size mismatch for layer4.0.conv1.weight: copying a param with shape torch.Size([512, 256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1, 1]).
	size mismatch for layer4.1.conv1.weight: copying a param with shape torch.Size([512, 512, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 2048, 1, 1, 1]).
	size mismatch for layer4.2.conv1.weight: copying a param with shape torch.Size([512, 512, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 2048, 1, 1, 1]).

Process finished with exit code 0

The text was updated successfully, but these errors were encountered:

danieltudosiu closed this as completed May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot load checkpoints #69

Cannot load checkpoints #69

danieltudosiu commented May 16, 2022 •

edited

Cannot load checkpoints #69

Cannot load checkpoints #69

Comments

danieltudosiu commented May 16, 2022 • edited

danieltudosiu commented May 16, 2022 •

edited