Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The structure of my resnet is different from built in PocketFlow,I how to change the structure? #19

Closed
as754770178 opened this issue Nov 6, 2018 · 11 comments
Assignees

Comments

@as754770178
Copy link

as754770178 commented Nov 6, 2018

command:
./scripts/run_local.sh nets/resnet_at_cifar10_run.py --learner dis-chn-pruned

error:
`NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key model/resnet_model/batch_normalization/beta not found in checkpoint
[[Node: model/save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_model/save/Const_0_0, model/save/RestoreV2/tensor_names, model/save/RestoreV2/shape_and_slices)]]
[[Node: model/save/RestoreV2/_27 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_32_model/save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]`

The structure of my resnet is different from built in PocketFlow,I how to change the structure?

@jiaxiang-wu
Copy link
Contributor

Hi, are you using checkpoint files produced by your own training code, instead of pre-trained models provided by us? This will cause the above error message.

If you do need to use your own model definition and pre-trained models, then you need to create your own ModelHelper class and a Python script to use it, similar to:

  1. nets/resnet_at_cifar10.py (which defines a ModelHelper class)
  2. nets/resnet_at_cifar10_run.py (which uses the above class)

@as754770178
Copy link
Author

I download models_resnet_56_at_cifar_10.tar.gz from https://api.ai.tencent.com/pocketflow/list.html, and decompress it in models.

error:
InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [32] rhs shape= [16]
[[Node: model/save/Assign_13 = Assign[T=DT_FLOAT, _class=["loc:@model/resnet_model/batch_normalization_11/gamma"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/resnet_model/batch_normalization_11/gamma, model/save/RestoreV2/_27)]]
[[Node: model/save/RestoreV2/_98 = _SendT=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_104_model/save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

@jiaxiang-wu jiaxiang-wu self-assigned this Nov 6, 2018
@BowieHsu
Copy link

BowieHsu commented Nov 6, 2018

you have to modify the layer num in resnet.py,the default layer num should be 50 or 101

@jiaxiang-wu
Copy link
Contributor

Hi @as754770178
For nets/resnet_at_cifar10_run.py, the default number of layers is 20. Since you have downloaded the ResNet-56 model, you need to specify the number of layers with --resnet_size 56.

@as754770178
Copy link
Author

Thanks. I misunderstood the function of PocketFlow, I think the net defined in nets/resnet_at_cifar10_run.py is the student net. Actually, PocketFlow Pruning/Quantization the net defined in nets/resnet_at_cifar10_run.py as the student net? Is my idea correct?

@jiaxiang-wu
Copy link
Contributor

I'm not sure whether I have understood your question.

In PocketFlow, the student network and teacher network (only exists if network distillation is enabled) share the same network architecture. The student network may have further restrictions introduced by pruning or quantization operations, while the teacher network is the full-precision uncompressed network. Does this resolve your question?

@as754770178
Copy link
Author

I want to confirm that the student only come from the pruning or quantization operations of the full-precision uncompressed network in the network distillation.

@jiaxiang-wu
Copy link
Contributor

Yes, for all model compression methods in PocketFlow, the compressed network (or student network) only comes from the pruned / quantized version of a full-precision uncompressed network (or teacher network). This is irrelevant to network distillation, which only adds a distillation loss term in the training of student network.

@as754770178
Copy link
Author

ok, thanks

@as754770178
Copy link
Author

I define my net, but the name of variable is Prefixed of 'model', such as 'model/resnet_v1_110/block1/unit_1/bottleneck2_v1/conv1/BatchNorm/beta ', but it should is 'resnet_v1_110/block1/unit_1/bottleneck2_v1/conv1/BatchNorm/beta '.

`NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key model/resnet_v1_110/block1/unit_1/bottleneck2_v1/conv1/BatchNorm/beta not found in checkpoint
[[Node: model/save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_model/save/Const_0_0, model/save/RestoreV2/tensor_names, model/save/RestoreV2/shape_and_slices)]]
[[Node: model/save/RestoreV2/_1059 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_1064_model/save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]`

@jiaxiang-wu
Copy link
Contributor

It seems this issue has been resolved in #27. Closing. Reopen it if there are any further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants