`SPPF` will generate nodes with duplicate names #234

SarBH · 2021-11-23T19:04:09Z

🐛 Describe the bug

Somewhere between these two commits there was a model backbone change: 06022fd...e3e18f2. The three MaxPool2d at backbone.body.8.m.0, backbone.body.8.m.1, backbone.body.8.m.2 go from being parallel to being serialized in the later hash into just backbone.body.8.m with three outputs. (I'm using nni==2.4 to prune the model, and a node with three outputs is a problem for that)

For example, the SMALL model.
On the left: commit hash 06022fd, default upstream_version = r4.0
On the right: commit hash e3e18f2, default value for upstream_version changed with the addition of r6.0, so I set the upstream_version=r4.0 explicitly.

I see the same behavior for 'yolov5s', 'yolov5m', 'yolov5l'

All else in the models remained the same, therefore I wondered if this was accidental.

Versions

Collecting environment information...
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.14.4
Libc version: glibc-2.31

Python version: 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-4.14.252-195.483.amzn2.x86_64-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.2.142
GPU models and configuration: GPU 0: Tesla K80
Nvidia driver version: 450.142.00
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] efficientnet-pytorch==0.6.3
[pip3] mypy==0.910
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.19.5
[pip3] numpydoc==1.1.0
[pip3] pytorch-lightning==1.5.2
[pip3] pytorchcv==0.0.58
[pip3] segmentation-models-pytorch==0.2.1
[pip3] torch==1.7.1
[pip3] torchinfo==1.5.3
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.8.2
[conda] Could not collect

The text was updated successfully, but these errors were encountered:

zhiqwang · 2021-11-24T02:37:42Z

Hi @SarBH , Thanks for reporting and providing this information!

All else in the models remained the same, therefore I wondered if this was accidental.

The origin of this change is in upstream YOLOv5 ultralytics/yolov5#4420, the SPPF we adopted here is a faster version of SPP. I think it will also benefit the earlier version, as such I replace this part in 451f3e4 with a verification of numerical equality.

EDITED: A detailed test about the SPPF vs SPP - ultralytics/yolov5#4420 (comment) .

https://github.com/zhiqwang/yolov5-rt-stack/blob/4cba0437389e2cccbcc299ab196c922945c93d45/yolort/v5/models/common.py#L191-L208

I'm using nni==2.4 to prune the model, and a node with three outputs is a problem for that

Actually, I don't quite understand the phenomenon that occurs here, could you provide me with more information, or a reproducible example? (I also have some interest in nni and am following its progress).

If the above change affects the use of downstream application, we can revert this change back or we can work together to find a better way to handle this scenario.

SarBH · 2021-11-30T17:12:33Z

Thanks for the follow up @zhiqwang, and for this awesome project!

I investigated where exactly pruning is failing:

the compressor.py file asserts that assert len(node.outputs) == 1, 'The number of the output should be one after the Tuple unpacked manually' .
I see that all other (non failing) nodes in this model indeed have a single output, but this one MaxPool2d node has 3 outputs after the update (Below is a printout of the node, see last line):

name: backbone.body.8.m, type: module, op_type: MaxPool2d, sub_nodes: 
['__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_aten::max_pool2d', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_aten::max_pool2d', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_aten::max_pool2d', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct'], 
inputs: ['input.85'], outputs: ['input.86', 'input.87', '4815'], aux: None

The old hash’s model that doesnt fail prunning actually splits that backbone.body.8.m layer into backbone.body.8.m.0, backbone.body.8.m.1, backbone.body.8.m.2 , but it is otherwise the same exact model.

** Note: We are using an older version of nni package nni==2.4 so I'm not sure if this is resolved already. From the fact that compressor.py still has the assert im inclined to beleive new version will also fail.
I'm not an expert on pruning, but I hope this helps :)

zhiqwang · 2021-12-01T03:52:48Z

I see that all other (non failing) nodes in this model indeed have a single output, but this one MaxPool2d node has 3 outputs after the update.

Got it, Thanks for the details informations, it is very useful. I agree with you, we will revert this substitution it this two days.

zhiqwang · 2021-12-01T05:03:41Z

Hi @SarBH ,

I revert the SPPF to SPP in #240 both in "r4.0" and "r6.0", and as such I'm closing this issue.

Please reinstall the yolort from source (we'll distribute 0.6.0 at the end of the month.)

pip install -U 'git+https://github.com/zhiqwang/yolov5-rt-stack.git'

Thanks for the detailed information again, feel free to reopen this or create another ticket if you have more questions.

syswyl · 2022-01-25T03:19:17Z

Thanks for the follow up @zhiqwang, and for this awesome project!

I investigated where exactly pruning is failing:

the compressor.py file asserts that assert len(node.outputs) == 1, 'The number of the output should be one after the Tuple unpacked manually' . I see that all other (non failing) nodes in this model indeed have a single output, but this one MaxPool2d node has 3 outputs after the update (Below is a printout of the node, see last line):
name: backbone.body.8.m, type: module, op_type: MaxPool2d, sub_nodes: 
['__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_aten::max_pool2d', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_aten::max_pool2d', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_aten::max_pool2d', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct'], 
inputs: ['input.85'], outputs: ['input.86', 'input.87', '4815'], aux: None 
The old hash’s model that doesnt fail prunning actually splits that backbone.body.8.m layer into backbone.body.8.m.0, backbone.body.8.m.1, backbone.body.8.m.2 , but it is otherwise the same exact model.

** Note: We are using an older version of nni package nni==2.4 so I'm not sure if this is resolved already. From the fact that compressor.py still has the assert im inclined to beleive new version will also fail. I'm not an expert on pruning, but I hope this helps :)

Hello @SarBH , I encountered the same problem as you when using nni2.6. Considering the reason of sppf, I decided to use the yolov5 version 5, because the old version still uses the basic spp module, but the pruning process appeared new error, have you succeeded in pruning yolov5?

[2022-01-25 11:21:04] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for model.12
model.12
node inputs:['4932', 'input.79']
node outputs:['input.107']
file:nni/compression/pytorch/speedup/compressor.py
Traceback (most recent call last):
File "v12_old_yolo.py", line 85, in
ModelSpeedup(model, dummy_input=dummy_input.to(device), masks_file=masks).speedup_model()
File "/nni-master25/nni/compression/pytorch/speedup/compressor.py", line 545, in speedup_model
self.infer_modules_masks()
File "/nni-master25/nni/compression/pytorch/speedup/compressor.py", line 390, in infer_modules_masks
self.update_direct_sparsity(curnode)
File "/nni-master25/nni/compression/pytorch/speedup/compressor.py", line 234, in update_direct_sparsity
state_dict=copy.deepcopy(module.state_dict()), batch_dim=self.batch_dim)
File "/nni-master25/nni/compression/pytorch/speedup/infer_mask.py", line 80, in init
self.output = self.module(*dummy_input)
File "/Users/anaconda3/envs/py37torch17/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
TypeError: forward() takes 2 positional arguments but 3 were given

Hap-Zhang · 2022-11-16T12:40:07Z

Thanks for the follow up @zhiqwang, and for this awesome project!
I investigated where exactly pruning is failing:
the compressor.py file asserts that assert len(node.outputs) == 1, 'The number of the output should be one after the Tuple unpacked manually' . I see that all other (non failing) nodes in this model indeed have a single output, but this one MaxPool2d node has 3 outputs after the update (Below is a printout of the node, see last line):
name: backbone.body.8.m, type: module, op_type: MaxPool2d, sub_nodes: 
['__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_aten::max_pool2d', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_aten::max_pool2d', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_aten::max_pool2d', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct', 
'__module.backbone/__module.backbone.body/__module.backbone.body.8/__module.backbone.body.8.m_prim::ListConstruct'], 
inputs: ['input.85'], outputs: ['input.86', 'input.87', '4815'], aux: None 
The old hash’s model that doesnt fail prunning actually splits that backbone.body.8.m layer into backbone.body.8.m.0, backbone.body.8.m.1, backbone.body.8.m.2 , but it is otherwise the same exact model.
** Note: We are using an older version of nni package nni==2.4 so I'm not sure if this is resolved already. From the fact that compressor.py still has the assert im inclined to beleive new version will also fail. I'm not an expert on pruning, but I hope this helps :)
Hello @SarBH , I encountered the same problem as you when using nni2.6. Considering the reason of sppf, I decided to use the yolov5 version 5, because the old version still uses the basic spp module, but the pruning process appeared new error, have you succeeded in pruning yolov5?

[2022-01-25 11:21:04] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for model.12
model.12
node inputs:['4932', 'input.79']
node outputs:['input.107']
file:nni/compression/pytorch/speedup/compressor.py
Traceback (most recent call last):
File "v12_old_yolo.py", line 85, in
ModelSpeedup(model, dummy_input=dummy_input.to(device), masks_file=masks).speedup_model()
File "/nni-master25/nni/compression/pytorch/speedup/compressor.py", line 545, in speedup_model
self.infer_modules_masks()
File "/nni-master25/nni/compression/pytorch/speedup/compressor.py", line 390, in infer_modules_masks
self.update_direct_sparsity(curnode)
File "/nni-master25/nni/compression/pytorch/speedup/compressor.py", line 234, in update_direct_sparsity
state_dict=copy.deepcopy(module.state_dict()), batch_dim=self.batch_dim)
File "/nni-master25/nni/compression/pytorch/speedup/infer_mask.py", line 80, in init
self.output = self.module(*dummy_input)
File "/Users/anaconda3/envs/py37torch17/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
TypeError: forward() takes 2 positional arguments but 3 were given

Hi, @zhiqwang @syswyl
i met the same error now, have you solved this problem? And can you give me some guidance? Thanks very much!

zhiqwang mentioned this issue Dec 1, 2021

Revert SPPF to SPP #240

Merged

zhiqwang added the bug / fix Something isn't working label Dec 1, 2021

zhiqwang closed this as completed in #240 Dec 1, 2021

zhiqwang mentioned this issue Jan 24, 2022

pruning detection models microsoft/nni#4477

Closed

zhiqwang mentioned this issue Mar 2, 2022

AIMET and YOLOv5 quic/aimet#1067

Open

zhiqwang changed the title ~~After code refactor, model backbone for r4.0 has changed~~ SPPF will generate nodes with duplicate names Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`SPPF` will generate nodes with duplicate names #234

`SPPF` will generate nodes with duplicate names #234

SarBH commented Nov 23, 2021

zhiqwang commented Nov 24, 2021 •

edited

SarBH commented Nov 30, 2021 •

edited

zhiqwang commented Dec 1, 2021

zhiqwang commented Dec 1, 2021 •

edited

syswyl commented Jan 25, 2022 •

edited

Hap-Zhang commented Nov 16, 2022

SPPF will generate nodes with duplicate names #234

SPPF will generate nodes with duplicate names #234

Comments

SarBH commented Nov 23, 2021

🐛 Describe the bug

Versions

zhiqwang commented Nov 24, 2021 • edited

SarBH commented Nov 30, 2021 • edited

zhiqwang commented Dec 1, 2021

zhiqwang commented Dec 1, 2021 • edited

syswyl commented Jan 25, 2022 • edited

Hap-Zhang commented Nov 16, 2022

`SPPF` will generate nodes with duplicate names #234

`SPPF` will generate nodes with duplicate names #234

zhiqwang commented Nov 24, 2021 •

edited

SarBH commented Nov 30, 2021 •

edited

zhiqwang commented Dec 1, 2021 •

edited

syswyl commented Jan 25, 2022 •

edited