Half model error #28

NingNanXin · 2023-03-01T08:19:32Z

感谢大佬们开源的工作。
在使用TPAT产生插件ScatterElements的时候，产生如下报错
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_52) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=1 Target=3 Dimension=0
我的运行命令为：
python onnx_to_plugin.py -i CodeFormer.onnx -o plan.onnx -n ScatterElements_1022 -dynamic=true -min=1 -max=6 -opt=3

报错位置发生在python/cuda_kernels.py compute_tensor()，重新加载half_model.onnx的时候，报错日志如下

 File "/data//TPAT/python/onnx_to_plugin.py", line 287, in <module>
    onnx2plugin(
  File "/data//TPAT/python/onnx_to_plugin.py", line 190, in onnx2plugin
    onnx_name_mapping_trt_plugin = generate_plugin_library(
  File "/data//TPAT/python/onnx_to_plugin.py", line 85, in generate_plugin_library
    cuda_kernel.run()
  File "/data//TPAT/python/cuda_kernel.py", line 54, in run
    graph_def = self.extract_target_onnx_node(self._onnx_model)
  File "/data//TPAT/python/cuda_kernel.py", line 211, in extract_target_onnx_node
    computed_tensor_shapes = self.compute_tensor_shape(
  File "/data//TPAT/python/cuda_kernel.py", line 163, in compute_tensor_shape
    session = ort.InferenceSession(half_model_path, providers=EP_list)
  File "/home/ningnx/anaconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 360, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/ningnx/anaconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 408, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_52) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=1 Target=2 Dimension=0

附上onnx的地址
原始onnx
half model

希望大佬帮忙可以解答一下这个问题，感恩！

The text was updated successfully, but these errors were encountered:

buptqq · 2023-03-01T09:16:52Z

TPAT在生成Plugin的过程里，会多次使用shapeinference 推算出出对应算子的输入输出形状（shape-inference对无法预测的场景会使用onnx-runtime真实的跑一遍）. 但CodeFormer.onnx这个文件似乎没有办法用Onnx-runtime跑起来，你可以先确保这个文件能用shape inference和onnx runtime跑起来吗？
shape infernece : http://www.xavierdupre.fr/app/onnxcustom/helpsphinx/api/onnx_python/shape_inference.html
Onnx-runtiem : https://onnxruntime.ai/docs/

buptqq · 2023-03-01T09:19:21Z

TPAT在生成Plugin的过程里，会多次使用shapeinference 推算出出对应算子的输入输出形状（shape-inference对无法预测的场景会使用onnx-runtime真实的跑一遍）. 但CodeFormer.onnx这个文件似乎没有办法用Onnx-runtime跑起来，你可以先确保这个文件能用shape inference和onnx runtime跑起来吗？ shape infernece : http://www.xavierdupre.fr/app/onnxcustom/helpsphinx/api/onnx_python/shape_inference.html Onnx-runtiem : https://onnxruntime.ai/docs/

另外：对于比较大的onnx，我们比较建议可以手写一个onnx，包括你需要生成plugin的op，Shape保持一致，生成了对应的plugin之后，将这个比较大的onnx type改为plugin的Class name。这样onnx-parser也可以识别你这个plugin

NingNanXin · 2023-03-01T09:29:26Z

感谢回复，原版的onnx使用onnxruntime是可以正常使用，明天我会尝试shape inference测试，感谢大佬的回复，明天上午我会放上测试结果以及代码 QianQiu ***@***.***> 于2023年3月1日周三 17:19写道：

…

TPAT在生成Plugin的过程里，会多次使用shapeinference 推算出出对应算子的输入输出形状（shape-inference对无法预测的场景会使用onnx-runtime真实的跑一遍）. 但CodeFormer.onnx这个文件似乎没有办法用Onnx-runtime跑起来，你可以先确保这个文件能用shape inference和onnx runtime跑起来吗？ shape infernece : http://www.xavierdupre.fr/app/onnxcustom/helpsphinx/api/onnx_python/shape_inference.html Onnx-runtiem : https://onnxruntime.ai/docs/ 另外：对于比较大的onnx，我们比较建议可以手写一个onnx，包括你需要生成plugin的op，Shape保持一致，生成了对应的plugin之后，将这个比较大的onnx type改为plugin的Class name。这样onnx-parser也可以识别你这个plugin — Reply to this email directly, view it on GitHub <#28 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANOJGHQBZXMGE4FZOYNAYCDWZ4ICLANCNFSM6AAAAAAVLXCGZM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

NingNanXin · 2023-03-02T01:30:04Z

TPAT在生成Plugin的过程里，会多次使用shapeinference 推算出出对应算子的输入输出形状（shape-inference对无法预测的场景会使用onnx-runtime真实的跑一遍）. 但CodeFormer.onnx这个文件似乎没有办法用Onnx-runtime跑起来，你可以先确保这个文件能用shape inference和onnx runtime跑起来吗？ shape infernece : http://www.xavierdupre.fr/app/onnxcustom/helpsphinx/api/onnx_python/shape_inference.html Onnx-runtiem : https://onnxruntime.ai/docs/

这是我的onnx测试代码，使用shape_Infer和onnxruntime均没有问题。这是一个超分辨率的模型

def onnx_infer():
    # shape_infer
    onnx_model = onnx.load("CodeFormer.onnx")
    infer_shape = shape_inference.infer_shapes(onnx_model)
    onnx.checker.check_model(onnx_model)

    sess = onnxruntime.InferenceSession("CodeFormer.onnx")
    image = cv2.imread("00_00.png")
    image = cv2.resize(image, (512, 512), interpolation=cv2.INTER_LINEAR)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = image / 255
    mean = np.array([0.5, 0.5, 0.5])
    std = np.array([0.5, 0.5, 0.5])
    image = (image - mean) / std
    image = np.transpose(image, [2, 0, 1])
    image = np.expand_dims(image, 0).astype(np.float32)

    output = sess.run([], {"input": image})[0]

    output = np.squeeze(output, 0)
    output = np.clip(output, -1, 1)
    output = ((output + 1) / 2) * 255
    output = np.transpose(output, [1, 2, 0])
    output = cv2.cvtColor(output, cv2.COLOR_RGB2BGR)
    cv2.imwrite("test.png", output)

测试图片

buptqq · 2023-03-02T04:54:08Z

ScatterElement

我这里没有对应的一些环境，可以请你用shape_inference和onnxruntime跑一下half_model.onnx吗？ half_model.onnx是CodeFormer.onnx 里从input截取到ScatterElements这个op的子图

NingNanXin · 2023-03-02T06:20:01Z

加载了half_model后报一样的错误
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_52) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=1 Target=2 Dimension=0
暂时我怀疑是动态batch的问题，但是我还没有定位到该问题是发生在模型的哪个环节。
我将代码内的dynamic_batch强制为false，并将onnx的导出设置为batch=1，成功生成了插件。

buptqq · 2023-03-02T06:41:09Z

实际上TPAT的dynamic Batch的方案使用Padding的方式实现的。核心思路是对dynamic batch的onnx模型填充进batch维，生成了各自对应的plugin之后，用一个统一的plugin给拼起来。
对onnx model里的batch赋真实值的Code：python/onnx_to_plugin.py : add_explicit_bs 函数。
所以对于比较大的模型，例如整个图里bs所在的维度可能会发生改变，当用Shape-inference和Onnx-Runtime运行从input截取到目标Node的子图的时候，就可能会failed.
但是更简单的方式其实是创建一个与CodeFormer.onnx有相同输入输出Shape的ScatterElements onnx算子，生成了plugin之后用于CodeFormer.onnx。

加载了half_model后报一样的错误 onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_52) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=1 Target=2 Dimension=0 暂时我怀疑是动态batch的问题，但是我还没有定位到该问题是发生在模型的哪个环节。我将代码内的dynamic_batch强制为false，并将onnx的导出设置为batch=1，成功生成了插件。

NingNanXin · 2023-03-02T06:42:51Z

实际上TPAT的dynamic Batch的方案使用Padding的方式实现的。核心思路是对dynamic batch的onnx模型填充进batch维，生成了各自对应的plugin之后，用一个统一的plugin给拼起来。对onnx model里的batch赋真实值的Code：python/onnx_to_plugin.py : add_explicit_bs 函数。所以对于比较大的模型，例如整个图里bs所在的维度可能会发生改变，当用Shape-inference和Onnx-Runtime运行从input截取到目标Node的子图的时候，就可能会failed. 但是更简单的方式其实是创建一个与CodeFormer.onnx有相同输入输出Shape的ScatterElements onnx算子，生成了plugin之后用于CodeFormer.onnx。

加载了half_model后报一样的错误 onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_52) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=1 Target=2 Dimension=0 暂时我怀疑是动态batch的问题，但是我还没有定位到该问题是发生在模型的哪个环节。我将代码内的dynamic_batch强制为false，并将onnx的导出设置为batch=1，成功生成了插件。

了解🫡

NingNanXin closed this as completed Mar 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Half model error #28

Half model error #28

NingNanXin commented Mar 1, 2023

buptqq commented Mar 1, 2023

buptqq commented Mar 1, 2023

NingNanXin commented Mar 1, 2023 via email

NingNanXin commented Mar 2, 2023

buptqq commented Mar 2, 2023

NingNanXin commented Mar 2, 2023

buptqq commented Mar 2, 2023 •

edited

Loading

NingNanXin commented Mar 2, 2023

Half model error #28

Half model error #28

Comments

NingNanXin commented Mar 1, 2023

buptqq commented Mar 1, 2023

buptqq commented Mar 1, 2023

NingNanXin commented Mar 1, 2023 via email

NingNanXin commented Mar 2, 2023

buptqq commented Mar 2, 2023

NingNanXin commented Mar 2, 2023

buptqq commented Mar 2, 2023 • edited Loading

NingNanXin commented Mar 2, 2023

buptqq commented Mar 2, 2023 •

edited

Loading