Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Half model error #28

Closed
NingNanXin opened this issue Mar 1, 2023 · 8 comments
Closed

Half model error #28

NingNanXin opened this issue Mar 1, 2023 · 8 comments

Comments

@NingNanXin
Copy link

感谢大佬们开源的工作。
在使用TPAT产生插件ScatterElements的时候,产生如下报错
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_52) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=1 Target=3 Dimension=0
我的运行命令为:
python onnx_to_plugin.py -i CodeFormer.onnx -o plan.onnx -n ScatterElements_1022 -dynamic=true -min=1 -max=6 -opt=3

报错位置发生在python/cuda_kernels.py compute_tensor(),重新加载half_model.onnx的时候,报错日志如下

 File "/data//TPAT/python/onnx_to_plugin.py", line 287, in <module>
    onnx2plugin(
  File "/data//TPAT/python/onnx_to_plugin.py", line 190, in onnx2plugin
    onnx_name_mapping_trt_plugin = generate_plugin_library(
  File "/data//TPAT/python/onnx_to_plugin.py", line 85, in generate_plugin_library
    cuda_kernel.run()
  File "/data//TPAT/python/cuda_kernel.py", line 54, in run
    graph_def = self.extract_target_onnx_node(self._onnx_model)
  File "/data//TPAT/python/cuda_kernel.py", line 211, in extract_target_onnx_node
    computed_tensor_shapes = self.compute_tensor_shape(
  File "/data//TPAT/python/cuda_kernel.py", line 163, in compute_tensor_shape
    session = ort.InferenceSession(half_model_path, providers=EP_list)
  File "/home/ningnx/anaconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 360, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/ningnx/anaconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 408, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_52) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=1 Target=2 Dimension=0

附上onnx的地址
原始onnx
half model

希望大佬帮忙可以解答一下这个问题,感恩!

@buptqq
Copy link
Collaborator

buptqq commented Mar 1, 2023

TPAT在生成Plugin的过程里,会多次使用shapeinference 推算出出对应算子的输入输出形状(shape-inference对无法预测的场景会使用onnx-runtime真实的跑一遍). 但CodeFormer.onnx这个文件似乎没有办法用Onnx-runtime跑起来,你可以先确保这个文件能用shape inference和onnx runtime跑起来吗?
shape infernece : http://www.xavierdupre.fr/app/onnxcustom/helpsphinx/api/onnx_python/shape_inference.html
Onnx-runtiem : https://onnxruntime.ai/docs/

@buptqq
Copy link
Collaborator

buptqq commented Mar 1, 2023

TPAT在生成Plugin的过程里,会多次使用shapeinference 推算出出对应算子的输入输出形状(shape-inference对无法预测的场景会使用onnx-runtime真实的跑一遍). 但CodeFormer.onnx这个文件似乎没有办法用Onnx-runtime跑起来,你可以先确保这个文件能用shape inference和onnx runtime跑起来吗? shape infernece : http://www.xavierdupre.fr/app/onnxcustom/helpsphinx/api/onnx_python/shape_inference.html Onnx-runtiem : https://onnxruntime.ai/docs/

另外:对于比较大的onnx,我们比较建议可以手写一个onnx,包括你需要生成plugin的op,Shape保持一致,生成了对应的plugin之后,将这个比较大的onnx type改为plugin的Class name。这样onnx-parser也可以识别你这个plugin

@NingNanXin
Copy link
Author

NingNanXin commented Mar 1, 2023 via email

@NingNanXin
Copy link
Author

TPAT在生成Plugin的过程里,会多次使用shapeinference 推算出出对应算子的输入输出形状(shape-inference对无法预测的场景会使用onnx-runtime真实的跑一遍). 但CodeFormer.onnx这个文件似乎没有办法用Onnx-runtime跑起来,你可以先确保这个文件能用shape inference和onnx runtime跑起来吗? shape infernece : http://www.xavierdupre.fr/app/onnxcustom/helpsphinx/api/onnx_python/shape_inference.html Onnx-runtiem : https://onnxruntime.ai/docs/

这是我的onnx测试代码,使用shape_Infer和onnxruntime均没有问题。这是一个超分辨率的模型

def onnx_infer():
    # shape_infer
    onnx_model = onnx.load("CodeFormer.onnx")
    infer_shape = shape_inference.infer_shapes(onnx_model)
    onnx.checker.check_model(onnx_model)

    sess = onnxruntime.InferenceSession("CodeFormer.onnx")
    image = cv2.imread("00_00.png")
    image = cv2.resize(image, (512, 512), interpolation=cv2.INTER_LINEAR)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = image / 255
    mean = np.array([0.5, 0.5, 0.5])
    std = np.array([0.5, 0.5, 0.5])
    image = (image - mean) / std
    image = np.transpose(image, [2, 0, 1])
    image = np.expand_dims(image, 0).astype(np.float32)

    output = sess.run([], {"input": image})[0]

    output = np.squeeze(output, 0)
    output = np.clip(output, -1, 1)
    output = ((output + 1) / 2) * 255
    output = np.transpose(output, [1, 2, 0])
    output = cv2.cvtColor(output, cv2.COLOR_RGB2BGR)
    cv2.imwrite("test.png", output)

测试图片
00_00

@buptqq
Copy link
Collaborator

buptqq commented Mar 2, 2023

ScatterElement

我这里没有对应的一些环境,可以请你用shape_inference和onnxruntime跑一下half_model.onnx吗? half_model.onnx是CodeFormer.onnx 里从input截取到ScatterElements这个op的子图

@NingNanXin
Copy link
Author

加载了half_model后报一样的错误
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_52) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=1 Target=2 Dimension=0
暂时我怀疑是动态batch的问题,但是我还没有定位到该问题是发生在模型的哪个环节。
我将代码内的dynamic_batch强制为false,并将onnx的导出设置为batch=1,成功生成了插件。

@buptqq
Copy link
Collaborator

buptqq commented Mar 2, 2023

实际上TPAT的dynamic Batch的方案使用Padding的方式实现的。核心思路是对dynamic batch的onnx模型填充进batch维,生成了各自对应的plugin之后,用一个统一的plugin给拼起来。
对onnx model里的batch赋真实值的Code:python/onnx_to_plugin.py : add_explicit_bs 函数。
所以对于比较大的模型,例如整个图里bs所在的维度可能会发生改变,当用Shape-inference和Onnx-Runtime运行从input截取到目标Node的子图的时候,就可能会failed.
但是更简单的方式其实是 创建一个与CodeFormer.onnx有相同输入输出Shape的ScatterElements onnx算子,生成了plugin之后用于CodeFormer.onnx。

加载了half_model后报一样的错误 onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_52) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=1 Target=2 Dimension=0 暂时我怀疑是动态batch的问题,但是我还没有定位到该问题是发生在模型的哪个环节。 我将代码内的dynamic_batch强制为false,并将onnx的导出设置为batch=1,成功生成了插件。

@NingNanXin
Copy link
Author

实际上TPAT的dynamic Batch的方案使用Padding的方式实现的。核心思路是对dynamic batch的onnx模型填充进batch维,生成了各自对应的plugin之后,用一个统一的plugin给拼起来。 对onnx model里的batch赋真实值的Code:python/onnx_to_plugin.py : add_explicit_bs 函数。 所以对于比较大的模型,例如整个图里bs所在的维度可能会发生改变,当用Shape-inference和Onnx-Runtime运行从input截取到目标Node的子图的时候,就可能会failed. 但是更简单的方式其实是 创建一个与CodeFormer.onnx有相同输入输出Shape的ScatterElements onnx算子,生成了plugin之后用于CodeFormer.onnx。

加载了half_model后报一样的错误 onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (Concat_52) Op (Concat) [ShapeInferenceError] Can't merge shape info. Both source and target dimension have values but they differ. Source=1 Target=2 Dimension=0 暂时我怀疑是动态batch的问题,但是我还没有定位到该问题是发生在模型的哪个环节。 我将代码内的dynamic_batch强制为false,并将onnx的导出设置为batch=1,成功生成了插件。

了解🫡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants