# Polygraphy 功能探索

看起来 Polygraphy 的功能很强大, 探索下部分功能.

[Polygraphy github](https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy)

Polygraphy 里面既有 cli 工具, 也有 python api, 这里主要使用命令行工具.

# 安装

跳过安装的部分, 当前镜像里有, 先不看了, 专注于使用.

```bash
python -m pip install colored polygraphy --extra-index-url https://pypi.ngc.nvidia.com
```

In [1]:
!polygraphy -h

usage: polygraphy [-h] [-v]
                  {run,convert,inspect,surgeon,template,debug,data} ...

Polygraphy: A Deep Learning Debugging Toolkit

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

Tools:
  {run,convert,inspect,surgeon,template,debug,data}
    run                 Run inference and compare results across backends.
    convert             Convert models to other formats.
    inspect             View information about various types of files.
    surgeon             Modify ONNX models.
    template            [EXPERIMENTAL] Generate template files.
    debug               [EXPERIMENTAL] Debug model accuracy issues.
    data                Manipulate input and output data generated by other
                        Polygraphy subtools.


# polygraphy run

In [2]:
!polygraphy run -h

usage: polygraphy run [-h] [-v] [-q] [--silent]
                      [--log-format {timestamp,line-info,no-colors} [{timestamp,line-info,no-colors} ...]]
                      [--log-file LOG_FILE]
                      [--model-type {frozen,keras,ckpt,onnx,engine,uff,trt-network-script,caffe}]
                      [--input-shapes INPUT_SHAPES [INPUT_SHAPES ...]]
                      [--ckpt CKPT] [--tf-outputs TF_OUTPUTS [TF_OUTPUTS ...]]
                      [--save-pb SAVE_PB]
                      [--save-tensorboard SAVE_TENSORBOARD] [--freeze-graph]
                      [--tftrt] [--minimum-segment-size MINIMUM_SEGMENT_SIZE]
                      [--dynamic-op]
                      [--gpu-memory-fraction GPU_MEMORY_FRACTION]
                      [--allow-growth] [--xla] [--save-timeline SAVE_TIMELINE]
                      [--opset OPSET] [--no-const-folding]
                      [--save-onnx SAVE_ONNX]
                      [--save-external-data [SAVE_EXTERNAL_DATA]]
   

run 的参数一大片, 还是要仔细读一下.

主要的功能已经一句话讲完了, `Run inference and compare results across backends.`, 用来运行推理, 并比较不同后端的结果的.
这部分功能可以用来调试模型转换.

In [6]:
# 手动修复下 /opt/conda/lib/python3.8/site-packages/polygraphy/backend/onnxrt/loader.py 中的最后一行, 提供下 providers
# return onnxruntime.InferenceSession(model_bytes, providers=["CUDAExecutionProvider"])
!polygraphy run ./onnx/model_torch.onnx --onnxrt --onnx-outputs mark all --save-results=onnx_out.json \
    --data-loader-script /workspace/examples/data_loader.py

[38;5;14m[I] onnxrt-runner-N0-01/17/23-14:21:50  | Activating and starting inference[0m
[I] Loading model: /workspace/examples/onnx/model_torch.onnx
[I] onnxrt-runner-N0-01/17/23-14:21:50 
    ---- Inference Input(s) ----
    {input_ids [dtype=int64, shape=(1, 128)],
     attention_mask [dtype=int64, shape=(1, 128)],
     token_type_ids [dtype=int64, shape=(1, 128)]}
[I] onnxrt-runner-N0-01/17/23-14:21:50 
    ---- Inference Output(s) ----
    {/bert/Unsqueeze_output_0 [dtype=int64, shape=(1, 1, 128)],
     /bert/Unsqueeze_1_output_0 [dtype=int64, shape=(1, 1, 1, 128)],
     /bert/Cast_output_0 [dtype=float32, shape=(1, 1, 1, 128)],
     /bert/Sub_output_0 [dtype=float32, shape=(1, 1, 1, 128)],
     /bert/Mul_output_0 [dtype=float32, shape=(1, 1, 1, 128)],
     /bert/embeddings/Shape_output_0 [dtype=int64, shape=(2,)],
     /bert/embeddings/Gather_output_0 [dtype=int64, shape=()],
     /bert/embeddings/Unsqueeze_output_0 [dtype=int64, shape=(1,)],
     /bert/embeddings/Slice_output_0

In [9]:
!polygraphy run ./onnx/model_torch.onnx --trt --validate --trt-outputs mark all --save-results=trt_out.json \
    --data-loader-script /workspace/examples/data_loader.py \
    --trt-min-shapes input_ids:[1,128] attention_mask:[1,128] token_type_ids:[1,128] \
    --trt-opt-shapes input_ids:[1,128] attention_mask:[1,128] token_type_ids:[1,128] \
    --trt-max-shapes input_ids:[1,128] attention_mask:[1,128] token_type_ids:[1,128]

[38;5;14m[I] trt-runner-N0-01/17/23-14:44:05     | Activating and starting inference[0m
[01/17/2023-14:44:06] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:368: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/17/2023-14:44:08] [TRT] [W] Output type must be INT32 for shape outputs
[01/17/2023-14:44:08] [TRT] [W] Output type must be INT32 for shape outputs
[01/17/2023-14:44:08] [TRT] [W] Output type must be INT32 for shape outputs
[01/17/2023-14:44:08] [TRT] [W] Output type must be INT32 for shape outputs
[I]     Configuring with profiles: [Profile().add(input_ids, min=[1, 128], opt=[1, 128], max=[1, 128]).add(attention_mask, min=[1, 128], opt=[1, 128], max=[1, 128]).add(token_type_ids, min=[1, 128], opt=[1, 128], max=[1, 128])]
[38;5;14m[I] Building engine with configuration:
    Workspace            | 16777216 bytes (16.00 MiB)
    Precision            | TF32: False, FP16: False, INT8: False, St

In [10]:
!polygraphy debug --help

usage: polygraphy debug [-h] [-v] [-q] [--silent]
                        [--log-format {timestamp,line-info,no-colors} [{timestamp,line-info,no-colors} ...]]
                        [--log-file LOG_FILE]
                        {build,precision,diff-tactics,reduce,repeat} ...

[EXPERIMENTAL] Debug model accuracy issues.

optional arguments:
  -h, --help            show this help message and exit

Logging:
  Options for logging and debug output

  -v, --verbose         Increase logging verbosity. Specify multiple times for
                        higher verbosity
  -q, --quiet           Decrease logging verbosity. Specify multiple times for
                        lower verbosity
  --silent              Disable all output
  --log-format {timestamp,line-info,no-colors} [{timestamp,line-info,no-colors} ...]
                        Format for log messages: {{'timestamp': Include
                        timestamp, 'line-info': Include file and line number,
                        'no-color

In [12]:
!polygraphy debug precision --help

usage: polygraphy debug precision [-h] [-v] [-q] [--silent]
                                  [--log-format {timestamp,line-info,no-colors} [{timestamp,line-info,no-colors} ...]]
                                  [--log-file LOG_FILE]
                                  [--artifacts ARTIFACTS [ARTIFACTS ...]]
                                  [--art-dir DIR] --check ...
                                  [--fail-code FAIL_CODES [FAIL_CODES ...] |
                                  --ignore-fail-code IGNORE_FAIL_CODES
                                  [IGNORE_FAIL_CODES ...]]
                                  [--fail-regex FAIL_REGEX [FAIL_REGEX ...]]
                                  [--show-output | --hide-fail-output]
                                  [--iter-artifact ITER_ARTIFACT]
                                  [--no-remove-intermediate]
                                  [--iter-info ITERATION_INFO]
                                  [--model-type {frozen,keras,ckpt,onnx,engine,uff,t

In [21]:
# 不能加 --trt-outputs mark all, 但不加的话只有最后的输出
!polygraphy run ./onnx/model_torch.onnx --trt --validate --save-engine polygraphy_debug.engine --save-results=trt_out.json \
    --data-loader-script /workspace/examples/data_loader.py \
    --trt-min-shapes input_ids:[1,128] attention_mask:[1,128] token_type_ids:[1,128] \
    --trt-opt-shapes input_ids:[1,128] attention_mask:[1,128] token_type_ids:[1,128] \
    --trt-max-shapes input_ids:[1,128] attention_mask:[1,128] token_type_ids:[1,128]

[38;5;14m[I] trt-runner-N0-01/17/23-15:07:36     | Activating and starting inference[0m
[01/17/2023-15:07:37] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:368: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/17/2023-15:07:39] [TRT] [W] Output type must be INT32 for shape outputs
[01/17/2023-15:07:39] [TRT] [W] Output type must be INT32 for shape outputs
[01/17/2023-15:07:39] [TRT] [W] Output type must be INT32 for shape outputs
[01/17/2023-15:07:39] [TRT] [W] Output type must be INT32 for shape outputs
[I]     Configuring with profiles: [Profile().add(input_ids, min=[1, 128], opt=[1, 128], max=[1, 128]).add(attention_mask, min=[1, 128], opt=[1, 128], max=[1, 128]).add(token_type_ids, min=[1, 128], opt=[1, 128], max=[1, 128])]
[38;5;14m[I] Building engine with configuration:
    Workspace            | 16777216 bytes (16.00 MiB)
    Precision            | TF32: False, FP16: False, INT8: False, St

In [31]:
# 对比下两个框架的输出结果, trt 和 onnx
!polygraphy run ./onnx/model_torch.onnx --trt --validate --onnxrt --atol 1e-1 --rtol 1e-1 \
    --fp16 \
    --data-loader-script /workspace/examples/data_loader.py \
    --trt-min-shapes input_ids:[1,128] attention_mask:[1,128] token_type_ids:[1,128] \
    --trt-opt-shapes input_ids:[1,128] attention_mask:[1,128] token_type_ids:[1,128] \
    --trt-max-shapes input_ids:[1,128] attention_mask:[1,128] token_type_ids:[1,128]

[38;5;14m[I] trt-runner-N0-01/18/23-16:25:28     | Activating and starting inference[0m
[01/18/2023-16:25:29] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:368: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/18/2023-16:25:31] [TRT] [W] Output type must be INT32 for shape outputs
[01/18/2023-16:25:31] [TRT] [W] Output type must be INT32 for shape outputs
[01/18/2023-16:25:31] [TRT] [W] Output type must be INT32 for shape outputs
[01/18/2023-16:25:31] [TRT] [W] Output type must be INT32 for shape outputs
[I]     Configuring with profiles: [Profile().add(input_ids, min=[1, 128], opt=[1, 128], max=[1, 128]).add(attention_mask, min=[1, 128], opt=[1, 128], max=[1, 128]).add(token_type_ids, min=[1, 128], opt=[1, 128], max=[1, 128])]
[38;5;14m[I] Building engine with configuration:
    Workspace            | 16777216 bytes (16.00 MiB)
    Precision            | TF32: False, FP16: True, INT8: False, Str

In [22]:
!polygraphy debug precision ./onnx/model_torch.onnx --tf32 \
    --check polygraphy run polygraphy_debug.engine --trt --load-outputs onnx_out.json --abs 1e-1

[38;5;9m[!] polygraphy_debug.engine already exists, refusing to overwrite.
    Please specify a different path for the intermediate artifact with --intermediate-artifact[0m
