/usr/src/tensorrt/bin/trtexec --onnx=resnet18_quant.quant.onnx --saveEngine=resnet18_quant_int8.engine --int8 --useCudaGraph --dumpLayerInfo --profilingVerbosity=detailed --useSpinWait &&&& RUNNING TensorRT.trtexec [TensorRT v8602] # /usr/src/tensorrt/bin/trtexec --onnx=resnet18_quant.quant.onnx --saveEngine=resnet18_quant_int8.engine --int8 --useCudaGraph --dumpLayerInfo --profilingVerbosity=detailed --useSpinWait [08/13/2024-21:57:02] [I] === Model Options === [08/13/2024-21:57:02] [I] Format: ONNX [08/13/2024-21:57:02] [I] Model: resnet18_quant.quant.onnx [08/13/2024-21:57:02] [I] Output: [08/13/2024-21:57:02] [I] === Build Options === [08/13/2024-21:57:02] [I] Max batch: explicit batch [08/13/2024-21:57:02] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [08/13/2024-21:57:02] [I] minTiming: 1 [08/13/2024-21:57:02] [I] avgTiming: 8 [08/13/2024-21:57:02] [I] Precision: FP32+INT8 [08/13/2024-21:57:02] [I] LayerPrecisions: [08/13/2024-21:57:02] [I] Layer Device Types: [08/13/2024-21:57:02] [I] Calibration: Dynamic [08/13/2024-21:57:02] [I] Refit: Disabled [08/13/2024-21:57:02] [I] Version Compatible: Disabled [08/13/2024-21:57:02] [I] ONNX Native InstanceNorm: Disabled [08/13/2024-21:57:02] [I] TensorRT runtime: full [08/13/2024-21:57:02] [I] Lean DLL Path: [08/13/2024-21:57:02] [I] Tempfile Controls: { in_memory: allow, temporary: allow } [08/13/2024-21:57:02] [I] Exclude Lean Runtime: Disabled [08/13/2024-21:57:02] [I] Sparsity: Disabled [08/13/2024-21:57:02] [I] Safe mode: Disabled [08/13/2024-21:57:02] [I] Build DLA standalone loadable: Disabled [08/13/2024-21:57:02] [I] Allow GPU fallback for DLA: Disabled [08/13/2024-21:57:02] [I] DirectIO mode: Disabled [08/13/2024-21:57:02] [I] Restricted mode: Disabled [08/13/2024-21:57:02] [I] Skip inference: Disabled [08/13/2024-21:57:02] [I] Save engine: resnet18_quant_int8.engine [08/13/2024-21:57:02] [I] Load engine: [08/13/2024-21:57:02] [I] Profiling verbosity: 2 [08/13/2024-21:57:02] [I] Tactic sources: Using default tactic sources [08/13/2024-21:57:02] [I] timingCacheMode: local [08/13/2024-21:57:02] [I] timingCacheFile: [08/13/2024-21:57:02] [I] Heuristic: Disabled [08/13/2024-21:57:02] [I] Preview Features: Use default preview flags. [08/13/2024-21:57:02] [I] MaxAuxStreams: -1 [08/13/2024-21:57:02] [I] BuilderOptimizationLevel: -1 [08/13/2024-21:57:02] [I] Input(s)s format: fp32:CHW [08/13/2024-21:57:02] [I] Output(s)s format: fp32:CHW [08/13/2024-21:57:02] [I] Input build shapes: model [08/13/2024-21:57:02] [I] Input calibration shapes: model [08/13/2024-21:57:02] [I] === System Options === [08/13/2024-21:57:02] [I] Device: 0 [08/13/2024-21:57:02] [I] DLACore: [08/13/2024-21:57:02] [I] Plugins: [08/13/2024-21:57:02] [I] setPluginsToSerialize: [08/13/2024-21:57:02] [I] dynamicPlugins: [08/13/2024-21:57:02] [I] ignoreParsedPluginLibs: 0 [08/13/2024-21:57:02] [I] [08/13/2024-21:57:02] [I] === Inference Options === [08/13/2024-21:57:02] [I] Batch: Explicit [08/13/2024-21:57:02] [I] Input inference shapes: model [08/13/2024-21:57:02] [I] Iterations: 10 [08/13/2024-21:57:02] [I] Duration: 3s (+ 200ms warm up) [08/13/2024-21:57:02] [I] Sleep time: 0ms [08/13/2024-21:57:02] [I] Idle time: 0ms [08/13/2024-21:57:02] [I] Inference Streams: 1 [08/13/2024-21:57:02] [I] ExposeDMA: Disabled [08/13/2024-21:57:02] [I] Data transfers: Enabled [08/13/2024-21:57:02] [I] Spin-wait: Enabled [08/13/2024-21:57:02] [I] Multithreading: Disabled [08/13/2024-21:57:02] [I] CUDA Graph: Enabled [08/13/2024-21:57:02] [I] Separate profiling: Disabled [08/13/2024-21:57:02] [I] Time Deserialize: Disabled [08/13/2024-21:57:02] [I] Time Refit: Disabled [08/13/2024-21:57:02] [I] NVTX verbosity: 2 [08/13/2024-21:57:02] [I] Persistent Cache Ratio: 0 [08/13/2024-21:57:02] [I] Inputs: [08/13/2024-21:57:02] [I] === Reporting Options === [08/13/2024-21:57:02] [I] Verbose: Disabled [08/13/2024-21:57:02] [I] Averages: 10 inferences [08/13/2024-21:57:02] [I] Percentiles: 90,95,99 [08/13/2024-21:57:02] [I] Dump refittable layers:Disabled [08/13/2024-21:57:02] [I] Dump output: Disabled [08/13/2024-21:57:02] [I] Profile: Disabled [08/13/2024-21:57:02] [I] Export timing to JSON file: [08/13/2024-21:57:02] [I] Export output to JSON file: [08/13/2024-21:57:02] [I] Export profile to JSON file: [08/13/2024-21:57:02] [I] [08/13/2024-21:57:02] [I] === Device Information === [08/13/2024-21:57:02] [I] Selected Device: Orin [08/13/2024-21:57:02] [I] Compute Capability: 8.7 [08/13/2024-21:57:02] [I] SMs: 8 [08/13/2024-21:57:02] [I] Device Global Memory: 7620 MiB [08/13/2024-21:57:02] [I] Shared Memory per SM: 164 KiB [08/13/2024-21:57:02] [I] Memory Bus Width: 128 bits (ECC disabled) [08/13/2024-21:57:02] [I] Application Compute Clock Rate: 0.624 GHz [08/13/2024-21:57:02] [I] Application Memory Clock Rate: 0.624 GHz [08/13/2024-21:57:02] [I] [08/13/2024-21:57:02] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at. [08/13/2024-21:57:02] [I] [08/13/2024-21:57:02] [I] TensorRT version: 8.6.2 [08/13/2024-21:57:02] [I] Loading standard plugins [08/13/2024-21:57:02] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 33, GPU 1692 (MiB) [08/13/2024-21:57:08] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1154, GPU +1108, now: CPU 1223, GPU 2838 (MiB) [08/13/2024-21:57:08] [I] Start parsing network model. [08/13/2024-21:57:08] [I] [TRT] ---------------------------------------------------------------- [08/13/2024-21:57:08] [I] [TRT] Input filename: resnet18_quant.quant.onnx [08/13/2024-21:57:08] [I] [TRT] ONNX IR version: 0.0.7 [08/13/2024-21:57:08] [I] [TRT] Opset version: 13 [08/13/2024-21:57:08] [I] [TRT] Producer name: [08/13/2024-21:57:08] [I] [TRT] Producer version: [08/13/2024-21:57:08] [I] [TRT] Domain: [08/13/2024-21:57:08] [I] [TRT] Model version: 0 [08/13/2024-21:57:08] [I] [TRT] Doc string: [08/13/2024-21:57:08] [I] [TRT] ---------------------------------------------------------------- [08/13/2024-21:57:08] [I] Finished parsing network model. Parse time: 0.0987536 [08/13/2024-21:57:08] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best [08/13/2024-21:57:08] [W] [TRT] Calibrator won't be used in explicit precision mode. Use quantization aware training to generate network with Quantize/Dequantize nodes. [08/13/2024-21:57:08] [I] [TRT] Graph optimization time: 0.0396298 seconds. [08/13/2024-21:57:08] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [08/13/2024-21:57:17] [I] [TRT] Detected 1 inputs and 1 output network tensors. [08/13/2024-21:57:17] [I] [TRT] Total Host Persistent Memory: 106752 [08/13/2024-21:57:17] [I] [TRT] Total Device Persistent Memory: 22528 [08/13/2024-21:57:17] [I] [TRT] Total Scratch Memory: 0 [08/13/2024-21:57:17] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 59 MiB, GPU 26 MiB [08/13/2024-21:57:17] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 25 steps to complete. [08/13/2024-21:57:17] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.317824ms to assign 3 blocks to 25 nodes requiring 602112 bytes. [08/13/2024-21:57:17] [I] [TRT] Total Activation Memory: 602112 [08/13/2024-21:57:17] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +11, GPU +16, now: CPU 11, GPU 16 (MiB) [08/13/2024-21:57:18] [I] Engine built in 15.8085 sec. [08/13/2024-21:57:18] [I] [TRT] Loaded engine size: 13 MiB [08/13/2024-21:57:18] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +11, now: CPU 0, GPU 11 (MiB) [08/13/2024-21:57:18] [I] Engine deserialized in 0.0242519 sec. [08/13/2024-21:57:18] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 11 (MiB) [08/13/2024-21:57:18] [I] Setting persistentCacheLimit to 0 bytes. [08/13/2024-21:57:18] [I] Using random values for input input [08/13/2024-21:57:18] [I] Input binding for input with dimensions 1x3x224x224 is created. [08/13/2024-21:57:18] [I] Output binding for output with dimensions 1x1000 is created. [08/13/2024-21:57:18] [I] Layer Information: [08/13/2024-21:57:18] [I] Layers: Name: input_QuantizeLinear, LayerType: Reformat, Inputs: [ { Name: input, Location: Device, Dimensions: [1,3,224,224], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: input_QuantizeLinear_Output, Location: Device, Dimensions: [1,3,224,224], Format/Datatype: Row major Int8 format }], ParameterType: Reformat, Origin: QDQ, TacticValue: 0x00000000000003ea, StreamId: 0, Metadata: Name: onnx::Conv_194 + onnx::Conv_194_QuantizeLinear + /backbone/conv1/Conv + /backbone/relu/Relu + /backbone/maxpool/MaxPool, LayerType: CaskConvActPool, Inputs: [ { Name: input_QuantizeLinear_Output, Location: Device, Dimensions: [1,3,224,224], Format/Datatype: Row major Int8 format }], Outputs: [ { Name: /backbone/maxpool/MaxPool_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,64,56,56], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: ConvActPool, ConvParameterType: Convolution, ConvKernel: [7,7], ConvPaddingMode: kEXPLICIT_ROUND_DOWN, ConvPrePadding: [3,3], ConvPostPadding: [3,3], ConvStride: [2,2], ConvDilation: [1,1], ConvOutMaps: 64, ConvGroups: 1, ConvWeights: {"Type": "Int8", "Count": 9408}, ConvBias: {"Type": "Float", "Count": 64}, ConvHasSparseWeights: 0, ConvHasDynamicFilter: 0, ConvHasDynamicBias: 0, ConvHasResidual: 0, ConvConvXAsActInputIdx: -1, ConvBiasAsActInputIdx: -1, ConvResAsActInputIdx: -1, ConvActivation: RELU, PoolingParameterType: Pooling, PoolingPoolingType: MAX, PoolingWindowSize: [3,3], PoolingPaddingMode: kEXPLICIT_ROUND_DOWN, PoolingPrePadding: [1,1], PoolingPostPadding: [1,1], PoolingStride: [2,2], PoolingBlendFactor: 0, PoolingAverageCountExcludesPadding: 1, TacticValue: 0xb71c75095873646c, StreamId: 0, Metadata: [ONNX Layer: /backbone/conv1/Conv][ONNX Layer: /backbone/relu/Relu][ONNX Layer: /backbone/maxpool/MaxPool] Name: onnx::Conv_197 + onnx::Conv_197_QuantizeLinear + /backbone/layer1/layer1.0/conv1/Conv + /backbone/layer1/layer1.0/relu/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/maxpool/MaxPool_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,64,56,56], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer1/layer1.0/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,64,56,56], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Int8", "Count": 36864}, Bias: {"Type": "Float", "Count": 64}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize256x64x64_stage4_warpsize4x1x1_g1_tensor16x8x32_t1r3s3, TacticValue: 0xd3d41ef6de22d9b6, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer1/layer1.0/conv1/Conv][ONNX Layer: /backbone/layer1/layer1.0/relu/Relu] Name: onnx::Conv_200 + onnx::Conv_200_QuantizeLinear + /backbone/layer1/layer1.0/conv2/Conv + /backbone/layer1/layer1.0/Add + /backbone/layer1/layer1.0/relu_1/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer1/layer1.0/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,64,56,56], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }, { Name: /backbone/maxpool/MaxPool_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,64,56,56], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer1/layer1.0/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,64,56,56], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Int8", "Count": 36864}, Bias: {"Type": "Float", "Count": 64}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize256x64x64_stage4_warpsize4x1x1_g1_tensor16x8x32_t1r3s3, TacticValue: 0xd3d41ef6de22d9b6, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer1/layer1.0/conv2/Conv][ONNX Layer: /backbone/layer1/layer1.0/Add][ONNX Layer: /backbone/layer1/layer1.0/relu_1/Relu] Name: onnx::Conv_203 + onnx::Conv_203_QuantizeLinear + /backbone/layer1/layer1.1/conv1/Conv + /backbone/layer1/layer1.1/relu/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer1/layer1.0/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,64,56,56], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer1/layer1.1/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,64,56,56], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Int8", "Count": 36864}, Bias: {"Type": "Float", "Count": 64}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize256x64x64_stage4_warpsize4x1x1_g1_tensor16x8x32_t1r3s3, TacticValue: 0xd3d41ef6de22d9b6, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer1/layer1.1/conv1/Conv][ONNX Layer: /backbone/layer1/layer1.1/relu/Relu] Name: onnx::Conv_206 + onnx::Conv_206_QuantizeLinear + /backbone/layer1/layer1.1/conv2/Conv + /backbone/layer1/layer1.1/Add + /backbone/layer1/layer1.1/relu_1/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer1/layer1.1/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,64,56,56], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }, { Name: /backbone/layer1/layer1.0/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,64,56,56], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer1/layer1.1/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,64,56,56], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Int8", "Count": 36864}, Bias: {"Type": "Float", "Count": 64}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize256x64x64_stage4_warpsize4x1x1_g1_tensor16x8x32_t1r3s3, TacticValue: 0xd3d41ef6de22d9b6, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer1/layer1.1/conv2/Conv][ONNX Layer: /backbone/layer1/layer1.1/Add][ONNX Layer: /backbone/layer1/layer1.1/relu_1/Relu] Name: onnx::Conv_209 + onnx::Conv_209_QuantizeLinear + /backbone/layer2/layer2.0/conv1/Conv + /backbone/layer2/layer2.0/relu/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer1/layer1.1/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,64,56,56], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer2/layer2.0/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,128,28,28], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Int8", "Count": 73728}, Bias: {"Type": "Float", "Count": 128}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3, TacticValue: 0xea88b51105501f96, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer2/layer2.0/conv1/Conv][ONNX Layer: /backbone/layer2/layer2.0/relu/Relu] Name: onnx::Conv_215 + onnx::Conv_215_QuantizeLinear + /backbone/layer2/layer2.0/downsample/downsample.0/Conv, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer1/layer1.1/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,64,56,56], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer2/layer2.0/downsample/downsample.0/Conv_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,128,28,28], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Int8", "Count": 8192}, Bias: {"Type": "Float", "Count": 128}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, HasBias: 1, HasReLU: 0, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r1s1, TacticValue: 0x601b41d38fc4645b, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer2/layer2.0/downsample/downsample.0/Conv] Name: onnx::Conv_212 + onnx::Conv_212_QuantizeLinear + /backbone/layer2/layer2.0/conv2/Conv + /backbone/layer2/layer2.0/Add + /backbone/layer2/layer2.0/relu_1/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer2/layer2.0/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,128,28,28], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }, { Name: /backbone/layer2/layer2.0/downsample/downsample.0/Conv_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,128,28,28], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer2/layer2.0/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,128,28,28], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Int8", "Count": 147456}, Bias: {"Type": "Float", "Count": 128}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3, TacticValue: 0xea88b51105501f96, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer2/layer2.0/conv2/Conv][ONNX Layer: /backbone/layer2/layer2.0/Add][ONNX Layer: /backbone/layer2/layer2.0/relu_1/Relu] Name: onnx::Conv_218 + onnx::Conv_218_QuantizeLinear + /backbone/layer2/layer2.1/conv1/Conv + /backbone/layer2/layer2.1/relu/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer2/layer2.0/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,128,28,28], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer2/layer2.1/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,128,28,28], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Int8", "Count": 147456}, Bias: {"Type": "Float", "Count": 128}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32, TacticValue: 0x8d50646eff0cde6d, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer2/layer2.1/conv1/Conv][ONNX Layer: /backbone/layer2/layer2.1/relu/Relu] Name: onnx::Conv_221 + onnx::Conv_221_QuantizeLinear + /backbone/layer2/layer2.1/conv2/Conv + /backbone/layer2/layer2.1/Add + /backbone/layer2/layer2.1/relu_1/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer2/layer2.1/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,128,28,28], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }, { Name: /backbone/layer2/layer2.0/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,128,28,28], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer2/layer2.1/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,128,28,28], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Int8", "Count": 147456}, Bias: {"Type": "Float", "Count": 128}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3, TacticValue: 0xea88b51105501f96, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer2/layer2.1/conv2/Conv][ONNX Layer: /backbone/layer2/layer2.1/Add][ONNX Layer: /backbone/layer2/layer2.1/relu_1/Relu] Name: onnx::Conv_224 + onnx::Conv_224_QuantizeLinear + /backbone/layer3/layer3.0/conv1/Conv + /backbone/layer3/layer3.0/relu/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer2/layer2.1/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,128,28,28], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer3/layer3.0/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,256,14,14], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Int8", "Count": 294912}, Bias: {"Type": "Float", "Count": 256}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x128x64_stage6_warpsize2x2x1_g1_tensor16x8x32_t1r3s3, TacticValue: 0xd30e9f770878c3fd, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer3/layer3.0/conv1/Conv][ONNX Layer: /backbone/layer3/layer3.0/relu/Relu] Name: onnx::Conv_230 + onnx::Conv_230_QuantizeLinear + /backbone/layer3/layer3.0/downsample/downsample.0/Conv, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer2/layer2.1/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,128,28,28], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer3/layer3.0/downsample/downsample.0/Conv_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,256,14,14], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Int8", "Count": 32768}, Bias: {"Type": "Float", "Count": 256}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, HasBias: 1, HasReLU: 0, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x128x64_stage6_warpsize2x2x1_g1_tensor16x8x32_t1r1s1, TacticValue: 0x599d6bb582ecb830, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer3/layer3.0/downsample/downsample.0/Conv] Name: onnx::Conv_227 + onnx::Conv_227_QuantizeLinear + /backbone/layer3/layer3.0/conv2/Conv + /backbone/layer3/layer3.0/Add + /backbone/layer3/layer3.0/relu_1/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer3/layer3.0/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,256,14,14], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }, { Name: /backbone/layer3/layer3.0/downsample/downsample.0/Conv_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,256,14,14], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer3/layer3.0/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,256,14,14], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Int8", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x128x64_stage6_warpsize2x2x1_g1_tensor16x8x32, TacticValue: 0x3818ca0093333b50, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer3/layer3.0/conv2/Conv][ONNX Layer: /backbone/layer3/layer3.0/Add][ONNX Layer: /backbone/layer3/layer3.0/relu_1/Relu] Name: onnx::Conv_233 + onnx::Conv_233_QuantizeLinear + /backbone/layer3/layer3.1/conv1/Conv + /backbone/layer3/layer3.1/relu/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer3/layer3.0/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,256,14,14], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer3/layer3.1/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,256,14,14], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Int8", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x128x64_stage6_warpsize2x2x1_g1_tensor16x8x32, TacticValue: 0x3818ca0093333b50, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer3/layer3.1/conv1/Conv][ONNX Layer: /backbone/layer3/layer3.1/relu/Relu] Name: onnx::Conv_236 + onnx::Conv_236_QuantizeLinear + /backbone/layer3/layer3.1/conv2/Conv + /backbone/layer3/layer3.1/Add + /backbone/layer3/layer3.1/relu_1/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer3/layer3.1/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,256,14,14], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }, { Name: /backbone/layer3/layer3.0/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,256,14,14], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer3/layer3.1/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,256,14,14], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Int8", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x128x64_stage6_warpsize2x2x1_g1_tensor16x8x32, TacticValue: 0x3818ca0093333b50, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer3/layer3.1/conv2/Conv][ONNX Layer: /backbone/layer3/layer3.1/Add][ONNX Layer: /backbone/layer3/layer3.1/relu_1/Relu] Name: onnx::Conv_239 + onnx::Conv_239_QuantizeLinear + /backbone/layer4/layer4.0/conv1/Conv + /backbone/layer4/layer4.0/relu/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer3/layer3.1/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,256,14,14], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer4/layer4.0/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,512,7,7], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Int8", "Count": 1179648}, Bias: {"Type": "Float", "Count": 512}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x64x64_stage6_warpsize2x2x1_g1_tensor16x8x32_t1r3s3, TacticValue: 0xbb88763c3b0e94d4, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer4/layer4.0/conv1/Conv][ONNX Layer: /backbone/layer4/layer4.0/relu/Relu] Name: onnx::Conv_245 + onnx::Conv_245_QuantizeLinear + /backbone/layer4/layer4.0/downsample/downsample.0/Conv, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer3/layer3.1/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,256,14,14], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer4/layer4.0/downsample/downsample.0/Conv_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,512,7,7], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Int8", "Count": 131072}, Bias: {"Type": "Float", "Count": 512}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, HasBias: 1, HasReLU: 0, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize96x64x64_stage3_warpsize2x2x1_g1_tensor16x8x32_t1r1s1, TacticValue: 0xfa5f2e15625aa266, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer4/layer4.0/downsample/downsample.0/Conv] Name: onnx::Conv_242 + onnx::Conv_242_QuantizeLinear + /backbone/layer4/layer4.0/conv2/Conv + /backbone/layer4/layer4.0/Add + /backbone/layer4/layer4.0/relu_1/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer4/layer4.0/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,512,7,7], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }, { Name: /backbone/layer4/layer4.0/downsample/downsample.0/Conv_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,512,7,7], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer4/layer4.0/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,512,7,7], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Int8", "Count": 2359296}, Bias: {"Type": "Float", "Count": 512}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x64x64_stage6_warpsize2x2x1_g1_tensor16x8x32, TacticValue: 0xf56c0ac895d82363, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer4/layer4.0/conv2/Conv][ONNX Layer: /backbone/layer4/layer4.0/Add][ONNX Layer: /backbone/layer4/layer4.0/relu_1/Relu] Name: onnx::Conv_248 + onnx::Conv_248_QuantizeLinear + /backbone/layer4/layer4.1/conv1/Conv + /backbone/layer4/layer4.1/relu/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer4/layer4.0/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,512,7,7], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer4/layer4.1/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,512,7,7], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Int8", "Count": 2359296}, Bias: {"Type": "Float", "Count": 512}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x32x64_stage6_warpsize2x1x1_g1_tensor16x8x32_t1r3s3, TacticValue: 0x2f34f689bfca5071, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer4/layer4.1/conv1/Conv][ONNX Layer: /backbone/layer4/layer4.1/relu/Relu] Name: onnx::Conv_251 + onnx::Conv_251_QuantizeLinear + /backbone/layer4/layer4.1/conv2/Conv + /backbone/layer4/layer4.1/Add + /backbone/layer4/layer4.1/relu_1/Relu, LayerType: CaskConvolution, Inputs: [ { Name: /backbone/layer4/layer4.1/relu/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,512,7,7], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }, { Name: /backbone/layer4/layer4.0/relu_1/Relu_output_0_QuantizeLinear_Output_1, Location: Device, Dimensions: [1,512,7,7], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /backbone/layer4/layer4.1/relu_1/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,512,7,7], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Int8", "Count": 2359296}, Bias: {"Type": "Float", "Count": 512}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, HasBias: 1, HasReLU: 1, TacticName: sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x64x64_stage6_warpsize2x2x1_g1_tensor16x8x32, TacticValue: 0xf56c0ac895d82363, StreamId: 0, Metadata: [ONNX Layer: /backbone/layer4/layer4.1/conv2/Conv][ONNX Layer: /backbone/layer4/layer4.1/Add][ONNX Layer: /backbone/layer4/layer4.1/relu_1/Relu] Name: /neck/gap/GlobalAveragePool, LayerType: CaskPooling, Inputs: [ { Name: /backbone/layer4/layer4.1/relu_1/Relu_output_0_QuantizeLinear_Output, Location: Device, Dimensions: [1,512,7,7], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: /neck/Flatten_output_0, Location: Device, Dimensions: [1,512,1,1], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], ParameterType: Pooling, PoolingType: AVERAGE, WindowSize: [7,7], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], BlendFactor: 0, AverageCountExcludesPadding: 1, TacticName: sm72_xmma_pooling_IMMA_NCxHW32_gap, TacticValue: 0xa3a1a62d21de759d, StreamId: 0, Metadata: [ONNX Layer: /neck/gap/GlobalAveragePool] Name: Reformatting CopyNode for Input Tensor 0 to head.fc.weight + head.fc.weight_QuantizeLinear + transpose_before_/head/fc/Gemm + /head/fc/Gemm + head.fc.bias + (Unnamed Layer* 263) [Shuffle] + unsqueeze_node_after_head.fc.bias + (Unnamed Layer* 263) [Shuffle]_(Unnamed Layer* 263) [Shuffle]_output + (Unnamed Layer* 264) [ElementWise], LayerType: NoOp, Inputs: [ { Name: /neck/Flatten_output_0, Location: Device, Dimensions: [1,512,1,1], Format/Datatype: Thirty-two wide channel vectorized row major Int8 format }], Outputs: [ { Name: Reformatted Input Tensor 0 to head.fc.weight + head.fc.weight_QuantizeLinear + transpose_before_/head/fc/Gemm + /head/fc/Gemm + head.fc.bias + (Unnamed Layer* 263) [Shuffle] + unsqueeze_node_after_head.fc.bias + (Unnamed Layer* 263) [Shuffle]_(Unnamed Layer* 263) [Shuffle]_output + (Unnamed Layer* 264) [ElementWise], Location: Device, Dimensions: [1,512,1,1], Format/Datatype: Four wide channel vectorized row major Int8 format }], TacticValue: 0x0000000000000000, StreamId: 0, Metadata: Name: head.fc.weight + head.fc.weight_QuantizeLinear + transpose_before_/head/fc/Gemm + /head/fc/Gemm + head.fc.bias + (Unnamed Layer* 263) [Shuffle] + unsqueeze_node_after_head.fc.bias + (Unnamed Layer* 263) [Shuffle]_(Unnamed Layer* 263) [Shuffle]_output + (Unnamed Layer* 264) [ElementWise], LayerType: CaskConvolution, Inputs: [ { Name: Reformatted Input Tensor 0 to head.fc.weight + head.fc.weight_QuantizeLinear + transpose_before_/head/fc/Gemm + /head/fc/Gemm + head.fc.bias + (Unnamed Layer* 263) [Shuffle] + unsqueeze_node_after_head.fc.bias + (Unnamed Layer* 263) [Shuffle]_(Unnamed Layer* 263) [Shuffle]_output + (Unnamed Layer* 264) [ElementWise], Location: Device, Dimensions: [1,512,1,1], Format/Datatype: Four wide channel vectorized row major Int8 format }], Outputs: [ { Name: (Unnamed Layer* 264) [ElementWise]_out_tensor, Location: Device, Dimensions: [1,1000,1,1], Format/Datatype: Row major linear FP32 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1000, Groups: 1, Weights: {"Type": "Int8", "Count": 512000}, Bias: {"Type": "Float", "Count": 1000}, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, HasBias: 1, HasReLU: 0, TacticName: sm70_xmma_fprop_conv1x1_i8f32_f32_f32_nchw_vect_c_4kcrs_vect_c_4_nchw_simt_small_batch_bias_relu, TacticValue: 0xc073b0053ce90eac, StreamId: 0, Metadata: [ONNX Layer: /head/fc/Gemm][ONNX Layer: /head/fc/Gemm] Name: copied_squeeze_after_(Unnamed Layer* 264) [ElementWise], LayerType: NoOp, Inputs: [ { Name: (Unnamed Layer* 264) [ElementWise]_out_tensor, Location: Device, Dimensions: [1,1000,1,1], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: /head/fc/Gemm_output_0, Location: Device, Dimensions: [1,1000], Format/Datatype: Row major linear FP32 }], TacticValue: 0x0000000000000000, StreamId: 0, Metadata: Name: /Softmax, LayerType: CaskSoftMaxV2, Inputs: [ { Name: /head/fc/Gemm_output_0, Location: Device, Dimensions: [1,1000], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: output, Location: Device, Dimensions: [1,1000], Format/Datatype: Row major linear FP32 }], ParameterType: SoftMax, Axes: 2, HasLog: 0, TacticValue: 0x6d55c70c4c781969, StreamId: 0, Metadata: [ONNX Layer: /Softmax] Bindings: input output [08/13/2024-21:57:18] [I] Starting inference [08/13/2024-21:57:21] [I] Warmup completed 261 queries over 200 ms [08/13/2024-21:57:21] [I] Timing trace has 4506 queries over 3.00213 s [08/13/2024-21:57:21] [I] [08/13/2024-21:57:21] [I] === Trace details === [08/13/2024-21:57:21] [I] Trace averages of 10 runs: [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664587 ms - Host latency: 0.717325 ms (enqueue 0.004245 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664194 ms - Host latency: 0.715701 ms (enqueue 0.00448303 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664444 ms - Host latency: 0.717938 ms (enqueue 0.00466461 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664552 ms - Host latency: 0.716426 ms (enqueue 0.00436249 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664894 ms - Host latency: 0.715602 ms (enqueue 0.00428619 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66449 ms - Host latency: 0.716339 ms (enqueue 0.00437775 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665065 ms - Host latency: 0.716844 ms (enqueue 0.00466309 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664328 ms - Host latency: 0.715485 ms (enqueue 0.00427704 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664351 ms - Host latency: 0.717271 ms (enqueue 0.00448303 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664523 ms - Host latency: 0.716425 ms (enqueue 0.0041687 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664804 ms - Host latency: 0.714319 ms (enqueue 0.00480347 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663916 ms - Host latency: 0.715015 ms (enqueue 0.00448914 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664487 ms - Host latency: 0.715207 ms (enqueue 0.00426331 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664374 ms - Host latency: 0.717825 ms (enqueue 0.0042572 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66424 ms - Host latency: 0.715649 ms (enqueue 0.004599 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664713 ms - Host latency: 0.719965 ms (enqueue 0.00415649 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664648 ms - Host latency: 0.716223 ms (enqueue 0.00426941 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664548 ms - Host latency: 0.714993 ms (enqueue 0.00414124 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664319 ms - Host latency: 0.714633 ms (enqueue 0.00446472 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664487 ms - Host latency: 0.716208 ms (enqueue 0.00427856 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664319 ms - Host latency: 0.71561 ms (enqueue 0.00441284 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664807 ms - Host latency: 0.714871 ms (enqueue 0.00427551 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664575 ms - Host latency: 0.716406 ms (enqueue 0.00466614 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664853 ms - Host latency: 0.716959 ms (enqueue 0.00445557 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664462 ms - Host latency: 0.716269 ms (enqueue 0.00423279 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664874 ms - Host latency: 0.716684 ms (enqueue 0.0046936 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66485 ms - Host latency: 0.716949 ms (enqueue 0.00473328 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664151 ms - Host latency: 0.716437 ms (enqueue 0.00447083 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664954 ms - Host latency: 0.715588 ms (enqueue 0.00446167 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664597 ms - Host latency: 0.716 ms (enqueue 0.0043396 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664517 ms - Host latency: 0.716003 ms (enqueue 0.00452576 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664261 ms - Host latency: 0.717184 ms (enqueue 0.00430908 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664877 ms - Host latency: 0.715842 ms (enqueue 0.00426941 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664801 ms - Host latency: 0.716052 ms (enqueue 0.0041687 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665012 ms - Host latency: 0.715353 ms (enqueue 0.00447693 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664731 ms - Host latency: 0.714597 ms (enqueue 0.00429382 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664615 ms - Host latency: 0.714838 ms (enqueue 0.00413818 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664487 ms - Host latency: 0.715289 ms (enqueue 0.00422668 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665421 ms - Host latency: 0.715271 ms (enqueue 0.00435486 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664447 ms - Host latency: 0.715955 ms (enqueue 0.00420532 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664749 ms - Host latency: 0.714584 ms (enqueue 0.00425415 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664606 ms - Host latency: 0.718002 ms (enqueue 0.00419922 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664313 ms - Host latency: 0.716699 ms (enqueue 0.00472412 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664578 ms - Host latency: 0.716522 ms (enqueue 0.00422668 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66424 ms - Host latency: 0.716785 ms (enqueue 0.00438232 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664597 ms - Host latency: 0.715549 ms (enqueue 0.00418701 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664636 ms - Host latency: 0.716159 ms (enqueue 0.00436401 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664276 ms - Host latency: 0.715619 ms (enqueue 0.00426636 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664008 ms - Host latency: 0.717047 ms (enqueue 0.00420532 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664606 ms - Host latency: 0.716272 ms (enqueue 0.00418701 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664838 ms - Host latency: 0.715881 ms (enqueue 0.00440063 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664606 ms - Host latency: 0.715997 ms (enqueue 0.00423584 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66449 ms - Host latency: 0.716089 ms (enqueue 0.00442505 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664838 ms - Host latency: 0.715912 ms (enqueue 0.00427246 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664416 ms - Host latency: 0.716034 ms (enqueue 0.00458374 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664496 ms - Host latency: 0.715656 ms (enqueue 0.00427246 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664465 ms - Host latency: 0.716547 ms (enqueue 0.00435181 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664288 ms - Host latency: 0.716339 ms (enqueue 0.00421753 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664337 ms - Host latency: 0.716577 ms (enqueue 0.0045166 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664246 ms - Host latency: 0.715448 ms (enqueue 0.00457153 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664465 ms - Host latency: 0.715833 ms (enqueue 0.00422363 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664722 ms - Host latency: 0.717053 ms (enqueue 0.00467529 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664594 ms - Host latency: 0.715253 ms (enqueue 0.00471802 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664478 ms - Host latency: 0.717535 ms (enqueue 0.00438843 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664825 ms - Host latency: 0.71676 ms (enqueue 0.00444336 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665033 ms - Host latency: 0.716577 ms (enqueue 0.0043457 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664325 ms - Host latency: 0.717169 ms (enqueue 0.00460815 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664496 ms - Host latency: 0.716882 ms (enqueue 0.0043335 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664087 ms - Host latency: 0.717676 ms (enqueue 0.00438232 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664349 ms - Host latency: 0.716882 ms (enqueue 0.00432129 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664307 ms - Host latency: 0.717157 ms (enqueue 0.00472412 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664594 ms - Host latency: 0.714642 ms (enqueue 0.00432129 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664117 ms - Host latency: 0.71394 ms (enqueue 0.00421753 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665222 ms - Host latency: 0.716309 ms (enqueue 0.00444946 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664362 ms - Host latency: 0.717242 ms (enqueue 0.00457764 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66424 ms - Host latency: 0.715814 ms (enqueue 0.00436401 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664178 ms - Host latency: 0.715564 ms (enqueue 0.0046875 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664886 ms - Host latency: 0.716638 ms (enqueue 0.00446167 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664526 ms - Host latency: 0.718701 ms (enqueue 0.00456543 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664185 ms - Host latency: 0.718103 ms (enqueue 0.00446167 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66438 ms - Host latency: 0.71637 ms (enqueue 0.00436401 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664764 ms - Host latency: 0.716229 ms (enqueue 0.0041626 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664581 ms - Host latency: 0.714856 ms (enqueue 0.00442505 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664673 ms - Host latency: 0.716217 ms (enqueue 0.00418701 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664575 ms - Host latency: 0.716144 ms (enqueue 0.00406494 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664819 ms - Host latency: 0.717957 ms (enqueue 0.00432739 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664606 ms - Host latency: 0.718182 ms (enqueue 0.0046814 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66449 ms - Host latency: 0.715814 ms (enqueue 0.00420532 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66416 ms - Host latency: 0.718738 ms (enqueue 0.00426025 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664697 ms - Host latency: 0.715692 ms (enqueue 0.00429077 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664709 ms - Host latency: 0.71778 ms (enqueue 0.0043457 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664929 ms - Host latency: 0.71535 ms (enqueue 0.00418701 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664612 ms - Host latency: 0.717578 ms (enqueue 0.00435181 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664246 ms - Host latency: 0.717303 ms (enqueue 0.00442505 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664807 ms - Host latency: 0.71582 ms (enqueue 0.00480347 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665112 ms - Host latency: 0.718237 ms (enqueue 0.00420532 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66488 ms - Host latency: 0.717426 ms (enqueue 0.00421753 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664453 ms - Host latency: 0.715265 ms (enqueue 0.0041687 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66521 ms - Host latency: 0.71535 ms (enqueue 0.00446777 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664307 ms - Host latency: 0.715668 ms (enqueue 0.00441284 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664233 ms - Host latency: 0.71524 ms (enqueue 0.00438843 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664124 ms - Host latency: 0.716797 ms (enqueue 0.00430298 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664441 ms - Host latency: 0.717975 ms (enqueue 0.00505981 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665076 ms - Host latency: 0.715942 ms (enqueue 0.00662231 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664807 ms - Host latency: 0.717212 ms (enqueue 0.00470581 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664996 ms - Host latency: 0.714868 ms (enqueue 0.00435791 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66416 ms - Host latency: 0.714569 ms (enqueue 0.00466919 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664508 ms - Host latency: 0.717908 ms (enqueue 0.00419312 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664563 ms - Host latency: 0.716608 ms (enqueue 0.00419922 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.6646 ms - Host latency: 0.717743 ms (enqueue 0.00444946 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664258 ms - Host latency: 0.715344 ms (enqueue 0.00465088 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664526 ms - Host latency: 0.716644 ms (enqueue 0.00455933 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663995 ms - Host latency: 0.717456 ms (enqueue 0.00454712 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664935 ms - Host latency: 0.71629 ms (enqueue 0.00430908 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664496 ms - Host latency: 0.717145 ms (enqueue 0.00474854 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664398 ms - Host latency: 0.718146 ms (enqueue 0.00435791 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664337 ms - Host latency: 0.716968 ms (enqueue 0.0043396 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664453 ms - Host latency: 0.718207 ms (enqueue 0.00435181 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664105 ms - Host latency: 0.719025 ms (enqueue 0.00490112 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66441 ms - Host latency: 0.716266 ms (enqueue 0.00414429 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664594 ms - Host latency: 0.715338 ms (enqueue 0.00445557 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664453 ms - Host latency: 0.716083 ms (enqueue 0.00438232 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664874 ms - Host latency: 0.716663 ms (enqueue 0.00441895 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664838 ms - Host latency: 0.716833 ms (enqueue 0.00430908 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.6651 ms - Host latency: 0.716992 ms (enqueue 0.00435791 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66394 ms - Host latency: 0.716736 ms (enqueue 0.00422363 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664453 ms - Host latency: 0.715723 ms (enqueue 0.00452881 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664636 ms - Host latency: 0.716858 ms (enqueue 0.00478516 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664233 ms - Host latency: 0.714551 ms (enqueue 0.00443115 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664746 ms - Host latency: 0.71571 ms (enqueue 0.00441895 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664258 ms - Host latency: 0.713806 ms (enqueue 0.00458984 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664807 ms - Host latency: 0.71593 ms (enqueue 0.00418701 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664197 ms - Host latency: 0.716199 ms (enqueue 0.00428467 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664209 ms - Host latency: 0.715198 ms (enqueue 0.00426025 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664783 ms - Host latency: 0.717712 ms (enqueue 0.00447998 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664539 ms - Host latency: 0.715295 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664587 ms - Host latency: 0.716711 ms (enqueue 0.00438232 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663892 ms - Host latency: 0.716675 ms (enqueue 0.00426025 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664526 ms - Host latency: 0.717261 ms (enqueue 0.00454102 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664551 ms - Host latency: 0.717114 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664539 ms - Host latency: 0.717212 ms (enqueue 0.00384521 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664453 ms - Host latency: 0.716455 ms (enqueue 0.00380859 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664319 ms - Host latency: 0.716589 ms (enqueue 0.00396729 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.6646 ms - Host latency: 0.718091 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664319 ms - Host latency: 0.718433 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664478 ms - Host latency: 0.71615 ms (enqueue 0.0041626 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664709 ms - Host latency: 0.716284 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66449 ms - Host latency: 0.71593 ms (enqueue 0.00401611 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663989 ms - Host latency: 0.713928 ms (enqueue 0.00408936 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664612 ms - Host latency: 0.716919 ms (enqueue 0.00394287 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664551 ms - Host latency: 0.717236 ms (enqueue 0.00385742 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664416 ms - Host latency: 0.715881 ms (enqueue 0.00411377 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664404 ms - Host latency: 0.715601 ms (enqueue 0.00389404 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664392 ms - Host latency: 0.717834 ms (enqueue 0.00396729 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664429 ms - Host latency: 0.716748 ms (enqueue 0.00384521 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664038 ms - Host latency: 0.716284 ms (enqueue 0.0038208 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664514 ms - Host latency: 0.718445 ms (enqueue 0.00401611 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664294 ms - Host latency: 0.715918 ms (enqueue 0.0039917 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664111 ms - Host latency: 0.718054 ms (enqueue 0.00396729 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664709 ms - Host latency: 0.715405 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664136 ms - Host latency: 0.71637 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664209 ms - Host latency: 0.71604 ms (enqueue 0.00401611 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664294 ms - Host latency: 0.718176 ms (enqueue 0.00421143 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664258 ms - Host latency: 0.716089 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66477 ms - Host latency: 0.716101 ms (enqueue 0.00386963 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663635 ms - Host latency: 0.71604 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664197 ms - Host latency: 0.717065 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664282 ms - Host latency: 0.717834 ms (enqueue 0.00385742 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664416 ms - Host latency: 0.716418 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66416 ms - Host latency: 0.714893 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664368 ms - Host latency: 0.71731 ms (enqueue 0.00393066 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664856 ms - Host latency: 0.716736 ms (enqueue 0.00404053 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664465 ms - Host latency: 0.71665 ms (enqueue 0.00391846 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664368 ms - Host latency: 0.716003 ms (enqueue 0.00383301 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664197 ms - Host latency: 0.716113 ms (enqueue 0.00393066 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664734 ms - Host latency: 0.715894 ms (enqueue 0.00384521 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664331 ms - Host latency: 0.717505 ms (enqueue 0.00406494 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664246 ms - Host latency: 0.719019 ms (enqueue 0.00411377 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664832 ms - Host latency: 0.717358 ms (enqueue 0.00428467 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664612 ms - Host latency: 0.71499 ms (enqueue 0.00406494 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665271 ms - Host latency: 0.717957 ms (enqueue 0.00401611 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665088 ms - Host latency: 0.714355 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664539 ms - Host latency: 0.718274 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664355 ms - Host latency: 0.716711 ms (enqueue 0.00408936 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66405 ms - Host latency: 0.715613 ms (enqueue 0.00404053 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664197 ms - Host latency: 0.715479 ms (enqueue 0.00394287 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664404 ms - Host latency: 0.713647 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664294 ms - Host latency: 0.716455 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664026 ms - Host latency: 0.716113 ms (enqueue 0.00386963 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.6646 ms - Host latency: 0.716199 ms (enqueue 0.00394287 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663989 ms - Host latency: 0.716785 ms (enqueue 0.0039917 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664624 ms - Host latency: 0.715271 ms (enqueue 0.00386963 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664587 ms - Host latency: 0.717273 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664429 ms - Host latency: 0.716589 ms (enqueue 0.00394287 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664807 ms - Host latency: 0.716931 ms (enqueue 0.00404053 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664832 ms - Host latency: 0.71687 ms (enqueue 0.0041626 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664294 ms - Host latency: 0.714087 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665015 ms - Host latency: 0.717578 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664648 ms - Host latency: 0.715613 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66477 ms - Host latency: 0.7151 ms (enqueue 0.00406494 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664612 ms - Host latency: 0.71477 ms (enqueue 0.0038208 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664868 ms - Host latency: 0.716028 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664587 ms - Host latency: 0.72002 ms (enqueue 0.0041748 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664246 ms - Host latency: 0.717371 ms (enqueue 0.00394287 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665222 ms - Host latency: 0.716724 ms (enqueue 0.0041626 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664282 ms - Host latency: 0.715552 ms (enqueue 0.00404053 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664514 ms - Host latency: 0.715027 ms (enqueue 0.00389404 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664062 ms - Host latency: 0.715906 ms (enqueue 0.0041748 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664099 ms - Host latency: 0.716565 ms (enqueue 0.00413818 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664746 ms - Host latency: 0.715442 ms (enqueue 0.00391846 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664795 ms - Host latency: 0.716357 ms (enqueue 0.00401611 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664185 ms - Host latency: 0.71709 ms (enqueue 0.0041626 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66488 ms - Host latency: 0.718811 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664941 ms - Host latency: 0.718066 ms (enqueue 0.00406494 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665308 ms - Host latency: 0.717017 ms (enqueue 0.00446777 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665259 ms - Host latency: 0.716125 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664563 ms - Host latency: 0.717004 ms (enqueue 0.00443115 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664941 ms - Host latency: 0.717334 ms (enqueue 0.00393066 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664734 ms - Host latency: 0.716895 ms (enqueue 0.00820312 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66449 ms - Host latency: 0.716406 ms (enqueue 0.0039917 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664612 ms - Host latency: 0.715308 ms (enqueue 0.0039917 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664343 ms - Host latency: 0.71676 ms (enqueue 0.00379639 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664502 ms - Host latency: 0.718127 ms (enqueue 0.00391846 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664795 ms - Host latency: 0.717944 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664893 ms - Host latency: 0.718274 ms (enqueue 0.00391846 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664209 ms - Host latency: 0.714563 ms (enqueue 0.00406494 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664392 ms - Host latency: 0.716785 ms (enqueue 0.00385742 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664331 ms - Host latency: 0.715955 ms (enqueue 0.00385742 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664087 ms - Host latency: 0.715967 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664172 ms - Host latency: 0.715808 ms (enqueue 0.00426025 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665198 ms - Host latency: 0.716907 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664392 ms - Host latency: 0.716907 ms (enqueue 0.0041748 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664709 ms - Host latency: 0.71687 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664648 ms - Host latency: 0.717993 ms (enqueue 0.00396729 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664441 ms - Host latency: 0.717517 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664514 ms - Host latency: 0.715393 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664819 ms - Host latency: 0.716211 ms (enqueue 0.00396729 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664612 ms - Host latency: 0.715381 ms (enqueue 0.00428467 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664258 ms - Host latency: 0.717358 ms (enqueue 0.00394287 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664758 ms - Host latency: 0.717029 ms (enqueue 0.00389404 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664624 ms - Host latency: 0.715747 ms (enqueue 0.00407715 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664648 ms - Host latency: 0.716895 ms (enqueue 0.00407715 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664392 ms - Host latency: 0.716553 ms (enqueue 0.00388184 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664685 ms - Host latency: 0.715295 ms (enqueue 0.0038208 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664319 ms - Host latency: 0.716223 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66477 ms - Host latency: 0.715332 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664661 ms - Host latency: 0.715124 ms (enqueue 0.00427246 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664856 ms - Host latency: 0.717065 ms (enqueue 0.00406494 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664722 ms - Host latency: 0.715686 ms (enqueue 0.00391846 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663965 ms - Host latency: 0.715076 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664575 ms - Host latency: 0.715833 ms (enqueue 0.00394287 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664587 ms - Host latency: 0.715112 ms (enqueue 0.00380859 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664612 ms - Host latency: 0.714294 ms (enqueue 0.00396729 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664185 ms - Host latency: 0.71532 ms (enqueue 0.00393066 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664844 ms - Host latency: 0.716504 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664197 ms - Host latency: 0.716541 ms (enqueue 0.00394287 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664087 ms - Host latency: 0.715576 ms (enqueue 0.00401611 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664868 ms - Host latency: 0.717383 ms (enqueue 0.00391846 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664233 ms - Host latency: 0.718164 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664648 ms - Host latency: 0.716296 ms (enqueue 0.00411377 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664343 ms - Host latency: 0.717542 ms (enqueue 0.00406494 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664783 ms - Host latency: 0.715918 ms (enqueue 0.00385742 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664526 ms - Host latency: 0.717529 ms (enqueue 0.00413818 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663989 ms - Host latency: 0.714978 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664514 ms - Host latency: 0.71665 ms (enqueue 0.00408936 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664587 ms - Host latency: 0.716577 ms (enqueue 0.00411377 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664172 ms - Host latency: 0.716382 ms (enqueue 0.00389404 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664526 ms - Host latency: 0.71731 ms (enqueue 0.0039917 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664319 ms - Host latency: 0.719067 ms (enqueue 0.00385742 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664709 ms - Host latency: 0.718542 ms (enqueue 0.00406494 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664929 ms - Host latency: 0.716431 ms (enqueue 0.00394287 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664587 ms - Host latency: 0.717261 ms (enqueue 0.00408936 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664465 ms - Host latency: 0.716724 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665039 ms - Host latency: 0.71615 ms (enqueue 0.00404053 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664954 ms - Host latency: 0.715552 ms (enqueue 0.00388184 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664392 ms - Host latency: 0.715735 ms (enqueue 0.00391846 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665125 ms - Host latency: 0.717993 ms (enqueue 0.00401611 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664685 ms - Host latency: 0.717639 ms (enqueue 0.00389404 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664062 ms - Host latency: 0.715747 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664404 ms - Host latency: 0.71748 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664453 ms - Host latency: 0.714478 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663843 ms - Host latency: 0.717114 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664795 ms - Host latency: 0.719727 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664526 ms - Host latency: 0.717383 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664502 ms - Host latency: 0.71521 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664355 ms - Host latency: 0.714209 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663574 ms - Host latency: 0.715601 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664429 ms - Host latency: 0.717261 ms (enqueue 0.00380859 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.6646 ms - Host latency: 0.717188 ms (enqueue 0.0041748 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664502 ms - Host latency: 0.71477 ms (enqueue 0.00393066 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66477 ms - Host latency: 0.719336 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664453 ms - Host latency: 0.716821 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664478 ms - Host latency: 0.715771 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663965 ms - Host latency: 0.716333 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664355 ms - Host latency: 0.716162 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66416 ms - Host latency: 0.716309 ms (enqueue 0.00378418 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664014 ms - Host latency: 0.716602 ms (enqueue 0.00388184 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664502 ms - Host latency: 0.717529 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664136 ms - Host latency: 0.718066 ms (enqueue 0.00393066 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.6646 ms - Host latency: 0.717212 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664868 ms - Host latency: 0.71792 ms (enqueue 0.00407715 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664478 ms - Host latency: 0.719458 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66438 ms - Host latency: 0.714502 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66438 ms - Host latency: 0.714087 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664233 ms - Host latency: 0.716797 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664648 ms - Host latency: 0.718066 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664185 ms - Host latency: 0.71748 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664526 ms - Host latency: 0.718481 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664429 ms - Host latency: 0.717139 ms (enqueue 0.00415039 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664746 ms - Host latency: 0.716211 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.6646 ms - Host latency: 0.717017 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664429 ms - Host latency: 0.715381 ms (enqueue 0.00383301 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664453 ms - Host latency: 0.715894 ms (enqueue 0.00393066 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664746 ms - Host latency: 0.716626 ms (enqueue 0.00419922 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664722 ms - Host latency: 0.717407 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664746 ms - Host latency: 0.71748 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664795 ms - Host latency: 0.716675 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664258 ms - Host latency: 0.714575 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664331 ms - Host latency: 0.717578 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664648 ms - Host latency: 0.714429 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664233 ms - Host latency: 0.716235 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664648 ms - Host latency: 0.716089 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664404 ms - Host latency: 0.716406 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664209 ms - Host latency: 0.714526 ms (enqueue 0.00410156 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664307 ms - Host latency: 0.716113 ms (enqueue 0.00385742 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66438 ms - Host latency: 0.71665 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664551 ms - Host latency: 0.715356 ms (enqueue 0.00380859 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664551 ms - Host latency: 0.714966 ms (enqueue 0.00385742 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664502 ms - Host latency: 0.717285 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66416 ms - Host latency: 0.715723 ms (enqueue 0.00388184 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664209 ms - Host latency: 0.714673 ms (enqueue 0.0041748 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66438 ms - Host latency: 0.718579 ms (enqueue 0.00410156 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664551 ms - Host latency: 0.717139 ms (enqueue 0.00437012 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664185 ms - Host latency: 0.716187 ms (enqueue 0.00385742 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664209 ms - Host latency: 0.71377 ms (enqueue 0.00385742 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664673 ms - Host latency: 0.717139 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663989 ms - Host latency: 0.714624 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664014 ms - Host latency: 0.71394 ms (enqueue 0.00385742 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664502 ms - Host latency: 0.716699 ms (enqueue 0.00419922 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664722 ms - Host latency: 0.717212 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664893 ms - Host latency: 0.71626 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664844 ms - Host latency: 0.716235 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665064 ms - Host latency: 0.716089 ms (enqueue 0.00419922 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664819 ms - Host latency: 0.716675 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664795 ms - Host latency: 0.716138 ms (enqueue 0.00410156 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664282 ms - Host latency: 0.715137 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664795 ms - Host latency: 0.717212 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664722 ms - Host latency: 0.718457 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664941 ms - Host latency: 0.716479 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664478 ms - Host latency: 0.716113 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664551 ms - Host latency: 0.71438 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664551 ms - Host latency: 0.717114 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664331 ms - Host latency: 0.716406 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66416 ms - Host latency: 0.716284 ms (enqueue 0.00393066 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66416 ms - Host latency: 0.717041 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664844 ms - Host latency: 0.716309 ms (enqueue 0.00383301 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665064 ms - Host latency: 0.717578 ms (enqueue 0.00393066 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664844 ms - Host latency: 0.719019 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664648 ms - Host latency: 0.715625 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664453 ms - Host latency: 0.71499 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664209 ms - Host latency: 0.71543 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66416 ms - Host latency: 0.714429 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66394 ms - Host latency: 0.716431 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66438 ms - Host latency: 0.716089 ms (enqueue 0.00415039 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664453 ms - Host latency: 0.715381 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664795 ms - Host latency: 0.71709 ms (enqueue 0.00410156 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664038 ms - Host latency: 0.716235 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664575 ms - Host latency: 0.715063 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664429 ms - Host latency: 0.717798 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66438 ms - Host latency: 0.714648 ms (enqueue 0.00415039 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664233 ms - Host latency: 0.716357 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664429 ms - Host latency: 0.716943 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664648 ms - Host latency: 0.717358 ms (enqueue 0.00410156 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664941 ms - Host latency: 0.717944 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664478 ms - Host latency: 0.715479 ms (enqueue 0.00415039 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664478 ms - Host latency: 0.716919 ms (enqueue 0.00415039 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664648 ms - Host latency: 0.716431 ms (enqueue 0.00385742 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664551 ms - Host latency: 0.71665 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664551 ms - Host latency: 0.718457 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664722 ms - Host latency: 0.715723 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664893 ms - Host latency: 0.716431 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664746 ms - Host latency: 0.716968 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664478 ms - Host latency: 0.717139 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664648 ms - Host latency: 0.71875 ms (enqueue 0.00441895 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664429 ms - Host latency: 0.715356 ms (enqueue 0.0041748 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664893 ms - Host latency: 0.717993 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.6646 ms - Host latency: 0.717603 ms (enqueue 0.00422363 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664478 ms - Host latency: 0.715576 ms (enqueue 0.00393066 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664233 ms - Host latency: 0.715308 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664429 ms - Host latency: 0.715601 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664307 ms - Host latency: 0.714868 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664844 ms - Host latency: 0.71665 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664453 ms - Host latency: 0.71543 ms (enqueue 0.00419922 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665088 ms - Host latency: 0.715991 ms (enqueue 0.00393066 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66438 ms - Host latency: 0.717358 ms (enqueue 0.00410156 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664624 ms - Host latency: 0.716333 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664673 ms - Host latency: 0.716699 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664233 ms - Host latency: 0.717236 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664209 ms - Host latency: 0.717432 ms (enqueue 0.00407715 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664258 ms - Host latency: 0.714014 ms (enqueue 0.00410156 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664526 ms - Host latency: 0.71499 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664453 ms - Host latency: 0.716333 ms (enqueue 0.00422363 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663647 ms - Host latency: 0.714355 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664844 ms - Host latency: 0.718774 ms (enqueue 0.00380859 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664404 ms - Host latency: 0.715527 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664624 ms - Host latency: 0.717285 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664282 ms - Host latency: 0.716821 ms (enqueue 0.00393066 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664551 ms - Host latency: 0.717456 ms (enqueue 0.00407715 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665039 ms - Host latency: 0.716284 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664087 ms - Host latency: 0.715698 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665137 ms - Host latency: 0.716699 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664746 ms - Host latency: 0.717163 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664795 ms - Host latency: 0.716675 ms (enqueue 0.00385742 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664429 ms - Host latency: 0.715283 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664282 ms - Host latency: 0.715674 ms (enqueue 0.00415039 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664526 ms - Host latency: 0.71665 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66477 ms - Host latency: 0.715991 ms (enqueue 0.00407715 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664453 ms - Host latency: 0.716382 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66477 ms - Host latency: 0.715991 ms (enqueue 0.00471191 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664551 ms - Host latency: 0.718359 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663965 ms - Host latency: 0.71499 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664526 ms - Host latency: 0.718286 ms (enqueue 0.00373535 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664429 ms - Host latency: 0.715796 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665308 ms - Host latency: 0.715723 ms (enqueue 0.00390625 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664697 ms - Host latency: 0.716138 ms (enqueue 0.00388184 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664429 ms - Host latency: 0.715918 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663867 ms - Host latency: 0.715137 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664795 ms - Host latency: 0.717773 ms (enqueue 0.00395508 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664624 ms - Host latency: 0.715869 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664624 ms - Host latency: 0.717065 ms (enqueue 0.00393066 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664502 ms - Host latency: 0.715967 ms (enqueue 0.00380859 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664941 ms - Host latency: 0.718115 ms (enqueue 0.00419922 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664429 ms - Host latency: 0.715234 ms (enqueue 0.00419922 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664478 ms - Host latency: 0.718359 ms (enqueue 0.00427246 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664429 ms - Host latency: 0.718164 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66499 ms - Host latency: 0.717212 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664502 ms - Host latency: 0.716382 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664551 ms - Host latency: 0.716504 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.663892 ms - Host latency: 0.717139 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664575 ms - Host latency: 0.7198 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664478 ms - Host latency: 0.71626 ms (enqueue 0.00402832 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.66477 ms - Host latency: 0.718091 ms (enqueue 0.00405273 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664648 ms - Host latency: 0.717432 ms (enqueue 0.00410156 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.665308 ms - Host latency: 0.716504 ms (enqueue 0.00410156 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664331 ms - Host latency: 0.714355 ms (enqueue 0.00400391 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664258 ms - Host latency: 0.716406 ms (enqueue 0.00397949 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664307 ms - Host latency: 0.717822 ms (enqueue 0.00410156 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664722 ms - Host latency: 0.717603 ms (enqueue 0.00410156 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664624 ms - Host latency: 0.715796 ms (enqueue 0.00393066 ms) [08/13/2024-21:57:21] [I] Average on 10 runs - GPU latency: 0.664282 ms - Host latency: 0.716846 ms (enqueue 0.00412598 ms) [08/13/2024-21:57:21] [I] [08/13/2024-21:57:21] [I] === Performance summary === [08/13/2024-21:57:21] [I] Throughput: 1500.93 qps [08/13/2024-21:57:21] [I] Latency: min = 0.708862 ms, max = 0.733276 ms, mean = 0.716457 ms, median = 0.716309 ms, percentile(90%) = 0.720459 ms, percentile(95%) = 0.721252 ms, percentile(99%) = 0.722595 ms [08/13/2024-21:57:21] [I] Enqueue Time: min = 0.00341797 ms, max = 0.0460205 ms, mean = 0.00414514 ms, median = 0.00390625 ms, percentile(90%) = 0.00476074 ms, percentile(95%) = 0.00512695 ms, percentile(99%) = 0.00723267 ms [08/13/2024-21:57:21] [I] H2D Latency: min = 0.0419922 ms, max = 0.0587158 ms, mean = 0.0449008 ms, median = 0.0449219 ms, percentile(90%) = 0.0461121 ms, percentile(95%) = 0.0463867 ms, percentile(99%) = 0.0483398 ms [08/13/2024-21:57:21] [I] GPU Compute Time: min = 0.661621 ms, max = 0.668213 ms, mean = 0.664518 ms, median = 0.664551 ms, percentile(90%) = 0.665771 ms, percentile(95%) = 0.666138 ms, percentile(99%) = 0.666748 ms [08/13/2024-21:57:21] [I] D2H Latency: min = 0.00292969 ms, max = 0.0114746 ms, mean = 0.00703715 ms, median = 0.0065918 ms, percentile(90%) = 0.0107422 ms, percentile(95%) = 0.0109253 ms, percentile(99%) = 0.0111084 ms [08/13/2024-21:57:21] [I] Total Host Walltime: 3.00213 s [08/13/2024-21:57:21] [I] Total GPU Compute Time: 2.99432 s [08/13/2024-21:57:21] [I] Explanations of the performance metrics are printed in the verbose logs. [08/13/2024-21:57:21] [I] &&&& PASSED TensorRT.trtexec [TensorRT v8602] # /usr/src/tensorrt/bin/trtexec --onnx=resnet18_quant.quant.onnx --saveEngine=resnet18_quant_int8.engine --int8 --useCudaGraph --dumpLayerInfo --profilingVerbosity=detailed --useSpinWait