Fix perf analyzer CAPI request lifecycle #124

Tabrizian · 2022-07-04T20:05:56Z

The request object was released as soon as a response is received. This leads to a segfault if the backend wants to still use the inference request object for metrics reporting. This seems to have resolved crazy queue time values that was observed in the C-API too.

Before:

 USING C API: only default functionalities supported
OpenLibraryHandle: /opt/tritonserver/lib/libtritonserver.so
server is alive!
*** Measurement Settings ***
  Batch size: 1
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Using synchronous calls for inference
  Stabilizing using p95 latency

Request concurrency: 1
Segmentation fault (core dumped)
0704 20:04:16.679908 6192 pb_stub.cc:1006] Non-graceful termination detected.

After:

/opt/tritonserver/bin/perf_analyzer -v -p5000 -s5.0 --percentile=95 --shared-memory '"none"' -m python_zero_1_float32 -b1 -t1 --shape INPUT0:1 --service-kind triton_c_api --triton-server-directory /opt/tritonserver --model-repository /opt/tritonserver/qa/L0_perf_nomodel/models -f 22.07dev/min_latency_triton_c_api/python_sbatch1_dbatch1_instance1.csv
 USING C API: only default functionalities supported
OpenLibraryHandle: /opt/tritonserver/lib/libtritonserver.so
server is alive!
*** Measurement Settings ***
  Batch size: 1
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Using synchronous calls for inference
  Stabilizing using p95 latency

Request concurrency: 1
  Pass [1] throughput: 1263.82 infer/sec. p95 latency: 846 usec
  Pass [2] throughput: 1266.1 infer/sec. p95 latency: 845 usec
  Pass [3] throughput: 1262.93 infer/sec. p95 latency: 848 usec
  Client:
    Request count: 22785
    Throughput: 1264.28 infer/sec
    p50 latency: 791 usec
    p90 latency: 833 usec
    p95 latency: 846 usec
    p99 latency: 933 usec

  Server:
    Inference count: 22760
    Execution count: 22760
    Successful request count: 22761
    Avg request latency: 759 usec (overhead 110 usec + queue 135 usec + compute input 94 usec + compute infer 349 usec + compute output 70 usec)

Inferences/Second vs. Client p95 Batch Latency
Concurrency: 1, throughput: 1264.28 infer/sec, latency 846 usec

tgerdesnv · 2022-07-05T12:24:23Z

Was this caused by the PA changes last month? Or unrelated?

Tabrizian · 2022-07-05T13:11:31Z

It is unrelated to the PA changes last month. It existed before that too.

dyastremsky

Beautiful work, Iman. Could you update the headers to include 2022? Happy to approve once done.

Fix perf analyzer CAPI request lifecycle

0bebcf7

Tabrizian requested review from jbkyang-nvi, dyastremsky and matthewkotila July 4, 2022 20:06

dyastremsky requested changes Jul 5, 2022

View reviewed changes

Update copyrights

f63bcfb

Tabrizian requested a review from dyastremsky July 5, 2022 15:49

dyastremsky approved these changes Jul 5, 2022

View reviewed changes

Tabrizian merged commit fe60703 into main Jul 5, 2022

Tabrizian deleted the imant-capi branch July 5, 2022 17:40

dyastremsky mentioned this pull request Jul 26, 2022

How ot imporve throughput on tritonserver triton-inference-server/server#4616

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix perf analyzer CAPI request lifecycle #124

Fix perf analyzer CAPI request lifecycle #124

Tabrizian commented Jul 4, 2022 •

edited

tgerdesnv commented Jul 5, 2022

Tabrizian commented Jul 5, 2022

dyastremsky left a comment

Fix perf analyzer CAPI request lifecycle #124

Fix perf analyzer CAPI request lifecycle #124

Conversation

Tabrizian commented Jul 4, 2022 • edited

tgerdesnv commented Jul 5, 2022

Tabrizian commented Jul 5, 2022

dyastremsky left a comment

Choose a reason for hiding this comment

Tabrizian commented Jul 4, 2022 •

edited