Support tracing tensors in triton #3598

ZhuYuJin · 2021-11-22T07:32:14Z

A draft implementation of tracing tensors according to the design doc(https://docs.google.com/document/d/1yL40ctSccNMnbhiAkR-Wg6T0zSaXrfoC/edit) and the PR of triton core(triton-inference-server/core#36).

The output of trace_summary.py is shown as follow.

Please check whether the implementation is reasonable. @deadeyegoodwin

deadeyegoodwin · 2021-11-24T16:35:17Z

qa/common/trace_summary.py

+
+
+TRITON_TYPE_TO_NUMPY = {
+    1: bool,


Wha are these magic numbers. At the least put comments indicating the corresponding enum name

deadeyegoodwin · 2021-11-24T16:38:12Z

src/core/dynamic_batch_scheduler.h

@@ -34,6 +34,7 @@
 #include <queue>
 #include <set>
 #include <thread>
+


Remove extra lines between includes added here and elsewhere. Are you running the formatter? It should not be adding these lines.

deadeyegoodwin · 2021-11-24T16:39:23Z

src/core/infer_request.h

@@ -39,6 +40,10 @@
 #include "src/core/status.h"
 #include "src/core/tritonserver_apis.h"

+#ifdef TRITON_ENABLE_GPU
+#include <cuda_runtime_api.h>


Why do you need this include?

deadeyegoodwin · 2021-11-24T16:53:41Z

src/core/infer_request.h

@@ -304,7 +309,12 @@ class InferenceRequest {
  void SetTrace(std::unique_ptr<InferenceTrace>&& trace)
  {
    trace_ = std::move(trace);
+#ifdef TRITON_ENABLE_TRACING
+    response_factory_.SetTrace(std::move(trace_->CopyTrace()));


If we need to have the trace object be used by both request and response(s), then we should change the trace to be a shared pointer everywhere. We should not be creating copies of the trace object. The potential complication is that the activity collection and other functions may need additional synchronization as the response(s) may be reporting trace activities/tensors interleaved with each other and with the request.

deadeyegoodwin · 2021-11-24T16:56:37Z

src/core/infer_trace.h

@@ -52,7 +53,31 @@ class InferenceTrace {
  {
  }

+  InferenceTrace(


Make this the only constructor and change callers to explicitly pass tensor_activity_fn = nullptr when necessary

deadeyegoodwin · 2021-11-24T16:58:14Z

src/servers/tracer.h

 #include "triton/core/tritonserver.h"

+#ifdef TRITON_ENABLE_GPU
+#include <cuda_runtime_api.h>


Why did you add this? I don't see any cuda api usage in this file.

ZhuYuJin · 2021-11-29T07:46:22Z

@deadeyegoodwin Please check the latest commit. I have made the following modification. 1) Removing extra lines between includes; 2) Changing the trace to be a shared pointer; 3) Saving strings instead of magic numbers in trace.json.

deadeyegoodwin · 2021-11-30T18:52:18Z

src/core/infer_request.cc

@@ -31,6 +31,9 @@
 #include "src/core/logging.h"
 #include "src/core/model.h"
 #include "src/core/server.h"
+#ifdef TRITON_ENABLE_GPU
+#include <cuda_runtime_api.h>


Why is this include needed?

deadeyegoodwin · 2021-11-30T18:53:34Z

src/core/ensemble_scheduler.cc

@@ -1222,6 +1222,8 @@ EnsembleScheduler::Enqueue(std::unique_ptr<InferenceRequest>& request)
  INFER_TRACE_ACTIVITY(
      request->Trace(), TRITONSERVER_TRACE_QUEUE_START,
      request->QueueStartNs());
+  request->TraceTensor();


Surround with "#ifdef TRITON_ENABLE_TRACING"

deadeyegoodwin · 2021-11-30T18:53:55Z

src/core/infer_request.cc

@@ -110,6 +113,70 @@ InferenceRequest::SetPriority(uint32_t p)
  }
 }

+void


Put "#ifdef TRITON_ENABLE_TRACING" around entire function

deadeyegoodwin · 2021-11-30T18:54:53Z

src/core/infer_request.cc

+    TRITONSERVER_MemoryType memory_type;
+    int64_t memory_type_id;
+
+    TRITONBACKEND_InputProperties(


Don't call external TRITONBACKEND functions from here. All the information is available directly from this InferenceRequest object.

src/core/infer_request.cc

deadeyegoodwin · 2021-11-30T19:09:24Z

src/core/infer_response.h

@@ -102,6 +109,11 @@ class InferenceResponseFactory {
  // Delegator to be invoked on sending responses.
  std::function<void(std::unique_ptr<InferenceResponse>&&, const uint32_t)>
      response_delegator_;
+
+#ifdef TRITON_ENABLE_TRACING
+  // Inference trace associated with this request.


... "associated with this response."

deadeyegoodwin · 2021-11-30T19:10:08Z

src/core/infer_response.h

@@ -303,6 +320,11 @@ class InferenceResponse {
      response_delegator_;

  bool null_response_;
+
+#ifdef TRITON_ENABLE_TRACING
+  // Inference trace associated with this request.


... "associated with this response."

deadeyegoodwin · 2021-11-30T19:21:48Z

src/core/infer_trace.h

@@ -46,13 +46,24 @@ class InferenceTrace {
  InferenceTrace(
      const TRITONSERVER_InferenceTraceLevel level, const uint64_t parent_id,
      TRITONSERVER_InferenceTraceActivityFn_t activity_fn,
-      TRITONSERVER_InferenceTraceReleaseFn_t release_fn, void* userp)
+      TRITONSERVER_InferenceTraceReleaseFn_t release_fn, void* userp,
+      TRITONSERVER_InferenceTraceTensorActivityFn_t tensor_activity_fn =


Move tensor_activity_fn argument to after activity_fn. You will not be able to use a default augument value (nullptr) so you will need to change the callers to explicitly pass nullptr.

src/servers/tracer.cc

ZhuYuJin · 2021-12-01T12:02:00Z

@deadeyegoodwin The above comments should get solved in the latest commits. Please check if there are any other problems.

src/core/infer_request.cc

src/core/tritonserver.cc

deadeyegoodwin · 2021-12-01T17:24:45Z

src/servers/tracer.cc

@@ -181,9 +182,6 @@ TraceManager::TraceRelease(TRITONSERVER_InferenceTrace* trace, void* userp)
  if (parent_id == 0) {
    delete ts;
  }
-
-  LOG_TRITONSERVER_ERROR(


Why are you removing this?

TraceRelease is called in destructor of InferenceTrace by now. So it's no need to release the InferenceTrace again. Otherwise, it will cause deadlock.

We need to keep this call. But some other changes are needed. Currently TRITONSERVER_InferenceTrace represents InferenceTrace. So TRITONSERVER_InferenceTrace* is a cast of InferenceTrace*. But now we need TRITONSERVER_InferenceTrace needs to represent shared_ptr so that TRITONSERVER_InferenceTrace* is a cast of shared_ptr*. That will require changing all of the uses of TRITONSERVER_InferenceTrace.

I split original TraceRelease into two functions: TraceRelease and TraceStreamRelease.
TraceRelease should delete InferenceTrace object.
TraceStreamRelease should be called in InferenceTrace destructor.

src/servers/tracer.cc

deadeyegoodwin · 2021-12-01T17:32:45Z

src/servers/tracer.cc

+  std::stringstream ss_tmp;
+
+  // collect and serialize trace details.
+  ss_tmp << ",{\"id\":" << id << ",\"activity\":\""


Why not write directly to 'ss'... doing it this way introduces unnecessary copies.

src/servers/tracer.cc

ZhuYuJin · 2021-12-02T07:34:36Z

@deadeyegoodwin The above comments should get resolved. Please check this PR again. Thanks for your patience.

src/core/infer_request.cc

deadeyegoodwin · 2021-12-02T17:59:43Z

src/servers/tracer.cc

@@ -181,9 +182,6 @@ TraceManager::TraceRelease(TRITONSERVER_InferenceTrace* trace, void* userp)
  if (parent_id == 0) {
    delete ts;
  }
-
-  LOG_TRITONSERVER_ERROR(


We need to keep this call. But some other changes are needed. Currently TRITONSERVER_InferenceTrace represents InferenceTrace. So TRITONSERVER_InferenceTrace* is a cast of InferenceTrace*. But now we need TRITONSERVER_InferenceTrace needs to represent shared_ptr so that TRITONSERVER_InferenceTrace* is a cast of shared_ptr*. That will require changing all of the uses of TRITONSERVER_InferenceTrace.

ZhuYuJin · 2021-12-03T08:53:32Z

@deadeyegoodwin I have explained both the comments. Do you agree my opinions?

deadeyegoodwin · 2021-12-03T21:30:13Z

@deadeyegoodwin I have explained both the comments. Do you agree my opinions?

I only saw one response from you, and I agree. See my comment above.

ZhuYuJin · 2021-12-05T10:36:48Z

@deadeyegoodwin Please check this response.

deadeyegoodwin · 2021-12-07T17:24:32Z

"the release timing of InferenceTrace is determined, which should be the release timing of InferenceRequest". That is not true.
Previous to this change, we used a unique_ptr for InferenceTrace because its lifetime was closely tied to InferenceRequest.
With this change the lifetime of InferenceTrace is now complicated. InferrenceTrace is required by both InferenceRequest and (one or more) InferenceResponse. The lifetimes of InferenceRequest and InferenceResponse are completely decoupled... an InferenceResponse can live much longer than the corresponding InferenceRequest, and vice-versa. So we now use a shared_ptr for InferenceTrace and we can no longer assume that its lifetime is tied to InferenceRequest. An InferenceTrace reference can be held by an InferenceRequest object, and one or more InferenceResponse objects, and by the client using the C API. Once once all these references are released should the InferenceTrace object be released and finalized. Careful reading of the entire InferenceTrace lifetime/flow is required to make sure that this new lifetime is correctly handled.... and whatever we do must remain backwards compatible at the API level.

deadeyegoodwin · 2021-12-07T17:25:23Z

BTW, the reason I couldn't see you response is because I don't think you actually posted it... note the "Pending" next to it.

ZhuYuJin · 2021-12-08T12:56:52Z

@deadeyegoodwin I have made two modification:

I simply split original TraceRelease into two functions: TraceRelease and TraceStreamRelease.
TraceRelease should delete InferenceTrace object.
TraceStreamRelease should be called in InferenceTrace destructor.
If there is less than one chunk, TraceTensor in InferenceRequest will not copy buffer.

deadeyegoodwin · 2021-12-09T20:17:30Z

@deadeyegoodwin I have made two modification:

1. I simply split original TraceRelease into two functions: TraceRelease and TraceStreamRelease.
   TraceRelease should delete InferenceTrace object.
   TraceStreamRelease should be called in InferenceTrace destructor.

2. If there is less than one chunk, TraceTensor in InferenceRequest will not copy buffer.

The #2 change looks good. I'm not sure about #1 change. The first thing you need to do is to address the comments I made above:
"But now we need TRITONSERVER_InferenceTrace needs to represent shared_ptr so that TRITONSERVER_InferenceTrace* is a cast of shared_ptr*"
I don't see that you have done that change. Have you carefully audited the full life-cycle of trace objects along with making that change?

ZhuYuJin · 2021-12-10T03:24:08Z

@deadeyegoodwin I have made two modification:
1. I simply split original TraceRelease into two functions: TraceRelease and TraceStreamRelease.
   TraceRelease should delete InferenceTrace object.
   TraceStreamRelease should be called in InferenceTrace destructor.

2. If there is less than one chunk, TraceTensor in InferenceRequest will not copy buffer.
The #2 change looks good. I'm not sure about #1 change. The first thing you need to do is to address the comments I made above: "But now we need TRITONSERVER_InferenceTrace needs to represent shared_ptr so that TRITONSERVER_InferenceTrace* is a cast of shared_ptr*" I don't see that you have done that change. Have you carefully audited the full life-cycle of trace objects along with making that change?

@deadeyegoodwin Hi. I think I don't get your point that we should represent TRITONSERVER_InferenceTrace* as shared_ptr*. What's the difference between shared_ptr* and InferenceTrace*?

In the current change, I simply split original TraceRelease into two functions: TraceRelease and TraceStreamRelease. TraceRelease will delete InferenceTrace object. InferenceTrace destructor will call TraceStreamRelease. And TraceStreamRelease will release TraceStream.

Since InferenceTrace destructor will call TraceStreamRelease to release TraceStream, we only need to release InferenceTrace.

The grpc/http server will call trace_manager->SampleTrace or TRITONSERVER_InferenceTraceNew to get a InferenceTrace*, and call TraceRelease or TRITONSERVER_InferenceTraceDelete to delete InferenceTrace*.

Inside Triton, we will call trace->SpawnChildTrace to get a child shared_ptr, which is maintained by InferenceRequest and InferenceResponse. After InferenceRequest and InferenceResponse are released, the InferenceTrace will get auto released.

ZhuYuJin · 2021-12-14T14:00:34Z

@deadeyegoodwin I have made two modification:
1. I simply split original TraceRelease into two functions: TraceRelease and TraceStreamRelease.
   TraceRelease should delete InferenceTrace object.
   TraceStreamRelease should be called in InferenceTrace destructor.

2. If there is less than one chunk, TraceTensor in InferenceRequest will not copy buffer.
The #2 change looks good. I'm not sure about #1 change. The first thing you need to do is to address the comments I made above: "But now we need TRITONSERVER_InferenceTrace needs to represent shared_ptr so that TRITONSERVER_InferenceTrace* is a cast of shared_ptr*" I don't see that you have done that change. Have you carefully audited the full life-cycle of trace objects along with making that change?
@deadeyegoodwin Hi. I think I don't get your point that we should represent TRITONSERVER_InferenceTrace* as shared_ptr*. What's the difference between shared_ptr* and InferenceTrace*?

In the current change, I simply split original TraceRelease into two functions: TraceRelease and TraceStreamRelease. TraceRelease will delete InferenceTrace object. InferenceTrace destructor will call TraceStreamRelease. And TraceStreamRelease will release TraceStream.

Since InferenceTrace destructor will call TraceStreamRelease to release TraceStream, we only need to release InferenceTrace.

The grpc/http server will call trace_manager->SampleTrace or TRITONSERVER_InferenceTraceNew to get a InferenceTrace*, and call TraceRelease or TRITONSERVER_InferenceTraceDelete to delete InferenceTrace*.

Inside Triton, we will call trace->SpawnChildTrace to get a child shared_ptr, which is maintained by InferenceRequest and InferenceResponse. After InferenceRequest and InferenceResponse are released, the InferenceTrace will get auto released.

@deadeyegoodwin Hi. Please check this comment if you are free. I'm confused of our last divergence point.

ZhuYuJin · 2022-01-07T07:18:51Z

@deadeyegoodwin Hi, I have updated cmdline and InferenceTrace. Please check the latest commit again.

BTW, I have run the existing trace tests qa/L0_cmdline_trace to ensure there are no regressions.

deadeyegoodwin

In TraceManager as well you need to fix the initialization and handling of TraceLevel to account for deprecated MIN and MAX and the new TIMESTAMPS and TENSORS values.

deadeyegoodwin · 2022-01-10T18:51:14Z

src/core/infer_trace.h

@@ -69,6 +69,10 @@ class InferenceTrace {
  void Report(
      const TRITONSERVER_InferenceTraceActivity activity, uint64_t timestamp_ns)
  {
+    if (level_ < TRITONSERVER_TRACE_LEVEL_TIMESTAMPS) {


Do not use '<'. level_ is a bitmask of enabled trace levels. So you need to mask TRACE_LEVEL_TIMESTAMPS and only report if that bit is set.

deadeyegoodwin · 2022-01-10T18:51:49Z

src/core/infer_trace.h

@@ -90,6 +98,10 @@ class InferenceTrace {
      const int64_t* shape, uint64_t dim_count,
      TRITONSERVER_MemoryType memory_type, int64_t memory_type_id)
  {
+    if (level_ < TRITONSERVER_TRACE_LEVEL_TENSORS) {


Do not use '<'. level_ is a bitmask of enabled trace levels. So you need to mask TRACE_LEVEL_TENSORS and only report if that bit is set.

deadeyegoodwin · 2022-01-10T18:54:20Z

src/servers/main.cc

@@ -462,8 +462,9 @@ std::vector<Option> options_
      {OPTION_TRACE_FILEPATH, "trace-file", Option::ArgStr,
       "Set the file where trace output will be saved."},
      {OPTION_TRACE_LEVEL, "trace-level", Option::ArgStr,
-       "Set the trace level. OFF to disable tracing, MIN for minimal tracing, "
-       "MAX for maximal tracing. Default is OFF."},
+       "Set the trace level. OFF to disable tracing, TIMESTAMPES to trace "


We need to allow --trace-level to be specified more than once because there are multiple trace levels that can be enabled. For example, see how --backend-config is handled. TIMESTAMPS enabled only timestamps and TENSORS enables only tensors.

deadeyegoodwin · 2022-01-10T18:57:00Z

src/core/infer_trace.h

@@ -69,6 +69,10 @@ class InferenceTrace {
  void Report(


In the contructor you need to set level_ in a way that handles the deprecated MIN and MAX settings.

ZhuYuJin · 2022-01-11T03:50:26Z

@deadeyegoodwin
By now, I simply transfer both the deprecated MIN and MAX to the new TIMESTAMPS. So, I do not need to handle them.

Besides, If we take trace_level_ as a bitmask of enabled trace levels, we have to change the definition of the old interface TRITONSERVER_InferenceTraceNew in tritonserver.h because we take level as an enum in the current interface.

Maybe it should look like:
TRITONSERVER_DECLSPEC TRITONSERVER_Error* TRITONSERVER_InferenceTraceNew(
TRITONSERVER_InferenceTrace** trace, TRITONSERVER_InferenceTraceLevel* level_settings,
uint64_t parent_id, TRITONSERVER_InferenceTraceActivityFn_t activity_fn,
TRITONSERVER_InferenceTraceReleaseFn_t release_fn, void* trace_userp);

But, we can keep the old interface TRITONSERVER_InferenceTraceNew as before, and only fit the new level bitmap into the new interface TRITONSERVER_InferenceTraceTensorNew. Then, TRITONSERVER_InferenceTraceNew will only support one trace level. And TRITONSERVER_InferenceTraceTensorNew support multiple trace levels.

However, if we take trace_level_ as an enum and use comparison operators to check the trace level, we should not have the above problem.

What's your suggestion?

deadeyegoodwin · 2022-01-11T21:42:36Z

The trace level remains an enum. It is just that we only allow power-of-2 enum values (except for the old deprecated values). A "trace level" is still a "trace level" and it is still an enum... the documentation is updated already to indicate what the trace level enum means. On the cmdline and C API we must still accept min and max for backwards compatibility, but internally we should convert those to TIMESTAMPS, so that internally in the code we can assume that trace level will never be min or max and so we can always treat it as a bitmask. There is no need to change the InfrenceTraceNew APIs.

ZhuYuJin · 2022-01-12T08:21:48Z

@deadeyegoodwin Please check the latest commit.

I convert MIN and MAX to TIMESTAMPS on the cmdline, C APIs and grpc/http server.
Besides, I convert trace level bitmap to the enum type, and continue the rest logits.

deadeyegoodwin · 2022-01-12T18:40:54Z

src/core/tritonserver.cc

@@ -807,6 +807,10 @@ TRITONSERVER_InferenceTraceNew(
    TRITONSERVER_InferenceTraceReleaseFn_t release_fn, void* trace_userp)
 {
 #ifdef TRITON_ENABLE_TRACING
+  if ((level == TRITONSERVER_TRACE_LEVEL_MIN) ||


This is the right idea, but level could be (TRITONSERVER_TRACE_LEVEL_TIMESTAMPS | TRITONSERVER_TRACE_LEVEL_TENSORS). So if either min or max is set you need to turn on TIMESTAMPS (level |= TRITONSERVER_TRACE_LEVEL_TIMESTAMPS), and unconditionally you need to mask out MIN and MAX.

deadeyegoodwin · 2022-01-12T18:43:03Z

src/servers/main.cc

-       "Set the trace level. OFF to disable tracing, TIMESTAMPES to trace "
-       "timestamps, TENSORS to trace both timestamps and tensors. Default is "
-       "OFF."},
+       "Specify a trace level. OFF to disable tracing, TIMESTAMPES to "


Fix spelling of TIMESTAMPS. Also you need to say something about how the flag can be specified multiple times. That is needed so that you can turn on both TIMESTAMPS and TENSORS.

Still need to fix TIMESTAMPES -> TIMESTAMPS

ZhuYuJin · 2022-01-13T16:04:54Z

@deadeyegoodwin Please check the latest commit. I have masked out MIN and MAX, and added some description of trace-level command.

deadeyegoodwin

Just need to fix the one spelling issue, rebuild and make sure existing tests still work and then I will try your changes in a full CI run. The last step will be to add new testing for the TENSORS trace.

ZhuYuJin · 2022-01-17T08:26:26Z

@deadeyegoodwin Please check the latest commit. To make sure existing tests still work, I removed trace from InferenceResponse constructor and added SetTrace interface in InferenceResponse. BTW, I have added a test for TENSOR in L0_cmdline_trace.

deadeyegoodwin · 2022-01-19T19:20:07Z

qa/L0_cmdline_trace/test.sh

@@ -104,8 +104,8 @@ fi

 set -e

-# trace-rate == 1, trace-level=MIN make sure every request is traced


Keep one of the MIN and one of the MAX test cases to cover backwards compatibility

qa/L0_cmdline_trace/test.sh

deadeyegoodwin · 2022-01-19T19:35:56Z

Just need a couple of test updated. Then you need to rebase everything to main branch and resolve any conflicts, and then I will run CI. Be sure to rebase your change in the core repo as well.

ZhuYuJin · 2022-01-20T13:31:36Z

@deadeyegoodwin I have merged main branch in the server/core repos. Please check the latest commit.

deadeyegoodwin · 2022-01-21T01:18:41Z

I ran the CI and linux looks good. There is a strange windows build failure that I will need to triage before we can merge.

deadeyegoodwin · 2022-01-26T00:54:17Z

I fixed the windows build issue. It wasn't directly related to your change but was somehow triggered by it.
You need to rebase again and then I can run another CI.

deadeyegoodwin · 2022-01-27T17:17:53Z

The CI passed so this is ready to merge. I can merge my local rebased copy of these changes but I would prefer to just merge your PR. We have other, conflicting changes pending so we need to get this merged ASAP. Are you able to rebase it soon?

ZhuYuJin · 2022-01-28T01:13:01Z

@deadeyegoodwin I have merged the latest main branch of server/core and also fixed a print problem in trace_summary.py.

deadeyegoodwin · 2022-01-28T17:09:53Z

src/core/dynamic_batch_scheduler.cc

-      (payload_state == Payload::State::EXECUTING) ||
-      (payload_state == Payload::State::RELEASED));
+namespace nvidia {
+namespace inferenceserver {


Why did you change this formatting and other formatting within the file? This need to be reverted.

deadeyegoodwin · 2022-01-28T17:10:36Z

src/core/dynamic_batch_scheduler.cc

 #ifdef TRITON_ENABLE_TRACING
-  request->TraceInputTensors(
-      TRITONSERVER_TRACE_TENSOR_QUEUE_INPUT, "DynamicBatchScheduler Enqueue");
+  request->TraceInputTensors(TRITONSERVER_TRACE_TENSOR_QUEUE_INPUT,


Move this to just after line 170, it needs to be inside of the conditional above.

deadeyegoodwin · 2022-01-28T19:02:54Z

I went ahead and merged. I will fix the formatting and other issues in followup PR: #3867
Last task is to update the documentation. You should add appropriate content to https://github.com/triton-inference-server/server/blob/main/docs/trace.md

robertzhu added 3 commits November 18, 2021 15:45

Added trace tensor API

93eef1f

Added trace tensor API

1645949

Added trace tensor API

4a398ae

deadeyegoodwin suggested changes Nov 24, 2021

View reviewed changes

robertzhu added 3 commits November 29, 2021 10:39

Added trace tensor API

bc84f58

Added trace tensor API

74a1ad0

Added trace tensor API

27a6de0

deadeyegoodwin suggested changes Nov 30, 2021

View reviewed changes

robertzhu added 3 commits December 1, 2021 19:50

Added trace tensor API

5626378

Added trace tensor API

05527cd

Added trace tensor API

a333b92

deadeyegoodwin suggested changes Dec 1, 2021

View reviewed changes

Added trace tensor API

83129f9

deadeyegoodwin suggested changes Dec 2, 2021

View reviewed changes

Added trace tensor API

5d70883

ZhuYuJin closed this Dec 8, 2021

ZhuYuJin reopened this Dec 8, 2021

Added trace tensor API

90b9318

Added trace tensor API

4c9be03

deadeyegoodwin suggested changes Jan 10, 2022

View reviewed changes

Added trace tensor API

dd45ac0

deadeyegoodwin suggested changes Jan 12, 2022

View reviewed changes

Added trace tensor API

c4e2521

deadeyegoodwin reviewed Jan 13, 2022

View reviewed changes

Added trace tensor API

a57f27a

deadeyegoodwin suggested changes Jan 19, 2022

View reviewed changes

robertzhu added 3 commits January 20, 2022 19:53

Added trace tensor API

488edc4

Merge remote-tracking branch 'upstream/main' into main

60451dc

Added trace tensor API

d6b4ce0

deadeyegoodwin mentioned this pull request Jan 26, 2022

Add trace endpoints. Add trace log frequency #3849

Merged

robertzhu added 2 commits January 28, 2022 08:16

Added trace tensor API

6ea68ac

Added trace tensor API

ad7c621

deadeyegoodwin suggested changes Jan 28, 2022

View reviewed changes

deadeyegoodwin merged commit 2beb3ae into triton-inference-server:main Jan 28, 2022

ZhuYuJin mentioned this pull request Feb 17, 2022

Updating trace document #3950

Merged

@@ @@ -110,6 +113,70 @@ InferenceRequest::SetPriority(uint32_t p) @@
                 }
               }
+              void

		@@ -104,8 +104,8 @@ fi

		set -e

		# trace-rate == 1, trace-level=MIN make sure every request is traced



		TRITON_TYPE_TO_NUMPY = {
		1: bool,

Support tracing tensors in triton #3598

Support tracing tensors in triton #3598

Conversation

ZhuYuJin commented Nov 22, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZhuYuJin commented Nov 29, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZhuYuJin commented Dec 1, 2021

Choose a reason for hiding this comment

ZhuYuJin Dec 2, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZhuYuJin commented Dec 2, 2021

Choose a reason for hiding this comment

ZhuYuJin commented Dec 3, 2021

deadeyegoodwin commented Dec 3, 2021

ZhuYuJin commented Dec 5, 2021

deadeyegoodwin commented Dec 7, 2021

deadeyegoodwin commented Dec 7, 2021

ZhuYuJin commented Dec 8, 2021

deadeyegoodwin commented Dec 9, 2021

ZhuYuJin commented Dec 10, 2021 • edited

ZhuYuJin commented Dec 14, 2021 • edited

ZhuYuJin commented Jan 7, 2022 • edited

deadeyegoodwin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZhuYuJin commented Jan 11, 2022 • edited

deadeyegoodwin commented Jan 11, 2022

ZhuYuJin commented Jan 12, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZhuYuJin commented Jan 13, 2022

deadeyegoodwin left a comment

Choose a reason for hiding this comment

ZhuYuJin commented Jan 17, 2022

Choose a reason for hiding this comment

deadeyegoodwin commented Jan 19, 2022 • edited

ZhuYuJin commented Jan 20, 2022

deadeyegoodwin commented Jan 21, 2022

deadeyegoodwin commented Jan 26, 2022

deadeyegoodwin commented Jan 27, 2022

ZhuYuJin commented Jan 28, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deadeyegoodwin commented Jan 28, 2022

ZhuYuJin commented Nov 22, 2021 •

edited

ZhuYuJin commented Nov 29, 2021 •

edited

ZhuYuJin Dec 2, 2021 •

edited

ZhuYuJin commented Dec 10, 2021 •

edited

ZhuYuJin commented Dec 14, 2021 •

edited

ZhuYuJin commented Jan 7, 2022 •

edited

ZhuYuJin commented Jan 11, 2022 •

edited

deadeyegoodwin commented Jan 19, 2022 •

edited

ZhuYuJin commented Jan 28, 2022 •

edited