-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support tracing tensors in triton #3598
Conversation
qa/common/trace_summary.py
Outdated
|
||
|
||
TRITON_TYPE_TO_NUMPY = { | ||
1: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wha are these magic numbers. At the least put comments indicating the corresponding enum name
src/core/dynamic_batch_scheduler.h
Outdated
@@ -34,6 +34,7 @@ | |||
#include <queue> | |||
#include <set> | |||
#include <thread> | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra lines between includes added here and elsewhere. Are you running the formatter? It should not be adding these lines.
src/core/infer_request.h
Outdated
@@ -39,6 +40,10 @@ | |||
#include "src/core/status.h" | |||
#include "src/core/tritonserver_apis.h" | |||
|
|||
#ifdef TRITON_ENABLE_GPU | |||
#include <cuda_runtime_api.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this include?
src/core/infer_request.h
Outdated
@@ -304,7 +309,12 @@ class InferenceRequest { | |||
void SetTrace(std::unique_ptr<InferenceTrace>&& trace) | |||
{ | |||
trace_ = std::move(trace); | |||
#ifdef TRITON_ENABLE_TRACING | |||
response_factory_.SetTrace(std::move(trace_->CopyTrace())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to have the trace object be used by both request and response(s), then we should change the trace to be a shared pointer everywhere. We should not be creating copies of the trace object. The potential complication is that the activity collection and other functions may need additional synchronization as the response(s) may be reporting trace activities/tensors interleaved with each other and with the request.
src/core/infer_trace.h
Outdated
@@ -52,7 +53,31 @@ class InferenceTrace { | |||
{ | |||
} | |||
|
|||
InferenceTrace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this the only constructor and change callers to explicitly pass tensor_activity_fn = nullptr when necessary
src/servers/tracer.h
Outdated
#include "triton/core/tritonserver.h" | ||
|
||
#ifdef TRITON_ENABLE_GPU | ||
#include <cuda_runtime_api.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add this? I don't see any cuda api usage in this file.
@deadeyegoodwin Please check the latest commit. I have made the following modification. 1) Removing extra lines between includes; 2) Changing the trace to be a shared pointer; 3) Saving strings instead of magic numbers in trace.json. |
src/core/infer_request.cc
Outdated
@@ -31,6 +31,9 @@ | |||
#include "src/core/logging.h" | |||
#include "src/core/model.h" | |||
#include "src/core/server.h" | |||
#ifdef TRITON_ENABLE_GPU | |||
#include <cuda_runtime_api.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this include needed?
src/core/ensemble_scheduler.cc
Outdated
@@ -1222,6 +1222,8 @@ EnsembleScheduler::Enqueue(std::unique_ptr<InferenceRequest>& request) | |||
INFER_TRACE_ACTIVITY( | |||
request->Trace(), TRITONSERVER_TRACE_QUEUE_START, | |||
request->QueueStartNs()); | |||
request->TraceTensor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surround with "#ifdef TRITON_ENABLE_TRACING"
src/core/infer_request.cc
Outdated
@@ -110,6 +113,70 @@ InferenceRequest::SetPriority(uint32_t p) | |||
} | |||
} | |||
|
|||
void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put "#ifdef TRITON_ENABLE_TRACING" around entire function
src/core/infer_request.cc
Outdated
TRITONSERVER_MemoryType memory_type; | ||
int64_t memory_type_id; | ||
|
||
TRITONBACKEND_InputProperties( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't call external TRITONBACKEND functions from here. All the information is available directly from this InferenceRequest object.
src/core/infer_response.h
Outdated
@@ -102,6 +109,11 @@ class InferenceResponseFactory { | |||
// Delegator to be invoked on sending responses. | |||
std::function<void(std::unique_ptr<InferenceResponse>&&, const uint32_t)> | |||
response_delegator_; | |||
|
|||
#ifdef TRITON_ENABLE_TRACING | |||
// Inference trace associated with this request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... "associated with this response."
src/core/infer_response.h
Outdated
@@ -303,6 +320,11 @@ class InferenceResponse { | |||
response_delegator_; | |||
|
|||
bool null_response_; | |||
|
|||
#ifdef TRITON_ENABLE_TRACING | |||
// Inference trace associated with this request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... "associated with this response."
src/core/infer_trace.h
Outdated
@@ -46,13 +46,24 @@ class InferenceTrace { | |||
InferenceTrace( | |||
const TRITONSERVER_InferenceTraceLevel level, const uint64_t parent_id, | |||
TRITONSERVER_InferenceTraceActivityFn_t activity_fn, | |||
TRITONSERVER_InferenceTraceReleaseFn_t release_fn, void* userp) | |||
TRITONSERVER_InferenceTraceReleaseFn_t release_fn, void* userp, | |||
TRITONSERVER_InferenceTraceTensorActivityFn_t tensor_activity_fn = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move tensor_activity_fn argument to after activity_fn. You will not be able to use a default augument value (nullptr) so you will need to change the callers to explicitly pass nullptr.
@deadeyegoodwin The above comments should get solved in the latest commits. Please check if there are any other problems. |
src/servers/tracer.cc
Outdated
@@ -181,9 +182,6 @@ TraceManager::TraceRelease(TRITONSERVER_InferenceTrace* trace, void* userp) | |||
if (parent_id == 0) { | |||
delete ts; | |||
} | |||
|
|||
LOG_TRITONSERVER_ERROR( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TraceRelease is called in destructor of InferenceTrace by now. So it's no need to release the InferenceTrace again. Otherwise, it will cause deadlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep this call. But some other changes are needed. Currently TRITONSERVER_InferenceTrace represents InferenceTrace. So TRITONSERVER_InferenceTrace* is a cast of InferenceTrace*. But now we need TRITONSERVER_InferenceTrace needs to represent shared_ptr so that TRITONSERVER_InferenceTrace* is a cast of shared_ptr*. That will require changing all of the uses of TRITONSERVER_InferenceTrace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split original TraceRelease into two functions: TraceRelease and TraceStreamRelease.
TraceRelease should delete InferenceTrace object.
TraceStreamRelease should be called in InferenceTrace destructor.
src/servers/tracer.cc
Outdated
std::stringstream ss_tmp; | ||
|
||
// collect and serialize trace details. | ||
ss_tmp << ",{\"id\":" << id << ",\"activity\":\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not write directly to 'ss'... doing it this way introduces unnecessary copies.
@deadeyegoodwin The above comments should get resolved. Please check this PR again. Thanks for your patience. |
src/servers/tracer.cc
Outdated
@@ -181,9 +182,6 @@ TraceManager::TraceRelease(TRITONSERVER_InferenceTrace* trace, void* userp) | |||
if (parent_id == 0) { | |||
delete ts; | |||
} | |||
|
|||
LOG_TRITONSERVER_ERROR( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep this call. But some other changes are needed. Currently TRITONSERVER_InferenceTrace represents InferenceTrace. So TRITONSERVER_InferenceTrace* is a cast of InferenceTrace*. But now we need TRITONSERVER_InferenceTrace needs to represent shared_ptr so that TRITONSERVER_InferenceTrace* is a cast of shared_ptr*. That will require changing all of the uses of TRITONSERVER_InferenceTrace.
@deadeyegoodwin I have explained both the comments. Do you agree my opinions? |
I only saw one response from you, and I agree. See my comment above. |
|
"the release timing of InferenceTrace is determined, which should be the release timing of InferenceRequest". That is not true. |
BTW, the reason I couldn't see you response is because I don't think you actually posted it... note the "Pending" next to it. |
@deadeyegoodwin I have made two modification:
|
The #2 change looks good. I'm not sure about #1 change. The first thing you need to do is to address the comments I made above: |
@deadeyegoodwin Hi. I think I don't get your point that we should represent TRITONSERVER_InferenceTrace* as shared_ptr*. What's the difference between shared_ptr* and InferenceTrace*? In the current change, I simply split original TraceRelease into two functions: TraceRelease and TraceStreamRelease. TraceRelease will delete InferenceTrace object. InferenceTrace destructor will call TraceStreamRelease. And TraceStreamRelease will release TraceStream. Since InferenceTrace destructor will call TraceStreamRelease to release TraceStream, we only need to release InferenceTrace. The grpc/http server will call trace_manager->SampleTrace or TRITONSERVER_InferenceTraceNew to get a InferenceTrace*, and call TraceRelease or TRITONSERVER_InferenceTraceDelete to delete InferenceTrace*. Inside Triton, we will call trace->SpawnChildTrace to get a child shared_ptr, which is maintained by InferenceRequest and InferenceResponse. After InferenceRequest and InferenceResponse are released, the InferenceTrace will get auto released. |
@deadeyegoodwin Hi. Please check this comment if you are free. I'm confused of our last divergence point. |
@deadeyegoodwin Hi, I have updated cmdline and InferenceTrace. Please check the latest commit again. BTW, I have run the existing trace tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In TraceManager as well you need to fix the initialization and handling of TraceLevel to account for deprecated MIN and MAX and the new TIMESTAMPS and TENSORS values.
src/core/infer_trace.h
Outdated
@@ -69,6 +69,10 @@ class InferenceTrace { | |||
void Report( | |||
const TRITONSERVER_InferenceTraceActivity activity, uint64_t timestamp_ns) | |||
{ | |||
if (level_ < TRITONSERVER_TRACE_LEVEL_TIMESTAMPS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use '<'. level_ is a bitmask of enabled trace levels. So you need to mask TRACE_LEVEL_TIMESTAMPS and only report if that bit is set.
src/core/infer_trace.h
Outdated
@@ -90,6 +98,10 @@ class InferenceTrace { | |||
const int64_t* shape, uint64_t dim_count, | |||
TRITONSERVER_MemoryType memory_type, int64_t memory_type_id) | |||
{ | |||
if (level_ < TRITONSERVER_TRACE_LEVEL_TENSORS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use '<'. level_ is a bitmask of enabled trace levels. So you need to mask TRACE_LEVEL_TENSORS and only report if that bit is set.
src/servers/main.cc
Outdated
@@ -462,8 +462,9 @@ std::vector<Option> options_ | |||
{OPTION_TRACE_FILEPATH, "trace-file", Option::ArgStr, | |||
"Set the file where trace output will be saved."}, | |||
{OPTION_TRACE_LEVEL, "trace-level", Option::ArgStr, | |||
"Set the trace level. OFF to disable tracing, MIN for minimal tracing, " | |||
"MAX for maximal tracing. Default is OFF."}, | |||
"Set the trace level. OFF to disable tracing, TIMESTAMPES to trace " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to allow --trace-level to be specified more than once because there are multiple trace levels that can be enabled. For example, see how --backend-config is handled. TIMESTAMPS enabled only timestamps and TENSORS enables only tensors.
@@ -69,6 +69,10 @@ class InferenceTrace { | |||
void Report( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the contructor you need to set level_ in a way that handles the deprecated MIN and MAX settings.
@deadeyegoodwin Besides, If we take trace_level_ as a bitmask of enabled trace levels, we have to change the definition of the old interface Maybe it should look like: But, we can keep the old interface However, if we take trace_level_ as an enum and use comparison operators to check the trace level, we should not have the above problem. What's your suggestion? |
The trace level remains an enum. It is just that we only allow power-of-2 enum values (except for the old deprecated values). A "trace level" is still a "trace level" and it is still an enum... the documentation is updated already to indicate what the trace level enum means. On the cmdline and C API we must still accept min and max for backwards compatibility, but internally we should convert those to TIMESTAMPS, so that internally in the code we can assume that trace level will never be min or max and so we can always treat it as a bitmask. There is no need to change the InfrenceTraceNew APIs. |
@deadeyegoodwin Please check the latest commit. I convert MIN and MAX to TIMESTAMPS on the cmdline, C APIs and grpc/http server. |
src/core/tritonserver.cc
Outdated
@@ -807,6 +807,10 @@ TRITONSERVER_InferenceTraceNew( | |||
TRITONSERVER_InferenceTraceReleaseFn_t release_fn, void* trace_userp) | |||
{ | |||
#ifdef TRITON_ENABLE_TRACING | |||
if ((level == TRITONSERVER_TRACE_LEVEL_MIN) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the right idea, but level could be (TRITONSERVER_TRACE_LEVEL_TIMESTAMPS | TRITONSERVER_TRACE_LEVEL_TENSORS). So if either min or max is set you need to turn on TIMESTAMPS (level |= TRITONSERVER_TRACE_LEVEL_TIMESTAMPS), and unconditionally you need to mask out MIN and MAX.
src/servers/main.cc
Outdated
"Set the trace level. OFF to disable tracing, TIMESTAMPES to trace " | ||
"timestamps, TENSORS to trace both timestamps and tensors. Default is " | ||
"OFF."}, | ||
"Specify a trace level. OFF to disable tracing, TIMESTAMPES to " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix spelling of TIMESTAMPS. Also you need to say something about how the flag can be specified multiple times. That is needed so that you can turn on both TIMESTAMPS and TENSORS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to fix TIMESTAMPES -> TIMESTAMPS
@deadeyegoodwin Please check the latest commit. I have masked out MIN and MAX, and added some description of trace-level command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just need to fix the one spelling issue, rebuild and make sure existing tests still work and then I will try your changes in a full CI run. The last step will be to add new testing for the TENSORS trace.
@deadeyegoodwin Please check the latest commit. To make sure existing tests still work, I removed trace from InferenceResponse constructor and added SetTrace interface in InferenceResponse. BTW, I have added a test for TENSOR in |
@@ -104,8 +104,8 @@ fi | |||
|
|||
set -e | |||
|
|||
# trace-rate == 1, trace-level=MIN make sure every request is traced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep one of the MIN and one of the MAX test cases to cover backwards compatibility
Just need a couple of test updated. Then you need to rebase everything to main branch and resolve any conflicts, and then I will run CI. Be sure to rebase your change in the core repo as well. |
@deadeyegoodwin I have merged main branch in the server/core repos. Please check the latest commit. |
I ran the CI and linux looks good. There is a strange windows build failure that I will need to triage before we can merge. |
I fixed the windows build issue. It wasn't directly related to your change but was somehow triggered by it. |
The CI passed so this is ready to merge. I can merge my local rebased copy of these changes but I would prefer to just merge your PR. We have other, conflicting changes pending so we need to get this merged ASAP. Are you able to rebase it soon? |
@deadeyegoodwin I have merged the latest main branch of server/core and also fixed a print problem in |
(payload_state == Payload::State::EXECUTING) || | ||
(payload_state == Payload::State::RELEASED)); | ||
namespace nvidia { | ||
namespace inferenceserver { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change this formatting and other formatting within the file? This need to be reverted.
#ifdef TRITON_ENABLE_TRACING | ||
request->TraceInputTensors( | ||
TRITONSERVER_TRACE_TENSOR_QUEUE_INPUT, "DynamicBatchScheduler Enqueue"); | ||
request->TraceInputTensors(TRITONSERVER_TRACE_TENSOR_QUEUE_INPUT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to just after line 170, it needs to be inside of the conditional above.
I went ahead and merged. I will fix the formatting and other issues in followup PR: #3867 |
A draft implementation of tracing tensors according to the design doc(https://docs.google.com/document/d/1yL40ctSccNMnbhiAkR-Wg6T0zSaXrfoC/edit) and the PR of triton core(triton-inference-server/core#36).
The output of trace_summary.py is shown as follow.
![image](https://user-images.githubusercontent.com/5879410/142820893-0c08c723-dea3-4c78-8ca4-5a7252cc47a3.png)
![image](https://user-images.githubusercontent.com/5879410/142821142-04e01851-5ee1-4fa1-b317-a456f9082099.png)
Please check whether the implementation is reasonable. @deadeyegoodwin