-
-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metal (Apple) GPU back-end for Tracy #793
Conversation
@wolfpld I'd like to request reviews from @nosferalatu and @JamesMcCarthy44, but I can't seem to be able to add reviewers. |
I don't know how assigning reviewers work on Github. Mentioning people should be enough to get their attention. |
Also pinging @theblackunknown for a code review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work Marcos.
I have made a first pass of code review without testing so far.
The PR description and analysis is greatly appreciated !
Can you also clarify if this code is supposed to be use with ObjC ARC or not ?
I know that codebase from codebase we can run into different expectations with this behavior.
TracyMetalDebug(1<<0, TracyMetalPanic(, "MTLCounterErrorValue = 0x%llx", MTLCounterErrorValue)); | ||
TracyMetalDebug(1<<0, TracyMetalPanic(, "MTLCounterDontSample = 0x%llx", MTLCounterDontSample)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan on keeping all those debug points ?
Coming from the Tracy Vulkan background they look unfamiliar to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at least at this initial stage. Should people start reporting issues, this can help me triage the problem.
TracyMetalDebug(1<<0, TracyMetalPanic(, "Calibration: CPU timestamp (Metal): %llu", cpuTimestamp)); | ||
TracyMetalDebug(1<<0, TracyMetalPanic(, "Calibration: GPU timestamp (Metal): %llu", gpuTimestamp)); | ||
|
||
cpuTimestamp = Profiler::GetTime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this expected that you ditch the CPU timestamp returned by the MTLDevice ? Is this for consistency with other Tracy event messages ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what Tracy expects. The CPU timestamp reported when creating the GPU context must be the timestamp that Tracy understands. This is consistent with other backends as well.
t_start = m_mostRecentTimestamp + 5; | ||
t_end = t_start + 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this means you try tp "patch" unresolved or not yet resolved timestamps ?
Can't we defer their resolution instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't defer, because the main reason a timestamp can't be resolved is because it the command encoder with the associated timestamp(s) was never scheduled for execution.
const bool m_active; | ||
|
||
MetalCtx* m_ctx; | ||
id<MTLComputeCommandEncoder> m_cmdEncoder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently unused because the above code is within #if 0 ... #endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I'll be working on that in a subsequent PR. I don't have hardware that supports command granularity right now.
MemWrite( &item->gpuZoneEnd.context, ctx->GetContextId() ); | ||
Profiler::QueueSerialFinish(); | ||
|
||
TracyMetalDebug(1<<2, TracyAllocN((void*)(uintptr_t)queryId, 1, "TracyMetalGpuZone")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those not supposed to be pair of TracyAllocN
/TracyFreeN
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TracyFree
happens in Collect
, when the two timestamps announced here (begin and end timestamps) are resolved.
private: | ||
const bool m_active; | ||
|
||
MetalCtx* m_ctx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively you could just store the context ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other back-ends seem to all store the context pointer, so I'll stick to that.
{ | ||
auto checkTime = std::chrono::high_resolution_clock::now(); | ||
auto requestTime = m_timestampRequestTime[k]; | ||
auto ms_in_flight = std::chrono::duration<float>(checkTime-requestTime).count()*1000.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wolfpld I want to remove uses of std::chrono
and use what's available in Tracy already, that is, Profiler::GetTime()
. I may be missing something obvious here, but how do you convert a time difference between two Profiler::GetTime()
samples and convert it to, say, seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const bool m_active; | ||
|
||
MetalCtx* m_ctx; | ||
id<MTLComputeCommandEncoder> m_cmdEncoder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I'll be working on that in a subsequent PR. I don't have hardware that supports command granularity right now.
@wolfpld Please review the changes to the manual, and if it looks good to you, this PR is ready to merge! |
@@ -401,7 +401,8 @@ enum class GpuContextType : uint8_t | |||
Vulkan, | |||
OpenCL, | |||
Direct3D12, | |||
Direct3D11 | |||
Direct3D11, | |||
Metal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires a protocol version bump (I'll do it).
(I still need to update the manual, but I'm putting the code here for review to save some time).
The Metal back-end in Tracy operates differently than other GPU back-ends like Vulkan, Direct3D and OpenGL. Specifically,
TracyMetalZone()
must be placed around the site where a command encoder is created.This is because not all hardware supports timestamps at command granularity, and can only provide timestamps around an entire command encoder. This accommodates for all tiers of hardware; in the future, variants of
TracyMetalZone()
will be added to support the habitual command-level granularity of Tracy GPU back-ends.Metal also imposes a few restrictions that make the process of requesting and collecting queries more complicated in Tracy:
Because of the limitations above, two timestamp buffers are managed internally. Once one of the buffers fills up with requests, the second buffer can start serving new requests.
Once all requests in a buffer get resolved and collected, the entire buffer is discarded and a new one allocated for future requests. (Proper cycling through a ring buffer would require bookkeeping and completion handlers to collect only the known complete queries.)
In the current implementation, there is potential for a race condition when the buffer is discarded and reallocated. In practice, the race condition will never materialize so long as TracyMetalCollect() is called frequently to keep the amount of unresolved queries low.
Finally, there's a timeout mechanism during timestamp collection to detect "empty" command encoders and ensure progress.