-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change traced flag semantics and add recorded flag #142
Change traced flag semantics and add recorded flag #142
Conversation
trace_context/HTTP_HEADER_FORMAT.md
Outdated
@@ -118,12 +118,15 @@ static final byte FLAG_TRACED = 1; // 00000001 | |||
boolean traced = (traceOptions & FLAG_TRACED) == FLAG_TRACED | |||
``` | |||
|
|||
#### Traced Flag (00000001) | |||
#### Requested Flag (00000001) | |||
When set, the least significant bit recommends the request should be traced. A caller who | |||
defers a tracing decision leaves this flag unset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add something like this:
"An implementation might still decide on it's own if this flag is respected or not. Consider a scenario when capturing a trace induces cost. An upstream caller could set this flag to maliciously induce cost in a tracing system. It's recommended that an implementation is careful about this and only respects the flag if the upstream caller is trusted."
trace_context/HTTP_HEADER_FORMAT.md
Outdated
When set, the least significant bit recommends the request should be traced. A caller who | ||
defers a tracing decision leaves this flag unset. | ||
|
||
#### Recorded Flag (00000010) | ||
When set, the least significant bit documents that the caller may have recorded trace data. A caller who does not record trace data out-of-band leaves this flag unset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be more clear about the "Definitively not recorded" and "Maybe recorded" states here? Otherwise discussion will spur again, when an implementation cannot definitively declare if it recorded a trace ("undecided").
1f6bd82
to
d9501f5
Compare
Update TraceContext options to store two bits: requested and recorded. See: w3c/trace-context#142
FWIW I don't like this for many reasons. Firstly, it is it translated in a non-user friendly way. For example, say we did have two options like sampled and debug which exist today. The way this translated in is a way that forces people to think in bit flags. The documentation doesn't show how it will look if you did have both flags set except requested/not requested. Adding "recorded" will show how this will look very odd to people. Secondly the mapping isn't semantically correct at least not in a way that matches existing stuff. For example, it is adding a trait "recorded" which is hard to propagate accurately while also muddying the "sampled" state in a way that conflates itself with "debug" or "force trace". For example, "requested" is more muddy than yes/no/abstain of something sampled it.. It doesn't naturally have any more information about user intent. "force trace" is very edge case on the other hand and more important. The whole thing seems like good intentions but missed some experience that is already in B3 and Amazon trace specifications yet muddied with this design. It is also missing real world situation where what is recorded is quite flexible with regards to sampling. The abstraction of traceparent if done well allows a simple compatible route for existing sites who have used similar mechanics for several years in various open source tools... more on this below. Finally, I'm surprised folks like @felixbarny don't speak up and "just implement it" This is a bad sign that people just accept whatever eventhough it doesn't match in a reliable way. Lacking outreach of those directly impacted, lacking folks pushing back at all, and adding rush to merge when meetings happen doesn't lead way for a solid standard, just "another standard". Please next time reach out directly especially knowing that B3 and Amazon both do not follow this, and both represent a significant stake. |
I had a clarifying question around this as well. (This might be my ignorance of the previous specs - b3, etc - or the history of this spec.) Does the
I guess I'm looking for a definition of who "someone" is. Thanks! |
In B3 and things based on it or derived from similar work, or previous
version of this (including grpc format and the original work on this
before w3c)...
sampled was not trying to explain who, rather what. For example it was
sampled upstream, or not, or no decision has been made yet.
debug/force trace in B3 and derived things is usually user
involvement, and is in a way emphasizing the sampled decision, but it
doesn't imply a user did this. It could be automation, for example. At
any rate it is edge case.
The "who" thing about the someone.. there's no prior practice I'm
aware of, at least in open source. My suggestion is to revert this
change as since this is a standard, we should do our best to leverage
what's out there... and we were, but this changed that.
|
There is no explicit |
needs clarification in a doc, I'll send a PR tomorrow morning. |
Thanks! That helps! |
This PR proposes slightly changing the semantics of the
traced
(also calledsampled
) flag in trace-options, and adding a new flag.traced?
becomesrequested?
and describes whether someone explicitly requested this trace be capturedrecorded
describes whether your parent recorded its tracing information out-of-bandBackground
There are three use cases I've heard described:
(This is pretty hard to do, since you can't actually force anything downstream to do something. Although also called "force" or "debug" trace, this is more like a "pretty-please" request or hint.)
(Because the hint cannot actually be trusted, i.e. you can't trust that ALL the upstream nodes actually recorded information, just understanding what your parent did is helpful.)
(However, at the time the request hits the proxy or load balancer, you may not know whether or not you should sample. You want to defer the decision.)
Proposal
These use cases can be solved by two different signals: (1) did someone explicitly request that I record this? (2) did my parent propagate its tracing information out of band? These signals can be sent separately as two trace-option flags.
I believe these changes address the concerns of PR #122:
The
recorded?
flag covers 0 and 1 andrequested?
covers 2.These are covered by
requested?
.The two deferred cases are covered by
10
.At @SergeyKanzhelev's suggestion, I added the table to the spec. While it describes all existing flags, those are both specific to "sampling" and may not scale well if we add more unrelated flags in the future. (On a related note, @tedsuo suggested renaming whole trace-options section to sampling-options until the point where we have unrelated flags.)