Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List all the projects that are willing to change and use this format #4

Closed
bogdandrutu opened this issue Apr 10, 2017 · 18 comments
Closed
Assignees
Labels
Milestone

Comments

@bogdandrutu
Copy link
Contributor

bogdandrutu commented Apr 10, 2017

Here's a working list of tracing systems and constraints related to this specification

System Trace ID Span Id Flags Notes
OpenCensus 128bit 64bit sampled
Jaeger 128bit 64bit sampled, deferred, debug • we set the debug flag when user code does span.SetTag(SAMPLING_PRIORITY, v) v > 0.
• it always sets sampled: 1 (in our case sampled bit is authoritative, not a suggestion - suggestion would have deferred: 1)
• the backend will not down-sample the data any further (which it can do for normal sampled traces to shed the load)
Tracelytics 160bit 64bit ?
Zipkin 128bit 64bit sampled, debug • In B3, sampling is implied as authoritative downstream, and can be null
• incompatible w/ B3 approach of single-span/RPC, so needs work both server side and all tracers.

Please note your project if not already on the list, so we can keep requirements.

@yurishkuro
Copy link
Member

  • Jaeger | 128-bits trace_id | 64-bits span_id | 3 bits used in flags (sampled, deferred, debug)

@bogdandrutu
Copy link
Contributor Author

@yurishkuro I know what deferred means (you explained in the doc) but can you explain the debug field as well?

@yurishkuro
Copy link
Member

  1. we set the debug flag when user code does span.SetTag(SAMPLING_PRIORITY, v) v > 0.
  2. it always sets sampled: 1 (in our case sampled bit is authoritative, not a suggestion - suggestion would have deferred: 1)
  3. the backend will not down-sample the data any further (which it can do for normal sampled traces to shed the load)

@cce
Copy link

cce commented Apr 11, 2017

  • Tracelytics | 160-bit trace_id | 64-bit span_id | 1-bit "sampled" flag

@codefromthecrypt
Copy link

I reformatted comments into a table above. Note from another issue is that in Zipkin, support for this format wouldn't happen until users ask for it, especially as it implies a significant amount of work to tracers and the server side, due to single-host-span model. For example, this spec is incompatible with span-per-rpc defined in b3, as there's not enough room to store the whole context (trace,parent,span id).

I'd expect demand to source from at least google-related sites, just that whatever demand needs to be enough to motivate tracer maintainers to support two propagation formats simultaneously (B3 and this). This labor aspect is a constraint that may not apply to others.

@wu-sheng
Copy link
Contributor

sky-walking project has interested in this, although, in order to improve the performance(less backend memory requirements) of Topological graph of application clusters, we set more info in propagation than others.

Our format is traceSegmentId|spanId|applicationCode|peerHost|distributedTraceIds|sampled. traceSegmentId is only id in current JVM. distributedTraceIds are real trace_ids, more than one values are for batch requirements.

I have a propose, can we design this format as a extendable format. e.g. the first four fields are necessary, but the tracer implementation can use more fields?

@beberlei
Copy link

We are planning to make the move to distributed tracing from a single process tracer, the format is not yet defined, so we could definately see us moving to Trace-Context header.

In addition to trace id, span id and flags we could make use of additional optional key value pairs, but i see no problem solving this with an additional vendor specific header. We will also sign the trace-context with a hash-hmac key, since we have different customers calling each other we need to define the boundary, but this is a specific header as well.

One thing i see a bit problematic with "flags" is that they have to be defined up front in the spec, or they might cause incompatibilities between vendors.

@msample
Copy link

msample commented Apr 25, 2017

Hello all. We are looking at using OpenTracing and my searching for the on-wire/protocol for SpanContext led me here. I was surprised to not find this part solidly defined as it seems critical to the broad adoption of OpenTracing. Some observations - hopefully of use:

  • seems like the minimal abstract span context should be {traceid, spanid} and a is high priority to define. Other things such as sampling could be other headers or optional fields defined later.
  • having just fought int64 issues between gRPC & Swagger/REST and prog langs (Javascript 52 bit integers anyone?), using 'string' as the type for traceid and spanid is less bug prone, future safe, and size tuning can be done depending on installation. DB index optimization could handled by local configuration convention (e.g. traceid strings are b64'd 128 bit little endian ints)
  • Given that HTTP2 will likely become a dominant transport (gRPC, etc), bit packing the header values becomes less important thanks to HPACK compression.
  • Someone mentioned HMAC - if there is need for signatures and even encryption, there is already a good format for this - JWT (lots of debugged libraries, fairly compact). The overhead for 'Unsecured JWTs' (no signature, not encrypted) is pretty low. It's also easy to add other fields if needed.

An unsecured JWT with 'tid' and 'sid' strings might be a decent format - the spec is short, libraries abound, id size is tuneable, and it's relatively future safe (id sizes & adding fields).

@msample
Copy link

msample commented Apr 25, 2017

@yurishkuro thanks, I would have posted to stackoverflow if I had wanted to. I infer that you don't want this discussion here so I'll leave it at this.

your answer: "The reason for this is that such standardization is not necessary as long as the target architecture is using the OpenTracing libraries from the same tracing system"

Unfortunately, that requirement is often a challenge in larger, polyglot microservices deployments that use many different libraries. Pardon my confusion - I thought a standardized wire format was the purpose of this project.

@yurishkuro
Copy link
Member

I infer that you don't want this discussion here so I'll leave it at this.

@msample - this was a FAQ for OpenTracing, that's why I posted to SO. The standardized wire format is the purpose of this project, TraceContext, but not of OpenTracing.

Unfortunately, that requirement is often a challenge in larger, polyglot microservices deployments that use many different libraries

Our (Uber's) ecosystem is 2000+ microservices in 4+ languages and dozens of frameworks. But we consistently use Jaeger as the tracing backend, and Jaeger libs guarantee wire interop. Because all services and libs are instrumented with OpenTracing, we can (theoretically) switch the tracing backend & libraries. It's just rather hard in practice since we can't upgrade 2000+ microservices in lock-step. If we did try to switch, then the wire interop would become very useful indeed.

@sloev
Copy link

sloev commented May 19, 2017

@yurishkuro I actually kind of agree with @msample.
We here at Trustpilot have a few thousand microservices and we are consistently using logstash for all services.

Our scenario is that we are looking to implement opentracing but can't afford to implement vendor specific code in clients since updating them (if we made the wrong choice of OT backend) would be too costly ( /impossible if libs for chosen backend did not exist).

My point is:
We allready have the infrastructure to transmit logs and dispatch them using an OT client towards a chosen backend so :-)WHY:-) would we need to introduce vendor specific code in our client libs for other reasons than it simply not being part of the standard.

[btw. enjoyed your talk (and donuts) at osCon]

@tedsuo
Copy link

tedsuo commented Jun 13, 2017

@sloev to clarify, I believe the approach has been to get as many OpenTracing-enabled Tracers to agree on a wire-protocol, then "bless" that wire-protocol in OpenTracing (though I doubt OT will ever require support for a particular protocol, as it's orthogonal to the instrumentation API). Trying to dictate a wireprotocol to a number of disparate tracing systems which could nonetheless support a common instrumentation API was deemed a non-starter for getting that first tier of agreement.

I'm hopeful that TraceContext will succeed in getting at least all major Dapper/Zipkin style tracers to agree on a wire protocol, that would definitely be a win.

@codefromthecrypt
Copy link

I'm not sure the connection between log dispatch and this cross-process propagation format, except facilitating that the ids log messages are indexed against are coherent within a trace.

It is certainly the case that propagation format inside a library is easier to change than the library itself. In OpenZipkin ecosystem, for example, the binding of headers to tracing happens more often at an abstraction higher than the tracer was defined at. Worst case, middleware can translate things provided the actual trace context values are compatible. More on that..

The values in the trace context (shape of the ID, how many etc), are what will ultimately decide what's compatible within this spec. There are other initiatives for example, that accept arbitrary strings. These won't fit. Also, there's things like X-Ray which include a temporal part of an ID, which might not fit well. So long story short, I agree with @tedsuo that this will be a subset of all propagation formats, hopefully collecting a stronger union there than exists today.

@codefromthecrypt
Copy link

B3 format includes a debug flag, which is persisted such that it passes through all the way to collection tier. This lets operators request a trace be retained even in a scenario where a traffic spike or otherwise leads to traffic being dropped. This functionality isn't widely understood or supported consistently. From tracing workshop discussion, it seems this sortof flag could be accomplished in other ways, and in doing so could reduce the scope of what propagated context must include. Ex. replace this with advice.

One way would be to have a vendor-specific way to ask the collection tier for a force-sampled trace ID. An operator could use this trace ID and the tier would allow it to pass. Another way is that in a scenario like overload, just keep retrying until a request passes collection-tier sampling. Ex if collector is dropping 10%, unlikely 10 tries will fail.

@tedsuo
Copy link

tedsuo commented Jul 14, 2017

Just a note that LightStep is also interested in supporting this format. Our caveats are that we would like to see a Baggage concept, and are concerned about a couple things being specified too narrowly, but it is very very close.

@codefromthecrypt
Copy link

codefromthecrypt commented Jul 15, 2017 via email

@AloisReitbauer
Copy link
Contributor

I would rather list all project that have passed the tests. This is the best way to define support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests