Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
<section id='abstract' data-include="spec/01-abstract.md" data-include-format='markdown'></section>
<section id='sotd' data-include="spec/02-sotd.md" data-include-format='markdown'></section>

<section data-include="spec/20-BINARY_FORMAT.md" data-include-format='markdown'></section>
<section data-include="spec/20-binary-format.md" data-include-format='markdown'></section>
<section data-include="spec/31-parsing-algoritm.md" data-include-format='markdown' class="informative"></section>
</body>
</html>
3 changes: 0 additions & 3 deletions spec/20-BINARY_FORMAT.md

This file was deleted.

106 changes: 106 additions & 0 deletions spec/20-binary-format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Binary format

Binary format document describes how to encode each field - `traceparent` and
`tracestate`. The binary format should be used to encode the values of these
fields. This specification does not specify how these fields should be stored
and sent as a part of a binary payload. The basic implementation may serialize
those as size of the field followed by the value.

Specification operates with bytes - unsigned 8-bit integer values
representing values from `0` to `255`. Byte representation as a set of
bits (big or little endian) MUST be defined by underlying platform and
out of scope of this specification.

## `Traceparent` binary format

The field `traceparent` encodes the version of the protocol and fields
`trace-id`, `parent-id` and `trace-flags`. Each field starts with the one byte
field identifier with the field value following immediately after it. Field
identifiers are used as markers for additional verification of the value
consistency and may be used in future for the versioning of the `traceparent`
field.

``` abnf
traceparent = version version_format
version = 1BYTE ; version is 0 in the current spec
version_format = "{ 0x0 }" trace-id "{ 0x1 }" parent-id "{ 0x2 }" trace-flags
trace-id = 16BYTES
parent-id = 8BYTES
trace-flags = 1BYTE ; only the least significant bit is used
```

Unknown field identifier (anything beyond `0`, `1` and `2`) should be treated as
invalid `traceparent`. All zeroes in `trace-id` and `parent-id` invalidates the
`traceparent` as well.

## Serialization of `traceparent`

Implementation MUST serialize fields into the field ordering sequence.
In other words, `trace-id` field should be serialized first, `parent-id`
second and `trace-flags` - third.

Field identifiers should be treated as unsigned byte numbers and should be
encoded in big-endian bit order.

Fields `trace-id` and `parent-id` are defined as a byte arrays, NOT a
long numbers. First element of an array MUST be copied first. When array is
represented as a memory block of 16 bytes - serialization of `trace-id`
would be identical to `memcpy` method call on that memory block. This
may be a concern for implementations casting these fields to integers -
protocol is NOT defining whether those byte arrays are ordered as big
endian or little endian and have a sign bit.

If padding of the field is required (`traceparent` needs to be serialized into
the bigger buffer) - any number of bytes can be appended to the end of the
serialized value.

## `traceparent` example

``` js
{0,
0, 75, 249, 47, 53, 119, 179, 77, 166, 163, 206, 146, 157, 0, 14, 71, 54,
1, 52, 240, 103, 170, 11, 169, 2, 183,
2, 1}
```

This corresponds to:

- `trace-id` is
`{75, 249, 47, 53, 119, 179, 77, 166, 163, 206, 146, 157, 0, 14, 71, 54}` or
`4bf92f3577b34da6a3ce929d000e4736`.
- `parent-id` is `{52, 240, 103, 170, 11, 169, 2, 183}` or `34f067aa0ba902b7`.
- `trace-flags` is `1` with the meaning `recorded` is true.

## `tracestate` binary format

List of up to 32 name-value pairs. Each list member starts with the 1 byte field
identifier `0`. The format of list member is a single byte key length followed
by the key value and single byte value length followed by the encoded
value. Note, single byte length field allows keys and values up to 256
bytes long. This limit is defined by [trace
context](https://w3c.github.io/trace-context/#header-value)
specification. Strings are transmitted in ASCII encoding.

``` abnf
tracestate = list-member 0*31( list-member )
list-member = "0" key-len key value-len value
key-len = 1BYTE ; length of the key string
value-len = 1BYTE ; length of the value string
```

Zero length key (`key-len == 0`) indicates the end of the `tracestate`. So when
`tracestate` should be serialized into the buffer that is longer than it
requires - `{ 0, 0 }` (field id `0` and key-len `0`) will indicate the end of
the `tracestate`.

## `tracestate` example

``` js
{ 0, 3, 102, 111, 111, 16, 51, 52, 102, 48, 54, 55, 97, 97, 48, 98, 97, 57, 48, 50, 98, 55,
0, 3, 98, 97, 114, 4, 48, 46, 50, 53, }

```

This corresponds to 2 tracestate entries:

`foo=34f067aa0ba902b7,bar=0.25`
25 changes: 25 additions & 0 deletions spec/21-binary-format-rationale.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Rationale for decision on binary format

Binary format is similar to proto encoding without any reference on
protobuf project. It uses field identifiers in bytes in front of field
values.

## Field identifiers

Protocol uses field identifiers for fields like `trace-id`, `parent-id`,
`trace-flags` and tracestate entries. The purpose of the field
identifiers is two-fold. First, allow to remove existing fields or add
new ones going forward. Second, provides an additional layer of
validation of the format.

## How can we add new fields

If we follow the rules that we always append the new ids at the end of the
buffer we can add up to 127. After that we can either use varint encoding or
just reserve 255 as a continuation byte. Assumption at the moment is
that specification will never get to this point.

## Why custom binary protocol

We didn't find non-proprietary wide used binary protocol that can be
used in this specification.
86 changes: 86 additions & 0 deletions spec/31-parsing-algoritm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# De-serialization algorithms

This is non-normative section that describe de-serialization algorithm
that may be used to parse `traceparent` and `tracestate` field values.

## De-serialization of `traceparent`

Let's assume the algorithm takes a buffer - bytes array - and can set
and shift cursor in the buffer as well as validate whether the end of
the buffer was reached or will be reached after reading the given number
of bytes. This algorithm can work on stream of bytes. De-serialization
of `traceparent` MAY be done in the following sequence:

1. If buffer is empty - RETURN invalid status `BUFFER_EMPTY`. Set a cursor to
the first byte.
2. Read the `version` byte at the cursor position. Shift cursor to `1` byte.
3. If at the end of the buffer RETURN invalid status `TRACEPARENT_INCOMPLETE`.
4. **Parse `trace-id`**. Read the field identifier byte at the cursor
position. If NOT `0` - go to step `8. Report invalid field`.
Otherwise - check that remaining buffer size is more or equal to `16`
bytes. If shorter - RETURN invalid status `TRACE_ID_TOO_SHORT`.
Otherwise read the next `16` bytes for `trace-id` and shift cursor to
the end of those `16` bytes.
5. **Parse `trace-id`**. Read the field identifier byte at the cursor
position. If NOT `1` - go to step `8. Report invalid field`.
Otherwise - check that remaining buffer size is more or equal to `8`
bytes. If shorter - RETURN invalid status `PARENT_ID_TOO_SHORT`.
Otherwise read the next `8` bytes for `parent-id` and shift cursor
to the end of those `8` bytes.
6. **Parse `trace-id`**. Read the field identifier byte at the cursor
position. If NOT `2` - go to step `8. Report invalid field`.
Otherwise - check the remaining size of the buffer. If at the end of
the buffer - RETURN invalid status. Otherwise - read the
`trace-flags` byte. Least significant bit will represent `recorded`
value.
7. RETURN status `OK` if `version` is `0` or status `DOWNGRADED_TO_ZERO`
otherwise.
8. **Report invalid field**. If `version` is `0` RETURN invalid status
`INVALID_FIELD_ID`. If `version` has any other value -
`INCOMPATIBLE_VERSION`

_Note_, that invalid status names are given for readability and not part of the
specification.

_Note_, that parsing should not treat any additional bytes in the end of the
buffer as an invalid status. Those fields can be added for padding purposes.
Optionally implementation can check that the buffer is longer than `29` bytes as
a very first step if this check is not expensive.

## De-serialization of `tracestate`

Let's assume the algorithm takes a buffer - bytes array - and can set
and shift cursor in the buffer as well as validate whether the end of
the buffer was reached or will be reached after reading the given number
of bytes. Algorithm also uses `version` value parsed from `traceparent`.
If `version` was not given - value `0` SHOULD be used. This algorithm
can work on stream of bytes. De-serialization of `tracestate` MAY be
done in the following sequence:

1. If at the end of the buffer - RETURN status `OK`. Otherwise set a
cursor to the first byte.
2. **Parse `list-member` field identifier**. Read the field identifier
byte at the cursor position and shift cursor to `1` byte. If NOT `0`
and `version` is `0` RETURN invalid status `INVALID_FIELD_ID`. If NOT
`0` and `version` has any other value - `INCOMPATIBLE_VERSION`.
3. **Parse key**.
1. If at the end of the buffer - RETURN status `OK`. This situation
indicates that `tracestate` value was padded with `0`.
2. Read the `key-len` byte. Shift cursor to `1` byte. If the value of
`key-len` is `0` - RETURN status `OK`. This situation indicates an
explicit end of a key.
3. Check that buffer has `key-len` more bytes. If not - RETURN
`KEY_TOO_SHORT`.
4. Read `key-len` bytes as `key`. Shift cursor to `key-len` bytes.
4. **Parse value**.
1. If at the end of the buffer - RETURN status `INCOMPLETE_LIST_MEMBER`.
2. Read the `value-len` byte. Shift cursor to `1` byte. If the value of
`value-len` is `0` - add `list-member` with the `key` and empty
`value` to the `tracestate` list. RETURN status `OK`.
3. Check that buffer has `value-len` more bytes. If not - RETURN
`VALUE_TOO_SHORT`.
4. Read `value-len` bytes as `value`. Shift cursor to `value-len`
bytes.
5. Add `list-member` with the `key` and `value` to the `tracestate`
list.
5. Go to step `2. Parse list-member field identifier`.