diff --git a/index.html b/index.html index c15e23e..909c6c0 100644 --- a/index.html +++ b/index.html @@ -54,6 +54,7 @@
-
+
+
\ No newline at end of file diff --git a/spec/20-BINARY_FORMAT.md b/spec/20-BINARY_FORMAT.md deleted file mode 100644 index 9e1e4f6..0000000 --- a/spec/20-BINARY_FORMAT.md +++ /dev/null @@ -1,3 +0,0 @@ -# Binary format - -TBD \ No newline at end of file diff --git a/spec/20-binary-format.md b/spec/20-binary-format.md new file mode 100644 index 0000000..01462f3 --- /dev/null +++ b/spec/20-binary-format.md @@ -0,0 +1,106 @@ +# Binary format + +Binary format document describes how to encode each field - `traceparent` and +`tracestate`. The binary format should be used to encode the values of these +fields. This specification does not specify how these fields should be stored +and sent as a part of a binary payload. The basic implementation may serialize +those as size of the field followed by the value. + +Specification operates with bytes - unsigned 8-bit integer values +representing values from `0` to `255`. Byte representation as a set of +bits (big or little endian) MUST be defined by underlying platform and +out of scope of this specification. + +## `Traceparent` binary format + +The field `traceparent` encodes the version of the protocol and fields +`trace-id`, `parent-id` and `trace-flags`. Each field starts with the one byte +field identifier with the field value following immediately after it. Field +identifiers are used as markers for additional verification of the value +consistency and may be used in future for the versioning of the `traceparent` +field. + +``` abnf +traceparent = version version_format +version = 1BYTE ; version is 0 in the current spec +version_format = "{ 0x0 }" trace-id "{ 0x1 }" parent-id "{ 0x2 }" trace-flags +trace-id = 16BYTES +parent-id = 8BYTES +trace-flags = 1BYTE ; only the least significant bit is used +``` + +Unknown field identifier (anything beyond `0`, `1` and `2`) should be treated as +invalid `traceparent`. All zeroes in `trace-id` and `parent-id` invalidates the +`traceparent` as well. + +## Serialization of `traceparent` + +Implementation MUST serialize fields into the field ordering sequence. +In other words, `trace-id` field should be serialized first, `parent-id` +second and `trace-flags` - third. + +Field identifiers should be treated as unsigned byte numbers and should be +encoded in big-endian bit order. + +Fields `trace-id` and `parent-id` are defined as a byte arrays, NOT a +long numbers. First element of an array MUST be copied first. When array is +represented as a memory block of 16 bytes - serialization of `trace-id` +would be identical to `memcpy` method call on that memory block. This +may be a concern for implementations casting these fields to integers - +protocol is NOT defining whether those byte arrays are ordered as big +endian or little endian and have a sign bit. + +If padding of the field is required (`traceparent` needs to be serialized into +the bigger buffer) - any number of bytes can be appended to the end of the +serialized value. + +## `traceparent` example + +``` js +{0, + 0, 75, 249, 47, 53, 119, 179, 77, 166, 163, 206, 146, 157, 0, 14, 71, 54, + 1, 52, 240, 103, 170, 11, 169, 2, 183, + 2, 1} +``` + +This corresponds to: + +- `trace-id` is + `{75, 249, 47, 53, 119, 179, 77, 166, 163, 206, 146, 157, 0, 14, 71, 54}` or + `4bf92f3577b34da6a3ce929d000e4736`. +- `parent-id` is `{52, 240, 103, 170, 11, 169, 2, 183}` or `34f067aa0ba902b7`. +- `trace-flags` is `1` with the meaning `recorded` is true. + +## `tracestate` binary format + +List of up to 32 name-value pairs. Each list member starts with the 1 byte field +identifier `0`. The format of list member is a single byte key length followed +by the key value and single byte value length followed by the encoded +value. Note, single byte length field allows keys and values up to 256 +bytes long. This limit is defined by [trace +context](https://w3c.github.io/trace-context/#header-value) +specification. Strings are transmitted in ASCII encoding. + +``` abnf +tracestate = list-member 0*31( list-member ) +list-member = "0" key-len key value-len value +key-len = 1BYTE ; length of the key string +value-len = 1BYTE ; length of the value string +``` + +Zero length key (`key-len == 0`) indicates the end of the `tracestate`. So when +`tracestate` should be serialized into the buffer that is longer than it +requires - `{ 0, 0 }` (field id `0` and key-len `0`) will indicate the end of +the `tracestate`. + +## `tracestate` example + +``` js +{ 0, 3, 102, 111, 111, 16, 51, 52, 102, 48, 54, 55, 97, 97, 48, 98, 97, 57, 48, 50, 98, 55, + 0, 3, 98, 97, 114, 4, 48, 46, 50, 53, } + +``` + +This corresponds to 2 tracestate entries: + +`foo=34f067aa0ba902b7,bar=0.25` diff --git a/spec/21-binary-format-rationale.md b/spec/21-binary-format-rationale.md new file mode 100644 index 0000000..a612360 --- /dev/null +++ b/spec/21-binary-format-rationale.md @@ -0,0 +1,25 @@ +# Rationale for decision on binary format + +Binary format is similar to proto encoding without any reference on +protobuf project. It uses field identifiers in bytes in front of field +values. + +## Field identifiers + +Protocol uses field identifiers for fields like `trace-id`, `parent-id`, +`trace-flags` and tracestate entries. The purpose of the field +identifiers is two-fold. First, allow to remove existing fields or add +new ones going forward. Second, provides an additional layer of +validation of the format. + +## How can we add new fields + +If we follow the rules that we always append the new ids at the end of the +buffer we can add up to 127. After that we can either use varint encoding or +just reserve 255 as a continuation byte. Assumption at the moment is +that specification will never get to this point. + +## Why custom binary protocol + +We didn't find non-proprietary wide used binary protocol that can be +used in this specification. diff --git a/spec/31-parsing-algoritm.md b/spec/31-parsing-algoritm.md new file mode 100644 index 0000000..2c962f1 --- /dev/null +++ b/spec/31-parsing-algoritm.md @@ -0,0 +1,86 @@ +# De-serialization algorithms + +This is non-normative section that describe de-serialization algorithm +that may be used to parse `traceparent` and `tracestate` field values. + +## De-serialization of `traceparent` + +Let's assume the algorithm takes a buffer - bytes array - and can set +and shift cursor in the buffer as well as validate whether the end of +the buffer was reached or will be reached after reading the given number +of bytes. This algorithm can work on stream of bytes. De-serialization +of `traceparent` MAY be done in the following sequence: + +1. If buffer is empty - RETURN invalid status `BUFFER_EMPTY`. Set a cursor to + the first byte. +2. Read the `version` byte at the cursor position. Shift cursor to `1` byte. +3. If at the end of the buffer RETURN invalid status `TRACEPARENT_INCOMPLETE`. +4. **Parse `trace-id`**. Read the field identifier byte at the cursor + position. If NOT `0` - go to step `8. Report invalid field`. + Otherwise - check that remaining buffer size is more or equal to `16` + bytes. If shorter - RETURN invalid status `TRACE_ID_TOO_SHORT`. + Otherwise read the next `16` bytes for `trace-id` and shift cursor to + the end of those `16` bytes. +5. **Parse `trace-id`**. Read the field identifier byte at the cursor + position. If NOT `1` - go to step `8. Report invalid field`. + Otherwise - check that remaining buffer size is more or equal to `8` + bytes. If shorter - RETURN invalid status `PARENT_ID_TOO_SHORT`. + Otherwise read the next `8` bytes for `parent-id` and shift cursor + to the end of those `8` bytes. +6. **Parse `trace-id`**. Read the field identifier byte at the cursor + position. If NOT `2` - go to step `8. Report invalid field`. + Otherwise - check the remaining size of the buffer. If at the end of + the buffer - RETURN invalid status. Otherwise - read the + `trace-flags` byte. Least significant bit will represent `recorded` + value. +7. RETURN status `OK` if `version` is `0` or status `DOWNGRADED_TO_ZERO` + otherwise. +8. **Report invalid field**. If `version` is `0` RETURN invalid status + `INVALID_FIELD_ID`. If `version` has any other value - + `INCOMPATIBLE_VERSION` + +_Note_, that invalid status names are given for readability and not part of the +specification. + +_Note_, that parsing should not treat any additional bytes in the end of the +buffer as an invalid status. Those fields can be added for padding purposes. +Optionally implementation can check that the buffer is longer than `29` bytes as +a very first step if this check is not expensive. + +## De-serialization of `tracestate` + +Let's assume the algorithm takes a buffer - bytes array - and can set +and shift cursor in the buffer as well as validate whether the end of +the buffer was reached or will be reached after reading the given number +of bytes. Algorithm also uses `version` value parsed from `traceparent`. +If `version` was not given - value `0` SHOULD be used. This algorithm +can work on stream of bytes. De-serialization of `tracestate` MAY be +done in the following sequence: + +1. If at the end of the buffer - RETURN status `OK`. Otherwise set a + cursor to the first byte. +2. **Parse `list-member` field identifier**. Read the field identifier + byte at the cursor position and shift cursor to `1` byte. If NOT `0` + and `version` is `0` RETURN invalid status `INVALID_FIELD_ID`. If NOT + `0` and `version` has any other value - `INCOMPATIBLE_VERSION`. +3. **Parse key**. + 1. If at the end of the buffer - RETURN status `OK`. This situation + indicates that `tracestate` value was padded with `0`. + 2. Read the `key-len` byte. Shift cursor to `1` byte. If the value of + `key-len` is `0` - RETURN status `OK`. This situation indicates an + explicit end of a key. + 3. Check that buffer has `key-len` more bytes. If not - RETURN + `KEY_TOO_SHORT`. + 4. Read `key-len` bytes as `key`. Shift cursor to `key-len` bytes. +4. **Parse value**. + 1. If at the end of the buffer - RETURN status `INCOMPLETE_LIST_MEMBER`. + 2. Read the `value-len` byte. Shift cursor to `1` byte. If the value of + `value-len` is `0` - add `list-member` with the `key` and empty + `value` to the `tracestate` list. RETURN status `OK`. + 3. Check that buffer has `value-len` more bytes. If not - RETURN + `VALUE_TOO_SHORT`. + 4. Read `value-len` bytes as `value`. Shift cursor to `value-len` + bytes. + 5. Add `list-member` with the `key` and `value` to the `tracestate` + list. +5. Go to step `2. Parse list-member field identifier`.