Skip to content

Commit

Permalink
Refine Metrics Protobuf definitions
Browse files Browse the repository at this point in the history
This change applies the refinement approach that is already performed on Traces
Protobuf definitions as part of open-telemetry/oteps#59 and
which proved to yield significant performance improvements.

Notable changes are:
- Replace google.protobuf.Timestamp by int64 time in unix epoch nanoseconds.
- Eliminate unnecessary messages, move the fields into the containing message.
- Replace oneof by a set of fields.

Simple benchmark in Go demonstrates the following improvement compared to OpenCensus
Metrics encoding and decoding:

```
BenchmarkEncode/OpenCensus/Metrics-8         	       2	 645252504 ns/op
BenchmarkEncode/OTLP/Metrics-8               	       4	 288457433 ns/op
BenchmarkDecode/OpenCensus/Metrics-8                   1	1154650804 ns/op
BenchmarkDecode/OTLP/Metrics-8                         3	 475370617 ns/op
```

Encoding is about 2.2 times faster, decoding is about 2.4 times faster.

Benchmarks encode and decode 500 batches of 2 metrics: one int64 Gauge with 5 time series
and one Histogram of doubles with 1 time series and single bucket. Each time series for
both metrics contains 5 data points. Both metrics have 2 labels.

Benchmark source code is available at:
https://github.com/tigrannajaryan/exp-otelproto/blob/master/encodings/encoding_test.go
  • Loading branch information
Tigran Najaryan committed Oct 24, 2019
1 parent 7b88843 commit 35e109f
Showing 1 changed file with 138 additions and 119 deletions.
257 changes: 138 additions & 119 deletions opentelemetry/proto/metrics/v1/metrics.proto
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,6 @@ syntax = "proto3";

package opentelemetry.proto.metrics.v1;

import "google/protobuf/timestamp.proto";
import "google/protobuf/wrappers.proto";
import "opentelemetry/proto/resource/v1/resource.proto";

option java_multiple_files = true;
Expand All @@ -26,31 +24,32 @@ option java_outer_classname = "MetricsProto";

// Defines a Metric which has one or more timeseries.
message Metric {
// The descriptor of the Metric.
// metric_descriptor describes the Metric.
MetricDescriptor metric_descriptor = 1;

// One or more timeseries for a single metric, where each timeseries has
// timeseries is one or more TimeSeries for a single metric, where each timeseries has
// one or more points.
repeated TimeSeries timeseries = 2;

// The resource for the metric. If unset, it may be set to a default value
// provided for a sequence of messages in an RPC stream.
// resource that is associated with this metric. Optional. If not set, this metric
// should be part of a ResourceMetrics message that does include the resource
// information, unless resource information is unknown.
opentelemetry.proto.resource.v1.Resource resource = 3;
}

// Defines a metric type and its schema.
message MetricDescriptor {
// The metric type, including its DNS name prefix. It must be unique.
// name of the metric, including its DNS name prefix. It must be unique.
string name = 1;

// A detailed description of the metric, which can be used in documentation.
// description of the metric, which can be used in documentation.
string description = 2;

// The unit in which the metric value is reported. Follows the format
// unit in which the metric value is reported. Follows the format
// described by http://unitsofmeasure.org/ucum.html.
string unit = 3;

// The kind of metric. It describes how the data is reported.
// Type of metric. It describes how the data is reported.
//
// A gauge is an instantaneous measurement of a value.
//
Expand All @@ -63,36 +62,44 @@ message MetricDescriptor {
// Do not use this default value.
UNSPECIFIED = 0;

// Integer gauge. The value can go both up and down.
// Integer gauge. The value can go both up and down over time.
// Corresponding value is stored in Point.int64_value.
GAUGE_INT64 = 1;

// Floating point gauge. The value can go both up and down.
// Floating point gauge. The value can go both up and down over time.
// Corresponding value is stored in Point.double_value.
GAUGE_DOUBLE = 2;

// Histogram gauge measurement. The count and sum can go both up and
// down. Recorded values are always >= 0.
// Histogram gauge measurement.
// Used in scenarios like a snapshot of time the current items in a queue
// have spent there.
// Corresponding value is stored in Point.histogram_value. The count and sum of the
// histogram can go both up and down over time. Recorded values are always >= 0.
GAUGE_HISTOGRAM = 3;

// Integer counter measurement. The value cannot decrease, if resets
// then the start_time should also be reset.
// Integer counter measurement. The value cannot decrease; if value is reset then
// Point.start_time_unixnano should also be reset.
// Corresponding value is stored in Point.int64_value.
COUNTER_INT64 = 4;

// Floating point counter measurement. The value cannot decrease, if
// resets then the start_time should also be reset. Recorded values are
// always >= 0.
// Corresponding value is stored in Point.double_value.
COUNTER_DOUBLE = 5;

// Histogram cumulative measurement. The count and sum cannot decrease,
// if resets then the start_time should also be reset.
// Histogram cumulative measurement.
// Corresponding value is stored in Point.histogram_value. The count and sum of the
// histogram cannot decrease; if values are reset then Point.start_time_unixnano
// should also be reset.
CUMULATIVE_HISTOGRAM = 6;

// Some frameworks implemented Histograms as a summary of observations
// Summary value. Some frameworks implemented Histograms as a summary of observations
// (usually things like request durations and response sizes). While it
// also provides a total count of observations and a sum of all observed
// values, it calculates configurable percentiles over a sliding time
// window. This is not recommended, since it cannot be aggregated.
// window.
// Corresponding value is stored in Point.summary_value.
SUMMARY = 7;
}
Type type = 4;
Expand All @@ -101,7 +108,7 @@ message MetricDescriptor {
repeated string label_keys = 5;
}

// A collection of data points that describes the time-varying values
// TimeSeries is a collection of data points that describes the time-varying values
// of a metric.
message TimeSeries {
// The set of label values that uniquely identify this timeseries. Applies to
Expand All @@ -110,87 +117,104 @@ message TimeSeries {
repeated LabelValue label_values = 1;

// The data points of this timeseries. Point.value type MUST match the
// MetricDescriptor.type.
// MetricDescriptor.type (see matching types in MetricDescriptor.Type comments).
repeated Point points = 2;
}

// LabelValue is a value of a label.
message LabelValue {
// The value for the label.
string value = 1;
// If false the value field is ignored and considered not set.

// If true the value field is ignored and considered not set.
// This is used to differentiate a missing label from an empty string.
bool has_value = 2;
bool value_unspecified = 2;
}

// A timestamped measurement.
// Point is a timestamped measurement.
message Point {
// Must be present for counter/cumulative metrics. The time when the
// cumulative value was reset to zero. The cumulative value is over the time
// interval (start_timestamp, timestamp]. If not specified, the backend can
// use the previous recorded value.
google.protobuf.Timestamp start_timestamp = 1;
// start_time_unixnano is the time when the cumulative value was reset to zero.
// Must be present for counter/cumulative metrics. The cumulative value is over the time
// interval (start_time_unixnano, timestamp_unixnano].
// Value is UNIX Epoch time in nanoseconds since 00:00:00 UTC on 1 January 1970.
int64 start_time_unixnano = 1;

// The moment when this point was recorded.
// If not specified, the timestamp will be decided by the backend.
google.protobuf.Timestamp timestamp = 2;
// start_time_unspecified must be set to true if start_time_unixnano value is unspecified.
// In that case the backend can use the previous recorded value for start_time_unixnano.
bool start_time_unspecified = 2;

// The actual point value.
oneof value {
// A 64-bit integer.
int64 int64_value = 3;
// timestamp_unixnano is the moment when this point was recorded.
// Value is UNIX Epoch time in nanoseconds since 00:00:00 UTC on 1 January 1970.
int64 timestamp_unixnano = 3;

// A 64-bit double-precision floating-point number.
double double_value = 4;
// timestamp_unspecified must be set to true if timestamp_unixnano value is unspecified.
// In that case timestamp_unixnano value will be decided by the backend.
bool timestamp_unspecified = 4;

// A histogram value.
HistogramValue histogram_value = 5;
// ValueType is the enumeration of possible types that Point's value can have.
enum ValueType {
INT64 = 0;
DOUBLE = 1;
HISTOGRAM = 2;
SUMMARY = 3;
};

// A summary value. This is not recommended, since it cannot be aggregated.
SummaryValue summary_value = 6;
}
// type of the value.
ValueType type = 5;

// Only one of the following fields is supposed to contain data (determined by `type` field value).
// This is deliberately not using Protobuf `oneof` for performance reasons (verified by benchmarks).

// A 64-bit integer.
int64 int64_value = 6;

// A 64-bit double-precision floating-point number.
double double_value = 7;

// A histogram value.
HistogramValue histogram_value = 8;

// A summary value. This is not recommended, since it cannot be aggregated.
SummaryValue summary_value = 9;
}

// Histogram contains summary statistics for a population of values. It may
// optionally contain the distribution of those values across a set of buckets.
message HistogramValue {
// The number of values in the population. Must be non-negative. This value
// must equal the sum of the values in bucket_counts if a histogram is
// count is the number of values in the population. Must be non-negative. This value
// must be equal to the sum of the "count" fields in buckets if a histogram is
// provided.
int64 count = 1;

// The sum of the values in the population. If count is zero then this field
// must be zero.
// sum of the values in the population. If count is zero then this field
// must be zero. This value must be equal to the sum of the "sum" fields in buckets if
// a histogram is provided.
double sum = 2;

// A Histogram may optionally contain the distribution of the values in the
// population. The bucket boundaries are described by BucketOptions.
message BucketOptions {
oneof type {
// Bucket with explicit bounds.
Explicit explicit = 1;
}

// Specifies a set of buckets with arbitrary upper-bounds.
// This defines size(bounds) + 1 (= N) buckets. The boundaries for bucket
// index i are:
//
// [0, bucket_bounds[i]) for i == 0
// [bucket_bounds[i-1], bucket_bounds[i]) for 0 < i < N-1
// [bucket_bounds[i], +infinity) for i == N-1
message Explicit {
// The values must be strictly increasing and > 0.
repeated double bounds = 1;
}

// TODO: If OpenMetrics decides to support (a, b] intervals we should add
// support for these by defining a boolean value here which decides what
// type of intervals to use.
}
// A histogram may optionally contain the distribution of the values in the population.
// If that is the case then "bucket_bounds" and "buckets" fields below both must
// be defined. Otherwise both fields must be omitted in which case the distribution of
// values in the histogram is unknown and only the total count and sum are known.

// bucket_bounds is an optional field. If present specifies buckets with explicitly
// defined bounds. The bucket boundaries are described by this field.
//
// This defines size(bucket_bounds) + 1 (= N) buckets. The boundaries for bucket
// at index i are:
//
// [0, bucket_bounds[i]) for i == 0
// [bucket_bounds[i-1], bucket_bounds[i]) for 0 < i < N-1
// [bucket_bounds[i], +infinity) for i == N-1
// The values in bucket_bounds array must be strictly increasing and > 0.
// Don't change bucket boundaries within a TimeSeries if your backend doesn't
// support this.
BucketOptions bucket_options = 3;
// If bucket_bounds field is unspecified then bucket bounds are not explicitly defined.
// TODO: If OpenMetrics decides to support (a, b] intervals we should add
// support for these by defining a boolean value which decides what type of
// intervals to use.
repeated double bucket_bounds = 3;

// Bucket contains values for a bucket.
message Bucket {
// The number of values in each bucket of the histogram, as described in
// bucket_bounds.
Expand All @@ -199,67 +223,62 @@ message HistogramValue {
// Exemplars are example points that may be used to annotate aggregated
// Histogram values. They are metadata that gives information about a
// particular value added to a Histogram bucket.
message Exemplar {
// Value of the exemplar point. It determines which bucket the exemplar
// belongs to.
double value = 1;

// The observation (sampling) time of the above value.
google.protobuf.Timestamp timestamp = 2;
// Value of the exemplar point. It determines which bucket the exemplar belongs to.
double exemplar_value = 2;

// Contextual information about the example value.
map<string, string> attachments = 3;
}
// The observation (sampling) time of the above value.
int64 exemplar_timestamp_unixnano = 3;

// Exemplars are example points that may be used to annotate aggregated
// Histogram values.
Exemplar exemplar = 2;
// timestamp_unspecified must be set to true if timestamp_unixnano value is unspecified.
// In that case timestamp_unixnano value will be decided by the backend.
bool exemplar_timestamp_unspecified = 4;

// exemplar_attachments are contextual information about the example value.
repeated StringKeyValuePair exemplar_attachments = 5;
}

// The sum of the values in the Bucket counts must equal the value in the
// count field of the histogram.
// buckets is an optional field contains the values of histogram for each bucket.
//
// The sum of the values in the buckets "count" field must equal the value in the
// count field of HistogramValue.
//
// The number of elements in buckets array must be by one greater than the
// number of elements in bucket_bounds array.
repeated Bucket buckets = 4;
}

// StringKeyValuePair is a pair of key/value strings.
message StringKeyValuePair {
string key = 1;
string value = 2;
}

// The start_timestamp only applies to the count and sum in the SummaryValue.
message SummaryValue {
// The total number of recorded values since start_time. Optional since
// some systems don't expose this.
google.protobuf.Int64Value count = 1;
int64 count = 1;

// The total sum of recorded values since start_time. Optional since some
// systems don't expose this. If count is zero then this field must be zero.
// This field must be unset if the sum is not available.
google.protobuf.DoubleValue sum = 2;

// The values in this message can be reset at arbitrary unknown times, with
// the requirement that all of them are reset at the same time.
message Snapshot {
// The number of values in the snapshot. Optional since some systems don't
// expose this.
google.protobuf.Int64Value count = 1;

// The sum of values in the snapshot. Optional since some systems don't
// expose this. If count is zero then this field must be zero or not set
// (if not supported).
google.protobuf.DoubleValue sum = 2;

// Represents the value at a given percentile of a distribution.
message ValueAtPercentile {
// The percentile of a distribution. Must be in the interval
// (0.0, 100.0].
double percentile = 1;

// The value at the given percentile of a distribution.
double value = 2;
}

// A list of values at different percentiles of the distribution calculated
// from the current snapshot. The percentiles must be strictly increasing.
repeated ValueAtPercentile percentile_values = 3;
double sum = 2;

// count_and_sum_unspecified must be set to true if count and sum values are unknown or
// unspecified.
bool count_and_sum_unspecified = 3;

// Represents the value at a given percentile of a distribution.
message ValueAtPercentile {
// The percentile of a distribution. Must be in the interval
// (0.0, 100.0].
double percentile = 1;

// The value at the given percentile of a distribution.
double value = 2;
}

// Values calculated over an arbitrary time window.
Snapshot snapshot = 3;
// A list of values at different percentiles of the distribution calculated
// from the current snapshot. The percentiles must be strictly increasing.
repeated ValueAtPercentile percentile_values = 4;
}

0 comments on commit 35e109f

Please sign in to comment.