native tag-based metrics for graphite/carbon
Go Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
LICENSE add notice and license Oct 2, 2014
TODO code cleanup Dec 24, 2014


graphite protocol extension

Standard graphite protocol (proto1)

string.based.metric[...] value unix_timestamp

"string", "based", "metric" being the nodes of the metric, and metric names usually being unorganized, unstandardized and lacking information.

Carbon-tagger implements metrics 2.0 which aims to make metrics entirely self describing, structured, and standardized.

It does this by using an axtended graphite protocol (backwards compatible):

  • nodes can be old-style values, or new-style key=val or key_is_val tag pairs. (Some versions of graphite-web/graphite-api struggle with equals signs, so the latter format is recommended)
  • if there's a "=" or "is" in one or more of the nodes, we'll try to parse as proto2 and add it to the index if below conditions are met.
  • there must be a tag pair with unit as tag key.
  • there must be at least one other tag.
  • you can freely choose the order of the nodes for every metric, but when you change the order, you change the metric key.
  • old-style nodes (i.e. not "key=val" or key_is_val format) within a proto2 metric implicitly get an "nX" tag key where X is the node position in the string, starting from 1.

You'll probably want to follow the metrics naming conventions, specifically apply the correct units

So this defines a metric as a set of tags, and while we're at it, it also specifies the metric_id that will be used by the current carbon and related tools. carbon-tagger will maintain a database of metrics and their tags, and pass it on (unaltered) to a daemon like carbon-cache or carbon-relay. So the protocol is fully backwards compatible.

This is metrics 2.0 but

  • using dots as delimiters until we can fix graphite.
  • you can use units like "Mbps" or "Errps" to mean "Mb/s" and "Err/s". Graphite treats slashes as delimiters. Carbon-tagger will set the proper unit tag.


  • Indexes metrics 2.0 full (_id and tag)
  • legacy metrics, just the _id, so you can search for it. (empty tags property) it's up to a tool like graph-explorer to create or update documents for legacy metrics with tags enabled.

how does this affect the rest of my stack?

  • carbon-relay, carbon-cache: unaffected, they receive the same data as usual, the identifiers just look a little different.
  • graphite-web: unaffected. metrics 2.0 still show up in the tree of the graphite composer, although that's an inferior UI model that I want to phase out. ideally, dashboards leverage the tag database, like graph-explorer does.
  • aggregators like statsd need proto2 support. statsdaemon is a drop-in statsd replacement that for metrics in metrics2.0 format correctly represents its aggregations and statistical summaries by updating the key/value pairs of metric names, as opposed to the traditional prefix/suffix "features" which leave metrics even vaguer than they were.

why do you process the same metrics every time they are submitted?

to have a realtime database. you could get all metricnames later and process them offline, which lowers resource usage but has higher delays

internal metrics

are in proto2 format and are submitted to a carbon endpoint (typically your relay) they are also available on the http address at /debug/vars2


currently, not very optimized at all! but it's probably speedy enough, and there's a big buffer that smoothens the effect of new metrics

  • I reach about 15k metrics/s processing speed, even when the temp buffer is full and it's syncing to ES. in fact, i don't see a discernable difference between buffer full (unblocked) and buffer full (blocked) probably carbon-cache (whisper) was being the bottleneck?

  • space used: 176B/metric (21M for 125k metrics, twice that if we'd enable indexing/analyzing)


  • make sure the bulk thing uses 'create' and doesn't update/replace the doc every time
  • it seems like ES doesn't contain all metrics (on 2M unique inserts, ES' count is 1889300)
  • better mapping, _source, type analyzing?
  • don't store legacy metrics with empty tags, if already exists. now it will temporarily break graph-explorer's structured metrics (until the latter indexes again)

future optimisations

  • populate an ES cache from disk on program start
  • forward_lines channel buffering, so that runtime doesn't have to switch Gs all the time?

if/when ES performance becomes an issue, consider:

  • edge-ngrams/reverse edge-ngrams to do it at index time
  • use prefix match with regular/reverse fields
  • query_string maybe


if you already have a working Go setup, adjust accordingly:

mkdir -p ~/go/
export GOPATH=~/go
go get
go get
go build