GEQL

Dieter Plaetinck edited this page Apr 29, 2014 · 14 revisions

Introduction

the Graph-Explorer Query Language is designed to:

  • be minimal, use a simple syntax and get a lot done with little input.
  • let you compose graphs from metrics in a flexible way: you can use tags and pattern matching to filter, group, process and aggregate targets and manipulate how the graph gets displayed.
  • let you create custom views of the exact information you need, and let you compare and correlate across different aspects.

At the most basic level you start by typing patterns that will filter out the metrics you're looking for. Then, you can extend the query by typing statements that have special meanings.

Query execution

  • from the query input you provide...
  • parse out any special statements (see below)
  • split up result into separate patterns (white space as boundary), each of which must match on its own.
  • you can use ! to negate
  • any pattern that has : or = inside it matches on tags, like so:
    • =<val> : a tag must have value <val>
    • <key>= : a tag with key <key> must exist
    • <key>=<val> : a tag with key <key> must exist and have value <val>
    • :<val> : a tag with value matching regex <val> must exist
    • <key>: : a tag with key matching regex <key> must exist
    • <key>:<val> : a tag with key <key> must exist and its value must match the regex <val>
  • any other pattern is treated as a regular expression and gets matched on the metric as well as tags.
  • matching targets are collected, grouped into graphs, aggregated and rendered

note:

  • order between patterns is irrelevant

Special statements

Unless mentioned otherwise, these statements are all optional (i.e. have default values), can occur anywhere within the query and the values must not contain white space.

||<events query>

if anthracite events are enabled, using this pattern you can customize which events to retrieve. This must appear at the end of the GEQL query default value: '*' (retrieve all values) The format is Lucene queries which allows a bunch of things, wildcards, fuzzy terms, booleans etc. In particular, "" means retrieve no events. So if you end your GEQL query with "||" it will show no events. By default, searches in all fields. But using something like tags:initiator=sclm you can search in tags

graph|list|lines|stack

default: graph this statement goes in the beginning of the query.

  • graph (default): builds and shows the graphs from your input (in lines mode, unless preferences alter that)
  • list: shows all matching targets (not the matching graphs)
  • lines: graph, but enforce lines mode
  • stack: graph, but enforce stack mode

group by <tagspec> and GROUP BY <tagspec>

<tagspec> is a list like so: foo[=][,bar[=][,baz[=][...]]] basically a comma-separated list of tags with optional '=' suffix to denote soft or hard (see below).

by default, grouping is by unit= and server. The tags unit is strong, meaning a <tag>= pattern is added to the query so that only targets are shown that have the tag. The tag server is soft so that no pattern is added, and targets without this tag will show up too.

You can affect this in two ways:

  • specify group by <tagspec> to keep the standard hard tags and replace all soft tags with foo, bar, etc.
  • specify GROUP BY <tagspec> to replace the original list entirely and only group by foo, bar, etc.

for more control over different buckets, this construct supports bucketing

For example, the cpu plugin yields targets with tags:

  • target_type: gauge_pct (all of them)

  • what: cpu_state (all of them)

  • type : something like iowait

  • server: the host name

  • core: core name (core0, etc)

  • default: grouping by target_type=, what= and server. So you get a graph for each server, showing the different types for all different cores.

  • group by type shows for each type (iowait, idle, etc) a graph comparing all cores on all servers

  • group by core,server shows a graph for each core on each server.

(a good way to try this out would be to query for cpu_state and maybe filter on servername so you only get a few hostnames) (no effect in 'list' mode)

avg by <tagspec>

not in list mode. <tagspec> is a list like so: foo[,bar][...]

causes targets on every graph to be averaged together and shown as one, if they have the same tag key/values but different values for the tags in <tagspec>.

for more control over different buckets, this construct supports bucketing

sum by <tagspec>

not in list mode. <tagspec> is a list like so: foo[,bar][...]

causes targets on every graph to be summed together and shown as one, if they have the same tag key/values but different values for the tags in <tagspec>.

for more control over different buckets, this construct supports bucketing

from <word>

default: '-24hours'. accepts anything graphite accepts which is a whole lot (no effect in 'list' mode)

to <word>

default: 'now'. accepts anything graphite accepts (see above) (no effect in 'list' mode)

min <val>

set the min value of the Y-axis. can be float or int, and recognizes SI and IEC prefixes. I.e. "10", "100.25", "10G", "100Mi", etc.

max val

set the max value of the Y-axis. see above.

limit <number>

default: 500
limit rendered targets (to avoid killing your browser and/or graphite server). 0 means no limit (no effect in 'list' mode). Note that a rendered target can be an aggregate of N metrics. Note: a targets-per-graph limit would be very useful, as that seems to be a key performance indicator: more targets on a graph makes graphite's response slower and seems to hang firefox; whereas spreading out those same targets over less graphs is less troublesome (this demonstrates the problem if you have enough statsd timers: catchall_statsd unit=ms vs catchall_statsd unit=ms group by type). I'm unsure what's the best way introduce all these different limits and how to name them.

avg over <timespec>

instead of showing the metric directly, show the moving average. timespec looks like [amount]unit

example: avg over 30M to show the moving average over 30 minutes.

note: the points up until the first datapoint where a full period is covered, are empty. so by default avg over 1d shows nothing, so you need something like avg over 1d from -2days.

Inline unit conversion

when you specify something like unit=B/M or unit=B/h it will automatically show the requested metric per the new time interval, by rescaling B/s metrics appropriately. Currently only supported for metrics that are per second. Only looks at the divisor, not the dividend, and doesn't take prefixes into account. None of that is needed because no-one ever uses 'ks' (kiloseconds) and the graphs automatically display appropriate prefixes (i.e. no need to type 'TB/d' for a 'B/s' metric, just make it 'B/d', if it's in the terabyte range the graph will automatically show the appropriate prefixes). It follows that this only needs to be done for rates. For metrics that are not expressed per a time, this is not needed as the automatic prefixes on the graph suffice.

Bucketing

constructs like sum by, avg by, group by typically list one or more tag keys, and they operate by creating buckets for every given tag key. For example avg by hostname,type will by default create a bucket for both hostname and type, and both buckets capture all possible different values for those keys, so that you average across all possible different values for hostname and type. But this bucketing is controllable. You can type something like avg by hostname:web|db,type. Now for the hostname averaging, there will be 3 buckets: tags matching 'web', tags matching 'db', and all others. So instead of one graph line, you will have 3 according to the chosen buckets.