Skip to content

Latest commit

 

History

History
1709 lines (1551 loc) · 51.1 KB

runtime.asciidoc

File metadata and controls

1709 lines (1551 loc) · 51.1 KB

Runtime fields

A runtime field is a field that is evaluated at query time. Runtime fields enable you to:

  • Add fields to existing documents without reindexing your data

  • Start working with your data without understanding how it’s structured

  • Override the value returned from an indexed field at query time

  • Define fields for a specific use without modifying the underlying schema

You access runtime fields from the search API like any other field, and {es} sees runtime fields no differently. You can define runtime fields in the index mapping or in the search request. Your choice, which is part of the inherent flexibility of runtime fields.

Use the fields parameter on the _search API to retrieve the values of runtime fields. Runtime fields won’t display in _source, but the fields API works for all fields, even those that were not sent as part of the original _source.

Runtime fields are useful when working with log data (see examples), especially when you’re unsure about the data structure. Your search speed decreases, but your index size is much smaller and you can more quickly process logs without having to index them.

Benefits

Because runtime fields aren’t indexed, adding a runtime field doesn’t increase the index size. You define runtime fields directly in the index mapping, saving storage costs and increasing ingestion speed. You can more quickly ingest data into the Elastic Stack and access it right away. When you define a runtime field, you can immediately use it in search requests, aggregations, filtering, and sorting.

If you change a runtime field into an indexed field, you don’t need to modify any queries that refer to the runtime field. Better yet, you can refer to some indices where the field is a runtime field, and other indices where the field is an indexed field. You have the flexibility to choose which fields to index and which ones to keep as runtime fields.

At its core, the most important benefit of runtime fields is the ability to add fields to documents after you’ve ingested them. This capability simplifies mapping decisions because you don’t have to decide how to parse your data up front, and can use runtime fields to amend the mapping at any time. Using runtime fields allows for a smaller index and faster ingest time, which combined use less resources and reduce your operating costs.

Incentives

Runtime fields can replace many of the ways you can use scripting with the _search API. How you use a runtime field is impacted by the number of documents that the included script runs against. For example, if you’re using the fields parameter on the _search API to retrieve the values of a runtime field, the script runs only against the top hits just like script fields do.

You can use script fields to access values in _source and return calculated values based on a script valuation. Runtime fields have the same capabilities, but provide greater flexibility because you can query and aggregate on runtime fields in a search request. Script fields can only fetch values.

Similarly, you could write a script query that filters documents in a search request based on a script. Runtime fields provide a very similar feature that is more flexible. You write a script to create field values and they are available everywhere, such as fields, all queries, and aggregations.

You can also use scripts to sort search results, but that same script works exactly the same in a runtime field.

If you move a script from any of these sections in a search request to a runtime field that is computing values from the same number of documents, the performance should be about the same. The performance for these features is largely dependent upon the calculations that the included script is running and how many documents the script runs against.

Compromises

Runtime fields use less disk space and provide flexibility in how you access your data, but can impact search performance based on the computation defined in the runtime script.

To balance search performance and flexibility, index fields that you’ll frequently search for and filter on, such as a timestamp. {es} automatically uses these indexed fields first when running a query, resulting in a fast response time. You can then use runtime fields to limit the number of fields that {es} needs to calculate values for. Using indexed fields in tandem with runtime fields provides flexibility in the data that you index and how you define queries for other fields.

Use the asynchronous search API to run searches that include runtime fields. This method of search helps to offset the performance impacts of computing values for runtime fields in each document containing that field. If the query can’t return the result set synchronously, you’ll get results asynchronously as they become available.

Important
Queries against runtime fields are considered expensive. If search.allow_expensive_queries is set to false, expensive queries are not allowed and {es} will reject any queries against runtime fields.

Map a runtime field

You map runtime fields by adding a runtime section under the mapping definition and defining a Painless script. This script has access to the entire context of a document, including the original _source via params._source and any mapped fields plus their values. At query time, the script runs and generates values for each scripted field that is required for the query.

Emitting runtime field values

When defining a Painless script to use with runtime fields, you must include the {painless}/painless-runtime-fields-context.html[emit method] to emit calculated values.

For example, the script in the following request calculates the day of the week from the @timestamp field, which is defined as a date type. The script calculates the day of the week based on the value of timestamp, and uses emit to return the calculated value.

PUT my-index-000001/
{
  "mappings": {
    "runtime": {
      "day_of_week": {
        "type": "keyword",
        "script": {
          "source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
        }
      }
    },
    "properties": {
      "@timestamp": {"type": "date"}
    }
  }
}

The runtime section can be any of these data types:

  • boolean

  • composite

  • date

  • double

  • geo_point

  • ip

  • keyword

  • long

  • lookup

Runtime fields with a type of date can accept the format parameter exactly as the date field type.

Runtime fields with a type of lookup allow retrieving fields from related indices. See retrieve fields from related indices.

If dynamic field mapping is enabled where the dynamic parameter is set to runtime, new fields are automatically added to the index mapping as runtime fields:

PUT my-index-000001
{
  "mappings": {
    "dynamic": "runtime",
    "properties": {
      "@timestamp": {
        "type": "date"
      }
    }
  }
}

Define runtime fields without a script

Runtime fields typically include a Painless script that manipulates data in some way. However, there are instances where you might define a runtime field without a script. For example, if you want to retrieve a single field from _source without making changes, you don’t need a script. You can just create a runtime field without a script, such as day_of_week:

PUT my-index-000001/
{
  "mappings": {
    "runtime": {
      "day_of_week": {
        "type": "keyword"
      }
    }
  }
}

When no script is provided, {es} implicitly looks in _source at query time for a field with the same name as the runtime field, and returns a value if one exists. If a field with the same name doesn’t exist, the response doesn’t include any values for that runtime field.

In most cases, retrieve field values through doc_values whenever possible. Accessing doc_values with a runtime field is faster than retrieving values from _source because of how data is loaded from Lucene.

However, there are cases where retrieving fields from _source is necessary. For example, text fields do not have doc_values available by default, so you have to retrieve values from _source. In other instances, you might choose to disable doc_values on a specific field.

Note
You can alternatively prefix the field you want to retrieve values for with params._source (such as params._source.day_of_week). For simplicity, defining a runtime field in the mapping definition without a script is the recommended option, whenever possible.

Ignoring script errors on runtime fields

Scripts can throw errors at runtime, e.g. on accessing missing or invalid values in documents or because of performing invalid operations. The on_script_error parameter can be used to control error behaviour when this happens. Setting this parameter to continue will have the effect of silently ignoring all errors on this runtime field. The default fail value will cause a shard failure which gets reported in the search response.

Updating and removing runtime fields

You can update or remove runtime fields at any time. To replace an existing runtime field, add a new runtime field to the mappings with the same name. To remove a runtime field from the mappings, set the value of the runtime field to null:

PUT my-index-000001/_mapping
{
 "runtime": {
   "day_of_week": null
 }
}
Downstream impacts

Updating or removing a runtime field while a dependent query is running can return inconsistent results. Each shard might have access to different versions of the script, depending on when the mapping change takes effect.

Warning
Existing queries or visualizations in {kib} that rely on runtime fields can fail if you remove or update the field. For example, a bar chart visualization that uses a runtime field of type ip will fail if the type is changed to boolean, or if the runtime field is removed.

Define runtime fields in a search request

You can specify a runtime_mappings section in a search request to create runtime fields that exist only as part of the query. You specify a script as part of the runtime_mappings section, just as you would if adding a runtime field to the mappings.

Defining a runtime field in a search request uses the same format as defining a runtime field in the index mapping. Just copy the field definition from the runtime in the index mapping to the runtime_mappings section of the search request.

The following search request adds a day_of_week field to the runtime_mappings section. The field values will be calculated dynamically, and only within the context of this search request:

GET my-index-000001/_search
{
  "runtime_mappings": {
    "day_of_week": {
      "type": "keyword",
      "script": {
        "source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
      }
    }
  },
  "aggs": {
    "day_of_week": {
      "terms": {
        "field": "day_of_week"
      }
    }
  }
}

Create runtime fields that use other runtime fields

You can even define runtime fields in a search request that return values from other runtime fields. For example, let’s say you bulk index some sensor data:

POST my-index-000001/_bulk?refresh=true
{"index":{}}
{"@timestamp":1516729294000,"model_number":"QVKC92Q","measures":{"voltage":"5.2","start": "300","end":"8675309"}}
{"index":{}}
{"@timestamp":1516642894000,"model_number":"QVKC92Q","measures":{"voltage":"5.8","start": "300","end":"8675309"}}
{"index":{}}
{"@timestamp":1516556494000,"model_number":"QVKC92Q","measures":{"voltage":"5.1","start": "300","end":"8675309"}}
{"index":{}}
{"@timestamp":1516470094000,"model_number":"QVKC92Q","measures":{"voltage":"5.6","start": "300","end":"8675309"}}
{"index":{}}
{"@timestamp":1516383694000,"model_number":"HG537PU","measures":{"voltage":"4.2","start": "400","end":"8625309"}}
{"index":{}}
{"@timestamp":1516297294000,"model_number":"HG537PU","measures":{"voltage":"4.0","start": "400","end":"8625309"}}

You realize after indexing that your numeric data was mapped as type text. You want to aggregate on the measures.start and measures.end fields, but the aggregation fails because you can’t aggregate on fields of type text. Runtime fields to the rescue! You can add runtime fields with the same name as your indexed fields and modify the data type:

PUT my-index-000001/_mapping
{
  "runtime": {
    "measures.start": {
      "type": "long"
    },
    "measures.end": {
      "type": "long"
    }
  }
}

Runtime fields take precedence over fields defined with the same name in the index mappings. This flexibility allows you to shadow existing fields and calculate a different value, without modifying the field itself. If you made a mistake in your index mapping, you can use runtime fields to calculate values that override values in the mapping during the search request.

Now, you can easily run an average aggregation on the measures.start and measures.end fields:

GET my-index-000001/_search
{
  "aggs": {
    "avg_start": {
      "avg": {
        "field": "measures.start"
      }
    },
    "avg_end": {
      "avg": {
        "field": "measures.end"
      }
    }
  }
}

The response includes the aggregation results without changing the values for the underlying data:

{
  "aggregations" : {
    "avg_start" : {
      "value" : 333.3333333333333
    },
    "avg_end" : {
      "value" : 8658642.333333334
    }
  }
}

Further, you can define a runtime field as part of a search query that calculates a value, and then run a stats aggregation on that field in the same query.

The duration runtime field doesn’t exist in the index mapping, but we can still search and aggregate on that field. The following query returns the calculated value for the duration field and runs a stats aggregation to compute statistics over numeric values extracted from the aggregated documents.

GET my-index-000001/_search
{
  "runtime_mappings": {
    "duration": {
      "type": "long",
      "script": {
        "source": """
          emit(doc['measures.end'].value - doc['measures.start'].value);
          """
      }
    }
  },
  "aggs": {
    "duration_stats": {
      "stats": {
        "field": "duration"
      }
    }
  }
}

Even though the duration runtime field only exists in the context of a search query, you can search and aggregate on that field. This flexibility is incredibly powerful, enabling you to rectify mistakes in your index mappings and dynamically complete calculations all within a single search request.

{
  "aggregations" : {
    "duration_stats" : {
      "count" : 6,
      "min" : 8624909.0,
      "max" : 8675009.0,
      "avg" : 8658309.0,
      "sum" : 5.1949854E7
    }
  }
}

Override field values at query time

If you create a runtime field with the same name as a field that already exists in the mapping, the runtime field shadows the mapped field. At query time, {es} evaluates the runtime field, calculates a value based on the script, and returns the value as part of the query. Because the runtime field shadows the mapped field, you can override the value returned in search without modifying the mapped field.

For example, let’s say you indexed the following documents into my-index-000001:

POST my-index-000001/_bulk?refresh=true
{"index":{}}
{"@timestamp":1516729294000,"model_number":"QVKC92Q","measures":{"voltage":5.2}}
{"index":{}}
{"@timestamp":1516642894000,"model_number":"QVKC92Q","measures":{"voltage":5.8}}
{"index":{}}
{"@timestamp":1516556494000,"model_number":"QVKC92Q","measures":{"voltage":5.1}}
{"index":{}}
{"@timestamp":1516470094000,"model_number":"QVKC92Q","measures":{"voltage":5.6}}
{"index":{}}
{"@timestamp":1516383694000,"model_number":"HG537PU","measures":{"voltage":4.2}}
{"index":{}}
{"@timestamp":1516297294000,"model_number":"HG537PU","measures":{"voltage":4.0}}

You later realize that the HG537PU sensors aren’t reporting their true voltage. The indexed values are supposed to be 1.7 times higher than the reported values! Instead of reindexing your data, you can define a script in the runtime_mappings section of the _search request to shadow the voltage field and calculate a new value at query time.

If you search for documents where the model number matches HG537PU:

GET my-index-000001/_search
{
  "query": {
    "match": {
      "model_number": "HG537PU"
    }
  }
}

The response includes indexed values for documents matching model number HG537PU:

{
  ...
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0296195,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_id" : "F1BeSXYBg_szTodcYCmk",
        "_score" : 1.0296195,
        "_source" : {
          "@timestamp" : 1516383694000,
          "model_number" : "HG537PU",
          "measures" : {
            "voltage" : 4.2
          }
        }
      },
      {
        "_index" : "my-index-000001",
        "_id" : "l02aSXYBkpNf6QRDO62Q",
        "_score" : 1.0296195,
        "_source" : {
          "@timestamp" : 1516297294000,
          "model_number" : "HG537PU",
          "measures" : {
            "voltage" : 4.0
          }
        }
      }
    ]
  }
}

The following request defines a runtime field where the script evaluates the model_number field where the value is HG537PU. For each match, the script multiplies the value for the voltage field by 1.7.

Using the fields parameter on the _search API, you can retrieve the value that the script calculates for the measures.voltage field for documents matching the search request:

POST my-index-000001/_search
{
  "runtime_mappings": {
    "measures.voltage": {
      "type": "double",
      "script": {
        "source":
        """if (doc['model_number.keyword'].value.equals('HG537PU'))
        {emit(1.7 * params._source['measures']['voltage']);}
        else{emit(params._source['measures']['voltage']);}"""
      }
    }
  },
  "query": {
    "match": {
      "model_number": "HG537PU"
    }
  },
  "fields": ["measures.voltage"]
}

Looking at the response, the calculated values for measures.voltage on each result are 7.14 and 6.8. That’s more like it! The runtime field calculated this value as part of the search request without modifying the mapped value, which still returns in the response:

{
  ...
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0296195,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_id" : "F1BeSXYBg_szTodcYCmk",
        "_score" : 1.0296195,
        "_source" : {
          "@timestamp" : 1516383694000,
          "model_number" : "HG537PU",
          "measures" : {
            "voltage" : 4.2
          }
        },
        "fields" : {
          "measures.voltage" : [
            7.14
          ]
        }
      },
      {
        "_index" : "my-index-000001",
        "_id" : "l02aSXYBkpNf6QRDO62Q",
        "_score" : 1.0296195,
        "_source" : {
          "@timestamp" : 1516297294000,
          "model_number" : "HG537PU",
          "measures" : {
            "voltage" : 4.0
          }
        },
        "fields" : {
          "measures.voltage" : [
            6.8
          ]
        }
      }
    ]
  }
}

Retrieve a runtime field

Use the fields parameter on the _search API to retrieve the values of runtime fields. Runtime fields won’t display in _source, but the fields API works for all fields, even those that were not sent as part of the original _source.

Define a runtime field to calculate the day of week

For example, the following request adds a runtime field called day_of_week. The runtime field includes a script that calculates the day of the week based on the value of the @timestamp field. We’ll include "dynamic":"runtime" in the request so that new fields are added to the mapping as runtime fields.

PUT my-index-000001/
{
  "mappings": {
    "dynamic": "runtime",
    "runtime": {
      "day_of_week": {
        "type": "keyword",
        "script": {
          "source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
        }
      }
    },
    "properties": {
      "@timestamp": {"type": "date"}
    }
  }
}

Ingest some data

Let’s ingest some sample data, which will result in two indexed fields: @timestamp and message.

POST /my-index-000001/_bulk?refresh
{ "index": {}}
{ "@timestamp": "2020-06-21T15:00:01-05:00", "message" : "211.11.9.0 - - [2020-06-21T15:00:01-05:00] \"GET /english/index.html HTTP/1.0\" 304 0"}
{ "index": {}}
{ "@timestamp": "2020-06-21T15:00:01-05:00", "message" : "211.11.9.0 - - [2020-06-21T15:00:01-05:00] \"GET /english/index.html HTTP/1.0\" 304 0"}
{ "index": {}}
{ "@timestamp": "2020-04-30T14:30:17-05:00", "message" : "40.135.0.0 - - [2020-04-30T14:30:17-05:00] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
{ "index": {}}
{ "@timestamp": "2020-04-30T14:30:53-05:00", "message" : "232.0.0.0 - - [2020-04-30T14:30:53-05:00] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
{ "index": {}}
{ "@timestamp": "2020-04-30T14:31:12-05:00", "message" : "26.1.0.0 - - [2020-04-30T14:31:12-05:00] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
{ "index": {}}
{ "@timestamp": "2020-04-30T14:31:19-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:19-05:00] \"GET /french/splash_inet.html HTTP/1.0\" 200 3781"}
{ "index": {}}
{ "@timestamp": "2020-04-30T14:31:27-05:00", "message" : "252.0.0.0 - - [2020-04-30T14:31:27-05:00] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
{ "index": {}}
{ "@timestamp": "2020-04-30T14:31:29-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:29-05:00] \"GET /images/hm_brdl.gif HTTP/1.0\" 304 0"}
{ "index": {}}
{ "@timestamp": "2020-04-30T14:31:29-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:29-05:00] \"GET /images/hm_arw.gif HTTP/1.0\" 304 0"}
{ "index": {}}
{ "@timestamp": "2020-04-30T14:31:32-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:32-05:00] \"GET /images/nav_bg_top.gif HTTP/1.0\" 200 929"}
{ "index": {}}
{ "@timestamp": "2020-04-30T14:31:43-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:43-05:00] \"GET /french/images/nav_venue_off.gif HTTP/1.0\" 304 0"}

Search for the calculated day of week

The following request uses the search API to retrieve the day_of_week field that the original request defined as a runtime field in the mapping. The value for this field is calculated dynamically at query time without reindexing documents or indexing the day_of_week field. This flexibility allows you to modify the mapping without changing any field values.

GET my-index-000001/_search
{
  "fields": [
    "@timestamp",
    "day_of_week"
  ],
  "_source": false
}

The previous request returns the day_of_week field for all matching documents. We can define another runtime field called client_ip that also operates on the message field and will further refine the query:

PUT /my-index-000001/_mapping
{
  "runtime": {
    "client_ip": {
      "type": "ip",
      "script" : {
      "source" : "String m = doc[\"message\"].value; int end = m.indexOf(\" \"); emit(m.substring(0, end));"
      }
    }
  }
}

Run another query, but search for a specific IP address using the client_ip runtime field:

GET my-index-000001/_search
{
  "size": 1,
  "query": {
    "match": {
      "client_ip": "211.11.9.0"
    }
  },
  "fields" : ["*"]
}

This time, the response includes only two hits. The value for day_of_week (Sunday) was calculated at query time using the runtime script defined in the mapping, and the result includes only documents matching the 211.11.9.0 IP address.

{
  ...
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_id" : "oWs5KXYB-XyJbifr9mrz",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2020-06-21T15:00:01-05:00",
          "message" : "211.11.9.0 - - [2020-06-21T15:00:01-05:00] \"GET /english/index.html HTTP/1.0\" 304 0"
        },
        "fields" : {
          "@timestamp" : [
            "2020-06-21T20:00:01.000Z"
          ],
          "client_ip" : [
            "211.11.9.0"
          ],
          "message" : [
            "211.11.9.0 - - [2020-06-21T15:00:01-05:00] \"GET /english/index.html HTTP/1.0\" 304 0"
          ],
          "day_of_week" : [
            "Sunday"
          ]
        }
      }
    ]
  }
}

Retrieve fields from related indices

experimental[]

The fields parameter on the _search API can also be used to retrieve fields from the related indices via runtime fields with a type of lookup.

Note
Fields that are retrieved by runtime fields of type lookup can be used to enrich the hits in a search response. It’s not possible to query or aggregate on these fields.
POST ip_location/_doc?refresh
{
  "ip": "192.168.1.1",
  "country": "Canada",
  "city": "Montreal"
}

PUT logs/_doc/1?refresh
{
  "host": "192.168.1.1",
  "message": "the first message"
}

PUT logs/_doc/2?refresh
{
  "host": "192.168.1.2",
  "message": "the second message"
}

POST logs/_search
{
  "runtime_mappings": {
    "location": {
        "type": "lookup", (1)
        "target_index": "ip_location", (2)
        "input_field": "host", (3)
        "target_field": "ip", (4)
        "fetch_fields": ["country", "city"] (5)
    }
  },
  "fields": [
    "host",
    "message",
    "location"
  ],
  "_source": false
}
  1. Define a runtime field in the main search request with a type of lookup that retrieves fields from the target index using the term queries.

  2. The target index where the lookup query executes against

  3. A field on the main index whose values are used as the input values of the lookup term query

  4. A field on the lookup index which the lookup query searches against

  5. A list of fields to retrieve from the lookup index. See the fields parameter of a search request.

The above search returns the country and city from the ip_location index for each ip address of the returned search hits.

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "logs",
        "_id": "1",
        "_score": 1.0,
        "fields": {
          "host": [ "192.168.1.1" ],
          "location": [
            {
              "city": [ "Montreal" ],
              "country": [ "Canada" ]
            }
          ],
          "message": [ "the first message" ]
        }
      },
      {
        "_index": "logs",
        "_id": "2",
        "_score": 1.0,
        "fields": {
          "host": [ "192.168.1.2" ],
          "message": [ "the second message" ]
        }
      }
    ]
  }
}

The response of lookup fields are grouped to maintain the independence of each document from the lookup index. The lookup query for each input value is expected to match at most one document on the lookup index. If the lookup query matches more than one documents, then a random document will be selected.

Index a runtime field

Runtime fields are defined by the context where they run. For example, you can define runtime fields in the context of a search query or within the runtime section of an index mapping. If you decide to index a runtime field for greater performance, just move the full runtime field definition (including the script) to the context of an index mapping. {es} automatically uses these indexed fields to drive queries, resulting in a fast response time. This capability means you can write a script only once, and apply it to any context that supports runtime fields.

Note
Indexing a composite runtime field is currently not supported.

You can then use runtime fields to limit the number of fields that {es} needs to calculate values for. Using indexed fields in tandem with runtime fields provides flexibility in the data that you index and how you define queries for other fields.

Important
After indexing a runtime field, you cannot update the included script. If you need to change the script, create a new field with the updated script.

For example, let’s say your company wants to replace some old pressure valves. The connected sensors are only capable of reporting a fraction of the true readings. Rather than outfit the pressure valves with new sensors, you decide to calculate the values based on reported readings. Based on the reported data, you define the following fields in your mapping for my-index-000001:

PUT my-index-000001/
{
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "date"
      },
      "temperature": {
        "type": "long"
      },
      "voltage": {
        "type": "double"
      },
      "node": {
        "type": "keyword"
      }
    }
  }
}

You then bulk index some sample data from your sensors. This data includes voltage readings for each sensor:

POST my-index-000001/_bulk?refresh=true
{"index":{}}
{"timestamp": 1516729294000, "temperature": 200, "voltage": 5.2, "node": "a"}
{"index":{}}
{"timestamp": 1516642894000, "temperature": 201, "voltage": 5.8, "node": "b"}
{"index":{}}
{"timestamp": 1516556494000, "temperature": 202, "voltage": 5.1, "node": "a"}
{"index":{}}
{"timestamp": 1516470094000, "temperature": 198, "voltage": 5.6, "node": "b"}
{"index":{}}
{"timestamp": 1516383694000, "temperature": 200, "voltage": 4.2, "node": "c"}
{"index":{}}
{"timestamp": 1516297294000, "temperature": 202, "voltage": 4.0, "node": "c"}

After talking to a few site engineers, you realize that the sensors should be reporting at least double the current values, but potentially higher. You create a runtime field named voltage_corrected that retrieves the current voltage and multiplies it by 2:

PUT my-index-000001/_mapping
{
  "runtime": {
    "voltage_corrected": {
      "type": "double",
      "script": {
        "source": """
        emit(doc['voltage'].value * params['multiplier'])
        """,
        "params": {
          "multiplier": 2
        }
      }
    }
  }
}

You retrieve the calculated values using the fields parameter on the _search API:

GET my-index-000001/_search
{
  "fields": [
    "voltage_corrected",
    "node"
  ],
  "size": 2
}

After reviewing the sensor data and running some tests, you determine that the multiplier for reported sensor data should be 4. To gain greater performance, you decide to index the voltage_corrected runtime field with the new multiplier parameter.

In a new index named my-index-000001, copy the voltage_corrected runtime field definition into the mappings of the new index. It’s that simple! You can add an optional parameter named on_script_error that determines whether to reject the entire document if the script throws an error at index time (default).

PUT my-index-000001/
{
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "date"
      },
      "temperature": {
        "type": "long"
      },
      "voltage": {
        "type": "double"
      },
      "node": {
        "type": "keyword"
      },
      "voltage_corrected": {
        "type": "double",
        "on_script_error": "fail", (1)
        "script": {
          "source": """
        emit(doc['voltage'].value * params['multiplier'])
        """,
          "params": {
            "multiplier": 4
          }
        }
      }
    }
  }
}
  1. Causes the entire document to be rejected if the script throws an error at index time. Setting the value to ignore will register the field in the document’s _ignored metadata field and continue indexing.

Bulk index some sample data from your sensors into the my-index-000001 index:

POST my-index-000001/_bulk?refresh=true
{ "index": {}}
{ "timestamp": 1516729294000, "temperature": 200, "voltage": 5.2, "node": "a"}
{ "index": {}}
{ "timestamp": 1516642894000, "temperature": 201, "voltage": 5.8, "node": "b"}
{ "index": {}}
{ "timestamp": 1516556494000, "temperature": 202, "voltage": 5.1, "node": "a"}
{ "index": {}}
{ "timestamp": 1516470094000, "temperature": 198, "voltage": 5.6, "node": "b"}
{ "index": {}}
{ "timestamp": 1516383694000, "temperature": 200, "voltage": 4.2, "node": "c"}
{ "index": {}}
{ "timestamp": 1516297294000, "temperature": 202, "voltage": 4.0, "node": "c"}

You can now retrieve calculated values in a search query, and find documents based on precise values. The following range query returns all documents where the calculated voltage_corrected is greater than or equal to 16, but less than or equal to 20. Again, use the fields parameter on the _search API to retrieve the fields you want:

POST my-index-000001/_search
{
  "query": {
    "range": {
      "voltage_corrected": {
        "gte": 16,
        "lte": 20,
        "boost": 1.0
      }
    }
  },
  "fields": ["voltage_corrected", "node"]
}

The response includes the voltage_corrected field for the documents that match the range query, based on the calculated value of the included script:

{
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_id" : "yoSLrHgBdg9xpPrUZz_P",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : 1516383694000,
          "temperature" : 200,
          "voltage" : 4.2,
          "node" : "c"
        },
        "fields" : {
          "voltage_corrected" : [
            16.8
          ],
          "node" : [
            "c"
          ]
        }
      },
      {
        "_index" : "my-index-000001",
        "_id" : "y4SLrHgBdg9xpPrUZz_P",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : 1516297294000,
          "temperature" : 202,
          "voltage" : 4.0,
          "node" : "c"
        },
        "fields" : {
          "voltage_corrected" : [
            16.0
          ],
          "node" : [
            "c"
          ]
        }
      }
    ]
  }
}

Explore your data with runtime fields

Consider a large set of log data that you want to extract fields from. Indexing the data is time consuming and uses a lot of disk space, and you just want to explore the data structure without committing to a schema up front.

You know that your log data contains specific fields that you want to extract. In this case, we want to focus on the @timestamp and message fields. By using runtime fields, you can define scripts to calculate values at search time for these fields.

Define indexed fields as a starting point

You can start with a simple example by adding the @timestamp and message fields to the my-index-000001 mapping as indexed fields. To remain flexible, use wildcard as the field type for message:

PUT /my-index-000001/
{
  "mappings": {
    "properties": {
      "@timestamp": {
        "format": "strict_date_optional_time||epoch_second",
        "type": "date"
      },
      "message": {
        "type": "wildcard"
      }
    }
  }
}

Ingest some data

After mapping the fields you want to retrieve, index a few records from your log data into {es}. The following request uses the bulk API to index raw log data into my-index-000001. Instead of indexing all of your log data, you can use a small sample to experiment with runtime fields.

The final document is not a valid Apache log format, but we can account for that scenario in our script.

POST /my-index-000001/_bulk?refresh
{"index":{}}
{"timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
{"index":{}}
{"timestamp":"2020-04-30T14:30:53-05:00","message":"232.0.0.0 - - [30/Apr/2020:14:30:53 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
{"index":{}}
{"timestamp":"2020-04-30T14:31:12-05:00","message":"26.1.0.0 - - [30/Apr/2020:14:31:12 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
{"index":{}}
{"timestamp":"2020-04-30T14:31:19-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:19 -0500] \"GET /french/splash_inet.html HTTP/1.0\" 200 3781"}
{"index":{}}
{"timestamp":"2020-04-30T14:31:22-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"}
{"index":{}}
{"timestamp":"2020-04-30T14:31:27-05:00","message":"252.0.0.0 - - [30/Apr/2020:14:31:27 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
{"index":{}}
{"timestamp":"2020-04-30T14:31:28-05:00","message":"not a valid apache log"}

At this point, you can view how {es} stores your raw data.

GET /my-index-000001

The mapping contains two fields: @timestamp and message.

{
  "my-index-000001" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "@timestamp" : {
          "type" : "date",
          "format" : "strict_date_optional_time||epoch_second"
        },
        "message" : {
          "type" : "wildcard"
        },
        "timestamp" : {
          "type" : "date"
        }
      }
    },
    ...
  }
}

Define a runtime field with a grok pattern

If you want to retrieve results that include clientip, you can add that field as a runtime field in the mapping. The following runtime script defines a grok pattern that extracts structured fields out of a single text field within a document. A grok pattern is like a regular expression that supports aliased expressions that you can reuse.

The script matches on the %{COMMONAPACHELOG} log pattern, which understands the structure of Apache logs. If the pattern matches (clientip != null), the script emits the value of the matching IP address. If the pattern doesn’t match, the script just returns the field value without crashing.

PUT my-index-000001/_mappings
{
  "runtime": {
    "http.client_ip": {
      "type": "ip",
      "script": """
        String clientip=grok('%{COMMONAPACHELOG}').extract(doc["message"].value)?.clientip;
        if (clientip != null) emit(clientip); (1)
      """
    }
  }
}
  1. This condition ensures that the script doesn’t crash even if the pattern of the message doesn’t match.

Alternatively, you can define the same runtime field but in the context of a search request. The runtime definition and the script are exactly the same as the one defined previously in the index mapping. Just copy that definition into the search request under the runtime_mappings section and include a query that matches on the runtime field. This query returns the same results as if you defined a search query for the http.clientip runtime field in your index mappings, but only in the context of this specific search:

GET my-index-000001/_search
{
  "runtime_mappings": {
    "http.clientip": {
      "type": "ip",
      "script": """
        String clientip=grok('%{COMMONAPACHELOG}').extract(doc["message"].value)?.clientip;
        if (clientip != null) emit(clientip);
      """
    }
  },
  "query": {
    "match": {
      "http.clientip": "40.135.0.0"
    }
  },
  "fields" : ["http.clientip"]
}

Define a composite runtime field

You can also define a composite runtime field to emit multiple fields from a single script. You can define a set of typed subfields and emit a map of values. At search time, each subfield retrieves the value associated with their name in the map. This means that you only need to specify your grok pattern one time and can return multiple values:

PUT my-index-000001/_mappings
{
  "runtime": {
    "http": {
      "type": "composite",
      "script": "emit(grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message\"].value))",
      "fields": {
        "clientip": {
          "type": "ip"
        },
        "verb": {
          "type": "keyword"
        },
        "response": {
          "type": "long"
        }
      }
    }
  }
}
Search for a specific IP address

Using the http.clientip runtime field, you can define a simple query to run a search for a specific IP address and return all related fields.

GET my-index-000001/_search
{
  "query": {
    "match": {
      "http.clientip": "40.135.0.0"
    }
  },
  "fields" : ["*"]
}

The API returns the following result. Because http is a composite runtime field, the response includes each of the sub-fields under fields, including any associated values that match the query. Without building your data structure in advance, you can search and explore your data in meaningful ways to experiment and determine which fields to index.

{
  ...
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_id" : "sRVHBnwBB-qjgFni7h_O",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : "2020-04-30T14:30:17-05:00",
          "message" : "40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"
        },
        "fields" : {
          "http.verb" : [
            "GET"
          ],
          "http.clientip" : [
            "40.135.0.0"
          ],
          "http.response" : [
            200
          ],
          "message" : [
            "40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"
          ],
          "http.client_ip" : [
            "40.135.0.0"
          ],
          "timestamp" : [
            "2020-04-30T19:30:17.000Z"
          ]
        }
      }
    ]
  }
}

Also, remember that if statement in the script?

if (clientip != null) emit(clientip);

If the script didn’t include this condition, the query would fail on any shard that doesn’t match the pattern. By including this condition, the query skips data that doesn’t match the grok pattern.

Search for documents in a specific range

You can also run a range query that operates on the timestamp field. The following query returns any documents where the timestamp is greater than or equal to 2020-04-30T14:31:27-05:00:

GET my-index-000001/_search
{
  "query": {
    "range": {
      "timestamp": {
        "gte": "2020-04-30T14:31:27-05:00"
      }
    }
  }
}

The response includes the document where the log format doesn’t match, but the timestamp falls within the defined range.

{
  ...
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_id" : "hdEhyncBRSB6iD-PoBqe",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : "2020-04-30T14:31:27-05:00",
          "message" : "252.0.0.0 - - [30/Apr/2020:14:31:27 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"
        }
      },
      {
        "_index" : "my-index-000001",
        "_id" : "htEhyncBRSB6iD-PoBqe",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : "2020-04-30T14:31:28-05:00",
          "message" : "not a valid apache log"
        }
      }
    ]
  }
}

Define a runtime field with a dissect pattern

If you don’t need the power of regular expressions, you can use dissect patterns instead of grok patterns. Dissect patterns match on fixed delimiters but are typically faster than grok.

You can use dissect to achieve the same results as parsing the Apache logs with a grok pattern. Instead of matching on a log pattern, you include the parts of the string that you want to discard. Paying special attention to the parts of the string you want to discard will help build successful dissect patterns.

PUT my-index-000001/_mappings
{
  "runtime": {
    "http.client.ip": {
      "type": "ip",
      "script": """
        String clientip=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{status} %{size}').extract(doc["message"].value)?.clientip;
        if (clientip != null) emit(clientip);
      """
    }
  }
}

Similarly, you can define a dissect pattern to extract the HTTP response code:

PUT my-index-000001/_mappings
{
  "runtime": {
    "http.responses": {
      "type": "long",
      "script": """
        String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}').extract(doc["message"].value)?.response;
        if (response != null) emit(Integer.parseInt(response));
      """
    }
  }
}

You can then run a query to retrieve a specific HTTP response using the http.responses runtime field. Use the fields parameter of the _search request to indicate which fields you want to retrieve:

GET my-index-000001/_search
{
  "query": {
    "match": {
      "http.responses": "304"
    }
  },
  "fields" : ["http.client_ip","timestamp","http.verb"]
}

The response includes a single document where the HTTP response is 304:

{
  ...
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_id" : "A2qDy3cBWRMvVAuI7F8M",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : "2020-04-30T14:31:22-05:00",
          "message" : "247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"
        },
        "fields" : {
          "http.verb" : [
            "GET"
          ],
          "http.client_ip" : [
            "247.37.0.0"
          ],
          "timestamp" : [
            "2020-04-30T19:31:22.000Z"
          ]
        }
      }
    ]
  }
}