Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
347 lines (276 sloc) 16.3 KB
description
Vector configuration

Configuration

This section covers configuring Vector and creating pipelines like the one shown above. Vector requires only a single TOML configurable file, which you can specify via the --config flag when starting vector:

vector --config /etc/vector/vector.toml

Example

{% code-tabs %} {% code-tabs-item title="vector.toml" %}

# Set global options
data_dir = "/var/lib/vector"

# Ingest data by tailing one or more files
[sources.apache_logs]
  type         = "file"
  include      = ["/var/log/apache2/*.log"]    # supports globbing
  ignore_older = 86400                         # 1 day

# Structure and parse the data
[transforms.apache_parser]
  inputs       = ["apache_logs"]
  type         = "regex_parser"                # fast/powerful regex
  regex        = '^(?P<host>[w.]+) - (?P<user>[w]+) (?P<bytes_in>[d]+) [(?P<timestamp>.*)] "(?P<method>[w]+) (?P<path>.*)" (?P<status>[d]+) (?P<bytes_out>[d]+)$'

# Sample the data to save on cost
[transforms.apache_sampler]
  inputs       = ["apache_parser"]
  type         = "sampler"
  hash_field   = "request_id"                  # sample _entire_ requests
  rate         = 50                            # only keep 50%

# Send structured data to a short-term storage
[sinks.es_cluster]
  inputs       = ["apache_sampler"]            # only take sampled data
  type         = "elasticsearch"
  host         = "http://79.12.221.222:9200"   # local or external host
  index        = "vector-%Y-%m-%d"             # daily indices

# Send structured data to a cost-effective long-term storage
[sinks.s3_archives]
  inputs       = ["apache_parser"]             # don't sample for S3
  type         = "aws_s3"
  region       = "us-east-1"
  bucket       = "my-log-archives"
  key_prefix   = "date=%Y-%m-%d"               # daily partitions, hive friendly format
  batch_size   = 10000000                      # 10mb uncompressed
  gzip         = true                          # compress final objects
  encoding     = "ndjson"                      # new line delimited JSON

{% endcode-tabs-item %} {% endcode-tabs %}

Global Options

Key Type Description
OPTIONAL
data_dir string The directory used for persisting Vector state, such as on-disk buffers, file checkpoints, and more. Please make sure the Vector project has write permissions to this dir. See Data Directory for more info.
no default example: "/var/lib/vector"

Sources

Name Description
file Ingests data through one or more local files and outputs log events.
statsd Ingests data through the StatsD UDP protocol and outputs log events.
stdin Ingests data through standard input (STDIN) and outputs log events.
syslog Ingests data through the Syslog 5424 protocol and outputs log events.
tcp Ingests data through the TCP protocol and outputs log events.
udp Ingests data through the UDP protocol and outputs log events.
vector Ingests data through another upstream Vector instance and outputs log events.

+ request a new source

Transforms

Name Description
add_fields Accepts log events and allows you to add one or more fields.
coercer Accepts log events and allows you to coerce event fields into fixed types.
field_filter Accepts log and metric events and allows you to filter events by a field's value.
grok_parser Accepts log events and allows you to parse a field value with Grok.
json_parser Accepts log events and allows you to parse a field value as JSON.
log_to_metric Accepts log events and allows you to convert logs into one or more metrics.
lua Accepts log events and allows you to transform events with a full embedded Lua engine.
regex_parser Accepts log events and allows you to parse a field's value with a Regular Expression.
remove_fields Accepts log and metric events and allows you to remove one or more event fields.
sampler Accepts log events and allows you to sample events with a configurable rate.
tokenizer Accepts log events and allows you to tokenize a field's value by splitting on white space, ignoring special wrapping characters, and zipping the tokens into ordered field names.

+ request a new transform

Sinks

Name Description
aws_cloudwatch_logs Batches log events to AWS CloudWatch Logs via the PutLogEvents API endpoint.
aws_kinesis_streams Batches log events to AWS Kinesis Data Stream via the PutRecords API endpoint.
aws_s3 Batches log events to AWS S3 via the PutObject API endpoint.
blackhole Streams log and metric events to a blackhole that simply discards data, designed for testing and benchmarking purposes.
clickhouse Batches log events to Clickhouse via the HTTP Interface.
console Streams log and metric events to the console, STDOUT or STDERR.
elasticsearch Batches log events to Elasticsearch via the _bulk API endpoint.
http Batches log events to a generic HTTP endpoint.
kafka Streams log events to Apache Kafka via the Kafka protocol.
prometheus Exposes metric events to Prometheus metrics service.
splunk_hec Batches log events to a Splunk HTTP Event Collector.
tcp Streams log events to a TCP connection.
vector Streams log events to another downstream Vector instance.

+ request a new sink

How It Works

Composition

The primary purpose of the configuration file is to compose pipelines. Pipelines are formed by connecting sources, transforms, and sinks through the inputs option.

Notice in the above example each input references the id assigned to a previous source or transform.

Config File Location

The location of your Vector configuration file depends on your platform or operating system. For most Linux based systems the file can be found at /etc/vector/vector.toml.

Data Directory

Vector requires a data_dir value for on-disk operations. Currently, the only operation using this directory are Vector's on-disk buffers. Buffers, by default, are memory-based, but if you switch them to disk-based you'll need to specify a data_dir.

Environment Variables

Vector will interpolate environment variables within your configuration file with the following syntax:

{% code-tabs %} {% code-tabs-item title="vector.toml" %}

[transforms.add_host]
    type = "add_fields"
    
    [transforms.add_host.fields]
        host = "${HOSTNAME}"

{% endcode-tabs-item %} {% endcode-tabs %}

The entire ${HOSTNAME} variable will be replaced, hence the requirement of quotes around the definition.

Escaping

You can escape environment variable by preceding them with a $ character. For example $${HOSTNAME} will be treated literally in the above environment variable example.

Format

The Vector configuration file requires the TOML format for it's simplicity, explicitness, and relaxed white-space parsing. For more information, please refer to the excellent TOML documentation.

Template Syntax

Select configuration options support Vector's template syntax to produce dynamic values derived from the event's data. There are 2 special syntaxes:

  1. Strftime specifiers. Ex: date=%Y/%m/%d
  2. Event fields. Ex: {{ field_name }}

Each are described in more detail below.

Strftime specifiers

For simplicity, Vector allows you to supply strftime
specifiers
diredctly as part of the value to produce formatted timestamp values based off of the event's timestamp field.

For example, given the following log event:

LogEvent {
    "timestamp": chrono::DateTime<2019-05-02T00:23:22Z>,
    "message": "message"
    "host": "my.host.com"
}

And the following configuration:

{% code-tabs %} {% code-tabs-item title="vector.toml" %}

[sinks.my_s3_sink_id]
  type = "aws_s3"
  key_prefix = "date=%Y-%m-%d"

{% endcode-tabs-item %} {% endcode-tabs %}

Vector would produce the following value for the key_prefix field:

date=2019-05-02

This effectively enables time partitioning for the aws_s3 sink.

Event fields

In addition to formatting the timestamp field, Vector allows you to directly access event fields with the {{ <field-name> }} syntax.

For example, given the following log event:

LogEvent {
    "timestamp": chrono::DateTime<2019-05-02T00:23:22Z>,
    "message": "message"
    "application_id":  1
}

And the following configuration:

{% code-tabs %} {% code-tabs-item title="vector.toml" %}

[sinks.my_s3_sink_id]
  type = "aws_s3"
  key_prefix = "application_id={{ application_id }}/date=%Y-%m-%d"

{% endcode-tabs-item %} {% endcode-tabs %}

Vector would produce the following value for the key_prefix field:

application_id=1/date=2019-05-02

This effectively enables application specific time partitioning.

Value Types

All TOML values types are supported. For convenience this includes:

You can’t perform that action at this time.