description |
---|
Deploying and running Vector as a service |
When Vector serves as a service, its purpose is to efficiently receive, aggregate, and route data downstream. In this scenario, Vector is the primary service on the host and should take full advantage of all resources.
When Vector is deployed as a service it receives data over the network from
upstream clients or services. Relevant sources include the
vector
, syslog
, and
tcp
sources.
Vector is designed, by default, to take full advantage of all system
resources, which is usually preferred in the service role.
As a result, there is nothing special you need to do to improve performance.
To ensure Vector does not lose data between restarts you'll need to switch
the buffer to use the disk for all relevant sinks. This can be accomplished
by adding a simple [buffer]
table to each of your configured sinks. In
addition, we recommend specifying an explicit data_dir
for Vector's buffer.
For example:
{% code-tabs %} {% code-tabs-item title="vector.toml" %}
data_dir = "/var/lib/vector"
[sinks.backups]
type = "s3"
# ...
[sinks.backups.buffer]
type = "disk"
max_size = 5000000000 # 5gb
{% endcode-tabs-item %} {% endcode-tabs %}
{% hint style="warning" %}
Please make sure that the Vector user has write access to the specified
data_dir
.
{% endhint %}
Please note that there is a performance hit to enabling on-disk buffers of about 3X. We believe this to be a worthwhile tradeoff to ensure data is not lost across restarts.
By default Vector is tuned for performance, there are no extra system level configuration steps necessary to improve performance.
The hardware needed is highly dependent on your configuration and data volume. Typically, Vector is CPU bound and not memory bound, especially if all buffers are configured to use the disk. Our benchmarks should give you a general idea of resource usage in relation to specific pipelines and data volume.
Vector benefits greatly from parallel processing, the more cores the better.
For example, if you're on AWS, the c5d.*
instances will give you the most
bang for your buck given their optimization towards CPU and the fact that
they include a fast NVME drive for on-disk buffers.
If you've configured on-disk buffers,
then memory should not be your bottleneck. If you opted to keep buffers
in-memory, then you'll want to make sure you have at least 2X your cumulative
buffer size. For example, if you have an elasticsearch
and s3
sink
configured to use 100mb and 1gb, then you should ensure you have at least
2.2gb (1.1 * 2) of memory available.
If you've configured on-disk buffers, then we recommend using local NVMe SSD
drives when possible. This will ensure disk IO does not become your bottleneck.
For example, if you're on AWS you'll want to choose an instance that includes a
local NVME SSD drive, such as the c5d.*
instances. The size of the disk should
be at least 3 times your cumulative buffer size.
TODO: make this better
If you've configured Vector to receive data over the network then you'll
benefit from load balancing. Select sinks offer built-in load balancing,
such as the http
, tcp
, and
vector
sinks. This is a very rudimentary form of load
balancing that requires all clients to know about the available downstream
hosts. A more formal load balancing strategy is outside of the scope of this
document, but is typically achieved by services such as
AWS' ELB, Haproxy, Nginx, and more.
Vector can be reloaded to apply configuration changes. This is the recommended strategy and should be used over restarting when possible.
To update Vector you'll need to restart the process. Like any service, restarting without disruption is achieved by higher level design decisions, such as load balancing.