Skip to content

Commit c31dc9a

Browse files
committed
admin: backpressure: updating for style and consistency
Signed-off-by: Lynette Miles <lynette.miles@chronosphere.io>
1 parent 39f0d28 commit c31dc9a

File tree

1 file changed

+110
-59
lines changed

1 file changed

+110
-59
lines changed

administration/backpressure.md

Lines changed: 110 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -2,68 +2,119 @@
22

33
<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=63e37cfe-9ce3-4a18-933a-76b9198958c1" />
44

5-
Under certain scenarios it is possible for logs or data to be ingested or created faster than the ability to flush it to some destinations. One such common scenario is when reading from big log files, especially with a large backlog, and dispatching the logs to a backend over the network, which takes time to respond. This generates backpressure leading to high memory consumption in the service.
6-
7-
In order to avoid backpressure, Fluent Bit implements a mechanism in the engine that restricts the amount of data that an input plugin can ingest, this is done through the configuration parameters **Mem\_Buf\_Limit** and **storage.Max\_Chunks\_Up**.
8-
9-
As described in the [Buffering](../concepts/buffering.md) concepts section, Fluent Bit offers two modes for data handling: in-memory only (default) and in-memory + filesystem \(optional\).
10-
11-
The default `storage.type memory` buffer can be restricted with **Mem\_Buf\_Limit**. If memory reaches this limit and you reach a backpressure scenario, you will not be able to ingest more data until the data chunks that are in memory can be flushed. The input will be paused and Fluent Bit will [emit](https://github.com/fluent/fluent-bit/blob/v2.0.0/src/flb_input_chunk.c#L1334) a `[warn] [input] {input name or alias} paused (mem buf overlimit)` log message. Depending on the input plugin in use, this might lead to discard incoming data \(e.g: TCP input plugin\). The tail plugin can handle pause without data loss; it will store its current file offset and resume reading later. When buffer memory is available, the input will resume collecting/accepting logs and Fluent Bit will [emit](https://github.com/fluent/fluent-bit/blob/v2.0.0/src/flb_input_chunk.c#L1277) a `[info] [input] {input name or alias} resume (mem buf overlimit)` message.
12-
13-
This risk of data loss can be mitigated by configuring secondary storage on the filesystem using the `storage.type` of `filesystem` \(as described in [Buffering & Storage](buffering-and-storage.md)\). Initially, logs will be buffered to *both* memory and filesystem. When the `storage.max_chunks_up` limit is reached, all the new data will be stored safely only in the filesystem. Fluent Bit will stop enqueueing new data in memory and will only buffer to the filesystem. Please note that when `storage.type filesystem` is set, the `Mem_Buf_Limit` setting no longer has any effect, instead, the `[SERVICE]` level `storage.max_chunks_up` setting controls the size of the memory buffer.
14-
15-
## Mem\_Buf\_Limit
16-
17-
This option is disabled by default and can be applied to all input plugins. Please note that `Mem_Buf_Limit` only applies with the default `storage.type memory`. Let's explain its behavior using the following scenario:
18-
19-
* Mem\_Buf\_Limit is set to 1MB \(one megabyte\)
20-
* input plugin tries to append 700KB
21-
* engine route the data to an output plugin
22-
* output plugin backend \(HTTP Server\) is down
23-
* engine scheduler will retry the flush after 10 seconds
24-
* input plugin tries to append 500KB
25-
26-
At this exact point, the engine will **allow** appending those 500KB of data into the memory; in total it will have 1.2MB of data buffered. The limit is permissive and will allow a single write past the limit, but once the limit is **exceeded** the following actions are taken:
27-
28-
* block local buffers for the input plugin \(cannot append more data\)
29-
* notify the input plugin invoking a **pause** callback
30-
31-
The engine will protect itself and will not append more data coming from the input plugin in question; note that it is the responsibility of the plugin to keep state and decide what to do in that _paused_ state.
32-
33-
After some time, usually measured in seconds, if the scheduler was able to flush the initial 700KB of data or it has given up after retrying, that amount of memory is released and the following actions will occur:
34-
35-
* Upon data buffer release \(700KB\), the internal counters get updated
36-
* Counters now are set at 500KB
37-
* Since 500KB is &lt; 1MB it checks the input plugin state
38-
* If the plugin is paused, it invokes a **resume** callback
39-
* input plugin can continue appending more data
40-
41-
## storage.max\_chunks\_up
42-
43-
Please note that when `storage.type filesystem` is set, the `Mem_Buf_Limit` setting no longer has any effect, instead, the `[SERVICE]` level `storage.max_chunks_up` setting controls the size of the memory buffer.
44-
45-
The setting behaves similarly to the above scenario with `Mem_Buf_Limit` when the non-default `storage.pause_on_chunks_overlimit` is enabled.
46-
47-
When (default) `storage.pause_on_chunks_overlimit` is disabled, the input will not pause when the memory limit is reached. Instead, it will switch to only buffering logs in the filesystem. The disk spaced used for filesystem buffering can be limited with `storage.total_limit_size`.
48-
49-
See the [Buffering & Storage](buffering-and-storage.md) docs for more information.
50-
51-
## About pause and resume Callbacks
52-
53-
Each plugin is independent and not all of them implements the **pause** and **resume** callbacks. As said, these callbacks are just a notification mechanism for the plugin.
54-
55-
One example of a plugin that implements these callbacks and keeps state correctly is the [Tail Input](../pipeline/inputs/tail.md) plugin. When the **pause** callback is triggered, it pauses its collectors and stops appending data. Upon **resume**, it resumes the collectors and continues ingesting data. Tail will track the current file offset when it pauses and resume at the same position. If the file has not been deleted or moved, it can still be read.
56-
57-
With the default `storage.type memory` and `Mem_Buf_Limit`, the following log messages will be emitted for pause and resume:
58-
59-
```
5+
It's possible for logs or data to be ingested or created faster than the ability to
6+
flush it to some destinations. A common scenario is when reading from big log files,
7+
especially with a large backlog, and dispatching the logs to a backend over the
8+
network, which takes time to respond. This generates backpressure leading to high
9+
memory consumption in the service.
10+
11+
To avoid backpressure, Fluent Bit implements a mechanism in the engine that restricts
12+
the amount of data an input plugin can ingest. Restriction is done through the
13+
configuration parameters `Mem_Buf_Limit` and `storage.Max_Chunks_Up`.
14+
15+
As described in the [Buffering](../concepts/buffering.md) concepts section, Fluent
16+
Bit offers two modes for data handling: in-memory only (default) and in-memory +
17+
filesystem (optional).
18+
19+
The default `storage.type memory` buffer can be restricted with `Mem_Buf_Limit`. If
20+
memory reaches this limit and you reach a backpressure scenario, you won't be able
21+
to ingest more data until the data chunks that are in memory can be flushed. The
22+
input pauses and Fluent Bit
23+
[emits](https://github.com/fluent/fluent-bit/blob/v2.0.0/src/flb_input_chunk.c#L1334)
24+
a `[warn] [input] {input name or alias} paused (mem buf overlimit)` log message.
25+
26+
Depending on the input plugin in use, this might lead to discard incoming data (for
27+
example, TCP input plugin). The tail plugin can handle pause without data loss; it
28+
stores its current file offset and resumes reading later. When buffer memory is
29+
available, the input resumes accepting logs. Fluent Bit
30+
[emits](https://github.com/fluent/fluent-bit/blob/v2.0.0/src/flb_input_chunk.c#L1277)
31+
a `[info] [input] {input name or alias} resume (mem buf overlimit)` message.
32+
33+
Mitigate the risk of data loss by configuring secondary storage on the filesystem
34+
using the `storage.type` of `filesystem` (as described in [Buffering &
35+
Storage](buffering-and-storage.md)). Initially, logs will be buffered to both memory
36+
and the filesystem. When the `storage.max_chunks_up` limit is reached, all new data
37+
will be stored safely in the filesystem. Fluent Bit stops queueing new data in memory
38+
and will only buffers to the filesystem. When `storage.type filesystem` is set, the
39+
`Mem_Buf_Limit` setting no longer has any effect. Instead, the `[SERVICE]` level
40+
`storage.max_chunks_up` setting controls the size of the memory buffer.
41+
42+
## `Mem_Buf_Limit`
43+
44+
`Mem_Buf_Limit` only applies with the default `storage.type memory`. This option is
45+
disabled by default and can be applied to all input plugins.
46+
47+
As an example situation:
48+
49+
- `Mem_Buf_Limit` is set to `1MB`.
50+
- The input plugin tries to append 700KB.
51+
- The engine routes the data to an output plugin.
52+
- The output plugin backend (HTTP Server) is down.
53+
- Engine scheduler retries the flush after 10 seconds.
54+
- The input plugin tries to append 500KB.
55+
56+
In this situation, the engine allows appending those 500KB of data into the memory,
57+
with a total of 1.2 MB of data buffered. The limit is permissive and will
58+
allow a single write past the limit. Once the limit is exceeded, the following
59+
actions are taken:
60+
61+
- Block local buffers for the input plugin (can't append more data).
62+
- Notify the input plugin, invoking a `pause` callback.
63+
64+
The engine protects itself and won't append more data coming from the input plugin in
65+
question. It's the responsibility of the plugin to keep state and decide what to do
66+
in a `paused` state.
67+
68+
In a few seconds, if the scheduler was able to flush the initial 700KB of data or it
69+
has given up after retrying, that amount of memory is released and the following
70+
actions occur:
71+
72+
- Upon data buffer release (700KB), the internal counters get updated.
73+
- Counters now are set at 500KB.
74+
- Since 500KB is &lt; 1 MB it checks the input plugin state.
75+
- If the plugin is paused, it invokes a `resume` callback.
76+
- The input plugin can continue appending more data.
77+
78+
## `storage.max_chunks_up`
79+
80+
The `[SERVICE]` level `storage.max_chunks_up` setting controls the size of the memory
81+
buffer. When `storage.type filesystem` is set, the `Mem_Buf_Limit` setting no longer
82+
has an effect.
83+
84+
The setting behaves similar to the `Mem_Buf_Limit` scenario when the non-default
85+
`storage.pause_on_chunks_overlimit` is enabled.
86+
87+
When (default) `storage.pause_on_chunks_overlimit` is disabled, the input won't pause
88+
when the memory limit is reached. Instead, it switches to only buffering logs in
89+
the filesystem. Limit the disk spaced used for filesystem buffering with
90+
`storage.total_limit_size`.
91+
92+
See [Buffering & Storage](buffering-and-storage.md) docs for more information.
93+
94+
## About pause and resume callbacks
95+
96+
Each plugin is independent and not all of them implement `pause` and `resume`
97+
callbacks. These callbacks are a notification mechanism for the plugin.
98+
99+
One example of a plugin that implements these callbacks and keeps state correctly is
100+
the [Tail Input](../pipeline/inputs/tail.md) plugin. When the `pause` callback
101+
triggers, it pauses its collectors and stops appending data. Upon `resume`, it
102+
resumes the collectors and continues ingesting data. Tail tracks the current file
103+
offset when it pauses, and resumes at the same position. If the file hasn't been
104+
deleted or moved, it can still be read.
105+
106+
With the default `storage.type memory` and `Mem_Buf_Limit`, the following log
107+
messages emit for `pause` and `resume`:
108+
109+
```text
60110
[warn] [input] {input name or alias} paused (mem buf overlimit)
61111
[info] [input] {input name or alias} resume (mem buf overlimit)
62112
```
63113

64-
With `storage.type filesystem` and `storage.max_chunks_up`, the following log messages will be emitted for pause and resume:
114+
With `storage.type filesystem` and `storage.max_chunks_up`, the following log
115+
messages emit for `pause` and `resume`:
65116

66-
```
67-
[input] {input name or alias} paused (storage buf overlimit
68-
[input] {input name or alias} resume (storage buf overlimit
117+
```text
118+
[input] {input name or alias} paused (storage buf overlimit)
119+
[input] {input name or alias} resume (storage buf overlimit)
69120
```

0 commit comments

Comments
 (0)