Skip to content

Commit b78cfe9

Browse files
committed
administration: migrate pending content
Signed-off-by: Eduardo Silva <eduardo@treasure-data.com>
1 parent 75f37fc commit b78cfe9

File tree

6 files changed

+445
-4
lines changed

6 files changed

+445
-4
lines changed

administration/backpressure.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,38 @@
11
# Backpressure
22

3+
In certain environments is common to see that logs or data being ingested is faster than the ability to flush it to some destinations. The common case is reading from big log files and dispatching the logs to a backend over the network which takes some time to respond, this generate backpressure leading to a high memory consumption in the service.
4+
5+
In order to avoid backpressure, Fluent Bit implements a mechanism in the engine that restrict the amount of data than an input plugin can ingest, this is done through the configuration parameter **Mem\_Buf\_Limit**.
6+
7+
## Mem\_Buf\_Limit
8+
9+
This option is disabled by default and can be applied to all input plugins. Let's explain it behavior using the following scenario:
10+
11+
* Mem\_Buf\_Limit is set to 1MB \(one megabyte\)
12+
* input plugin tries to append 700KB
13+
* engine route the data to an output plugin
14+
* output plugin backend \(HTTP Server\) is down
15+
* engine scheduler will retry the flush after 10 seconds
16+
* input plugin tries to append 500KB
17+
18+
At this exact point, the engine will **allow** to append those 500KB of data into the engine: in total we have 1.2MB. The options works in a permissive mode before to reach the limit, but the limit is **exceeded** the following actions are taken:
19+
20+
* block local buffers for the input plugin \(cannot append more data\)
21+
* notify the input plugin invoking a **pause** callback
22+
23+
The engine will protect it self and will not append more data coming from the input plugin in question; Note that is the plugin responsibility to keep their state and take some decisions about what to do on that _paused_ state.
24+
25+
After some seconds if the scheduler was able to flush the initial 700KB of data or it gave up after retrying, that amount memory is released and internally the following actions happens:
26+
27+
* Upon data buffer release \(700KB\), the internal counters get updated
28+
* Counters now are set at 500KB
29+
* Since 500KB is &lt; 1MB it checks the input plugin state
30+
* If the plugin is paused, it invokes a **resume** callback
31+
* input plugin can continue appending more data
32+
33+
## About pause and resume Callbacks
34+
35+
Each plugin is independent and not all of them implements the **pause** and **resume** callbacks. As said, these callbacks are just a notification mechanism for the plugin.
36+
37+
The plugin who implements and keep a good state is the [Tail Input](../input/tail.md) plugin. When the **pause** callback is triggered, it stop their collectors and stop appending data. Upon **resume**, it re-enable the collectors.
38+
Lines changed: 67 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,68 @@
1-
# Buffering & Storage
1+
# Fluent Bit and Buffering
2+
3+
The end-goal of [Fluent Bit](https://fluentbit.io) is to collect, parse, filter and ship logs to a central place. In this workflow there are many phases and one of the critical pieces is the ability to do _buffering_ : a mechanism to place processed data into a temporal location until is ready to be shipped.
4+
5+
By default when Fluent Bit process data, it uses Memory as a primary and temporal place to store the record logs, but there are certain scenarios where would be ideal to have a persistent buffering mechanism based in the filesystem to provide aggregation and data safety capabilities.
6+
7+
Starting with Fluent Bit v1.0, we introduced a new _storage layer_ that can either work in memory or in the file system. Input plugins can be configured to use one or the other upon demand at start time.
8+
9+
## Configuration
10+
11+
The storage layer configuration takes place in two areas:
12+
13+
- Service Section
14+
- Input Section
15+
16+
The known Service section configure a global environment for the storage layer, and then in the Input sections defines which mechanism to use.
17+
18+
### Service Section Configuration
19+
20+
| Key | Description | Default |
21+
| ------------------------- | ------------------------------------------------------------ | ------- |
22+
| storage.path | Set an optional location in the file system to store streams and chunks of data. If this parameter is not set, Input plugins can only use in-memory buffering. | |
23+
| storage.sync | Configure the synchronization mode used to store the data into the file system. It can take the values _normal_ or _full_. | normal |
24+
| storage.checksum | Enable the data integrity check when writing and reading data from the filesystem. The storage layer uses the CRC32 algorithm. | Off |
25+
| storage.backlog.mem_limit | If _storage.path_ is set, Fluent Bit will look for data chunks that were not delivered and are still in the storage layer, these are called _backlog_ data. This option configure a hint of maximum value of memory to use when processing these records. | 5M |
26+
27+
a Service section will look like this:
28+
29+
```
30+
[SERVICE]
31+
flush 1
32+
log_Level info
33+
storage.path /var/log/flb-storage/
34+
storage.sync normal
35+
storage.checksum off
36+
storage.backlog.mem_limit 5M
37+
```
38+
39+
that configuration configure an optional buffering mechanism where it root for data is _/var/log/flb-storage/_, it will use _normal_ synchronization mode, without checksum and up to a maximum of 5MB of memory when processing backlog data.
40+
41+
### Input Section Configuration
42+
43+
Optionally, any Input plugin can configure their storage preference, the following table describe the options available:
44+
45+
| Key | Description | Default |
46+
| ------------ | ------------------------------------------------------------ | ------- |
47+
| storage.type | Specify the buffering mechanism to use. It can be _memory_ or _filesystem_. | memory |
48+
49+
The following example configure a service that offers filesystem buffering capabilities and two Input plugins being the first based in memory and the second with the filesystem:
50+
51+
```
52+
[SERVICE]
53+
flush 1
54+
log_Level info
55+
storage.path /var/log/flb-storage/
56+
storage.sync normal
57+
storage.checksum off
58+
storage.backlog.mem_limit 5M
59+
60+
[INPUT]
61+
name cpu
62+
storage.type filesystem
63+
64+
[INPUT]
65+
name mem
66+
storage.type memory
67+
```
268

administration/memory-management.md

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,37 @@
1-
# Memory Management
1+
# Memory Usage
2+
3+
In certain scenarios would be ideal to estimate how much memory Fluent Bit could be using, this is very useful for containerized environments where memory limits are a must.
4+
5+
In order to estimate we will assume that the input plugins have set the **Mem\_Buf\_Limit** option \(you can learn more about it in the [Backpressure](backpressure.md) section\).
6+
7+
## Estimating
8+
9+
Input plugins append data independently, so in order to do an estimation a limit should be imposed through the **Mem\_Buf\_Limit** option. If the limit was set to _10MB_ we need to estimate that in the worse case, the output plugin likely could use _20MB_.
10+
11+
Fluent Bit has an internal binary representation for the data being processed, but when this data reach an output plugin, this one will likely create their own representation in a new memory buffer for processing. The best example are the [InfluxDB](../output/influxdb.md) and [Elasticsearch](../output/elasticsearch.md) output plugins, both needs to convert the binary representation to their respective-custom JSON formats before to talk to their backend servers.
12+
13+
So, if we impose a limit of _10MB_ for the input plugins and considering the worse case scenario of the output plugin consuming _20MB_ extra, as a minimum we need \(_30MB_ x 1.2\) = **36MB**.
14+
15+
## Glibc and Memory Fragmentation
16+
17+
Is well known that in intensive environments where memory allocations happens in the order of magnitude, the default memory allocator provided by Glibc could lead to a high fragmentation, reporting a high memory usage by the service.
18+
19+
It's strongly suggested that in any production environment, Fluent Bit should be built with [jemalloc](http://jemalloc.net/) enabled \(e.g. `-DFLB_JEMALLOC=On`\). Jemalloc is an alternative memory allocator that can reduce fragmentation \(among others things\) resulting in better performance.
20+
21+
You can check if Fluent Bit has been built with Jemalloc using the following command:
22+
23+
```text
24+
$ bin/fluent-bit -h|grep JEMALLOC
25+
```
26+
27+
The output should looks like:
28+
29+
```text
30+
Build Flags = JSMN_PARENT_LINKS JSMN_STRICT FLB_HAVE_TLS FLB_HAVE_SQLDB
31+
FLB_HAVE_TRACE FLB_HAVE_FLUSH_LIBCO FLB_HAVE_VALGRIND FLB_HAVE_FORK
32+
FLB_HAVE_PROXY_GO FLB_HAVE_JEMALLOC JEMALLOC_MANGLE FLB_HAVE_REGEX
33+
FLB_HAVE_C_TLS FLB_HAVE_SETJMP FLB_HAVE_ACCEPT4 FLB_HAVE_INOTIFY
34+
```
35+
36+
If the FLB\_HAVE\_JEMALLOC option is listed in _Build Flags_, everything will be fine.
237

administration/monitoring.md

Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,191 @@
11
# Monitoring
22

3+
Fluent Bit comes with a built-in HTTP Server that can be used to query internal information and monitor metrics of each running plugin.
4+
5+
## Getting Started {#getting_started}
6+
7+
To get started, the first step is to enable the HTTP Server from the configuration file:
8+
9+
```text
10+
[SERVICE]
11+
HTTP_Server On
12+
HTTP_Listen 0.0.0.0
13+
HTTP_PORT 2020
14+
15+
[INPUT]
16+
Name cpu
17+
18+
[OUTPUT]
19+
Name stdout
20+
Match *
21+
```
22+
23+
the above configuration snippet will instruct Fluent Bit to start it HTTP Server on TCP Port 2020 and listening on all network interfaces:
24+
25+
```text
26+
$ bin/fluent-bit -c fluent-bit.conf
27+
Fluent-Bit v0.14.x
28+
Copyright (C) Treasure Data
29+
30+
[2017/10/27 19:08:24] [ info] [engine] started
31+
[2017/10/27 19:08:24] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
32+
```
33+
34+
now with a simple **curl** command is enough to gather some information:
35+
36+
```text
37+
$ curl -s http://127.0.0.1:2020 | jq
38+
{
39+
"fluent-bit": {
40+
"version": "0.13.0",
41+
"edition": "Community",
42+
"flags": [
43+
"FLB_HAVE_TLS",
44+
"FLB_HAVE_METRICS",
45+
"FLB_HAVE_SQLDB",
46+
"FLB_HAVE_TRACE",
47+
"FLB_HAVE_HTTP_SERVER",
48+
"FLB_HAVE_FLUSH_LIBCO",
49+
"FLB_HAVE_SYSTEMD",
50+
"FLB_HAVE_VALGRIND",
51+
"FLB_HAVE_FORK",
52+
"FLB_HAVE_PROXY_GO",
53+
"FLB_HAVE_REGEX",
54+
"FLB_HAVE_C_TLS",
55+
"FLB_HAVE_SETJMP",
56+
"FLB_HAVE_ACCEPT4",
57+
"FLB_HAVE_INOTIFY"
58+
]
59+
}
60+
}
61+
```
62+
63+
Note that we are sending the _curl_ command output to the _jq_ program which helps to make the JSON data easy to read from the terminal. Fluent Bit don't aim to do JSON pretty-printing.
64+
65+
## REST API Interface {#rest_api}
66+
67+
Fluent Bit aims to expose useful interfaces for monitoring, as of Fluent Bit v0.14 the following end points are available:
68+
69+
| URI | Description | Data Format |
70+
| :--- | :--- | :--- |
71+
| / | Fluent Bit build information | JSON |
72+
| /api/v1/uptime | Get uptime information in seconds and human readable format | JSON |
73+
| /api/v1/metrics | Internal metrics per loaded plugin | JSON |
74+
| /api/v1/metrics/prometheus | Internal metrics per loaded plugin ready to be consumed by a Prometheus Server | Prometheus Text 0.0.4 |
75+
76+
## Uptime Example
77+
78+
Query the service uptime with the following command:
79+
80+
```
81+
$ curl -s http://127.0.0.1:2020/api/v1/uptime | jq
82+
```
83+
84+
it should print a similar output like this:
85+
86+
```json
87+
{
88+
"uptime_sec": 8950000,
89+
"uptime_hr": "Fluent Bit has been running: 103 days, 14 hours, 6 minutes and 40 seconds"
90+
}
91+
92+
```
93+
94+
## Metrics Examples
95+
96+
Query internal metrics in JSON format with the following command:
97+
98+
```bash
99+
$ curl -s http://127.0.0.1:2020/api/v1/metrics | jq
100+
```
101+
102+
it should print a similar output like this:
103+
104+
```json
105+
{
106+
"input": {
107+
"cpu.0": {
108+
"records": 8,
109+
"bytes": 2536
110+
}
111+
},
112+
"output": {
113+
"stdout.0": {
114+
"proc_records": 5,
115+
"proc_bytes": 1585,
116+
"errors": 0,
117+
"retries": 0,
118+
"retries_failed": 0
119+
}
120+
}
121+
}
122+
```
123+
124+
#### Metrics in Prometheus format
125+
126+
Query internal metrics in Prometheus Text 0.0.4 format:
127+
128+
```bash
129+
$ curl -s http://127.0.0.1:2020/api/v1/metrics/prometheus
130+
```
131+
132+
this time the same metrics will be in Prometheus format instead of JSON:
133+
134+
```
135+
fluentbit_input_records_total{name="cpu.0"} 57 1509150350542
136+
fluentbit_input_bytes_total{name="cpu.0"} 18069 1509150350542
137+
fluentbit_output_proc_records_total{name="stdout.0"} 54 1509150350542
138+
fluentbit_output_proc_bytes_total{name="stdout.0"} 17118 1509150350542
139+
fluentbit_output_errors_total{name="stdout.0"} 0 1509150350542
140+
fluentbit_output_retries_total{name="stdout.0"} 0 1509150350542
141+
fluentbit_output_retries_failed_total{name="stdout.0"} 0 1509150350542
142+
```
143+
144+
145+
146+
### Configuring Aliases
147+
148+
By default configured plugins on runtime get an internal name in the format _plugin_name.ID_. For monitoring purposes this can be confusing if many plugins of the same type were configured. To make a distinction each configured input or output section can get an _alias_ that will be used as the parent name for the metric.
149+
150+
The following example set an alias to the INPUT section which is using the [CPU](../input/cpu.md) input plugin:
151+
152+
```
153+
[SERVICE]
154+
HTTP_Server On
155+
HTTP_Listen 0.0.0.0
156+
HTTP_PORT 2020
157+
158+
[INPUT]
159+
Name cpu
160+
Alias server1_cpu
161+
162+
[OUTPUT]
163+
Name stdout
164+
Alias raw_output
165+
Match *
166+
```
167+
168+
Now when querying the metrics we get the aliases in place instead of the plugin name:
169+
170+
```json
171+
{
172+
"input": {
173+
"server1_cpu": {
174+
"records": 8,
175+
"bytes": 2536
176+
}
177+
},
178+
"output": {
179+
"raw_output": {
180+
"proc_records": 5,
181+
"proc_bytes": 1585,
182+
"errors": 0,
183+
"retries": 0,
184+
"retries_failed": 0
185+
}
186+
}
187+
}
188+
```
189+
190+
191+
Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,42 @@
1-
# Scheduling and Retries
1+
# Scheduler
2+
3+
[Fluent Bit](https://fluentbit.io) has an Engine that helps to coordinate the data ingestion from input plugins and call the _Scheduler_ to decide when is time to flush the data through one or multiple output plugins. The Scheduler flush new data every a fixed time of seconds and Schedule retries when asked.
4+
5+
Once an output plugin gets call to flush some data, after processing that data it can notify the Engine three possible return statuses:
6+
7+
- OK
8+
- Retry
9+
- Error
10+
11+
If the return status was __OK__, it means it was successfully able to process and flush the data, if it returned an __Error__ status, means that an unrecoverable error happened and the engine should not try to flush that data again. If a __Retry__ was requested, the _Engine_ will ask the _Scheduler_ to retry to flush that data, the Scheduler will decide how many seconds to wait before that happen.
12+
13+
## Configuring Retries
14+
15+
The Scheduler provides a simple configuration option called __Retry_Limit__ which can be set independently on each output section. This option allows to disable retries or impose a limit to try N times and then discard the data after reaching that limit:
16+
17+
| | Value | Description |
18+
| ----------- | ----- | ------------------------------------------------------------ |
19+
| Retry_Limit | N | Integer value to set the maximum number of retries allowed. N must be >= 1 (default: 2) |
20+
| Retry_Limit | False | When Retry_Limit is set to False, means that there is not limit for the number of retries that the Scheduler can do. |
21+
22+
23+
24+
### Example
25+
26+
The following example configure two outputs where the HTTP plugin have an unlimited number of retries and the Elasticsearch plugin have a limit of 5 times:
27+
28+
```
29+
[OUTPUT]
30+
Name http
31+
Host 192.168.5.6
32+
Port 8080
33+
Retry_Limit False
34+
35+
[OUTPUT]
36+
Name es
37+
Host 192.168.5.20
38+
Port 9200
39+
Logstash_Format On
40+
Retry_Limit 5
41+
```
242

0 commit comments

Comments
 (0)