Skip to content

Commit 2a0c790

Browse files
committed
pipeline: filter: migrate docs
Signed-off-by: Eduardo Silva <eduardo@treasure-data.com>
1 parent 1787fd8 commit 2a0c790

10 files changed

+1396
-1
lines changed

pipeline/filters/aws-metadata.md

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,42 @@
1-
# AWS Metadata
1+
# AWS
22

3+
The _AWS Filter_ Enriches logs with AWS Metadata. Currently the plugin adds the EC2 instance ID and availability zone to log records. To use this plugin, you must be running in EC2 and have the [instance metadata service enabled](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html).
4+
5+
## Configuration Parameters
6+
7+
The plugin supports the following configuration parameters:
8+
9+
| Key | Value Format | Description |
10+
| :--- | :--- | :--- |
11+
| imds_version | VERSION | Specify which version of the instance metadata service to use. Valid values are 'v1' or 'v2'; 'v2' is the default. |
12+
13+
Note: *If you run Fluent Bit in a container, you may have to use instance metadata v1.* The plugin behaves the same regardless of which version is used.
14+
15+
## Usage
16+
17+
### Command Line
18+
19+
```
20+
$ bin/fluent-bit -i dummy -F aws -m '*' -o stdout
21+
22+
[2020/01/17 07:57:17] [ info] [engine] started (pid=32744)
23+
[0] dummy.0: [1579247838.000171227, {"message"=>"dummy", "az"=>"us-west-2b", "ec2_instance_id"=>"i-06bc83dbc2ac2fdf8"}]
24+
[1] dummy.0: [1579247839.000125097, {"message"=>"dummy", "az"=>"us-west-2b", "ec2_instance_id"=>"i-06bc87dbc2ac3fdf8"}]
25+
```
26+
27+
### Configuration File
28+
29+
```
30+
[INPUT]
31+
Name dummy
32+
Tag dummy
33+
34+
[FILTER]
35+
Name aws
36+
Match *
37+
imds_version v1
38+
39+
[OUTPUT]
40+
Name stdout
41+
Match *
42+
```

pipeline/filters/grep.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,97 @@
11
# Grep
22

3+
The _Grep Filter_ plugin allows to match or exclude specific records based in regular expression patterns.
4+
5+
## Configuration Parameters
6+
7+
The plugin supports the following configuration parameters:
8+
9+
| Key | Value Format | Description |
10+
| :--- | :--- | :--- |
11+
| Regex | FIELD REGEX | Keep records which field matches the regular expression. |
12+
| Exclude | FIELD REGEX | Exclude records which field matches the regular expression. |
13+
14+
## Getting Started
15+
16+
In order to start filtering records, you can run the filter from the command line or through the configuration file. The following example assumes that you have a file called _lines.txt_ with the following content
17+
18+
```text
19+
aaa
20+
aab
21+
bbb
22+
ccc
23+
ddd
24+
eee
25+
fff
26+
ggg
27+
```
28+
29+
### Command Line
30+
31+
> Note: using the command line mode need special attention to quote the regular expressions properly. It's suggested to use a configuration file.
32+
33+
The following command will load the _tail_ plugin and read the content of _lines.txt_ file. Then the _grep_ filter will apply a regular expression rule over the _log_ field \(created by tail plugin\) and only _pass_ the records which field value starts with _aa_:
34+
35+
```text
36+
$ bin/fluent-bit -i tail -p 'path=lines.txt' -F grep -p 'regex=log aa' -m '*' -o stdout
37+
```
38+
39+
### Configuration File
40+
41+
```python
42+
[INPUT]
43+
Name tail
44+
Path lines.txt
45+
46+
[FILTER]
47+
Name grep
48+
Match *
49+
Regex log aa
50+
51+
[OUTPUT]
52+
Name stdout
53+
Match *
54+
```
55+
56+
The filter allows to use multiple rules which are applied in order, you can have many _Regex_ and _Exclude_ entries as required.
57+
58+
### Nested fields example
59+
60+
Currently nested fields are not supported. If you have records in the following format
61+
62+
```json
63+
{
64+
"kubernetes": {
65+
"pod_name": "myapp-0",
66+
"namespace_name": "default",
67+
"pod_id": "216cd7ae-1c7e-11e8-bb40-000c298df552",
68+
"labels": {
69+
"app": "myapp"
70+
},
71+
"host": "minikube",
72+
"container_name": "myapp",
73+
"docker_id": "370face382c7603fdd309d8c6aaaf434fd98b92421ce7c7c8aafe7697d4aa362"
74+
}
75+
}
76+
```
77+
78+
and if you want to exclude records that match given nested field (for example `kubernetes.labels.app`), you could use combination of [nest](https://docs.fluentbit.io/manual/v/1.0/filter/nest) and grep filters. Here is an example that will exclude records that match `kubernetes.labels.app: myapp`:
79+
80+
```python
81+
[FILTER]
82+
Name nest
83+
Match *
84+
Operation lift
85+
Nested_under kubernetes
86+
87+
[FILTER]
88+
Name nest
89+
Match *
90+
Operation lift
91+
Nested_under labels
92+
93+
[FILTER]
94+
Name grep
95+
Match *
96+
Exclude app myapp
97+
```

pipeline/filters/kubernetes.md

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,200 @@
11
# Kubernetes
22

3+
Fluent Bit _Kubernetes Filter_ allows to enrich your log files with Kubernetes metadata.
4+
5+
When Fluent Bit is deployed in Kubernetes as a DaemonSet and configured to read the log files from the containers (using tail or systemd input plugins), this filter aims to perform the following operations:
6+
7+
* Analyze the Tag and extract the following metadata:
8+
* Pod Name
9+
* Namespace
10+
* Container Name
11+
* Container ID
12+
* Query Kubernetes API Server to obtain extra metadata for the POD in question:
13+
* Pod ID
14+
* Labels
15+
* Annotations
16+
17+
The data is cached locally in memory and appended to each record.
18+
19+
## Configuration Parameters
20+
21+
The plugin supports the following configuration parameters:
22+
23+
| Key | Description | Default |
24+
| :--- | :--- | :--- |
25+
| Buffer\_Size | Set the buffer size for HTTP client when reading responses from Kubernetes API server. The value must be according to the [Unit Size](../configuration/unit_sizes.md) specification. | 32k |
26+
| Kube\_URL | API Server end-point | https://kubernetes.default.svc:443 |
27+
| Kube\_CA\_File | CA certificate file | /var/run/secrets/kubernetes.io/serviceaccount/ca.crt|
28+
| Kube\_CA\_Path | Absolute path to scan for certificate files | |
29+
| Kube\_Token\_File | Token file | /var/run/secrets/kubernetes.io/serviceaccount/token |
30+
| Kube_Tag_Prefix | When the source records comes from Tail input plugin, this option allows to specify what's the prefix used in Tail configuration. | kube.var.log.containers. |
31+
| Merge\_Log | When enabled, it checks if the `log` field content is a JSON string map, if so, it append the map fields as part of the log structure. | Off |
32+
| Merge\_Log\_Key | When `Merge_Log` is enabled, the filter tries to assume the `log` field from the incoming message is a JSON string message and make a structured representation of it at the same level of the `log` field in the map. Now if `Merge_Log_Key` is set \(a string name\), all the new structured fields taken from the original `log` content are inserted under the new key. | |
33+
| Merge\_Log\_Trim | When `Merge_Log` is enabled, trim (remove possible \n or \r) field values. | On |
34+
| Merge\_Parser | Optional parser name to specify how to parse the data contained in the _log_ key. Recommended use is for developers or testing only. | |
35+
| Keep\_Log | When `Keep_Log` is disabled, the `log` field is removed from the incoming message once it has been successfully merged (`Merge_Log` must be enabled as well). | On |
36+
| tls.debug | Debug level between 0 \(nothing\) and 4 \(every detail\). | -1 |
37+
| tls.verify | When enabled, turns on certificate validation when connecting to the Kubernetes API server. | On |
38+
| Use\_Journal | When enabled, the filter reads logs coming in Journald format. | Off |
39+
| Regex\_Parser | Set an alternative Parser to process record Tag and extract pod\_name, namespace\_name, container\_name and docker\_id. The parser must be registered in a [parsers file](https://github.com/fluent/fluent-bit/blob/master/conf/parsers.conf) \(refer to parser _filter-kube-test_ as an example\). | |
40+
| K8S-Logging.Parser | Allow Kubernetes Pods to suggest a pre-defined Parser (read more about it in Kubernetes Annotations section) | Off |
41+
| K8S-Logging.Exclude | Allow Kubernetes Pods to exclude their logs from the log processor (read more about it in Kubernetes Annotations section). | Off |
42+
| Labels | Include Kubernetes resource labels in the extra metadata. | On |
43+
| Annotations | Include Kubernetes resource annotations in the extra metadata. | On |
44+
| Kube\_meta_preload_cache_dir | If set, Kubernetes meta-data can be cached/pre-loaded from files in JSON format in this directory, named as namespace-pod.meta | |
45+
| Dummy\_Meta | If set, use dummy-meta data (for test/dev purposes) | Off |
46+
47+
## Processing the 'log' value
48+
49+
Kubernetes Filter aims to provide several ways to process the data contained in the _log_ key. The following explanation of the workflow assumes that your original Docker parser defined in _parsers.conf_ is as follows:
50+
51+
```
52+
[PARSER]
53+
Name docker
54+
Format json
55+
Time_Key time
56+
Time_Format %Y-%m-%dT%H:%M:%S.%L
57+
Time_Keep On
58+
```
59+
60+
> Since Fluent Bit v1.2 we are not suggesting the use of decoders (Decode\_Field\_As) if you are using Elasticsearch database in the output to avoid data type conflicts.
61+
62+
To perform processing of the _log_ key, it's __mandatory to enable__ the _Merge\_Log_ configuration property in this filter, then the following processing order will be done:
63+
64+
- If a Pod suggest a parser, the filter will use that parser to process the content of _log_.
65+
- If the option _Merge\_Parser_ was set and the Pod did not suggest a parser, process the _log_ content using the suggested parser in the configuration.
66+
- If no Pod was suggested and no _Merge\_Parser_ is set, try to handle the content as JSON.
67+
68+
If _log_ value processing fails, the value is untouched. The order above is not chained, meaning it's exclusive and the filter will try only one of the options above, __not__ all of them.
69+
70+
## Kubernetes Annotations
71+
72+
A flexible feature of Fluent Bit Kubernetes filter is that allow Kubernetes Pods to suggest certain behaviors for the log processor pipeline when processing the records. At the moment it support:
73+
74+
- Suggest a pre-defined parser
75+
- Request to exclude logs
76+
77+
The following annotations are available:
78+
79+
| Annotation | Description | Default |
80+
| ----------------------------------------- | ------------------------------------------------------------ | ------- |
81+
| fluentbit.io/parser[_stream][-container] | Suggest a pre-defined parser. The parser must be registered already by Fluent Bit. This option will only be processed if Fluent Bit configuration (Kubernetes Filter) have enabled the option _K8S-Logging.Parser_. If present, the stream (stdout or stderr) will restrict that specific stream. If present, the container can override a specific container in a Pod. | |
82+
| fluentbit.io/exclude[_stream][-container] | Request to Fluent Bit to exclude or not the logs generated by the Pod. This option will only be processed if Fluent Bit configuration (Kubernetes Filter) have enabled the option _K8S-Logging.Exclude_. | False |
83+
84+
### Annotation Examples in Pod definition
85+
86+
#### Suggest a parser
87+
88+
The following Pod definition runs a Pod that emits Apache logs to the standard output, in the Annotations it suggest that the data should be processed using the pre-defined parser called _apache_:
89+
90+
```yaml
91+
apiVersion: v1
92+
kind: Pod
93+
metadata:
94+
name: apache-logs
95+
labels:
96+
app: apache-logs
97+
annotations:
98+
fluentbit.io/parser: apache
99+
spec:
100+
containers:
101+
- name: apache
102+
image: edsiper/apache_logs
103+
```
104+
105+
#### Request to exclude logs
106+
107+
There are certain situations where the user would like to request that the log processor simply skip the logs from the Pod in question:
108+
109+
```yaml
110+
apiVersion: v1
111+
kind: Pod
112+
metadata:
113+
name: apache-logs
114+
labels:
115+
app: apache-logs
116+
annotations:
117+
fluentbit.io/exclude: "true"
118+
spec:
119+
containers:
120+
- name: apache
121+
image: edsiper/apache_logs
122+
```
123+
124+
Note that the annotation value is boolean which can take a _true_ or _false_ and __must__ be quoted.
125+
126+
## Workflow of Tail + Kubernetes Filter
127+
128+
Kubernetes Filter depends on either [Tail](../input/tail.md) or [Systemd](../input/systemd.md) input plugins to process and enrich records with Kubernetes metadata. Here we will explain the workflow of Tail and how it configuration is correlated with Kubernetes filter. Consider the following configuration example (just for demo purposes, not production):
129+
130+
```
131+
[INPUT]
132+
Name tail
133+
Tag kube.*
134+
Path /var/log/containers/*.log
135+
Parser docker
136+
137+
[FILTER]
138+
Name kubernetes
139+
Match kube.*
140+
Kube_URL https://kubernetes.default.svc:443
141+
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
142+
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
143+
Kube_Tag_Prefix kube.var.log.containers.
144+
Merge_Log On
145+
Merge_Log_Key log_processed
146+
```
147+
148+
In the input section, the [Tail](../input/tail.md) plugin will monitor all files ending in _.log_ in path _/var/log/containers/_. For every file it will read every line and apply the docker parser. Then the records are emitted to the next step with an expanded tag.
149+
150+
Tail support Tags expansion, which means that if a tag have a star character (*), it will replace the value with the absolute path of the monitored file, so if you file name and path is:
151+
152+
```
153+
/var/log/container/apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log
154+
```
155+
156+
then the Tag for every record of that file becomes:
157+
158+
```
159+
kube.var.log.containers.apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log
160+
```
161+
162+
> note that slashes are replaced with dots.
163+
164+
When [Kubernetes Filter](kubernetes.md) runs, it will try to match all records that starts with _kube._ (note the ending dot), so records from the file mentioned above will hit the matching rule and the filter will try to enrich the records
165+
166+
Kubernetes Filter do not care from where the logs comes from, but it cares about the absolute name of the monitored file, because that information contains the pod name and namespace name that are used to retrieve associated metadata to the running Pod from the Kubernetes Master/API Server.
167+
168+
If the configuration property __Kube_Tag_Prefix__ was configured (available on Fluent Bit >= 1.1.x), it will use that value to remove the prefix that was appended to the Tag in the previous Input section. Note that the configuration property defaults to _kube._var.logs.containers. , so the previous Tag content will be transformed from:
169+
170+
```
171+
kube.var.log.containers.apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log
172+
```
173+
174+
to:
175+
176+
```
177+
apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log
178+
```
179+
180+
> the transformation above do not modify the original Tag, just creates a new representation for the filter to perform metadata lookup.
181+
182+
that new value is used by the filter to lookup the pod name and namespace, for that purpose it uses an internal Regular expression:
183+
184+
```
185+
(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$
186+
```
187+
188+
> If you want to know more details, check the source code of that definition [here](<https://github.com/fluent/fluent-bit/blob/master/plugins/filter_kubernetes/kube_regex.h#L26>).
189+
190+
You can see on [Rublar.com](https://rubular.com/r/HZz3tYAahj6JCd) web site how this operation is performed, check the following demo link:
191+
192+
- [https://rubular.com/r/HZz3tYAahj6JCd](https://rubular.com/r/HZz3tYAahj6JCd)
193+
194+
#### Custom Regex
195+
196+
Under certain and not common conditions, a user would want to alter that hard-coded regular expression, for that purpose the option __Regex_Parser__ can be used (documented on top).
197+
198+
#### Final Comments
199+
200+
So at this point the filter is able to gather the values of _pod_name_ and _namespace_, with that information it will check in the local cache (internal hash table) if some metadata for that key pair exists, if so, it will enrich the record with the metadata value, otherwise it will connect to the Kubernetes Master/API Server and retrieve that information.

0 commit comments

Comments
 (0)