Skip to content

Commit 73e2233

Browse files
stream-processing: getting-started: hands-on: general cleanup
Signed-off-by: Alexa Kreizinger <alexakreizinger@gmail.com>
1 parent cba7fe8 commit 73e2233

File tree

1 file changed

+29
-54
lines changed

1 file changed

+29
-54
lines changed
Lines changed: 29 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,43 @@
1-
# Hands On! 101
1+
# Tutorial
22

3-
This article goes through very specific and simple steps to learn how Stream Processor works. For simplicity it uses a custom Docker image that contains the relevant components for testing.
3+
Follow this tutorial to learn more about stream processing.
44

55
## Requirements
66

7-
The following tutorial requires the following software components:
7+
This tutorial requires the following components:
88

9-
* [Fluent Bit](https://fluentbit.io) &gt;= v1.2.0
10-
* [Docker Engine](https://www.docker.com/products/docker-engine) \(not mandatory if you already have Fluent Bit binary installed in your system\)
9+
* Fluent Bit
10+
* [Docker Engine](https://www.docker.com/products/docker-engine)
11+
* A stream processing [sample file](https://raw.githubusercontent.com/fluent/fluent-bit-docs/37b477786d6e28eb223e08611c26ec93671a34ac/stream-processing/samples/sp-samples-1k.log)
1112

12-
In addition download the following data [sample file](https://raw.githubusercontent.com/fluent/fluent-bit-docs/37b477786d6e28eb223e08611c26ec93671a34ac/stream-processing/samples/sp-samples-1k.log) \(130KB\).
13+
## Steps
1314

14-
## Stream Processing using the command line
15-
16-
For all next steps we will run Fluent Bit from the command line, and for simplicity we will use the official Docker image.
15+
These steps use the official Fluent Bit Docker image.
1716

1817
### 1. Fluent Bit version
1918

19+
Run the following command to confirm that Fluent Bit is installed and up-to-date:
20+
2021
```bash
2122
$ docker run -ti fluent/fluent-bit:1.4 /fluent-bit/bin/fluent-bit --version
2223
Fluent Bit v1.8.2
2324
```
2425

2526
### 2. Parse sample files
2627

27-
The samples file contains JSON records. On this command, we are appending the Parsers configuration file and instructing _tail_ input plugin to parse the content as _json_:
28+
The sample file contains JSON records. Run the following command to append the `parsers.conf` file and instruct the Tail input plugin to parse content as JSON:
2829

2930
```bash
3031
$ docker run -ti -v `pwd`/sp-samples-1k.log:/sp-samples-1k.log \
3132
fluent/fluent-bit:1.8.2 \
3233
/fluent-bit/bin/fluent-bit -R /fluent-bit/etc/parsers.conf \
3334
-i tail -p path=/sp-samples-1k.log \
3435
-p parser=json \
35-
-p read_from_head=true \
36+
-p read_from_head=true \
3637
-o stdout -f 1
3738
```
3839

39-
The command above will simply print the parsed content to the standard output interface. The content will print the _Tag_ associated to each record and an array with two fields: record timestamp and record map:
40+
This command prints the parsed content to the standard output interface. The parsed content includes a tag associated with each record and an array with two fields: a timestamp and a record map:
4041

4142
```text
4243
Fluent Bit v1.8.2
@@ -58,11 +59,9 @@ Fluent Bit v1.8.2
5859
[5] tail.0: [1557322456.315550927, {"date"=>"22/abr/2019:12:43:52 -0600", "ip"=>"132.113.203.169", "word"=>"fendered", "country"=>"United States", "flag"=>true, "num"=>53}]
5960
```
6061

61-
As of now there is no Stream Processing, on step \#3 we will start doing some basic queries.
62-
63-
### 3. Selecting specific record keys
62+
### 3. Select specific record keys
6463

65-
This command introduces a Stream Processor \(SP\) query through the **-T** option and changes the output plugin to _null_, this is done with the purpose of obtaining the SP results in the standard output interface and avoid confusions in the terminal.
64+
Run the following command to create a stream processor query using the `-T` flag and change the output to the Null plugin. This obtains the stream processing results in the standard output interface and avoids confusion in the terminal.
6665

6766
```bash
6867
$ docker run -ti -v `pwd`/sp-samples-1k.log:/sp-samples-1k.log \
@@ -77,7 +76,7 @@ $ docker run -ti -v `pwd`/sp-samples-1k.log:/sp-samples-1k.log \
7776
-o null -f 1
7877
```
7978

80-
The query above aims to retrieve all records that a key named _country_ value matches the value _Chile_, and for each match compose and output a record using only the key fields _word_ and _num_:
79+
The previous query aims to retrieve all records for which the `country` key contains the value `Chile`. For each match, it composes and outputs a record that only contains the keys `word` and `num`:
8180

8281
```text
8382
[0] [1557322913.263534, {"word"=>"Candide", "num"=>94}]
@@ -87,9 +86,9 @@ The query above aims to retrieve all records that a key named _country_ value ma
8786
[0] [1557322913.263706, {"word"=>"decasyllables", "num"=>76}]
8887
```
8988

90-
### 4. Calculate Average Value
89+
### 4. Calculate average value
9190

92-
The following query is similar to the one in the previous step, but this time we will use the aggregation function called AVG\(\) to get the average value of the records ingested:
91+
Run the following command to use the `AVG` aggregation function to get the average value of ingested records:
9392

9493
```bash
9594
$ docker run -ti -v `pwd`/sp-samples-1k.log:/sp-samples-1k.log \
@@ -104,7 +103,7 @@ $ docker run -ti -v `pwd`/sp-samples-1k.log:/sp-samples-1k.log \
104103
-o null -f 1
105104
```
106105

107-
output:
106+
The previous query yields the following output:
108107

109108
```text
110109
[0] [1557323573.940149, {"AVG(num)"=>61.230770}]
@@ -114,11 +113,13 @@ output:
114113
[0] [1557323573.945130, {"AVG(num)"=>99.000000}]
115114
```
116115

117-
why did we get multiple records? Answer: When Fluent Bit processes the data, records come in chunks and the Stream Processor runs the process over chunks of data, so the input plugin ingested 5 chunks of records and SP processed the query for each chunk independently. To process multiple chunks at once we have to group results during windows of time.
116+
{% hint style="info" %}
117+
The resulting output contains multiple records because Fluent Bit processes data in chunks, and the stream processor processes each chunk independently. To process multiple chunks at the same time, you can group results using time windows.
118+
{% endhint %}
118119

119-
### 5. Grouping Results and Window
120+
### 5. Group results and windows
120121

121-
Grouping results aims to simplify data processing and when used in a defined window of time we can achieve great things. The next query group the results by _country_ and calculate the average of _num_ value, the processing window is 1 second which basically means: process all incoming chunks coming within 1 second window:
122+
Grouping results within a time window simplifies data processing. Run the following command to group results by `country` and calculate the average of `num` with a one-second processing window:
122123

123124
```bash
124125
$ docker run -ti -v `pwd`/sp-samples-1k.log:/sp-samples-1k.log \
@@ -136,17 +137,17 @@ $ docker run -ti -v `pwd`/sp-samples-1k.log:/sp-samples-1k.log \
136137
-o null -f 1
137138
```
138139

139-
output:
140+
The previous query yields the following output:
140141

141142
```text
142143
[0] [1557324239.003211, {"country"=>"Chile", "AVG(num)"=>53.164558}]
143144
```
144145

145-
### 6. Ingest Stream Processor results as new Stream of Data
146+
### 6. Ingest stream processor results as a new stream of data
146147

147-
Now we see a more real-world use case. Sending data results to the standard output interface is good for learning purposes, but now we will instruct the Stream Processor to ingest results as part of Fluent Bit data pipeline and attach a Tag to them.
148+
Next, instruct the stream processor to ingest results as part of the Fluent Bit data pipeline and assign a tag to each record.
148149

149-
This can be done using the **CREATE STREAM** statement that will also tag results with **sp-results** value. Note that output plugin parameter is now _stdout_ matching all records tagged with _sp-results_:
150+
Run the following command, which uses a `CREATE STREAM` statement to tag results with the `sp-results` tag, then outputs records with that tag to standard output:
150151

151152
```bash
152153
$ docker run -ti -v `pwd`/sp-samples-1k.log:/sp-samples-1k.log \
@@ -166,34 +167,8 @@ $ docker run -ti -v `pwd`/sp-samples-1k.log:/sp-samples-1k.log \
166167
-o stdout -m 'sp-results' -f 1
167168
```
168169

169-
output:
170+
The previous query yields the following results:
170171

171172
```text
172173
[0] sp-results: [1557325032.000160100, {"country"=>"Chile", "AVG(num)"=>53.164558}]
173174
```
174-
175-
## F.A.Q
176-
177-
### Where STREAM name comes from?
178-
179-
Fluent Bit have the notion of streams, and every input plugin instance gets a default name. You can override that behavior by setting an alias. Check the **alias** parameter and new **stream** name in the following example:
180-
181-
```bash
182-
$ docker run -ti -v `pwd`/sp-samples-1k.log:/sp-samples-1k.log \
183-
fluent/fluent-bit:1.8.2 \
184-
/fluent-bit/bin/fluent-bit \
185-
-R /fluent-bit/etc/parsers.conf \
186-
-i tail \
187-
-p path=/sp-samples-1k.log \
188-
-p parser=json \
189-
-p read_from_head=true \
190-
-p alias=samples \
191-
-T "CREATE STREAM results WITH (tag='sp-results') \
192-
AS \
193-
SELECT country, AVG(num) FROM STREAM:samples \
194-
WINDOW TUMBLING (1 SECOND) \
195-
WHERE country='Chile' \
196-
GROUP BY country;" \
197-
-o stdout -m 'sp-results' -f 1
198-
```
199-

0 commit comments

Comments
 (0)