Skip to content

Commit

Permalink
[EV-4885] Add example to flow log page. Improve aggr desc.
Browse files Browse the repository at this point in the history
  • Loading branch information
dimitri-nicolo committed May 28, 2024
1 parent 9c13bd4 commit 9e251c0
Show file tree
Hide file tree
Showing 8 changed files with 280 additions and 16 deletions.
6 changes: 3 additions & 3 deletions calico-cloud/visibility/elastic/flow/aggregation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ The following table summarizes the aggregation levels by flow log traffic:
|-----------|-------------------------------------|-------------------------------------------------------------------|
| 0 | | No aggregation |
| 1 | AnyProcessInSameSourcePod | Identity fields below source pod level are masked out. It means that if multiple processes or containers, within the same source pod, perform the same operation, the events are aggregated. |
| 2 | AnyProcessInSameSourcePodPrefix | Identity fields below source pod-prefix level are masked out. It means that if multiple processes or containers, within pods with the same prefix, perform the same operation, the events are aggregated. |
| 3 | AnyProcessInSamePodPrefix | Identity fields below source and destination pod-prefix level are masked out. It means that if multiple processes or containers, within pods with the same prefix, perform the same operation towards pods with the same prefix, the events are aggregated. |
| 2 | AnyProcessInSameSourcePodPrefix | In addition to the above, source pod names are aggregated based on their shared prefixes. This means that flows, to the same destination, from pods within the same Deployment/ReplicaSet are aggregated together. |
| 3 | AnyProcessInSamePodPrefix | This level of aggregation builds on the previous two levels and also groups destination pod names based on their shared prefixes. |

### Understanding aggregation level differences

Expand All @@ -45,7 +45,7 @@ type minimizes the flow logs generated for traffic coming from different contain
and port. The two flows originating from `client-a` without aggregation are combined into one.

In Kubernetes, ReplicaSets and StatefulSets can automatically create names for pods. For example, the pods `nginx-1` and `nginx-2` are created by the
ReplicaSet nginx. The ReplicaSet name is considered a pod-prefix and is used to aggregate flow log entries (indicated with an asterisk * at the end
ReplicaSet `nginx`. The ReplicaSet name is considered a pod-prefix and is used to aggregate flow log entries (indicated with an asterisk * at the end
of the name). Flow logs originating from pods with the same prefix will be aggregated as long as the traffic is on the same protocol, and destined
towards the same IP, and destination port. The three flow logs without aggregation originating from `client-a` and `client-b` are combined into a
single flow log. This aggregation level is called `AnyProcessInSameSourcePodPrefix`.
Expand Down
66 changes: 66 additions & 0 deletions calico-cloud/visibility/elastic/flow/datatypes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,69 @@ Where,
action for the tier. In this case, the `<policy name>` is selected arbitrarily from the set of policies within
the tier that apply to the endpoint.
* `-2` means "unknown". The rule index was not recorded.

### Flow log example, with **no aggregation**

When the aggregation level is set to 'no aggregation,' the flow log appears as follows:

```
{
"start_time": 1597166083,
"end_time": 1597166383,
"source_ip": "192.168.47.9",
"source_name": "access-6b687c8dcb-zn5s2",
"source_name_aggr": "access-6b687c8dcb-*",
"source_namespace": "policy-demo",
"source_port": 42106,
"source_type": "wep",
"source_labels": {
"labels": [
"pod-template-hash=6b687c8dcb",
"app=access"
]
},
"dest_ip": "192.168.138.79",
"dest_name": "nginx-86c57db685-h6792",
"dest_name_aggr": "nginx-86c57db685-*",
"dest_namespace": "policy-demo",
"dest_port": 80,
"dest_type": "wep",
"dest_labels": {
"labels": [
"pod-template-hash=86c57db685",
"app=nginx"
]
},
"proto": "tcp",
"action": "allow",
"reporter": "dst",
"policies": {
"all_policies": [
"0|default|policy-demo/default.access-nginx|allow"
]
},
"bytes_in": 388,
"bytes_out": 1113,
"num_flows": 1,
"num_flows_started": 1,
"num_flows_completed": 1,
"packets_in": 6,
"packets_out": 5,
"http_requests_allowed_in": 0,
"http_requests_denied_in": 0,
"original_source_ips": null,
"num_original_source_ips": 0,
"host": "bz-n8kf-kadm-node-1",
"@timestamp": 1597166383000
}
```

The aggregation interval can be determined by examining the time between **`start_time`**: 1597166083 and **`end_time`**: 1597166383. Workload
endpoints with a common prefix **`access-6b687c8dcb-`** in the **`policy-demo`** namespace connect to endpoints/pods with prefix
**`nginx-86c57db685-`** exposing a service on port 80. The aggregated source endpoints have labels app: nginx and
**`pod-template-hash: 6b687c8dcb`**, and the destination has labels **`app: nginx`** and **`pod-template-hash: 86c57db685`**. This log
represents an incoming connection reported by the "Destination" node, with the connection "Allowed" by a policy. There are 6 incoming packets and 5
outgoing packets. The **`num_flow field shows how many flows were aggregated during the interval. As aggregation levels increase, more flows might be
grouped together, depending on your data. At higher aggregation levels, some fields may show **`null`** values, such as **`source_port`** or
**`source_ip`**, indicating that pods from the same source performing the same tasks are aggregated into a single flow log. For more details on
aggregation levels, see the flow log aggregation configuration. [configure flow log aggregation](./aggregation.mdx).
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ The following table summarizes the aggregation levels by flow log traffic:
| **Level** | **Name** | **Description** |
|-----------|-------------------------------------|-------------------------------------------------------------------|
| 0 | | No aggregation |
| 1 | AnyProcessInSameSourcePod | Identity fields below source pod level are masked out. It means that if multiple processes or containers, within the same source pod, perform the same operation, the events are aggregated. |
| 2 | AnyProcessInSameSourcePodPrefix | Identity fields below source pod-prefix level are masked out. It means that if multiple processes or containers, within pods with the same prefix, perform the same operation, the events are aggregated. |
| 3 | AnyProcessInSamePodPrefix | Identity fields below source and destination pod-prefix level are masked out. It means that if multiple processes or containers, within pods with the same prefix, perform the same operation towards pods with the same prefix, the events are aggregated. |
| 1 | AnyProcessInSameSourcePod | Identity fields below source pod level are masked out. It means that if multiple processes or containers, within the same source pod, perform the same operation, the events are aggregated. |
| 2 | AnyProcessInSameSourcePodPrefix | In addition to the above, source pod names are aggregated based on their shared prefixes. This means that flows, to the same destination, from pods within the same Deployment/ReplicaSet are aggregated together. |
| 3 | AnyProcessInSamePodPrefix | This level of aggregation builds on the previous two levels and also groups destination pod names based on their shared prefixes. |

### Understanding aggregation level differences

Expand All @@ -45,7 +45,7 @@ type minimizes the flow logs generated for traffic coming from different contain
and port. The two flows originating from `client-a` without aggregation are combined into one.

In Kubernetes, ReplicaSets and StatefulSets can automatically create names for pods. For example, the pods `nginx-1` and `nginx-2` are created by the
ReplicaSet nginx. The ReplicaSet name is considered a pod-prefix and is used to aggregate flow log entries (indicated with an asterisk * at the end
ReplicaSet `nginx`. The ReplicaSet name is considered a pod-prefix and is used to aggregate flow log entries (indicated with an asterisk * at the end
of the name). Flow logs originating from pods with the same prefix will be aggregated as long as the traffic is on the same protocol, and destined
towards the same IP, and destination port. The three flow logs without aggregation originating from `client-a` and `client-b` are combined into a
single flow log. This aggregation level is called `AnyProcessInSameSourcePodPrefix`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,69 @@ Where,
action for the tier. In this case, the `<policy name>` is selected arbitrarily from the set of policies within
the tier that apply to the endpoint.
* `-2` means "unknown". The rule index was not recorded.

### Flow log example, with **no aggregation**

When the aggregation level is set to 'no aggregation,' the flow log appears as follows:

```
{
"start_time": 1597166083,
"end_time": 1597166383,
"source_ip": "192.168.47.9",
"source_name": "access-6b687c8dcb-zn5s2",
"source_name_aggr": "access-6b687c8dcb-*",
"source_namespace": "policy-demo",
"source_port": 42106,
"source_type": "wep",
"source_labels": {
"labels": [
"pod-template-hash=6b687c8dcb",
"app=access"
]
},
"dest_ip": "192.168.138.79",
"dest_name": "nginx-86c57db685-h6792",
"dest_name_aggr": "nginx-86c57db685-*",
"dest_namespace": "policy-demo",
"dest_port": 80,
"dest_type": "wep",
"dest_labels": {
"labels": [
"pod-template-hash=86c57db685",
"app=nginx"
]
},
"proto": "tcp",
"action": "allow",
"reporter": "dst",
"policies": {
"all_policies": [
"0|default|policy-demo/default.access-nginx|allow"
]
},
"bytes_in": 388,
"bytes_out": 1113,
"num_flows": 1,
"num_flows_started": 1,
"num_flows_completed": 1,
"packets_in": 6,
"packets_out": 5,
"http_requests_allowed_in": 0,
"http_requests_denied_in": 0,
"original_source_ips": null,
"num_original_source_ips": 0,
"host": "bz-n8kf-kadm-node-1",
"@timestamp": 1597166383000
}
```

The aggregation interval can be determined by examining the time between **`start_time`**: 1597166083 and **`end_time`**: 1597166383. Workload
endpoints with a common prefix **`access-6b687c8dcb-`** in the **`policy-demo`** namespace connect to endpoints/pods with prefix
**`nginx-86c57db685-`** exposing a service on port 80. The aggregated source endpoints have labels app: nginx and
**`pod-template-hash: 6b687c8dcb`**, and the destination has labels **`app: nginx`** and **`pod-template-hash: 86c57db685`**. This log
represents an incoming connection reported by the "Destination" node, with the connection "Allowed" by a policy. There are 6 incoming packets and 5
outgoing packets. The **`num_flow field shows how many flows were aggregated during the interval. As aggregation levels increase, more flows might be
grouped together, depending on your data. At higher aggregation levels, some fields may show **`null`** values, such as **`source_port`** or
**`source_ip`**, indicating that pods from the same source performing the same tasks are aggregated into a single flow log. For more details on
aggregation levels, see the flow log aggregation configuration. [configure flow log aggregation](./aggregation.mdx).
10 changes: 5 additions & 5 deletions calico-enterprise/visibility/elastic/flow/aggregation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ The following table summarizes the aggregation levels by flow log traffic:
| **Level** | **Name** | **Description** |
|-----------|-------------------------------------|-------------------------------------------------------------------|
| 0 | | No aggregation |
| 1 | AnyProcessInSameSourcePod | Identity fields below source pod level are masked out. It means that if multiple processes or containers, within the same source pod, perform the same operation, the events are aggregated. |
| 2 | AnyProcessInSameSourcePodPrefix | Identity fields below source pod-prefix level are masked out. It means that if multiple processes or containers, within pods with the same prefix, perform the same operation, the events are aggregated. |
| 3 | AnyProcessInSamePodPrefix | Identity fields below source and destination pod-prefix level are masked out. It means that if multiple processes or containers, within pods with the same prefix, perform the same operation towards pods with the same prefix, the events are aggregated. |
| 1 | AnyProcessInSameSourcePod | Identity fields below source pod level are masked out. It means that if multiple processes or containers, within the same source pod, perform the same operation, the events are aggregated. |
| 2 | AnyProcessInSameSourcePodPrefix | In addition to the above, source pod names are aggregated based on their shared prefixes. This means that flows, to the same destination, from pods within the same Deployment/ReplicaSet are aggregated together. |
| 3 | AnyProcessInSamePodPrefix | This level of aggregation builds on the previous two levels and also groups destination pod names based on their shared prefixes. |

### Understanding aggregation level differences

Expand All @@ -45,14 +45,14 @@ type minimizes the flow logs generated for traffic coming from different contain
and port. The two flows originating from `client-a` without aggregation are combined into one.

In Kubernetes, ReplicaSets and StatefulSets can automatically create names for pods. For example, the pods `nginx-1` and `nginx-2` are created by the
ReplicaSet nginx. The ReplicaSet name is considered a pod-prefix and is used to aggregate flow log entries (indicated with an asterisk * at the end
ReplicaSet `nginx`. The ReplicaSet name is considered a pod-prefix and is used to aggregate flow log entries (indicated with an asterisk * at the end
of the name). Flow logs originating from pods with the same prefix will be aggregated as long as the traffic is on the same protocol, and destined
towards the same IP, and destination port. The three flow logs without aggregation originating from `client-a` and `client-b` are combined into a
single flow log. This aggregation level is called `AnyProcessInSameSourcePodPrefix`.

Finally, with `AnyProcessInSamePodPrefix` we combine source and destination pods that are part of the same ReplicaSets. With level 3, the flow logs
are aggregated by the destination port and protocol, as long as they originate from pods with the same pod-prefix and destined for pods of the same
pod-prefix. All logs previously distinct, are aggregated with into a single flow log (see the last row).
pod-prefix. All logs previously distinct, are aggregated into a single flow log (see the last row).

| | | **Src Traffic** | | | **Dst Traffic** | | | **Packet counts** | |

Check failure on line 57 in calico-enterprise/visibility/elastic/flow/aggregation.mdx

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'Src'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'Src'?", "location": {"path": "calico-enterprise/visibility/elastic/flow/aggregation.mdx", "range": {"start": {"line": 57, "column": 44}}}, "severity": "ERROR"}

Check failure on line 57 in calico-enterprise/visibility/elastic/flow/aggregation.mdx

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'Dst'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'Dst'?", "location": {"path": "calico-enterprise/visibility/elastic/flow/aggregation.mdx", "range": {"start": {"line": 57, "column": 76}}}, "severity": "ERROR"}
|--------------------------|-----------|----------|---------|----------|----------|---------|----------|------------|-------------|
Expand Down
66 changes: 66 additions & 0 deletions calico-enterprise/visibility/elastic/flow/datatypes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,69 @@ Where,
action for the tier. In this case, the `<policy name>` is selected arbitrarily from the set of policies within
the tier that apply to the endpoint.
* `-2` means "unknown". The rule index was not recorded.

### Flow log example, with **no aggregation**

When the aggregation level is set to 'no aggregation,' the flow log appears as follows:

```
{
"start_time": 1597166083,
"end_time": 1597166383,
"source_ip": "192.168.47.9",
"source_name": "access-6b687c8dcb-zn5s2",
"source_name_aggr": "access-6b687c8dcb-*",
"source_namespace": "policy-demo",
"source_port": 42106,
"source_type": "wep",
"source_labels": {
"labels": [
"pod-template-hash=6b687c8dcb",
"app=access"
]
},
"dest_ip": "192.168.138.79",
"dest_name": "nginx-86c57db685-h6792",
"dest_name_aggr": "nginx-86c57db685-*",
"dest_namespace": "policy-demo",
"dest_port": 80,
"dest_type": "wep",
"dest_labels": {
"labels": [
"pod-template-hash=86c57db685",
"app=nginx"
]
},
"proto": "tcp",
"action": "allow",
"reporter": "dst",
"policies": {
"all_policies": [
"0|default|policy-demo/default.access-nginx|allow"
]
},
"bytes_in": 388,
"bytes_out": 1113,
"num_flows": 1,
"num_flows_started": 1,
"num_flows_completed": 1,
"packets_in": 6,
"packets_out": 5,
"http_requests_allowed_in": 0,
"http_requests_denied_in": 0,
"original_source_ips": null,
"num_original_source_ips": 0,
"host": "bz-n8kf-kadm-node-1",
"@timestamp": 1597166383000
}
```

The aggregation interval can be determined by examining the time between **`start_time`**: 1597166083 and **`end_time`**: 1597166383. Workload
endpoints with a common prefix **`access-6b687c8dcb-`** in the **`policy-demo`** namespace connect to endpoints/pods with prefix
**`nginx-86c57db685-`** exposing a service on port 80. The aggregated source endpoints have labels app: nginx and

Check failure on line 163 in calico-enterprise/visibility/elastic/flow/datatypes.mdx

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'nginx'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'nginx'?", "location": {"path": "calico-enterprise/visibility/elastic/flow/datatypes.mdx", "range": {"start": {"line": 163, "column": 105}}}, "severity": "ERROR"}
**`pod-template-hash: 6b687c8dcb`**, and the destination has labels **`app: nginx`** and **`pod-template-hash: 86c57db685`**. This log
represents an incoming connection reported by the "Destination" node, with the connection "Allowed" by a policy. There are 6 incoming packets and 5
outgoing packets. The **`num_flow field shows how many flows were aggregated during the interval. As aggregation levels increase, more flows might be
grouped together, depending on your data. At higher aggregation levels, some fields may show **`null`** values, such as **`source_port`** or

Check failure on line 167 in calico-enterprise/visibility/elastic/flow/datatypes.mdx

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'source_port'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'source_port'?", "location": {"path": "calico-enterprise/visibility/elastic/flow/datatypes.mdx", "range": {"start": {"line": 167, "column": 124}}}, "severity": "ERROR"}
**`source_ip`**, indicating that pods from the same source performing the same tasks are aggregated into a single flow log. For more details on
aggregation levels, see the flow log aggregation configuration. [configure flow log aggregation](./aggregation.mdx).
Loading

0 comments on commit 9e251c0

Please sign in to comment.