Skip to content

Commit

Permalink
[EV-4885] Add example to flow log page. Improve aggr desc.
Browse files Browse the repository at this point in the history
  • Loading branch information
dimitri-nicolo committed May 29, 2024
1 parent 9c13bd4 commit 58110f2
Show file tree
Hide file tree
Showing 8 changed files with 272 additions and 20 deletions.
8 changes: 4 additions & 4 deletions calico-cloud/visibility/elastic/flow/aggregation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ The following table summarizes the aggregation levels by flow log traffic:
|-----------|-------------------------------------|-------------------------------------------------------------------|
| 0 | | No aggregation |
| 1 | AnyProcessInSameSourcePod | Identity fields below source pod level are masked out. It means that if multiple processes or containers, within the same source pod, perform the same operation, the events are aggregated. |
| 2 | AnyProcessInSameSourcePodPrefix | Identity fields below source pod-prefix level are masked out. It means that if multiple processes or containers, within pods with the same prefix, perform the same operation, the events are aggregated. |
| 3 | AnyProcessInSamePodPrefix | Identity fields below source and destination pod-prefix level are masked out. It means that if multiple processes or containers, within pods with the same prefix, perform the same operation towards pods with the same prefix, the events are aggregated. |
| 2 | AnyProcessInSameSourcePodPrefix | In addition to the above, source pod names are aggregated based on their shared prefixes. This means that flows, to the same destination, from pods within the same Deployment/ReplicaSet are aggregated together. |
| 3 | AnyProcessInSamePodPrefix | This level of aggregation builds on the previous two levels and also groups destination pod names based on their shared prefixes. |

### Understanding aggregation level differences

Expand All @@ -45,7 +45,7 @@ type minimizes the flow logs generated for traffic coming from different contain
and port. The two flows originating from `client-a` without aggregation are combined into one.

In Kubernetes, ReplicaSets and StatefulSets can automatically create names for pods. For example, the pods `nginx-1` and `nginx-2` are created by the
ReplicaSet nginx. The ReplicaSet name is considered a pod-prefix and is used to aggregate flow log entries (indicated with an asterisk * at the end
ReplicaSet `nginx`. The ReplicaSet name is considered a pod-prefix and is used to aggregate flow log entries (indicated with an asterisk * at the end
of the name). Flow logs originating from pods with the same prefix will be aggregated as long as the traffic is on the same protocol, and destined
towards the same IP, and destination port. The three flow logs without aggregation originating from `client-a` and `client-b` are combined into a
single flow log. This aggregation level is called `AnyProcessInSameSourcePodPrefix`.
Expand Down Expand Up @@ -87,7 +87,7 @@ kubectl get felixconfiguration -o yaml
Before [changing the default aggregation level](../../../reference/resources/felixconfig.mdx#aggregationkind), note the following:

- Although any change in aggregation level affects flow log volume, lowering the aggregation number (especially to `0` for no aggregation) will cause significant impacts to log storage. If you allow more flow logs, ensure that you provision more log storage.
- Verify that the parameters that you want to see in your aggregation level, are not already [filtered](filtering.mdx)
- Verify that the parameters that you want to see in your aggregation level, are not already [filtered](filtering.mdx).

### Troubleshoot logs with aggregation levels

Expand Down
63 changes: 63 additions & 0 deletions calico-cloud/visibility/elastic/flow/datatypes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,66 @@ Where,
action for the tier. In this case, the `<policy name>` is selected arbitrarily from the set of policies within
the tier that apply to the endpoint.
* `-2` means "unknown". The rule index was not recorded.

### Flow log example, with **no aggregation**

A flow log with aggregation level 0, **`no aggregation`**, might look like:

```
{
"start_time": 1597166083,
"end_time": 1597166383,
"source_ip": "192.168.47.9",
"source_name": "access-6b687c8dcb-zn5s2",
"source_name_aggr": "access-6b687c8dcb-*",
"source_namespace": "policy-demo",
"source_port": 42106,
"source_type": "wep",
"source_labels": {
"labels": [
"pod-template-hash=6b687c8dcb",
"app=access"
]
},
"dest_ip": "192.168.138.79",
"dest_name": "nginx-86c57db685-h6792",
"dest_name_aggr": "nginx-86c57db685-*",
"dest_namespace": "policy-demo",
"dest_port": 80,
"dest_type": "wep",
"dest_labels": {
"labels": [
"pod-template-hash=86c57db685",
"app=nginx"
]
},
"proto": "tcp",
"action": "allow",
"reporter": "dst",
"policies": {
"all_policies": [
"0|default|policy-demo/default.access-nginx|allow"
]
},
"bytes_in": 388,
"bytes_out": 1113,
"num_flows": 1,
"num_flows_started": 1,
"num_flows_completed": 1,
"packets_in": 6,
"packets_out": 5,
"http_requests_allowed_in": 0,
"http_requests_denied_in": 0,
"original_source_ips": null,
"num_original_source_ips": 0,
"host": "bz-n8kf-kadm-node-1",
"@timestamp": 1597166383000
}
```

The log shows an incoming connection reported by the "Destination" node, allowed by a policy on port 80. The flows in the log are grouped using a
5-minute aggregation interval, calculated as **`end_time`** - **`start_time`**. During this interval, one flow (**`"num_flow": 1`**) was recorded. At
higher aggregation levels, flows from endpoints performing the same operation and originating from the same Deployment/ReplicaSet are grouped into a
single log. In this example, the common source endpoints are prefixed with **`access-6b687c8dcb-`**. Parameters like **`source_ip`** may be dropped
and set to **`null`** depending on the aggregation level. As aggregation levels increase, more flows will be grouped together based on your data. For
more details on aggregation levels, see [configure flow log aggregation](./aggregation.mdx).
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ The following table summarizes the aggregation levels by flow log traffic:
| **Level** | **Name** | **Description** |
|-----------|-------------------------------------|-------------------------------------------------------------------|
| 0 | | No aggregation |
| 1 | AnyProcessInSameSourcePod | Identity fields below source pod level are masked out. It means that if multiple processes or containers, within the same source pod, perform the same operation, the events are aggregated. |
| 2 | AnyProcessInSameSourcePodPrefix | Identity fields below source pod-prefix level are masked out. It means that if multiple processes or containers, within pods with the same prefix, perform the same operation, the events are aggregated. |
| 3 | AnyProcessInSamePodPrefix | Identity fields below source and destination pod-prefix level are masked out. It means that if multiple processes or containers, within pods with the same prefix, perform the same operation towards pods with the same prefix, the events are aggregated. |
| 1 | AnyProcessInSameSourcePod | Identity fields below source pod level are masked out. It means that if multiple processes or containers, within the same source pod, perform the same operation, the events are aggregated. |
| 2 | AnyProcessInSameSourcePodPrefix | In addition to the above, source pod names are aggregated based on their shared prefixes. This means that flows, to the same destination, from pods within the same Deployment/ReplicaSet are aggregated together. |
| 3 | AnyProcessInSamePodPrefix | This level of aggregation builds on the previous two levels and also groups destination pod names based on their shared prefixes. |

### Understanding aggregation level differences

Expand All @@ -45,7 +45,7 @@ type minimizes the flow logs generated for traffic coming from different contain
and port. The two flows originating from `client-a` without aggregation are combined into one.

In Kubernetes, ReplicaSets and StatefulSets can automatically create names for pods. For example, the pods `nginx-1` and `nginx-2` are created by the
ReplicaSet nginx. The ReplicaSet name is considered a pod-prefix and is used to aggregate flow log entries (indicated with an asterisk * at the end
ReplicaSet `nginx`. The ReplicaSet name is considered a pod-prefix and is used to aggregate flow log entries (indicated with an asterisk * at the end
of the name). Flow logs originating from pods with the same prefix will be aggregated as long as the traffic is on the same protocol, and destined
towards the same IP, and destination port. The three flow logs without aggregation originating from `client-a` and `client-b` are combined into a
single flow log. This aggregation level is called `AnyProcessInSameSourcePodPrefix`.
Expand Down Expand Up @@ -87,7 +87,7 @@ kubectl get felixconfiguration -o yaml
Before [changing the default aggregation level](../../../reference/resources/felixconfig.mdx#aggregationkind), note the following:

- Although any change in aggregation level affects flow log volume, lowering the aggregation number (especially to `0` for no aggregation) will cause significant impacts to log storage. If you allow more flow logs, ensure that you provision more log storage.
- Verify that the parameters that you want to see in your aggregation level, are not already [filtered](filtering.mdx)
- Verify that the parameters that you want to see in your aggregation level, are not already [filtered](filtering.mdx).

### Troubleshoot logs with aggregation levels

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,66 @@ Where,
action for the tier. In this case, the `<policy name>` is selected arbitrarily from the set of policies within
the tier that apply to the endpoint.
* `-2` means "unknown". The rule index was not recorded.

### Flow log example, with **no aggregation**

A flow log with aggregation level 0, **`no aggregation`**, might look like:

```
{
"start_time": 1597166083,
"end_time": 1597166383,
"source_ip": "192.168.47.9",
"source_name": "access-6b687c8dcb-zn5s2",
"source_name_aggr": "access-6b687c8dcb-*",
"source_namespace": "policy-demo",
"source_port": 42106,
"source_type": "wep",
"source_labels": {
"labels": [
"pod-template-hash=6b687c8dcb",
"app=access"
]
},
"dest_ip": "192.168.138.79",
"dest_name": "nginx-86c57db685-h6792",
"dest_name_aggr": "nginx-86c57db685-*",
"dest_namespace": "policy-demo",
"dest_port": 80,
"dest_type": "wep",
"dest_labels": {
"labels": [
"pod-template-hash=86c57db685",
"app=nginx"
]
},
"proto": "tcp",
"action": "allow",
"reporter": "dst",
"policies": {
"all_policies": [
"0|default|policy-demo/default.access-nginx|allow"
]
},
"bytes_in": 388,
"bytes_out": 1113,
"num_flows": 1,
"num_flows_started": 1,
"num_flows_completed": 1,
"packets_in": 6,
"packets_out": 5,
"http_requests_allowed_in": 0,
"http_requests_denied_in": 0,
"original_source_ips": null,
"num_original_source_ips": 0,
"host": "bz-n8kf-kadm-node-1",
"@timestamp": 1597166383000
}
```

The log shows an incoming connection reported by the "Destination" node, allowed by a policy on port 80. The flows in the log are grouped using a
5-minute aggregation interval, calculated as **`end_time`** - **`start_time`**. During this interval, one flow (**`"num_flow": 1`**) was recorded. At
higher aggregation levels, flows from endpoints performing the same operation and originating from the same Deployment/ReplicaSet are grouped into a
single log. In this example, the common source endpoints are prefixed with **`access-6b687c8dcb-`**. Parameters like **`source_ip`** may be dropped
and set to **`null`** depending on the aggregation level. As aggregation levels increase, more flows will be grouped together based on your data. For
more details on aggregation levels, see [configure flow log aggregation](./aggregation.mdx).
12 changes: 6 additions & 6 deletions calico-enterprise/visibility/elastic/flow/aggregation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ The following table summarizes the aggregation levels by flow log traffic:
| **Level** | **Name** | **Description** |
|-----------|-------------------------------------|-------------------------------------------------------------------|
| 0 | | No aggregation |
| 1 | AnyProcessInSameSourcePod | Identity fields below source pod level are masked out. It means that if multiple processes or containers, within the same source pod, perform the same operation, the events are aggregated. |
| 2 | AnyProcessInSameSourcePodPrefix | Identity fields below source pod-prefix level are masked out. It means that if multiple processes or containers, within pods with the same prefix, perform the same operation, the events are aggregated. |
| 3 | AnyProcessInSamePodPrefix | Identity fields below source and destination pod-prefix level are masked out. It means that if multiple processes or containers, within pods with the same prefix, perform the same operation towards pods with the same prefix, the events are aggregated. |
| 1 | AnyProcessInSameSourcePod | Identity fields below source pod level are masked out. It means that if multiple processes or containers, within the same source pod, perform the same operation, the events are aggregated. |
| 2 | AnyProcessInSameSourcePodPrefix | In addition to the above, source pod names are aggregated based on their shared prefixes. This means that flows, to the same destination, from pods within the same Deployment/ReplicaSet are aggregated together. |
| 3 | AnyProcessInSamePodPrefix | This level of aggregation builds on the previous two levels and also groups destination pod names based on their shared prefixes. |

### Understanding aggregation level differences

Expand All @@ -45,14 +45,14 @@ type minimizes the flow logs generated for traffic coming from different contain
and port. The two flows originating from `client-a` without aggregation are combined into one.

In Kubernetes, ReplicaSets and StatefulSets can automatically create names for pods. For example, the pods `nginx-1` and `nginx-2` are created by the
ReplicaSet nginx. The ReplicaSet name is considered a pod-prefix and is used to aggregate flow log entries (indicated with an asterisk * at the end
ReplicaSet `nginx`. The ReplicaSet name is considered a pod-prefix and is used to aggregate flow log entries (indicated with an asterisk * at the end
of the name). Flow logs originating from pods with the same prefix will be aggregated as long as the traffic is on the same protocol, and destined
towards the same IP, and destination port. The three flow logs without aggregation originating from `client-a` and `client-b` are combined into a
single flow log. This aggregation level is called `AnyProcessInSameSourcePodPrefix`.

Finally, with `AnyProcessInSamePodPrefix` we combine source and destination pods that are part of the same ReplicaSets. With level 3, the flow logs
are aggregated by the destination port and protocol, as long as they originate from pods with the same pod-prefix and destined for pods of the same
pod-prefix. All logs previously distinct, are aggregated with into a single flow log (see the last row).
pod-prefix. All logs previously distinct, are aggregated into a single flow log (see the last row).

| | | **Src Traffic** | | | **Dst Traffic** | | | **Packet counts** | |

Check failure on line 57 in calico-enterprise/visibility/elastic/flow/aggregation.mdx

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'Src'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'Src'?", "location": {"path": "calico-enterprise/visibility/elastic/flow/aggregation.mdx", "range": {"start": {"line": 57, "column": 44}}}, "severity": "ERROR"}

Check failure on line 57 in calico-enterprise/visibility/elastic/flow/aggregation.mdx

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Spelling] Did you really mean 'Dst'? Raw Output: {"message": "[Vale.Spelling] Did you really mean 'Dst'?", "location": {"path": "calico-enterprise/visibility/elastic/flow/aggregation.mdx", "range": {"start": {"line": 57, "column": 76}}}, "severity": "ERROR"}
|--------------------------|-----------|----------|---------|----------|----------|---------|----------|------------|-------------|
Expand Down Expand Up @@ -87,7 +87,7 @@ kubectl get felixconfiguration -o yaml
Before [changing the default aggregation level](../../../reference/resources/felixconfig.mdx#aggregationkind), note the following:

- Although any change in aggregation level affects flow log volume, lowering the aggregation number (especially to `0` for no aggregation) will cause significant impacts to log storage. If you allow more flow logs, ensure that you provision more log storage.
- Verify that the parameters that you want to see in your aggregation level, are not already [filtered](filtering.mdx)
- Verify that the parameters that you want to see in your aggregation level, are not already [filtered](filtering.mdx).

### Troubleshoot logs with aggregation levels

Expand Down
63 changes: 63 additions & 0 deletions calico-enterprise/visibility/elastic/flow/datatypes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,66 @@ Where,
action for the tier. In this case, the `<policy name>` is selected arbitrarily from the set of policies within
the tier that apply to the endpoint.
* `-2` means "unknown". The rule index was not recorded.

### Flow log example, with **no aggregation**

A flow log with aggregation level 0, **`no aggregation`**, might look like:

```
{
"start_time": 1597166083,
"end_time": 1597166383,
"source_ip": "192.168.47.9",
"source_name": "access-6b687c8dcb-zn5s2",
"source_name_aggr": "access-6b687c8dcb-*",
"source_namespace": "policy-demo",
"source_port": 42106,
"source_type": "wep",
"source_labels": {
"labels": [
"pod-template-hash=6b687c8dcb",
"app=access"
]
},
"dest_ip": "192.168.138.79",
"dest_name": "nginx-86c57db685-h6792",
"dest_name_aggr": "nginx-86c57db685-*",
"dest_namespace": "policy-demo",
"dest_port": 80,
"dest_type": "wep",
"dest_labels": {
"labels": [
"pod-template-hash=86c57db685",
"app=nginx"
]
},
"proto": "tcp",
"action": "allow",
"reporter": "dst",
"policies": {
"all_policies": [
"0|default|policy-demo/default.access-nginx|allow"
]
},
"bytes_in": 388,
"bytes_out": 1113,
"num_flows": 1,
"num_flows_started": 1,
"num_flows_completed": 1,
"packets_in": 6,
"packets_out": 5,
"http_requests_allowed_in": 0,
"http_requests_denied_in": 0,
"original_source_ips": null,
"num_original_source_ips": 0,
"host": "bz-n8kf-kadm-node-1",
"@timestamp": 1597166383000
}
```

The log shows an incoming connection reported by the "Destination" node, allowed by a policy on port 80. The flows in the log are grouped using a
5-minute aggregation interval, calculated as **`end_time`** - **`start_time`**. During this interval, one flow (**`"num_flow": 1`**) was recorded. At
higher aggregation levels, flows from endpoints performing the same operation and originating from the same Deployment/ReplicaSet are grouped into a
single log. In this example, the common source endpoints are prefixed with **`access-6b687c8dcb-`**. Parameters like **`source_ip`** may be dropped
and set to **`null`** depending on the aggregation level. As aggregation levels increase, more flows will be grouped together based on your data. For
more details on aggregation levels, see [configure flow log aggregation](./aggregation.mdx).
Loading

0 comments on commit 58110f2

Please sign in to comment.