Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update network flow visibility document #1677

Merged
merged 1 commit into from Dec 23, 2020

Conversation

srikartati
Copy link
Member

@srikartati srikartati commented Dec 18, 2020

Document flow aggregator feature details.

This will be checked in after PR #1671

@codecov-io
Copy link

codecov-io commented Dec 18, 2020

Codecov Report

Merging #1677 (10d21d1) into master (9d3d10b) will decrease coverage by 22.42%.
The diff coverage is 39.53%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #1677       +/-   ##
===========================================
- Coverage   63.31%   40.89%   -22.43%     
===========================================
  Files         170      106       -64     
  Lines       14250    13113     -1137     
===========================================
- Hits         9023     5362     -3661     
- Misses       4292     7275     +2983     
+ Partials      935      476      -459     
Flag Coverage Δ
kind-e2e-tests ?
unit-tests 40.89% <39.53%> (-0.39%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cmd/antrea-agent/agent.go 0.00% <0.00%> (ø)
pkg/agent/agent.go 12.20% <0.00%> (-36.51%) ⬇️
pkg/agent/agent_linux.go 0.00% <0.00%> (-100.00%) ⬇️
...g/agent/cniserver/interface_configuration_linux.go 5.79% <0.00%> (-14.44%) ⬇️
pkg/agent/cniserver/pod_configuration.go 34.52% <0.00%> (-19.24%) ⬇️
pkg/agent/config/node_config.go 0.00% <0.00%> (-100.00%) ⬇️
...gent/controller/noderoute/node_route_controller.go 32.57% <0.00%> (-13.90%) ⬇️
...agent/controller/traceflow/traceflow_controller.go 0.00% <0.00%> (-82.70%) ⬇️
pkg/agent/flowexporter/connections/conntrack.go 60.86% <0.00%> (-13.05%) ⬇️
pkg/agent/proxy/proxier_linux.go 0.00% <0.00%> (-25.00%) ⬇️
... and 164 more

@srikartati srikartati force-pushed the update_flow_agg_doc branch 2 times, most recently from 94b03a3 to 0bed060 Compare December 18, 2020 22:26
@srikartati srikartati added this to the Antrea v0.12.0 release milestone Dec 18, 2020
docs/network-flow-visibility.md Outdated Show resolved Hide resolved
docs/network-flow-visibility.md Outdated Show resolved Hide resolved
docs/network-flow-visibility.md Outdated Show resolved Hide resolved
parameters are set to 5s and 12, respectively. `flowCollectorAddr` is a required
parameter that is necessary for the Flow Exporter feature to work.
Please note that the default value for `flowCollectorAddr` is `"flow-aggregator.flow-aggregator.svc:4739:tcp"`,
which is based of DNS name, if Flow Aggregator service is deployed with the Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"which uses the DNS name of the flow aggregator Service"?

docs/network-flow-visibility.md Outdated Show resolved Hide resolved
### Configuration

The following configuration parameters have to be provided through the Flow Aggregator
configMap. `externalFlowCollectorAddr` is a required parameter that is necessary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfigMap

docs/network-flow-visibility.md Outdated Show resolved Hide resolved
docs/network-flow-visibility.md Outdated Show resolved Hide resolved
docs/network-flow-visibility.md Outdated Show resolved Hide resolved
docs/network-flow-visibility.md Outdated Show resolved Hide resolved

### Configuration

To enable the Flow Exporter feature at the Antrea Agent, the following config
parameters have to be set in the Antrea Agent ConfigMap as shown below. We provide
some examples for the parameter values in the following snippet.
parameters have to be set in the Antrea Agent ConfigMap as shown below.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a bit redundant considering "the following" at the beginning of the sentence

Currently, the Flow Exporter feature provides visibility for Pod-to-Pod, Pod-to-Service
and Pod-to-External network flows along with the associated statistics such as data
throughput (bits per second), packet throughput (packets per second), cumulative byte
count, cumulative packet count etc. Pod-To-Service flow visibility is supported
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer we specify the exact set of stats that are supported if not done elsewhere; then we can remove "etc".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


# Provide the IPFIX collector address as a string with format <HOST>:[<PORT>][:<PROTO>].
# HOST can either be the DNS name or the IP of the Flow Collector. For example,
# "flow-aggregator.flow-aggregator.svc" can be provided as DNS name to connect
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a DNS name


```yaml
flow-aggregator.conf: |
# Provide the flow collector address as string with format <IP>:<port>[:<proto>], where proto is tcp or udp.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a string

@srikartati srikartati changed the base branch from feature/flow-aggregator to master December 21, 2020 04:27
@srikartati
Copy link
Member Author

@jianjuns @kais66 Addressed comments.

and Pod-to-External network flows along with the associated statistics such as data
throughput (bits per second), packet throughput (packets per second), cumulative byte
count, cumulative packet count etc. Pod-To-Service flow visibility is supported
only [when Antrea Proxy enabled](feature-gates.md). In the future, we will extend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not addressed yet

maybe specify that this is now the default: Pod-To-Service flow visibility is supported only [when Antrea Proxy is enabled](feature-gates.md), which is the case by default starting with Antrea v0.11.

IPFIX protocol, and for this purpose we use the [go-ipfix](https://github.com/vmware/go-ipfix) library.
Connections from the connection store are exported to the [Flow Aggregator
service](#flow-aggregator) using the IPFIX protocol, and for this purpose we use
the IPFIX exporter process from [go-ipfix](https://github.com/vmware/go-ipfix) library.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the ...

@@ -67,25 +77,34 @@ some examples for the parameter values in the following snippet.
# Service traffic.
AntreaProxy: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is slightly outdated now that AntreaProxy is enabled by default

flow records that are exported from any given Antrea Agent, the Flow Exporter only
provides the information of Kubernetes entities that are local to the Antrea Agent.
In other words, flow records are only complete for intra-Node flows, but incomplete
for inter-Node flows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an extra sentence at the end here: It is the responsibility of the (Flow Aggregator)[#flow-aggregator] to correlate flows from source and destination Nodes and produce complete flow records.`

for inter-Node flows.

Flow Exporter is supported in IPv4 clusters, IPv6 clusters and dual-stack clusters.
Please note that Flow Aggregator is only supported for IPv4 clusters. We plan to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in IPv4 clusters" to match the previous sentence?

### IPFIX Information Elements (IEs) in an Aggregated Flow Record

In addition to IPFIX information elements provided in the [above section](#ipfix-information-elements-ies-in-a-flow-record),
the Flow Aggregator adds following fields to the flow records.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adds the following fields

Comment on lines +256 to +279
| IPFIX Information Element | Enterprise ID | Field ID | Type |
|-------------------------------------------|---------------|----------|-------------|
| packetTotalCountFromSourceNode | 56506 | 120 | unsigned64 |
| octetTotalCountFromSourceNode | 56506 | 121 | unsigned64 |
| packetDeltaCountFromSourceNode | 56506 | 122 | unsigned64 |
| octetDeltaCountFromSourceNode | 56506 | 123 | unsigned64 |
| reversePacketTotalCountFromSourceNode | 56506 | 124 | unsigned64 |
| reverseOctetTotalCountFromSourceNode | 56506 | 125 | unsigned64 |
| reversePacketDeltaCountFromSourceNode | 56506 | 126 | unsigned64 |
| reverseOctetDeltaCountFromSourceNode | 56506 | 127 | unsigned64 |
| packetTotalCountFromDestinationNode | 56506 | 128 | unsigned64 |
| octetTotalCountFromDestinationNode | 56506 | 129 | unsigned64 |
| packetDeltaCountFromDestinationNode | 56506 | 130 | unsigned64 |
| octetDeltaCountFromDestinationNode | 56506 | 131 | unsigned64 |
| reversePacketTotalCountFromDestinationNode| 56506 | 132 | unsigned64 |
| reverseOctetTotalCountFromDestinationNode | 56506 | 133 | unsigned64 |
| reversePacketDeltaCountFromDestinationNode| 56506 | 134 | unsigned64 |
| reverseOctetDeltaCountFromDestinationNode | 56506 | 135 | unsigned64 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the first time I am looking at this, and I am actually struggling a bit to understand what these do?

How are they different from existing IEs (packetTotalCount, reversePacketTotalCount, etc)? What does reversePacketTotalCountFromSourceNode mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For inter-Node flows, there is one stream of flow records from the source Node and another stream from the destination node. Aggregating stats across these two streams is not straight forward and they may not be similar in some scenarios, where there are intermittent losses or some sort of processing bottleneck at receiver.

To avoid loosing information,*SourceNode fields have aggregated statistics from the first stream of flow records and *DestinationNode fields have aggregated statistics from the second stream of flow records.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this standard in IPFIX mediation? If not, are we sure this is the best way to handle this case?

Anyway, it is independent of this PR so we can discuss further later.

#### Storage of Flow Records

Flow Aggregator stores the received flow records from Antrea Agents in a hash map,
where the flow key is five-tuple of a network connection. Five-tuple consists of Source IP,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the 5-tuple

#### Storage of Flow Records

Flow Aggregator stores the received flow records from Antrea Agents in a hash map,
where the flow key is five-tuple of a network connection. Five-tuple consists of Source IP,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never seen it spelled "five-tuple" :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought that would be more formal. Just did a search with "five-tuple" on google scholar, and there are 10K results :)
Changed to "5-tuple".. the easier convention.

from the source Node, where the flow originates from, and another one from the destination
Node, where the destination Pod resides. Both the flow records contain incomplete
information as mentioned [here](#types-of-flows-and-associated-information). Flow
Aggregator provides the support for the correlation of the flow records from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provides support

Copy link
Member Author

@srikartati srikartati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments @antoninbas. Addressed them.

and Pod-to-External network flows along with the associated statistics such as data
throughput (bits per second), packet throughput (packets per second), cumulative byte
count, cumulative packet count etc. Pod-To-Service flow visibility is supported
only [when Antrea Proxy enabled](feature-gates.md). In the future, we will extend
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed it here. There was extended in the context of IPv6 cluster support. Changed it there. Now done.

Comment on lines 208 to 220
If you are deploying the Flow Aggregator Service on a [vagrant setup](../test/e2e/README.md),
you can use following command that deploys Antrea and Flow Aggregator in one go
with the required configuration:

```shell
./infra/vagrant/push_antrea.sh -fc <externalFlowCollectorAddr>
```
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the manifest deployment steps here.

As there are a few steps involved in trying this feature, I thought pointing to a single script could be appealing for users, even though it is a vagrant setup. Moved these to a different section, which I feel more appropriate.

Comment on lines +256 to +279
| IPFIX Information Element | Enterprise ID | Field ID | Type |
|-------------------------------------------|---------------|----------|-------------|
| packetTotalCountFromSourceNode | 56506 | 120 | unsigned64 |
| octetTotalCountFromSourceNode | 56506 | 121 | unsigned64 |
| packetDeltaCountFromSourceNode | 56506 | 122 | unsigned64 |
| octetDeltaCountFromSourceNode | 56506 | 123 | unsigned64 |
| reversePacketTotalCountFromSourceNode | 56506 | 124 | unsigned64 |
| reverseOctetTotalCountFromSourceNode | 56506 | 125 | unsigned64 |
| reversePacketDeltaCountFromSourceNode | 56506 | 126 | unsigned64 |
| reverseOctetDeltaCountFromSourceNode | 56506 | 127 | unsigned64 |
| packetTotalCountFromDestinationNode | 56506 | 128 | unsigned64 |
| octetTotalCountFromDestinationNode | 56506 | 129 | unsigned64 |
| packetDeltaCountFromDestinationNode | 56506 | 130 | unsigned64 |
| octetDeltaCountFromDestinationNode | 56506 | 131 | unsigned64 |
| reversePacketTotalCountFromDestinationNode| 56506 | 132 | unsigned64 |
| reverseOctetTotalCountFromDestinationNode | 56506 | 133 | unsigned64 |
| reversePacketDeltaCountFromDestinationNode| 56506 | 134 | unsigned64 |
| reverseOctetDeltaCountFromDestinationNode | 56506 | 135 | unsigned64 |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For inter-Node flows, there is one stream of flow records from the source Node and another stream from the destination node. Aggregating stats across these two streams is not straight forward and they may not be similar in some scenarios, where there are intermittent losses or some sort of processing bottleneck at receiver.

To avoid loosing information,*SourceNode fields have aggregated statistics from the first stream of flow records and *DestinationNode fields have aggregated statistics from the second stream of flow records.

#### Storage of Flow Records

Flow Aggregator stores the received flow records from Antrea Agents in a hash map,
where the flow key is five-tuple of a network connection. Five-tuple consists of Source IP,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought that would be more formal. Just did a search with "five-tuple" on google scholar, and there are 10K results :)
Changed to "5-tuple".. the easier convention.

@srikartati srikartati force-pushed the update_flow_agg_doc branch 3 times, most recently from 8ad2365 to 5a05eda Compare December 21, 2020 21:36
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Connections from the connection store are exported to a flow collector using the
IPFIX protocol, and for this purpose we use the [go-ipfix](https://github.com/vmware/go-ipfix) library.
Connections from the connection store are exported to the [Flow Aggregator
service](#flow-aggregator) using the IPFIX protocol, and for this purpose we use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

service -> Service

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +256 to +279
| IPFIX Information Element | Enterprise ID | Field ID | Type |
|-------------------------------------------|---------------|----------|-------------|
| packetTotalCountFromSourceNode | 56506 | 120 | unsigned64 |
| octetTotalCountFromSourceNode | 56506 | 121 | unsigned64 |
| packetDeltaCountFromSourceNode | 56506 | 122 | unsigned64 |
| octetDeltaCountFromSourceNode | 56506 | 123 | unsigned64 |
| reversePacketTotalCountFromSourceNode | 56506 | 124 | unsigned64 |
| reverseOctetTotalCountFromSourceNode | 56506 | 125 | unsigned64 |
| reversePacketDeltaCountFromSourceNode | 56506 | 126 | unsigned64 |
| reverseOctetDeltaCountFromSourceNode | 56506 | 127 | unsigned64 |
| packetTotalCountFromDestinationNode | 56506 | 128 | unsigned64 |
| octetTotalCountFromDestinationNode | 56506 | 129 | unsigned64 |
| packetDeltaCountFromDestinationNode | 56506 | 130 | unsigned64 |
| octetDeltaCountFromDestinationNode | 56506 | 131 | unsigned64 |
| reversePacketTotalCountFromDestinationNode| 56506 | 132 | unsigned64 |
| reverseOctetTotalCountFromDestinationNode | 56506 | 133 | unsigned64 |
| reversePacketDeltaCountFromDestinationNode| 56506 | 134 | unsigned64 |
| reverseOctetDeltaCountFromDestinationNode | 56506 | 135 | unsigned64 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this standard in IPFIX mediation? If not, are we sure this is the best way to handle this case?

Anyway, it is independent of this PR so we can discuss further later.

@srikartati
Copy link
Member Author

/skip-all


| IPFIX Information Element | Enterprise ID | Field ID | Type |
|----------------------------|---------------|----------|-------------|
| originalExporterIPv4Address| 0 | 403 | ipv4Address |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also support originalExporterIPv6Address here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add it when we support IPv6?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Just realized even though it is the exporter address, it is still in flow aggregator scope. (sorry for the last minute review :))


![Flow Exporter Design](assets/flow_exporter.svg)
![Antrea Flow Visibility Design](assets/flow_visibility.svg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the arrow direction between flow exporter and conntrack mean? I would suppose flow exporter to get data from conntrack

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it signifies that flow exporter is polling conntrack module to get flow data.

@antoninbas antoninbas merged commit 6d6eb4a into antrea-io:master Dec 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants