Skip to content

Latest commit

 

History

History
449 lines (326 loc) · 13 KB

design.md

File metadata and controls

449 lines (326 loc) · 13 KB

Design of sart

Overview

Sart is Kubernetes network load-balancer and CNI plugin for Kubernetes using BGP in Rust. This project is inspired by Metallb and Coil.

Warning

This project is under experimental.

Programs

Sart has following progarams.

  • sart
    • CLI tool to control and describe sartd.
  • sartd
    • Daemon program that has following features.
      • bgp: BGP daemon
      • fib: FIB daemon
      • agent: Kubernetes agent running as DaemonSet
      • controller: Kubernetes custom controller running as Deployment
  • sart-cni
    • CLI tool for CNI
    • gRPC client to communicate with sartd agent

BGP

Sart's BGP feature(sartd-bgp) is based on RFC 4271 - A Border Gateway Protocol 4 (BGP-4). And sartd-bgp is configurable via gRPC interface.

Features

Sartd-bgp has following features. Now sartd-bgp support minimal features to work as BGP speaker.

  • eBGP
  • iBGP
  • 4 Bytes ASN
  • Policy injection
  • Multiple paths
  • IPv6 features
  • BGP unnumbered
  • Route Reflector

Architecture

This figure shows a basic model of sartd-bgp.

model

Sartd-bgp is the event driven model.

Peer handling

A peer has one event handler to wait for given events. And peer has channels to handle some events. Events include the following.

  • Message
  • Timer
  • Rib
  • Admin

Once an event is generated, it is queued to the channel. Then the event handler retrieves and processes its event. After that, according to the result, it move the state of BGP FSM.

Finite State Machine

BGP has an independent FSM for each peer. This diagram shows how BGP states move by given events. Please refer RFC 4271: Section 8.2.2 for details such as event numbers.

stateDiagram-v2
    Idle --> Connect: 1
    Idle --> Idle: others
    Connect --> Connect: 1,9,14
    Connect --> Active: 18
    Connect --> OpenSent: 16,17
    Connect --> Idle: others
    Active --> OpenSent: 16,17
    Active --> OpenConfirm: 19
    Active --> Connect: 9
    Active --> Active: 1,14
    Active --> Idle: others
    OpenSent --> OpenSent: 1,14,16,17
    OpenSent --> Active: 18
    OpenSent --> OpenConfirm: 19
    OpenSent --> Idle: others
    OpenConfirm --> OpenConfirm: 1,11,14,16,17
    OpenConfirm --> Established: 26
    OpenConfirm --> Idle: others
    Established --> Established: 1,11,26,27
    Established --> Idle: others

RIB(Routing Information Base)

This figure shows how sart-dbgp handle routing information. RIB-Manager has the event handler and wait for RIB event from peers.

When Adj-RIB-In gets some paths, its peer filters by Input policies, and publishes the event to store into the Loc-RIB. And Loc-RIB gets paths, Loc-RIB selects new best path and if the best path is changed, publishes the event to all peers. If a peer receives its event from RIB-Manager, it filter paths by Output policies, and store into Adj-RIB-Out.

rib-model

FIB

TBD

Kubernetes

Sart has Kubernetes network load-balancer feature. This is named sartd-kubernetes.

Sartd-kubernetes is Kubernetes custom controller.

Components

Sartd-kubernetes consists of following two components.

  • controller
  • agent

controller runs as Deployment to manage non node local resources such as Service and EndpointSlice. And this also manages admission webhooks. This is responsible to control load-balancer addresses.

agent runs as DaemonSet on each node to control node local resources. This interact with the node local BGP speaker to announce load-balancer addresses to external networks.

Custom Resource Definition

Sartd-kubernetes defines following custom resources.

  • AddressPool
  • AddressBlock
  • ClusterBGP
  • NodeBGP
  • BGPPeer
  • BGPPeerTemplate
  • BGPAdvertisement

These CRD's API group and version is sart.terassyi.net, v1alpha2

This figure shows the relationships between CRDs.

kubernetes-model

AddressPool

AddressPool defines a CIDR usable in itself. This has type parameter to specify for which this pool will be use. This also has allocType parameter to choose a method to allocate.

apiVersion: sart.terassyi.net/v1alpha2
kind: AddressPool
metadata:
  name: default-lb-pool
spec:
  cidr: 10.0.1.0/24
  type: service
  allocType: bit
  blockSize: 24
  autoAssign: true

AddressBlock

AddressBlock defines a subset of the AddressPool and its CIDR must be contained by AddressPool's CIDR. The allocator entity is tied to AddressBlock.

This is automatically generated by AddressPool.

If the parent AddressPool's type is service, AddressPool must be one for its pool.

apiVersion: sart.terassyi.net/v1alpha2
kind: AddressBlock
metadata:
  name: non-default-lb-pool
spec:
  autoAssign: false
  cidr: 10.0.100.0/24
  nodeRef: null
  poolRef: non-default-lb-pool
  type: service

ClusterBGP

ClusterBGP defines a cluster wide BGP configurations. We can select nodes to be configurable by nodeSelector.

This can template node local ASN and Router-Id settings. And this also provide a way to template BGP peer settings.

apiVersion: sart.terassyi.net/v1alpha2
kind: ClusterBGP
metadata:
  name: clusterbgp-sample-a
spec:
  nodeSelector:
    bgp: a
  asnSelector:
    from: label
  routerIdSelector:
    from: internalAddress
  speaker:
    path: 127.0.0.1:5000
  peers:
    - peerTemplateRef: bgppeertemplate-sample
      nodeBGPSelector:
        bgp: a

NodeBGP

NodeBGP defines a node local BGP configuration.

This is automatically generated by ClusterBGP. This is managed by agent.

apiVersion: sart.terassyi.net/v1alpha2
kind: NodeBGP
metadata:
  labels:
    bgp: a
  name: sart-worker
spec:
  asn: 65000
  peers:
  - name: bgppeertemplate-sample-sart-worker-65000-9.9.9.9
    ...
  routerId: 172.18.0.2
  speaker:
    path: 127.0.0.1:5000

BGPPeer

BGPPeer defines a local BGP peer abstraction. This resource will be created by NodeBGP and we can also create manually.

This is manged by agent.

apiVersion: sart.terassyi.net/v1alpha2
kind: BGPPeer
metadata:
  labels:
    bgp: a
    bgppeer.sart.terassyi.net/node: sart-worker
  name: bgppeer-sample-sart-worker
spec:
  addr: 9.9.9.9
  asn: 65000
  nodeBGPRef: sart-worker
  speaker:
    path: 127.0.0.1:5000

BGPPeerTemplate

BGPPeerTemplate defines a template BGP peer configuration.

apiVersion: sart.terassyi.net/v1alpha2
kind: BGPPeerTemplate
metadata:
  name: bgppeertemplate-sample
spec:
  asn: 65000
  addr: 9.9.9.9
  groups:
    - to-router0

BGPAdvertisement

BGPAdvertisement defines a prefix announced by BGPPeers. This is automatically generated when the controller allocates new addresses.

apiVersion: sart.terassyi.net/v1alpha2
kind: BGPAdvertisement
metadata:
  labels:
    kubernetes.io/service-name: app-svc-cluster
  name: app-svc-cluster-hv4l5-10.0.1.0
  namespace: test
spec:
  attrs: null
  cidr: 10.0.1.0/32
  protocol: ipv4
  type: service

IP Address Management(IPAM)

Service type LoadBalancer

Sartd-kubernetes has IPAM for Service type LoadBalancer.

AddressPool

To allocate IP addresses to LoadBalancers, we can create pools of IP addresses. To create pools for LoadBalancer, we have to apply AddressPool resource with type: service.

We can create multiple pools.

Choosing pools

We create multiple pools in one cluster. If there are multiple pools, we can choose the pool from which the controller allocates to the LoadBalancer. Choosing the pool, we have to add the annotation.

When we specify multiple pools in the annotation, we can assign multiple addresses from specified pools.

We also define a default assignable pool to specify autoAssign: true in the spec. If we want to use the pool its autoAssign is true, we can omit to specify its pool name in the annotation. We cannot create multiple default assignable pools.

When we specify multiple pools, and one of specified pools is default assignable, we must specify its name.

Requesting specific addresses

We can assign specific addresses to the LoadBalancer. To specify it, we also use the annotation.

We can give multiple addresses from one pool or multiple pools.

AddressBlock

AddressBlock is automatically generated by AddressPool and is a subset of a pool. In case of a pool for LoadBalancer(type: service), AddressBlock is created only one by the pool. Therefore, CIDR field of AddressBlock must be equal to AddressPool's CIDR.

And its allocator is managed by the controller.

Allocating and Releasing LoadBalancer addresses

Sartd-kubernetes's controller watches Service and EndpointSlice resources. When it detects the event for LoadBalancer, it triggers the reconciliation logic.

First, when the LoadBalancer is created, the controller picks addresses from desired pools if addresses are requested, it picks its addresses or if not, automatically. If its addresses are valid, the controller allocates these. And it updates the status(status.loadBalancer.ingress[].ip) of given Service resource.

Announcing addresses

The controller creates BGPAdvertisement resources depending on allocated addresses and the externalTrafficPolicy of the LoadBalancer.

When externalTrafficPolicy is Cluster, advertisements must have all peers which matches group label. If group label is not specified, all existing peers are picked.

When externalTrafficPolicy is Local, advertisements must have peers on the node where backend pods exist. And target peers must be changed depending on the change of backend pods scheduling.

CNI for Kubernetes

Sart has a CNI plugin feature for Kubernetes. This feature is named sart-cni.

Components

Sart-cni related programs are below.

  • controller
  • agent
  • bgp
  • sartcni

controller and agent is the same used in sartd-kubernetes. sartcni is CNI interface that is called by kubelet.

Architecture

Sart-cni is also the Custom Resource Definition(CRD) based architecture like sart-kubernetes. And sart-cni shares almost resources with sart-kubernetes.

The following figure shows the CRDs model of sart-cni.

kubernetes-cni-model.drawio.svg

The difference between sart-kubernetes's model is that AddressBlock resources belong to nodes. And agent creates a BGPAdvertisement resource that targets only peers that is on the same node as the node on its agent.

IP Address Management(IPAM)

Sart-cni has the IPAM feature for Pods.

AddressPool

To allocate IP addresses to pods, we have to create at least one AddressPool resource with type: pod. We also create multiple pools for pods.

Choosing pools

As described above, we can create multiple pools in one cluster. If there are multiple pools, we can choose the pool we want to use per a pod. To choose the pool, we have to specify the annotation to the pod.

Unlike the case of LoadBalancer, we cannot specify multiple pools to one pod.

Even if its annotation is not specified, we have to assign an address to a pod. Therefore, we can make some pool default assignable to specify autoAssign: true in the spec. The default pool must be one in one cluster.

AddressBlock

AddressBlock is automatically created by AddressPool and is a subset of its pool. In case of a pool for Pod(type: pod), AddressBlocks are created according to the number of nodes and belong to its node.

A BGPAdvertisement is created per AddressBlock. And its CIDR field is same as the AddressBlocks's CIDR.

When some pod is scheduled on the some node, its address is picked from the AddressBlock belongs to its node.

Internal

Sart-cni is implemented based on CNI Specification. Supported CNI versions are v1.0.0, v0.4.0 and v0.3.1.

The program satisfy this specification is sartcni. Sartcni is a stand-alone executable binary. And this is called by kubelet to create a pod and delegates a given request to sartd-agent via gRPC API like following.

cni-internal.drawio.svg

The flow looks like following.

  1. The admin creates the AddressPool.
  2. The user creates a pod.
  3. Kubelet on a node creates container processes.
  4. Kubelet(or container runtime) call sart-cni binary to set up the interface in the container.
  5. Sart-cni calls gRPC API to the sartd-agent on the same node to configure the interface.
  6. Sartd-agent gets the desired pool to assign an IP address and refers to the block on the node.
  7. If a block doesn't exist on the node, sartd-agent requests to allocate and sart-controller creates it.
  8. Sartd-agent assigns an IP address to the pod from the block and configures an interface and routing information.
  9. Sartd-agent returns the result to sart-cni and sart-cni also propagates its result to the caller.