Skip to content

Commit fdeb548

Browse files
committed
Replace StorageversionAPI with AggregatedDisovery to fetch served resources by peer apiservers
1 parent 489c055 commit fdeb548

File tree

1 file changed

+80
-88
lines changed
  • keps/sig-api-machinery/4020-unknown-version-interoperability-proxy

1 file changed

+80
-88
lines changed

keps/sig-api-machinery/4020-unknown-version-interoperability-proxy/README.md

+80-88
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@ tags, and then generate with `hack/update-toc.sh`.
2424
- [Risks and Mitigations](#risks-and-mitigations)
2525
- [Design Details](#design-details)
2626
- [Aggregation Layer](#aggregation-layer)
27-
- [StorageVersion enhancement needed](#storageversion-enhancement-needed)
2827
- [Identifying destination apiserver's network location](#identifying-destination-apiservers-network-location)
2928
- [Proxy transport between apiservers and authn](#proxy-transport-between-apiservers-and-authn)
3029
- [Discovery Merging](#discovery-merging)
@@ -130,41 +129,46 @@ incorrectly or objects being garbage collected mistakenly.
130129

131130
### Goals
132131

132+
- Ensure that a request for built-in resources is handled by an apiserver that is
133+
capable of serving that resource (if one exists)
134+
- In the failure case (e.g. network not routable between apiservers), ensure that
135+
unreachable resources are served 503 and not 404.
133136
- Ensure discovery reports the same set of resources everywhere (not just group
134-
versions, as it does today)
137+
versions, as it does today)
135138
- Ensure that every resource in discovery can be accessed successfully
136-
- In the failure case (e.g. network not routable between apiservers), ensure
137-
that unreachable resources are served 503 and not 404.
138139

139140
### Non-Goals
140141

141142
- Lock particular clients to particular versions
142143

143144
## Proposal
144145

145-
We will use the existing `StorageVersion` API to figure out which group, versions,
146-
and resources an apiserver can serve.
146+
We will use the existing [Aggregated Discovery](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/3352-aggregated-discovery/README.md)
147+
mechanism to fetch which group, versions and resources an apiserver can serve.
147148

148149
API server change:
149150

150151
- A new handler is added to the stack:
151-
152-
- If the request is for a group/version/resource the apiserver doesn't have
153-
locally (we can use the StorageVersion API), it will proxy the request to
154-
one of the apiservers that is listed in the [ServerStorageVersion](https://github.com/kubernetes/kubernetes/blob/release-1.27/pkg/apis/apiserverinternal/types.go#L64)
155-
object. If an apiserver fails
156-
to respond, then we will return a 503 (there is a small
157-
possibility of a race between the controller registering the apiserver
158-
with the resources it can serve and receiving a request for a resource
159-
that is not yet available on that apiserver).
152+
If a request targets a group/version/resource the apiserver doesn't serve locally
153+
(requiring a discovery request, which could be optimized with caching), the
154+
apiserver will consult its informer cache of agg-discovery reported by peer apiservers.
155+
This cache is populated and updated by an informer on apiserver lease objects.
156+
The informer's event handler performs remote discovery calls to each peer apiserver
157+
when its lease object is added or updated, ensuring the cache reflects the current
158+
state of each peer's served resources. The apiserver uses this cache to identify
159+
which peer serves the requested resource.
160+
161+
- Once it figures out a suitable peer to route the request to, it will proxy the
162+
request to that server. If that apiserver fails to respond, then we will return
163+
a 503 (there is a small possibility of a race between the controller registering
164+
the apiserver with the resources it can serve and receiving a request for a
165+
resource that is not yet available on that apiserver).
160166

161167
- Discovery merging:
162168

163169
- During upgrade or downgrade, it may be the case that no apiserver has a
164170
complete list of available resources. To fix the problems mentioned, it's
165-
necessary that discovery exactly matches the capability of the system. So,
166-
we will use the storage version objects to reconstruct a merged discovery
167-
document and serve that in all apiservers.
171+
necessary that discovery exactly matches the capability of the system.
168172

169173
Why so much work?
170174

@@ -276,76 +280,59 @@ This might be a good place to talk about core concepts and how they relate.
276280
![Alt text](https://user-images.githubusercontent.com/26771552/244544622-8ade44db-b22b-4f26-880d-3eee5bc1f913.png?raw=true "Optional Title")
277281

278282
1. A new filter will be added to the [handler chain] of the aggregation layer.
279-
This filter will maintain an internal map with
280-
the key being the group-version-resource and the value being a list of server
281-
IDs of apiservers that are capable of serving
282-
that group-version-resource
283-
1. This internal map is populated using an informer for StorageVersion objects.
284-
An event handler will be added for this
285-
informer that will get the apiserver ID of the requested group-version-resource
286-
and update the internal map accordingly
283+
This filter will maintain the following internal caches:
284+
- a map that stores the resources served by the local apiserver for a
285+
group-version(LocalDiscovery cache). This will be done via a discovery call
286+
using a loopback client. A post-start hook will populate this cache, guaranteeing
287+
the apiserver has a complete view of its served resources before processing
288+
any incoming requests.
289+
- an informer cache of resources served by each peer apiserver in the
290+
cluster(PeerAggregatedDiscovery cache). This cache will be updated by an informer
291+
on [apiserver identity Lease objects](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/1965-kube-apiserver-identity/README.md#proposal).
292+
The informer's event handler will make discovery calls to peer apiservers
293+
whose lease objects are created or updated (as a result of change in [holderIdentity](https://github.com/kubernetes/kubernetes/blob/release-1.32/staging/src/k8s.io/api/coordination/v1/types.go#L58) implying a server restart).
287294

288295
2. This filter will pass on the request to the next handler in the local aggregator
289296
chain, if:
290297
1. It is a non resource request
291-
2. The StorageVersion informer cache hasn't synced yet or if `StorageVersionManager.Completed()`
292-
has returned false. We
293-
will serve error 503 in this case
294-
3. The request has a header `X-Kubernetes-APIServer-Rerouted:true` that
295-
indicates that this request has been proxied once
296-
already. If for some reason the resource is not found locally, we will serve
297-
error 503
298-
4. No StorageVersion was retrieved for it, meaning the request is for an
299-
aggregated API or for a custom resource
300-
5. If the local apiserver ID is found in the list of serviceable-by server IDs
301-
from the internal map
302-
303-
3. If the local apiserver ID is not found in the list of serviceable-by server
304-
IDs, a random apiserver ID will be selected
305-
from the retrieved list and the request will be proxied to this apiserver
306-
307-
4. If there is no apiserver ID retrieved for the requested GVR, we will serve
308-
404 with error `GVR <group_version_resource> is
309-
not served by anything in this cluster`
298+
2. The LocalDiscovery cache or the apiserver identity lease informer hasn't synced
299+
yet. We will serve error 503 in this case
300+
3. The request has a header `X-Kubernetes-APIServer-Rerouted:true` that indicates
301+
that this request has been proxied once already. If for some reason the
302+
resource is not found locally, we will serve error 503
303+
4. The requested resource was listed in the LocalDiscovery cache
304+
5. No other peer apiservers were found to exist in the cluster
305+
306+
3. If the requested resource was not found in the LocalDiscovery cache, we will
307+
try to fetch the resource from the PeerAggregatedDiscovery cache. The request
308+
will then be proxied to any peer apiserver, selected randomly, thats found to be
309+
able to serve the resource as indicated in the PeerAggregatedDiscovery cache.
310+
1. There is a possibility of a race condition regarding creation/update of an
311+
aggregated resource or a CRD and its registration in the LocalDiscovery cache.
312+
If such a resource is not found in LocalDiscovery cache but is found in the PeerAggregatedDiscovery
313+
cache, we will always route the request for this resource to the suitable peer.
314+
315+
4. If there is no eligible apiserver found in the PeerAggregatedDiscovery cache
316+
for the requested resource, we will pass on the request to the next handler in the
317+
handler chain. This will either
318+
319+
- be eventually handled by the apiextensions-apiserver or the
320+
aggregated-apiserver if the request was for a custom resource or an
321+
aggregated resource which was created/updated after we established both the
322+
LocalDiscovery and the PeerAggregatedDiscovery caches
323+
- be returned with a 404 Not Found error for cases when the resource doesn't exist
324+
in the cluster
310325

311326
5. If the proxy call fails for network issues or any reason, we serve 503 with
312-
error `Error while proxying request to
313-
destination apiserver`
327+
error `Error while proxying request to destination apiserver`
314328

315329
6. We will also add a poststarthook for the apiserver to ensure that it does not
316-
start serving requests until we are done
317-
creating/updating SV objects
330+
start serving requests until
331+
* we have populated the LocalDiscovery cache
332+
* apiserver identity informer is synced
318333

319334
[handler chain]:https://github.com/kubernetes/kubernetes/blob/fc8f5a64106c30c50ee2bbcd1d35e6cd05f63b00/staging/src/k8s.io/apiserver/pkg/server/config.go#L639
320335

321-
#### StorageVersion enhancement needed
322-
323-
StorageVersion API currently tells us whether a particular StorageVersion can be
324-
read from etcd by the listed apiserver. We
325-
will enhance this API to also include apiserver ID of the server that can serve
326-
this StorageVersion.
327-
328-
With the enhancement, the new [ServerStorageVersion](https://github.com/kubernetes/kubernetes/blob/release-1.27/pkg/apis/apiserverinternal/types.go#L62-L73)
329-
object will have this structure
330-
331-
```
332-
type ServerStorageVersion struct {
333-
// The ID of the reporting API server.
334-
APIServerID string
335-
336-
// The API server encodes the object to this version
337-
// when persisting it in the backend (e.g., etcd).
338-
EncodingVersion string
339-
340-
// The API server can decode objects encoded in these versions.
341-
// The encodingVersion must be included in the decodableVersions.
342-
DecodableVersions []string
343-
344-
// Versions that can be served by the reporting API server
345-
ServedVersions []string
346-
}
347-
```
348-
349336
#### Identifying destination apiserver's network location
350337

351338
We will be performing dual writes of the ip and port information of the apiservers
@@ -643,8 +630,9 @@ Any change of default behavior may be surprising to users or break existing
643630
automations, so be extremely careful here.
644631
-->
645632

646-
Yes, requests for built-in resources at the time when a cluster is at mixed versions will be served with a default 503 error
647-
instead of a 404 error, if the request is unable to be served.
633+
Yes, requests for built-in resources at the time when a cluster is at mixed versions
634+
will be served with a default 503 error instead of a 404 error, if the request
635+
is unable to be served.
648636

649637
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
650638

@@ -823,11 +811,8 @@ This section must be completed when targeting beta to a release.
823811

824812
No, but it does depend on
825813

826-
- the `StorageVersion` feature that generates objects with a `storageVersion.status.serverStorageVersions[*].apiServerID`
827-
field which is used to find the remote apiserver's network location.
828-
- `APIServerIdentity` feature in kube-apiserver that creates a lease object for
829-
APIServerIdentity which we will use to store
830-
the network location of the remote apiserver for visibility/debugging
814+
- `APIServerIdentity` feature in kube-apiserver that creates a lease object for APIServerIdentity
815+
which we will use to store the network location of the remote apiserver for visibility/debugging
831816

832817
<!--
833818
Think about both cluster-level services (e.g. metrics-server) as well
@@ -871,7 +856,15 @@ Focusing mostly on:
871856
heartbeats, leader election, etc.)
872857
-->
873858

874-
No.
859+
Yes, enabling this feature will result in new API calls. Specifically:
860+
861+
- Discovery calls via a loopback client: The local apiserver will use a loopback
862+
client to discover the resources it serves for each group-version. This should
863+
only happen once upon server startup.
864+
- Remote discovery calls to peer apiservers: The event handler for apiserver identity
865+
lease informer will make remote discovery calls to each peer apiserver whose
866+
- identity lease is created
867+
- identity lease is updated as a result of change in [holderIdentity](https://github.com/kubernetes/kubernetes/blob/release-1.32/staging/src/k8s.io/api/coordination/v1/types.go#L58) implying a server restart
875868

876869
###### Will enabling / using this feature result in introducing new API types?
877870

@@ -916,9 +909,8 @@ Think about adding additional work or introducing new steps in between
916909
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
917910
-->
918911

919-
When handling a request in the handler chain of the kube-aggregator, the
920-
StorageVersion informer will be used to look up
921-
which API servers can serve the requested resource.
912+
The Local Discovery and Remote Discovery caches should take care of not causing delays while
913+
handling a request.
922914

923915
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
924916

0 commit comments

Comments
 (0)