You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Proxy transport between apiservers and authn](#proxy-transport-between-apiservers-and-authn)
30
29
-[Discovery Merging](#discovery-merging)
@@ -130,41 +129,46 @@ incorrectly or objects being garbage collected mistakenly.
130
129
131
130
### Goals
132
131
132
+
- Ensure that a request for built-in resources is handled by an apiserver that is
133
+
capable of serving that resource (if one exists)
134
+
- In the failure case (e.g. network not routable between apiservers), ensure that
135
+
unreachable resources are served 503 and not 404.
133
136
- Ensure discovery reports the same set of resources everywhere (not just group
134
-
versions, as it does today)
137
+
versions, as it does today)
135
138
- Ensure that every resource in discovery can be accessed successfully
136
-
- In the failure case (e.g. network not routable between apiservers), ensure
137
-
that unreachable resources are served 503 and not 404.
138
139
139
140
### Non-Goals
140
141
141
142
- Lock particular clients to particular versions
142
143
143
144
## Proposal
144
145
145
-
We will use the existing `StorageVersion` API to figure out which group, versions,
146
-
and resources an apiserver can serve.
146
+
We will use the existing [Aggregated Discovery](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/3352-aggregated-discovery/README.md)
147
+
mechanism to fetch which group, versions and resources an apiserver can serve.
147
148
148
149
API server change:
149
150
150
151
- A new handler is added to the stack:
151
-
152
-
- If the request is for a group/version/resource the apiserver doesn't have
153
-
locally (we can use the StorageVersion API), it will proxy the request to
154
-
one of the apiservers that is listed in the [ServerStorageVersion](https://github.com/kubernetes/kubernetes/blob/release-1.27/pkg/apis/apiserverinternal/types.go#L64)
155
-
object. If an apiserver fails
156
-
to respond, then we will return a 503 (there is a small
157
-
possibility of a race between the controller registering the apiserver
158
-
with the resources it can serve and receiving a request for a resource
159
-
that is not yet available on that apiserver).
152
+
If a request targets a group/version/resource the apiserver doesn't serve locally
153
+
(requiring a discovery request, which could be optimized with caching), the
154
+
apiserver will consult its informer cache of agg-discovery reported by peer apiservers.
155
+
This cache is populated and updated by an informer on apiserver lease objects.
156
+
The informer's event handler performs remote discovery calls to each peer apiserver
157
+
when its lease object is added or updated, ensuring the cache reflects the current
158
+
state of each peer's served resources. The apiserver uses this cache to identify
159
+
which peer serves the requested resource.
160
+
161
+
- Once it figures out a suitable peer to route the request to, it will proxy the
162
+
request to that server. If that apiserver fails to respond, then we will return
163
+
a 503 (there is a small possibility of a race between the controller registering
164
+
the apiserver with the resources it can serve and receiving a request for a
165
+
resource that is not yet available on that apiserver).
160
166
161
167
- Discovery merging:
162
168
163
169
- During upgrade or downgrade, it may be the case that no apiserver has a
164
170
complete list of available resources. To fix the problems mentioned, it's
165
-
necessary that discovery exactly matches the capability of the system. So,
166
-
we will use the storage version objects to reconstruct a merged discovery
167
-
document and serve that in all apiservers.
171
+
necessary that discovery exactly matches the capability of the system.
168
172
169
173
Why so much work?
170
174
@@ -276,76 +280,59 @@ This might be a good place to talk about core concepts and how they relate.
1. A new filter will be added to the [handler chain] of the aggregation layer.
279
-
This filter will maintain an internal map with
280
-
the key being the group-version-resource and the value being a list of server
281
-
IDs of apiservers that are capable of serving
282
-
that group-version-resource
283
-
1. This internal map is populated using an informer for StorageVersion objects.
284
-
An event handler will be added for this
285
-
informer that will get the apiserver ID of the requested group-version-resource
286
-
and update the internal map accordingly
283
+
This filter will maintain the following internal caches:
284
+
- a map that stores the resources served by the local apiserver for a
285
+
group-version(LocalDiscovery cache). This will be done via a discovery call
286
+
using a loopback client. A post-start hook will populate this cache, guaranteeing
287
+
the apiserver has a complete view of its served resources before processing
288
+
any incoming requests.
289
+
- an informer cache of resources served by each peer apiserver in the
290
+
cluster(PeerAggregatedDiscovery cache). This cache will be updated by an informer
291
+
on [apiserver identity Lease objects](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/1965-kube-apiserver-identity/README.md#proposal).
292
+
The informer's event handler will make discovery calls to peer apiservers
293
+
whose lease objects are created or updated (as a result of change in [holderIdentity](https://github.com/kubernetes/kubernetes/blob/release-1.32/staging/src/k8s.io/api/coordination/v1/types.go#L58) implying a server restart).
287
294
288
295
2. This filter will pass on the request to the next handler in the local aggregator
289
296
chain, if:
290
297
1. It is a non resource request
291
-
2. The StorageVersion informer cache hasn't synced yet or if `StorageVersionManager.Completed()`
292
-
has returned false. We
293
-
will serve error 503 in this case
294
-
3. The request has a header `X-Kubernetes-APIServer-Rerouted:true` that
295
-
indicates that this request has been proxied once
296
-
already. If for some reason the resource is not found locally, we will serve
297
-
error 503
298
-
4. No StorageVersion was retrieved for it, meaning the request is for an
299
-
aggregated API or for a custom resource
300
-
5. If the local apiserver ID is found in the list of serviceable-by server IDs
301
-
from the internal map
302
-
303
-
3. If the local apiserver ID is not found in the list of serviceable-by server
304
-
IDs, a random apiserver ID will be selected
305
-
from the retrieved list and the request will be proxied to this apiserver
306
-
307
-
4. If there is no apiserver ID retrieved for the requested GVR, we will serve
308
-
404 with error `GVR <group_version_resource> is
309
-
not served by anything in this cluster`
298
+
2. The LocalDiscovery cache or the apiserver identity lease informer hasn't synced
299
+
yet. We will serve error 503 in this case
300
+
3. The request has a header `X-Kubernetes-APIServer-Rerouted:true` that indicates
301
+
that this request has been proxied once already. If for some reason the
302
+
resource is not found locally, we will serve error 503
303
+
4. The requested resource was listed in the LocalDiscovery cache
304
+
5. No other peer apiservers were found to exist in the cluster
305
+
306
+
3. If the requested resource was not found in the LocalDiscovery cache, we will
307
+
try to fetch the resource from the PeerAggregatedDiscovery cache. The request
308
+
will then be proxied to any peer apiserver, selected randomly, thats found to be
309
+
able to serve the resource as indicated in the PeerAggregatedDiscovery cache.
310
+
1. There is a possibility of a race condition regarding creation/update of an
311
+
aggregated resource or a CRD and its registration in the LocalDiscovery cache.
312
+
If such a resource is not found in LocalDiscovery cache but is found in the PeerAggregatedDiscovery
313
+
cache, we will always route the request for this resource to the suitable peer.
314
+
315
+
4. If there is no eligible apiserver found in the PeerAggregatedDiscovery cache
316
+
for the requested resource, we will pass on the request to the next handler in the
317
+
handler chain. This will either
318
+
319
+
- be eventually handled by the apiextensions-apiserver or the
320
+
aggregated-apiserver if the request was for a custom resource or an
321
+
aggregated resource which was created/updated after we established both the
322
+
LocalDiscovery and the PeerAggregatedDiscovery caches
323
+
- be returned with a 404 Not Found error for cases when the resource doesn't exist
324
+
in the cluster
310
325
311
326
5. If the proxy call fails for network issues or any reason, we serve 503 with
312
-
error `Error while proxying request to
313
-
destination apiserver`
327
+
error `Error while proxying request to destination apiserver`
314
328
315
329
6. We will also add a poststarthook for the apiserver to ensure that it does not
StorageVersion API currently tells us whether a particular StorageVersion can be
324
-
read from etcd by the listed apiserver. We
325
-
will enhance this API to also include apiserver ID of the server that can serve
326
-
this StorageVersion.
327
-
328
-
With the enhancement, the new [ServerStorageVersion](https://github.com/kubernetes/kubernetes/blob/release-1.27/pkg/apis/apiserverinternal/types.go#L62-L73)
329
-
object will have this structure
330
-
331
-
```
332
-
type ServerStorageVersion struct {
333
-
// The ID of the reporting API server.
334
-
APIServerID string
335
-
336
-
// The API server encodes the object to this version
337
-
// when persisting it in the backend (e.g., etcd).
338
-
EncodingVersion string
339
-
340
-
// The API server can decode objects encoded in these versions.
341
-
// The encodingVersion must be included in the decodableVersions.
342
-
DecodableVersions []string
343
-
344
-
// Versions that can be served by the reporting API server
We will be performing dual writes of the ip and port information of the apiservers
@@ -643,8 +630,9 @@ Any change of default behavior may be surprising to users or break existing
643
630
automations, so be extremely careful here.
644
631
-->
645
632
646
-
Yes, requests for built-in resources at the time when a cluster is at mixed versions will be served with a default 503 error
647
-
instead of a 404 error, if the request is unable to be served.
633
+
Yes, requests for built-in resources at the time when a cluster is at mixed versions
634
+
will be served with a default 503 error instead of a 404 error, if the request
635
+
is unable to be served.
648
636
649
637
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
650
638
@@ -823,11 +811,8 @@ This section must be completed when targeting beta to a release.
823
811
824
812
No, but it does depend on
825
813
826
-
- the `StorageVersion` feature that generates objects with a `storageVersion.status.serverStorageVersions[*].apiServerID`
827
-
field which is used to find the remote apiserver's network location.
828
-
-`APIServerIdentity` feature in kube-apiserver that creates a lease object for
829
-
APIServerIdentity which we will use to store
830
-
the network location of the remote apiserver for visibility/debugging
814
+
-`APIServerIdentity` feature in kube-apiserver that creates a lease object for APIServerIdentity
815
+
which we will use to store the network location of the remote apiserver for visibility/debugging
831
816
832
817
<!--
833
818
Think about both cluster-level services (e.g. metrics-server) as well
@@ -871,7 +856,15 @@ Focusing mostly on:
871
856
heartbeats, leader election, etc.)
872
857
-->
873
858
874
-
No.
859
+
Yes, enabling this feature will result in new API calls. Specifically:
860
+
861
+
- Discovery calls via a loopback client: The local apiserver will use a loopback
862
+
client to discover the resources it serves for each group-version. This should
863
+
only happen once upon server startup.
864
+
- Remote discovery calls to peer apiservers: The event handler for apiserver identity
865
+
lease informer will make remote discovery calls to each peer apiserver whose
866
+
- identity lease is created
867
+
- identity lease is updated as a result of change in [holderIdentity](https://github.com/kubernetes/kubernetes/blob/release-1.32/staging/src/k8s.io/api/coordination/v1/types.go#L58) implying a server restart
875
868
876
869
###### Will enabling / using this feature result in introducing new API types?
877
870
@@ -916,9 +909,8 @@ Think about adding additional work or introducing new steps in between
0 commit comments