Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extended wildcard support for sotw #4

Closed
wants to merge 6 commits into from

Conversation

valerian-roche
Copy link
Owner

@valerian-roche valerian-roche commented May 16, 2022

This PR is implementing the new wildcard flow for sotw-xds
In order to limit the implementer complexity, I merge the logic of delta stream state with the delta stream state logic
At this stage the per-resource versioning is still ignored to avoid further impact, but this would be a natural next step to limit the amount of resources sent to envoy when only some resources are updated. It will anyway also require properly implementing the proper behavior of partial replies for non (lds|cds)

@valerian-roche
Copy link
Owner Author

This is passing tests but no unit tests added yet

@valerian-roche valerian-roche changed the base branch from vr/wildcard to main June 29, 2022 21:25
…ard mode

Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
@valerian-roche valerian-roche marked this pull request as ready for review July 5, 2022 21:56
@@ -122,6 +122,9 @@ type RawResponse struct {
// Resources to be included in the response.
Resources []types.ResourceWithTTL

// NextVersionMap consists of updated version mappings after this response is applied
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is a bit unclear to me, as it doesn't tell what exactly is in the key and value. Suggestion:

Suggested change
// NextVersionMap consists of updated version mappings after this response is applied
// NextVersionMap maps the resource name to the empty string for resources that were updated?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

if len(staleResources) == 0 {
resources = make([]types.ResourceWithTTL, 0, len(cache.resources))
for _, resource := range cache.resources {
resources := make([]types.ResourceWithTTL, 0, len(staleResources))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't you losing the behavior of sending all resources when staleResources is empty? IIUC you still rely on it e.g. at L341 in the wildcard case.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm following. respond is never called on wildcard streams
On non-wildcard streams we should not return anything is staleResources is empty

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm following. respond is never called on wildcard streams

Makes sense, I did not see that when I read the code and there's no indication that respond shouldn't be called in with wildcard. Documenting respond might help readers to understand its purpose.

@@ -320,34 +329,56 @@ func (cache *LinearCache) CreateWatch(request *Request, streamState stream.Strea
cache.mu.Lock()
defer cache.mu.Unlock()

// If the version is not up to date, check whether any requested resource has
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you move this block? I intuitively would have left it at L305 to

  1. make the diff smaller
  2. treat err != nil closer to where it is returned.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it back. Found it made it harder to follow objects as they are declared before other unrelated logic

if err != nil {
// The request does not include a version or the version could not be parsed.
// It will send all resources matching the request with no regards to the version.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid future tense in technical documentation. https://developers.google.com/style/tense

Suggested change
// It will send all resources matching the request with no regards to the version.
// Send all resources matching the request with no regards to the version.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

// been updated between the last version and the current version. This avoids the problem
// of sending empty updates whenever an irrelevant resource changes.
stale := false
var staleResources map[string]struct{} // empty means all
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the change from []string{} to map[string]struct{}? If I read the code right, you didn't change any access pattern, did you? It might make sense to change the signature of GetSubscribedResourceNames to return a slice instead of the map directly.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is that I am merging objects in-between the delta and sotw servers (which are rigorously identical outside of types and subscribe/unsubscribe
Here I aligned on delta as delta really uses the map for search. I can do a separate PR for that but given the review pace I'm not sure what would be preferred

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think review time is generally proportional to the size of PRs. The smaller the PRs, the faster the reviews. This is true in every project I see that actually does code review (and not just rubber stamping like in certain repos...).

// superset checks that all resources are listed in the names set.
func superset(names map[string]bool, resources map[string]types.ResourceWithTTL) error {
func superset(names map[string]struct{}, resources map[string]types.ResourceWithTTL) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about map[string]bool -> map[string]struct{}, you may prefer the former but I'd at least separate the diffs, and in general wouldn't bother changing that in somebody else's project since it's a matter of preference, especially if you're trying to optimize for fast reviews.

Comment on lines 588 to 598
// This code is duplicated from sotw/server.go
state := stream.NewStreamState(len(request.ResourceNames) == 0, nil)
wantedResources := make(map[string]struct{}, len(request.ResourceNames))
for _, resourceName := range request.ResourceNames {
if resourceName == "*" {
state.SetWildcard(true)
continue
}
wantedResources[resourceName] = struct{}{}
}
state.SetSubscribedResourceNames(wantedResources)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be extracted to a constructor, e.g.

// NewStreamState creates a new stream state based the set of requested resources.
func NewStreamState(resources []string) {
    ....
}

That way you don't have to duplicate the code in sotw.server.go.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is actually different in delta, so it's a bit complex to do
The xDS protocol created a weird exception on this behavior depending on the delta vs. sotw

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see how this can be updated, but not sure I can fully hide this complexity

response: responder,
})
if w.nonce != "" && w.nonce != nonce {
// The request received does not match the current state of the type, we ignore it and keep our current watch
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The request received does not match the current state of the type, we ignore it and keep our current watch
// The request received does not match the current state for the resource type, we ignore it and keep our current watch

Comment on lines 242 to 244
// The xDS protocol states that if there has never been any resource set, the request should be considered wildcard
// This would theoretically require keeping track on whether we ever became non-empty.
// As it is also technically allowed to return resources which have not been subscribed to, it is a best effort here.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this comment... Is the scenario you're thinking of:

- Stream requests some resources (not considered wildcard)?
- Then unsubscribes those resources, hence ending up with no resources
In theory this should not be considered wildcard, but for us it does and it's fine since we only end up sending too unrequested resources?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case is:

  • stream is not wildcard
  • then is made wildcard
  • then has no resource at all
    According to xDS this should be considered as "no resources", but this requires keeping a history of whether we ever left legacy mode (i.e. did we ever have resources set in any way, including *)
    As the protocol also states that it's okay to return too much, I chose to ignore this case (which should anyway basically never occur in envoy today)

Comment on lines 248 to 262
// When resources are provided, they may still include the wildcard symbol '*', as well as potentially other resources
// This allows the client to subscribe/unsubscribe to wildcard during the stream lifespan.
wantsWildcard := false
wantedResources := make(map[string]struct{}, len(resources))
for _, resourceName := range resources {
// We do not track '*' as a resource name to avoid confusion in further processing and rely on the IsWildcard method instead
if resourceName == "*" {
wantsWildcard = true
continue
}
wantedResources[resourceName] = struct{}{}
}
streamState.SetWildcard(wantsWildcard)
streamState.SetSubscribedResourceNames(wantedResources)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this logic be encapsulated inside streamstate?

Signed-off-by: Valerian Roche <valerian.roche@datadoghq.com>
@atollena
Copy link

atollena commented Jul 7, 2022

A few more things to make reviewing easier, beside the code:

  • Use the DataDog fork rather than your personal one. That makes it more obvious that this is code we use in production at DataDog and that it was written in a professional context, potentially with help from other engineers, solving a company problem, by opposition with something you hacked on the side for your side project.

  • Link to the relevant Github issue. If there is no issue, creating one would be worth it (ideally that should be done before opening the PR).

  • Edit the PR description to make it more structured. Generally the PR description should be structured as:

    1. what problem is this PR solving?
    2. how is it solving the problem? should readers know something before diving into the code?
    3. what's the results of this PR? performance gain, bug fixed, new feature implemented.
  • Here is my reaction reading the description:

Add extended wildcard support for sotw

good.

This PR is implementing the new wildcard flow for sotw-xds

First on the form: fix punctuation by adding periods at end of sentence, this shows that you proof-read yourself and that you value your reader's time.

Then: what is "the new wildcard flow"? Links to the relevant sections.

Suggestion:

In version 1.21.0, Envoy added support for * resource to make it a resource that can be subscribed to and unsubscribed
from at any time. See the Envoy 1.21.0 changelog [1] and the patch to the protocol overview [2].

This PR implements this behavior change in the state of the world xDS server (sotw). In order to limit the implementer complexity, I merge the logic of delta stream state with the delta stream state logic. At this stage the per-resource versioning is still ignored to avoid further impact, but this would be a natural next step to limit the amount of resources sent to envoy when only some resources are updated. It will anyway also require properly implementing the proper behavior of partial replies for non (lds|cds).

  • Add a description of what users can and cannot do before and after this commit.

[1] https://www.envoyproxy.io/docs/envoy/latest/version_history/v1.21/v1.21.0#incompatible-behavior-changes
[2] envoyproxy/envoy@b164962

@@ -122,6 +122,9 @@ type RawResponse struct {
// Resources to be included in the response.
Resources []types.ResourceWithTTL

// NextVersionMap maps the resource name to the empty string for resources that were included in the response.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's not always the empty string, right? Sometimes it's the version number, like at https://github.com/valerian-roche/go-control-plane/pull/4/files#diff-56212115e92b3629f0824a8e684c2ba8e1d70afc055edd1dd936aea206ccf707R457 ? How do readers differentiate those cases?

@valerian-roche
Copy link
Owner Author

Now handled on upstream repo

valerian-roche added a commit that referenced this pull request Jan 17, 2024
[sotw][linear] Fix missing watch cleanup in linear cache for sotw watches subscribing to multiple resources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants