Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(config): add wildcard expansion to inputs #6170

Merged
merged 5 commits into from Jan 27, 2021
Merged

Conversation

lukesteensen
Copy link
Member

Closes #2550

This allows for very simple wildcards in input lists. The * must be the last character in the string and we do a simple prefix match against all valid inputs (i.e. self is excluded). We could choose to expand this to a more full-featured glob syntax in the future, but there were enough questions there that this seemed like a better start. It is also my expectation that this should cover >95% of valuable use cases, so it may well be all we ever need.

Signed-off-by: Luke Steensen luke.steensen@gmail.com

Closes #2550

This allows for very simple wildcards in input lists. The `*` must be
the last character in the string and we do a simple prefix match against
all valid inputs (i.e. self is excluded). We could choose to expand this
to a more full-featured glob syntax in the future, but there were enough
questions there that this seemed like a better start. It is also my
expectation that this should cover >95% of valuable uses cases, so it
may well be all we ever need.

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
@lukesteensen
Copy link
Member Author

One other note: this is technically a breaking change since we don't limit characters in component names and someone could have a source named foo* and references to it would now expand to any inputs starting with foo. I don't feel like this is super likely and I chose not to add any limitations on components names in this PR, but it is definitely worth considering and at least highlighting in the release notes.

Copy link
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty straightforward, but I have a couple of questions about the "shape" of the code.

@@ -82,3 +84,139 @@ pub(super) fn expand_macros(config: &mut Config) -> Result<(), Vec<String>> {
Ok(())
}
}

/// Expand trailing `*` wildcards in input lists
fn expand_wildcards(config: &mut Config) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this (and fn expand_macros it appears) make more sense to be part of impl Config since the only data used comes from the single parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is that this is only callable as part of the process of compiling a ConfigBuilder into a Config (i.e. private to this module). That could be much clearer, and probably should be. The original refactor to force a single path for building Configs wasn't taken quite as far as it could have been and this is one place it shows. I'll look into making this a bit clearer.

}
}

fn expand_inner(inputs: &mut Vec<String>, name: &str, candidates: &[String]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest expand_wildcards or expand_names instead.

Could this be reworked to drop the mut parameter and return impl Iterator<Item = String> (or IntoIterator), and then call names.extend(expand_wildcards(name, &candidates));? I see the non-wildcard case complicates that.

This could also generate the list of sources on the fly, without creating the intermediate vector or cloning any strings until necessary:

    config
        .sources
        .keys()
        .chain(self.transforms.key())
        .filter(|candidate| candidate.starts_with(prefix) && candidate != name)
        .map(|candidate| candidate.clone())

Which prompts the question of it it too could be part of impl Config.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be reworked to drop the mut parameter and return impl Iterator<Item = String> (or IntoIterator), and then call names.extend(expand_wildcards(name, &candidates));?

I think the main bit that doesn't quite fit is that you need to remove the wildcard itself from the list, not just add the inputs that it expands into. There are definitely some helpers that could be extracted onto Config or similar though.

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Copy link
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some bikeshedding over how best to implement expand_wildcards_inner

Comment on lines 105 to 119
fn expand_wildcards_inner(inputs: &mut Vec<String>, name: &str, candidates: &[String]) {
let raw_inputs = inputs.drain(..).collect::<Vec<_>>();
for raw_input in raw_inputs {
if raw_input.ends_with('*') {
let prefix = &raw_input[0..raw_input.len() - 1];
for input in candidates {
if input.starts_with(prefix) && input != name {
inputs.push(input.clone())
}
}
} else {
inputs.push(raw_input);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By breaking out the actual expansion, I think this can be simplified into some straighter iteration, which suggests the input doesn't need to be mut at all:

Suggested change
fn expand_wildcards_inner(inputs: &mut Vec<String>, name: &str, candidates: &[String]) {
let raw_inputs = inputs.drain(..).collect::<Vec<_>>();
for raw_input in raw_inputs {
if raw_input.ends_with('*') {
let prefix = &raw_input[0..raw_input.len() - 1];
for input in candidates {
if input.starts_with(prefix) && input != name {
inputs.push(input.clone())
}
}
} else {
inputs.push(raw_input);
}
}
}
fn expand_wildcards_inner(inputs: &mut Vec<String>, name: &str, candidates: &[String]) {
inputs = inputs.iter()
.map(|raw_input| expand_wildcards(&raw_input, name, candidates))
.flatten()
.collect();
}
// It would be nice to avoid the extra Vec allocation here, but it's not performance critical
fn expand_wildcard(raw_input: &String, name: &str, candidates: &[String]) -> Vec<String> {
if raw_input.ends_with('*') {
let prefix = &raw_input[0..raw_input.len() - 1];
candidates
.iter()
.filter(|input| input.starts_with(prefix) && input != name)
.cloned()
.collect()
} else {
vec![raw_input.into()]
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I played around with this a bit and I'm honestly not finding much that I like better than the current state. What you have here is almost better, but it starts to get awkward with borrowing and mutability when you do inputs = inputs.iter().... I think what we have is good enough as-is, so I'm going to resist the temptation to spend any more time on it 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
@lukesteensen lukesteensen requested a review from a team January 27, 2021 21:11
@lukesteensen
Copy link
Member Author

Everything other than the benchmarks has passed, the benchmarks have been running for two hours, and this doesn't touch any performance-sensitive code, so I'm going to go ahead and merge.

@lukesteensen lukesteensen merged commit e777720 into master Jan 27, 2021
@lukesteensen lukesteensen deleted the input-wildcards branch January 27, 2021 23:24
@binarylogic
Copy link
Contributor

A small thing, but we should have documented this feature. See 55da1ed (#6434).

jszwedko added a commit that referenced this pull request Jun 15, 2021
#6170 had added support for
wildcards in identifiers in vector configs, but only as the last
character. This PR expands support to allow glob matching in the same
way that `vector tap` does which allows for `*` to appear anywhere in
the input name.

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
jszwedko added a commit that referenced this pull request Jun 15, 2021
#6170 had added support for
wildcards in identifiers in vector configs, but only as the last
character. This PR expands support to allow glob matching in the same
way that `vector tap` does which allows for `*` to appear anywhere in
the input name.

Closes #6786

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
jszwedko added a commit that referenced this pull request Jun 16, 2021
* enhancement(configuration): Allow wildcards anywhere in identifiers

#6170 had added support for
wildcards in identifiers in vector configs, but only as the last
character. This PR expands support to allow glob matching in the same
way that `vector tap` does which allows for `*` to appear anywhere in
the input name.

Closes #6786

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wildcards for input list ?
3 participants