New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(config): add wildcard expansion to inputs #6170
Conversation
Closes #2550 This allows for very simple wildcards in input lists. The `*` must be the last character in the string and we do a simple prefix match against all valid inputs (i.e. self is excluded). We could choose to expand this to a more full-featured glob syntax in the future, but there were enough questions there that this seemed like a better start. It is also my expectation that this should cover >95% of valuable uses cases, so it may well be all we ever need. Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
One other note: this is technically a breaking change since we don't limit characters in component names and someone could have a source named |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty straightforward, but I have a couple of questions about the "shape" of the code.
src/config/compiler.rs
Outdated
@@ -82,3 +84,139 @@ pub(super) fn expand_macros(config: &mut Config) -> Result<(), Vec<String>> { | |||
Ok(()) | |||
} | |||
} | |||
|
|||
/// Expand trailing `*` wildcards in input lists | |||
fn expand_wildcards(config: &mut Config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this (and fn expand_macros
it appears) make more sense to be part of impl Config
since the only data used comes from the single parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is that this is only callable as part of the process of compiling a ConfigBuilder
into a Config
(i.e. private to this module). That could be much clearer, and probably should be. The original refactor to force a single path for building Config
s wasn't taken quite as far as it could have been and this is one place it shows. I'll look into making this a bit clearer.
src/config/compiler.rs
Outdated
} | ||
} | ||
|
||
fn expand_inner(inputs: &mut Vec<String>, name: &str, candidates: &[String]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest expand_wildcards
or expand_names
instead.
Could this be reworked to drop the mut
parameter and return impl Iterator<Item = String>
(or IntoIterator
), and then call names.extend(expand_wildcards(name, &candidates));
? I see the non-wildcard case complicates that.
This could also generate the list of sources on the fly, without creating the intermediate vector or cloning any strings until necessary:
config
.sources
.keys()
.chain(self.transforms.key())
.filter(|candidate| candidate.starts_with(prefix) && candidate != name)
.map(|candidate| candidate.clone())
Which prompts the question of it it too could be part of impl Config
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be reworked to drop the
mut
parameter and returnimpl Iterator<Item = String>
(orIntoIterator
), and then callnames.extend(expand_wildcards(name, &candidates));
?
I think the main bit that doesn't quite fit is that you need to remove the wildcard itself from the list, not just add the inputs that it expands into. There are definitely some helpers that could be extracted onto Config
or similar though.
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just some bikeshedding over how best to implement expand_wildcards_inner
src/config/compiler.rs
Outdated
fn expand_wildcards_inner(inputs: &mut Vec<String>, name: &str, candidates: &[String]) { | ||
let raw_inputs = inputs.drain(..).collect::<Vec<_>>(); | ||
for raw_input in raw_inputs { | ||
if raw_input.ends_with('*') { | ||
let prefix = &raw_input[0..raw_input.len() - 1]; | ||
for input in candidates { | ||
if input.starts_with(prefix) && input != name { | ||
inputs.push(input.clone()) | ||
} | ||
} | ||
} else { | ||
inputs.push(raw_input); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By breaking out the actual expansion, I think this can be simplified into some straighter iteration, which suggests the input doesn't need to be mut
at all:
fn expand_wildcards_inner(inputs: &mut Vec<String>, name: &str, candidates: &[String]) { | |
let raw_inputs = inputs.drain(..).collect::<Vec<_>>(); | |
for raw_input in raw_inputs { | |
if raw_input.ends_with('*') { | |
let prefix = &raw_input[0..raw_input.len() - 1]; | |
for input in candidates { | |
if input.starts_with(prefix) && input != name { | |
inputs.push(input.clone()) | |
} | |
} | |
} else { | |
inputs.push(raw_input); | |
} | |
} | |
} | |
fn expand_wildcards_inner(inputs: &mut Vec<String>, name: &str, candidates: &[String]) { | |
inputs = inputs.iter() | |
.map(|raw_input| expand_wildcards(&raw_input, name, candidates)) | |
.flatten() | |
.collect(); | |
} | |
// It would be nice to avoid the extra Vec allocation here, but it's not performance critical | |
fn expand_wildcard(raw_input: &String, name: &str, candidates: &[String]) -> Vec<String> { | |
if raw_input.ends_with('*') { | |
let prefix = &raw_input[0..raw_input.len() - 1]; | |
candidates | |
.iter() | |
.filter(|input| input.starts_with(prefix) && input != name) | |
.cloned() | |
.collect() | |
} else { | |
vec![raw_input.into()] | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I played around with this a bit and I'm honestly not finding much that I like better than the current state. What you have here is almost better, but it starts to get awkward with borrowing and mutability when you do inputs = inputs.iter()...
. I think what we have is good enough as-is, so I'm going to resist the temptation to spend any more time on it 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Everything other than the benchmarks has passed, the benchmarks have been running for two hours, and this doesn't touch any performance-sensitive code, so I'm going to go ahead and merge. |
A small thing, but we should have documented this feature. See |
#6170 had added support for wildcards in identifiers in vector configs, but only as the last character. This PR expands support to allow glob matching in the same way that `vector tap` does which allows for `*` to appear anywhere in the input name. Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
#6170 had added support for wildcards in identifiers in vector configs, but only as the last character. This PR expands support to allow glob matching in the same way that `vector tap` does which allows for `*` to appear anywhere in the input name. Closes #6786 Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
* enhancement(configuration): Allow wildcards anywhere in identifiers #6170 had added support for wildcards in identifiers in vector configs, but only as the last character. This PR expands support to allow glob matching in the same way that `vector tap` does which allows for `*` to appear anywhere in the input name. Closes #6786 Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Closes #2550
This allows for very simple wildcards in input lists. The
*
must be the last character in the string and we do a simple prefix match against all valid inputs (i.e. self is excluded). We could choose to expand this to a more full-featured glob syntax in the future, but there were enough questions there that this seemed like a better start. It is also my expectation that this should cover >95% of valuable use cases, so it may well be all we ever need.Signed-off-by: Luke Steensen luke.steensen@gmail.com