feat(pipelines transform): load and handle pipelines tranforms #9733

jdrouet · 2021-10-21T09:11:24Z

Resources

Design document

How it works

The pipelines transform will expand into the following graph

What's left to do

implement loading configuration
implement configuration expanding
add transform documentation

What's left to be done (other PR)

automatic namespacing
add filter at pipeline level
add route mode
add traces path

For those who want to change the chart

graph TD
    SOURCE_A[sources.a] --> FILTER_LOG
    SOURCE_B[sources.b] --> FILTER_LOG
    SOURCE_A --> FILTER_METRIC
    SOURCE_B --> FILTER_METRIC
    subgraph Pipelines Transform
    subgraph Log Route
    FILTER_LOG[transforms.foo.logs.filter] --> LOG_A_FILTER

    subgraph Log Pipeline A 
    LOG_A_FILTER[transforms.foo.logs.pipelines.a.filter] --> LOG_A_TRANSFORM_0
    LOG_A_TRANSFORM_0[transforms.foo.logs.pipelines.a.transforms.0] --> LOG_A_TRANSFORM_1
    LOG_A_TRANSFORM_1[transforms.foo.logs.pipelines.a.transforms.1] --> LOG_A_TRANSFORM_ALIAS
    LOG_A_TRANSFORM_ALIAS{{transforms.foo.logs.pipelines.a}}
    end

    subgraph Log Pipeline B
    LOG_A_TRANSFORM_ALIAS --> LOG_B_FILTER
    LOG_B_FILTER[transforms.foo.logs.pipelines.b.filter] --> LOG_B_TRANSFORM_0
    LOG_B_TRANSFORM_0[transforms.foo.logs.pipelines.b.transforms.1] --> LOG_B_TRANSFORM_1
    LOG_B_TRANSFORM_1[transforms.foo.logs.pipelines.b.transforms.1] --> LOG_B_TRANSFORM_ALIAS
    LOG_B_TRANSFORM_ALIAS{{transforms.foo.logs.pipelines.b}}
    end

    LOG_B_TRANSFORM_ALIAS --> LOG_ALIAS{{transforms.foo.logs}}
    end

    subgraph Metric Route
    FILTER_METRIC[transforms.foo.metrics.filter]--> METRIC_A_FILTER

    subgraph Metric Pipeline A 
    METRIC_A_FILTER[transforms.foo.metrics.pipelines.a.filter] --> METRIC_A_TRANSFORM_0
    METRIC_A_TRANSFORM_0[transforms.foo.metrics.pipelines.a.transforms.0] --> METRIC_A_TRANSFORM_1
    METRIC_A_TRANSFORM_1[transforms.foo.metrics.pipelines.a.transforms.1] --> METRIC_A_TRANSFORM_ALIAS
    METRIC_A_TRANSFORM_ALIAS{{transforms.foo.metrics.pipelines.a}}
    end
    subgraph Metric Pipeline B
    METRIC_A_TRANSFORM_ALIAS --> METRIC_B_FILTER
    METRIC_B_FILTER[transforms.foo.metrics.pipelines.b.filter] --> METRIC_B_TRANSFORM_0
    METRIC_B_TRANSFORM_0[transforms.foo.metrics.pipelines.b.transforms.0] --> METRIC_B_TRANSFORM_1
    METRIC_B_TRANSFORM_1[transforms.foo.metrics.pipelines.b.transforms.1] --> METRIC_B_TRANSFORM_ALIAS
    METRIC_B_TRANSFORM_ALIAS{{transforms.foo.metrics.pipelines.b}}
    end

    METRIC_B_TRANSFORM_ALIAS --> METRIC_ALIAS{{transforms.foo.metrics}}
    end

    LOG_ALIAS --> AGGREGATE
    METRIC_ALIAS --> AGGREGATE
    AGGREGATE{{transforms.foo}}
    end

netlify · 2021-10-21T09:11:31Z

✔️ Deploy Preview for vector-project canceled.

🔨 Explore the source changes: 975d3ac

🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/6183b343ad5373000715f3ba

tests/behavior/transforms/pipelines_complete.toml

Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com>

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

tobz

Overall, this seems reasonable. 👍🏻

My biggest complaint is just a lack of documentation which made it hard to understand what is going on when looking only at the code. It's not a blocker, but I do think some more documentation should be added either in this PR or a subsequent one.

lib/vector-core/src/transform/config.rs

src/transforms/pipelines/mod.rs

tobz · 2021-11-03T14:16:15Z

src/transforms/pipelines/mod.rs

+        // This is a hack around the issue of cloning
+        // trait objects. So instead to clone the config
+        // we first serialize it into JSON, then back from
+        // JSON. Originally we used TOML here but TOML does not
+        // support serializing `None`.


This should already be possible, since we require that TransformConfig is a supertrait of dyn_clone::DynClone, and we use the dyn_clone::clone_trait_object! macro to define a Clone impl for Box<dyn TransformConfig>:

vector/lib/vector-core/src/transform/config.rs

Lines 45 to 69 in d849e22

pub trait TransformConfig: core::fmt::Debug + Send + Sync + dyn_clone::DynClone {

async fn build(&self, globals: &TransformContext)

-> crate::Result<crate::transform::Transform>;

fn input_type(&self) -> DataType;

fn output_type(&self) -> DataType;

fn named_outputs(&self) -> Vec<String> {

Vec::new()

}

fn transform_type(&self) -> &'static str;

/// Allows a transform configuration to expand itself into multiple "child"

/// transformations to replace it. This allows a transform to act as a macro

/// for various patterns.

fn expand(

&mut self,

) -> crate::Result<Option<(IndexMap<String, Box<dyn TransformConfig>>, ExpandType)>> {

Ok(None)

}

}

dyn_clone::clone_trait_object!(TransformConfig);

Should be as simple as deriving Clone on PipelineConfig. Does doing that return a specific error?

It's how it is currently done in the ConfigBuilder struct. I didn't want to reinvent the wheel on this.
Maybe we should create an issue related to this and handle it in a separate PR.

I created this issue #9898

src/transforms/pipelines/mod.rs

leebenson

Echoing @tobz: it'd be useful to add some additional commentary around some of the struct changes and functions, to provide context for future readers.

I found myself jumping around the code a little, trying to figure out why a change was brought in - particularly around the transform Noop and the properties added to ExpandType.

I'd also like sign-off from @lukesteensen before merging.

lukesteensen

Definitely would echo the desire for some docs on how this all fits together. This takes our macro expansion quite a bit further than anything else, and it's difficult to keep straight how all of the new things fit together to result in the chart in the PR description.

In particular, with things like Expander, EventRouterConfig, and EventFilterConfig, it seems like there are things that exist purely as intermediate expansions. I would like very much to simplify this by directly representing more of these concepts in the topology, but it would currently be difficult to refactor without worrying that something isn't going to be expanded in exactly the same way. Documentation and tests for how things are meant to expand would help that quite a bit.

Overall though, this is very neat and seems like it should work!

src/transforms/noop.rs

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

bits-bot · 2021-11-03T17:24:04Z

All committers have signed the CLA.

src/transforms/pipelines/mod.rs

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

lukesteensen · 2021-11-04T21:50:04Z

lib/vector-core/src/transform/config.rs

+    /// This way of expanding will duplicate the inputs for every expanded node.
+    /// If `aggregates` is set to `true`, then a `Noop` transform will be added
+    /// so that you can use the original component name as an input.


This makes sense! If I were to tweak the wording, I'd say something like

Duplicate the inputs onto every expanded node, fanning out so that each node receives inputs in parallel. If aggregates is set to true, then a Noop transform will be added such that each expanded node's output is fanned back in to pass through that node, which can then be used as an input for other components.

lukesteensen · 2021-11-04T22:23:34Z

lib/vector-core/src/transform/config.rs

+    /// This ways of expanding will take all the components and chain then in order.
+    /// The first node will be renamed `component_name.0` and so on.
+    /// If `alias` is set to `true, then a `Noop` transform will be added as the
+    /// last component and named `component_name` so that it can be used as an input.


Same here, just some small tweaks:

Chain components together one after another. Components will be named according to this order (e.g. component_name.0 and so on). If alias is set to true, then a Noop transform will be added as the last component and given the raw component_name identifier so that it can be used as an input for other components.

lukesteensen · 2021-11-04T22:25:09Z

src/transforms/pipelines/mod.rs

+/// This represent the configuration of a single pipeline,
+/// not the pipelines transform itself.


s/represent/represents/ and I'd say "not the pipelines transform itself, which can contain multiple individual pipelines" just to be more clear.

lukesteensen · 2021-11-05T14:48:17Z

src/transforms/pipelines/mod.rs

@@ -43,6 +45,7 @@ impl PipelineConfig {
    }
 }

+/// This represent an ordered list of pipelines depending on the event type.


This could use a bit more elaboration. From the description alone I'm still not sure what exactly it does.

lukesteensen · 2021-11-05T14:49:07Z

src/transforms/pipelines/mod.rs

@@ -1,3 +1,61 @@
+/// This pipelines transform is a bit complex and needs a simple example.
+///
+/// If we take the following example in consideration


I would word this:

If we consider the following example:

lukesteensen · 2021-11-05T14:50:21Z

src/transforms/pipelines/mod.rs

+/// The pipelines transform will first expand into 2 parallel transforms for `logs` and
+/// `metrics`. A `Noop` transform will be also added to aggregate `logs` and `metrics`
+/// into a single transform and to be able to use the transform name (`my_pipelines`) as an input.
+///
+/// Then the `logs` group of pipelines will be expanded into a `EventFilter` followed by
+/// a series `PipelineConfig` via the `EventRouter` transform. At the end, a `Noop` alias is added
+/// to be able to refer `logs` as `my_pipelines.logs`.
+/// Same thing for the `metrics` group of pipelines.
+///
+/// Each pipeline will then be expanded into a list of its transforms and at the end of each
+/// expansion, a `Noop` transform will be added to use the `pipeline` name as an alias
+/// (`my_pipelines.logs.transforms.foo`).


This is really helpful! Thanks for adding this.

binarylogic · 2021-11-10T17:18:02Z

website/cue/reference/components/transforms/pipelines.cue

+	examples: [
+		{
+			title: "Filter by log level and reformat"
+			configuration: """


Just noting, this should be structured data, not a string. @jdrouet was there precedence in another component for using strings here? I can't find one.

jdrouet force-pushed the jdrouet/pipeline-structure branch 3 times, most recently from 5e23f55 to 6bb01bb Compare October 28, 2021 09:58

jdrouet force-pushed the jdrouet/pipeline-structure branch from a79a9e2 to cb81bdb Compare October 28, 2021 13:09

lucperkins reviewed Oct 28, 2021

View reviewed changes

tests/behavior/transforms/pipelines_complete.toml Outdated Show resolved Hide resolved

jdrouet force-pushed the jdrouet/pipeline-structure branch from 15e4292 to 5efe832 Compare October 29, 2021 07:36

jdrouet changed the title ~~feat(pipelines transform): create config structure~~ feat(pipelines transform): load and handle pipelines tranforms Nov 2, 2021

jdrouet requested review from spencergilbert and lucperkins November 2, 2021 09:33

jdrouet marked this pull request as ready for review November 2, 2021 09:34

jdrouet requested review from lukesteensen, leebenson, binarylogic and jszwedko November 2, 2021 10:24

lucperkins mentioned this pull request Nov 2, 2021

fix(external docs): Fix CUE issues in pipeline transform docs #9867

Closed

jdrouet force-pushed the jdrouet/pipeline-structure branch from c5b4af9 to e891617 Compare November 3, 2021 07:27

jdrouet and others added 13 commits November 3, 2021 07:27

feat(pipelines transform): create config structure

1769f6b

Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com>

make it work with configs

dd7aec6

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

recursive expansion

4ad8be6

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

make the graph works

0f03a0c

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

refactor serial expansion with alias flag

fef9434

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

remove manual aliases

70216b7

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

remove hack

4e4b07d

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

add transforms-pipelines to default features

e3252c5

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

add some pipelines behavior tests

b034960

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

remove redundant lifetime

3f1bc20

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

update clippy

cd771a3

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

fix tests

1d03541

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

fix behavior tests

4deec08

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

tobz approved these changes Nov 3, 2021

View reviewed changes

leebenson approved these changes Nov 3, 2021

View reviewed changes

lukesteensen approved these changes Nov 3, 2021

View reviewed changes

src/transforms/noop.rs Show resolved Hide resolved

jdrouet force-pushed the jdrouet/pipeline-structure branch from 659a965 to 32e3b56 Compare November 3, 2021 15:46

jdrouet added 4 commits November 3, 2021 16:00

use indoc for configuration sample

1f47724

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

move config and remove useless comments

38281ce

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

add some doc around ExpandType

0512d1a

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

add some more doc

47ef9d3

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

jdrouet force-pushed the jdrouet/pipeline-structure branch from 32e3b56 to 47ef9d3 Compare November 3, 2021 16:01

jdrouet force-pushed the jdrouet/pipeline-structure branch from 7142659 to cf0c101 Compare November 4, 2021 09:00

vladimir-dd reviewed Nov 4, 2021

View reviewed changes

src/transforms/pipelines/mod.rs Outdated Show resolved Hide resolved

vladimir-dd reviewed Nov 4, 2021

View reviewed changes

src/transforms/pipelines/mod.rs Outdated Show resolved Hide resolved

jdrouet force-pushed the jdrouet/pipeline-structure branch from cf0c101 to 2959fca Compare November 4, 2021 10:12

add some doc and an expansion example

975d3ac

Signed-off-by: Jérémie Drouet <jeremie.drouet@datadoghq.com>

jdrouet force-pushed the jdrouet/pipeline-structure branch from 2959fca to 975d3ac Compare November 4, 2021 10:17

vladimir-dd approved these changes Nov 4, 2021

View reviewed changes

jdrouet enabled auto-merge (squash) November 4, 2021 10:22

jdrouet merged commit 54deade into master Nov 4, 2021

jdrouet deleted the jdrouet/pipeline-structure branch November 4, 2021 13:06

lukesteensen reviewed Nov 5, 2021

View reviewed changes

jdrouet mentioned this pull request Nov 9, 2021

chore(doc): fix some doc wording in pipelines transform #9965

Merged

binarylogic reviewed Nov 10, 2021

View reviewed changes

This was referenced Nov 10, 2021

Consider adding a "noop" transform #5061

Open

Enable a single Vector instance to run multiple, independent pipelines natively #8216

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pipelines transform): load and handle pipelines tranforms #9733

feat(pipelines transform): load and handle pipelines tranforms #9733

jdrouet commented Oct 21, 2021 •

edited

Loading

netlify bot commented Oct 21, 2021 •

edited

Loading

tobz left a comment

tobz Nov 3, 2021

jdrouet Nov 3, 2021

jdrouet Nov 4, 2021

leebenson left a comment

lukesteensen left a comment

bits-bot commented Nov 3, 2021 •

edited

Loading

lukesteensen Nov 4, 2021

lukesteensen Nov 4, 2021

lukesteensen Nov 4, 2021

lukesteensen Nov 5, 2021

lukesteensen Nov 5, 2021

lukesteensen Nov 5, 2021

binarylogic Nov 10, 2021

	pub trait TransformConfig: core::fmt::Debug + Send + Sync + dyn_clone::DynClone {
	async fn build(&self, globals: &TransformContext)
	-> crate::Result<crate::transform::Transform>;

	fn input_type(&self) -> DataType;

	fn output_type(&self) -> DataType;

	fn named_outputs(&self) -> Vec<String> {
	Vec::new()
	}

	fn transform_type(&self) -> &'static str;

	/// Allows a transform configuration to expand itself into multiple "child"
	/// transformations to replace it. This allows a transform to act as a macro
	/// for various patterns.
	fn expand(
	&mut self,
	) -> crate::Result<Option<(IndexMap<String, Box<dyn TransformConfig>>, ExpandType)>> {
	Ok(None)
	}
	}

	dyn_clone::clone_trait_object!(TransformConfig);

		/// This represent the configuration of a single pipeline,
		/// not the pipelines transform itself.

feat(pipelines transform): load and handle pipelines tranforms #9733

feat(pipelines transform): load and handle pipelines tranforms #9733

Conversation

jdrouet commented Oct 21, 2021 • edited Loading

Resources

How it works

What's left to do

What's left to be done (other PR)

netlify bot commented Oct 21, 2021 • edited Loading

tobz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leebenson left a comment

Choose a reason for hiding this comment

lukesteensen left a comment

Choose a reason for hiding this comment

bits-bot commented Nov 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdrouet commented Oct 21, 2021 •

edited

Loading

netlify bot commented Oct 21, 2021 •

edited

Loading

bits-bot commented Nov 3, 2021 •

edited

Loading