Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set runtime schema definitions in the topology #16732

Closed
Tracked by #15453
fuchsnj opened this issue Mar 8, 2023 · 0 comments 路 Fixed by #17692
Closed
Tracked by #15453

Set runtime schema definitions in the topology #16732

fuchsnj opened this issue Mar 8, 2023 · 0 comments 路 Fixed by #17692
Assignees
Labels
type: feature A value-adding code addition that introduce new functionality.

Comments

@fuchsnj
Copy link
Member

fuchsnj commented Mar 8, 2023

A note for the community

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

depends on #16730

Since semantic meanings are now used in sinks (when log namespacing is enabled) that means that the schema definition needs to be accessible on each event at runtime so that the meanings can be looked up. This already exists in EventMetadata, but it is not currently being set. Once the issue linked above is done, it should be fairly straightforward to keep a definition for each source, and a map of definitions (OutputId -> Definition) for each transform, so that the schema definition can be set on each event after it's generated from a component. The definitions should be stored in an Arc so they are cheap to clone into event metadata for each event.

@fuchsnj fuchsnj added the type: feature A value-adding code addition that introduce new functionality. label Mar 8, 2023
@StephenWakely StephenWakely self-assigned this Apr 5, 2023
github-merge-queue bot pushed a commit that referenced this issue Jun 29, 2023
closes #16732

In order for sinks to use semantic meaning, they need a mapping of
meanings to fields. This is included in the schema definition of events,
but the exact definition that needs to be used depends on the path the
event took to get to the sink. The schema definition of an event is
tracked at runtime so this can be determined.

A `parent_id` was added to event metadata to track the previous
component that an event came from, which lets the topology select the
correct schema definition to attach to events.

For sources, there is only one definition that can be attached (for each
port). This is automatically attached in the topology layer (after an
event is emitted by a source), so there is no additional work in each
source to support this.

For transforms, it's slightly more complicated. The schema definition
depends on both the output port _and_ the component the event came from.
A map is generated at Vector startup, and the correct definition is
obtained from that at runtime. This also happens in the topology layer
so transforms don't need to worry about this.

Previously the `remap` transform had custom code to support runtime
schema definitions (for the VRL meaning functions). This was removed
since it's now handled automatically.

The `reduce` and `lua` transforms are special cases since there is no
clear "path" that an event takes through the topology, since multiple
events can be merged (from different inputs) in `reduce`. For `lua`,
output events may not be related to input events at all. In these cases
the schema definition map will have the same value for all inputs (they
are all merged). The topology will then arbitrarily pick one (since they
are all the same).

---------

Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
Co-authored-by: Stephen Wakely <fungus.humungus@gmail.com>
Nithiya2021 pushed a commit to Nithiya2021/vector that referenced this issue Jan 19, 2024
closes vectordotdev/vector#16732

In order for sinks to use semantic meaning, they need a mapping of
meanings to fields. This is included in the schema definition of events,
but the exact definition that needs to be used depends on the path the
event took to get to the sink. The schema definition of an event is
tracked at runtime so this can be determined.

A `parent_id` was added to event metadata to track the previous
component that an event came from, which lets the topology select the
correct schema definition to attach to events.

For sources, there is only one definition that can be attached (for each
port). This is automatically attached in the topology layer (after an
event is emitted by a source), so there is no additional work in each
source to support this.

For transforms, it's slightly more complicated. The schema definition
depends on both the output port _and_ the component the event came from.
A map is generated at Vector startup, and the correct definition is
obtained from that at runtime. This also happens in the topology layer
so transforms don't need to worry about this.

Previously the `remap` transform had custom code to support runtime
schema definitions (for the VRL meaning functions). This was removed
since it's now handled automatically.

The `reduce` and `lua` transforms are special cases since there is no
clear "path" that an event takes through the topology, since multiple
events can be merged (from different inputs) in `reduce`. For `lua`,
output events may not be related to input events at all. In these cases
the schema definition map will have the same value for all inputs (they
are all merged). The topology will then arbitrarily pick one (since they
are all the same).

---------

Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
Co-authored-by: Stephen Wakely <fungus.humungus@gmail.com>
Nithiya2021 pushed a commit to Nithiya2021/vector that referenced this issue Jan 19, 2024
closes vectordotdev/vector#16732

In order for sinks to use semantic meaning, they need a mapping of
meanings to fields. This is included in the schema definition of events,
but the exact definition that needs to be used depends on the path the
event took to get to the sink. The schema definition of an event is
tracked at runtime so this can be determined.

A `parent_id` was added to event metadata to track the previous
component that an event came from, which lets the topology select the
correct schema definition to attach to events.

For sources, there is only one definition that can be attached (for each
port). This is automatically attached in the topology layer (after an
event is emitted by a source), so there is no additional work in each
source to support this.

For transforms, it's slightly more complicated. The schema definition
depends on both the output port _and_ the component the event came from.
A map is generated at Vector startup, and the correct definition is
obtained from that at runtime. This also happens in the topology layer
so transforms don't need to worry about this.

Previously the `remap` transform had custom code to support runtime
schema definitions (for the VRL meaning functions). This was removed
since it's now handled automatically.

The `reduce` and `lua` transforms are special cases since there is no
clear "path" that an event takes through the topology, since multiple
events can be merged (from different inputs) in `reduce`. For `lua`,
output events may not be related to input events at all. In these cases
the schema definition map will have the same value for all inputs (they
are all merged). The topology will then arbitrarily pick one (since they
are all the same).

---------

Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
Co-authored-by: Stephen Wakely <fungus.humungus@gmail.com>
Nithiya2021 pushed a commit to Nithiya2021/vector that referenced this issue Jan 19, 2024
closes vectordotdev/vector#16732

In order for sinks to use semantic meaning, they need a mapping of
meanings to fields. This is included in the schema definition of events,
but the exact definition that needs to be used depends on the path the
event took to get to the sink. The schema definition of an event is
tracked at runtime so this can be determined.

A `parent_id` was added to event metadata to track the previous
component that an event came from, which lets the topology select the
correct schema definition to attach to events.

For sources, there is only one definition that can be attached (for each
port). This is automatically attached in the topology layer (after an
event is emitted by a source), so there is no additional work in each
source to support this.

For transforms, it's slightly more complicated. The schema definition
depends on both the output port _and_ the component the event came from.
A map is generated at Vector startup, and the correct definition is
obtained from that at runtime. This also happens in the topology layer
so transforms don't need to worry about this.

Previously the `remap` transform had custom code to support runtime
schema definitions (for the VRL meaning functions). This was removed
since it's now handled automatically.

The `reduce` and `lua` transforms are special cases since there is no
clear "path" that an event takes through the topology, since multiple
events can be merged (from different inputs) in `reduce`. For `lua`,
output events may not be related to input events at all. In these cases
the schema definition map will have the same value for all inputs (they
are all merged). The topology will then arbitrarily pick one (since they
are all the same).

---------

Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
Co-authored-by: Stephen Wakely <fungus.humungus@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants