Set runtime schema definitions in the topology #16732

fuchsnj · 2023-03-08T22:05:51Z

A note for the community

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

depends on #16730

Since semantic meanings are now used in sinks (when log namespacing is enabled) that means that the schema definition needs to be accessible on each event at runtime so that the meanings can be looked up. This already exists in EventMetadata, but it is not currently being set. Once the issue linked above is done, it should be fairly straightforward to keep a definition for each source, and a map of definitions (OutputId -> Definition) for each transform, so that the schema definition can be set on each event after it's generated from a component. The definitions should be stored in an Arc so they are cheap to clone into event metadata for each event.

The text was updated successfully, but these errors were encountered:

closes #16732 In order for sinks to use semantic meaning, they need a mapping of meanings to fields. This is included in the schema definition of events, but the exact definition that needs to be used depends on the path the event took to get to the sink. The schema definition of an event is tracked at runtime so this can be determined. A `parent_id` was added to event metadata to track the previous component that an event came from, which lets the topology select the correct schema definition to attach to events. For sources, there is only one definition that can be attached (for each port). This is automatically attached in the topology layer (after an event is emitted by a source), so there is no additional work in each source to support this. For transforms, it's slightly more complicated. The schema definition depends on both the output port _and_ the component the event came from. A map is generated at Vector startup, and the correct definition is obtained from that at runtime. This also happens in the topology layer so transforms don't need to worry about this. Previously the `remap` transform had custom code to support runtime schema definitions (for the VRL meaning functions). This was removed since it's now handled automatically. The `reduce` and `lua` transforms are special cases since there is no clear "path" that an event takes through the topology, since multiple events can be merged (from different inputs) in `reduce`. For `lua`, output events may not be related to input events at all. In these cases the schema definition map will have the same value for all inputs (they are all merged). The topology will then arbitrarily pick one (since they are all the same). --------- Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com> Co-authored-by: Stephen Wakely <fungus.humungus@gmail.com>

closes vectordotdev/vector#16732 In order for sinks to use semantic meaning, they need a mapping of meanings to fields. This is included in the schema definition of events, but the exact definition that needs to be used depends on the path the event took to get to the sink. The schema definition of an event is tracked at runtime so this can be determined. A `parent_id` was added to event metadata to track the previous component that an event came from, which lets the topology select the correct schema definition to attach to events. For sources, there is only one definition that can be attached (for each port). This is automatically attached in the topology layer (after an event is emitted by a source), so there is no additional work in each source to support this. For transforms, it's slightly more complicated. The schema definition depends on both the output port _and_ the component the event came from. A map is generated at Vector startup, and the correct definition is obtained from that at runtime. This also happens in the topology layer so transforms don't need to worry about this. Previously the `remap` transform had custom code to support runtime schema definitions (for the VRL meaning functions). This was removed since it's now handled automatically. The `reduce` and `lua` transforms are special cases since there is no clear "path" that an event takes through the topology, since multiple events can be merged (from different inputs) in `reduce`. For `lua`, output events may not be related to input events at all. In these cases the schema definition map will have the same value for all inputs (they are all merged). The topology will then arbitrarily pick one (since they are all the same). --------- Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com> Co-authored-by: Stephen Wakely <fungus.humungus@gmail.com>

fuchsnj added the type: feature A value-adding code addition that introduce new functionality. label Mar 8, 2023

fuchsnj mentioned this issue Mar 8, 2023

Tracking issue: Add Log Namespace support to sinks #15453

Closed

24 tasks

StephenWakely self-assigned this Apr 5, 2023

StephenWakely mentioned this issue Apr 5, 2023

chore(topology): Transform outputs hash table of OutputId -> Definition #17059

Merged

fuchsnj mentioned this issue Jun 22, 2023

feat: track runtime schema definitions for log events #17692

Merged

dsmith3197 closed this as completed in #17692 Jun 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set runtime schema definitions in the topology #16732

Set runtime schema definitions in the topology #16732

fuchsnj commented Mar 8, 2023 •

edited

Loading

Set runtime schema definitions in the topology #16732

Set runtime schema definitions in the topology #16732

Comments

fuchsnj commented Mar 8, 2023 • edited Loading

A note for the community

Use Cases

fuchsnj commented Mar 8, 2023 •

edited

Loading