Log schemas #3910

vector-vic · 2020-09-16T00:34:25Z

A common need for Vector users is the ability to map data according to different schemas. This is a key requirement for Vector since it aims to be schema, standard, and vendor-neutral. In order to deliver on this claim, Vector must not only support a variety of schemas independently, but it must also assist in the interchange between them.

Use Cases

Transitioning to Vector

Schemas create very heavy lock-in. This is because most downstream systems depend on this schema. To name a few:

Alerts.
Graphs and dashboards.
Storages.
Humans.

Changing a schema can break all of these things which usually is not acceptable. To prevent this, Vector must adopt their current schema in a way that downstream dependencies do not notice.

Transitioning Vendors

The use case above illustrates the need for Vector to support a single schema at a time, but there are cases where a user would need to support multiple. For example, when transitioning vendors. Vector must not only support the "read" schema but also transform the data to an entirely new "write" schema.

For example, if a user is transitioning from Splunk to Elasticsearch, Vector must ingest the data under the Splunk Common Information Model and transform it to the Elastic Common Schema.

Automatic Dashboards, Alerts, & Insights

A benefit of using a vendor's agent is that it'll unlock automatic dashboards, alerts, and other features. This not only saves a considerable amount of time and effort, but you can effectively delegate the management of these things to your chosen vendor. For example, I assume that DataDog, and their community, continually improve their dashboards. In this case, it's very important that Vector can transparently adopt the DataDog schema so that DataDog Vector users can receive the same benefit. It also alleviates us from having to maintain these entities as well.

Schemas

Proposal

In short, I'm proposing that we attach the known schema to each Vector event during ingestion. This would allow us to lookup fields and map them across schemas. There are a lot of little details to discuss which we can cover in an RFC. To name a few:

How would Vector detect the schema?
Should Vector strictly enforce the schema? Ex: Not allowing users to add fields that would violate the schema.
Should Vector reject data at the source-level that does not conform to the chosen schema?
Should Vector adopt a default schema? Ex: OpenTelemetry.
What happens when the user has a custom schema that we know nothing about? Ex: require them to manually map data when necessary.

vector-vic · 2020-09-16T00:34:26Z

Link to feature: https://timber.productboard.com/feature-board/planning/features/5154387

jszwedko · 2022-12-29T23:57:25Z

Closing since this was just used for tracking.

polarathene · 2023-11-26T07:39:07Z

@jszwedko you may want to update this docs page which links here to track progress?:

jszwedko · 2023-11-28T19:33:40Z

@jszwedko you may want to update this docs page which links here to track progress?:

Thanks for pointing that out! I opened #19256

vector-vic added domain: processing Anything related to processing Vector's events (parsing, merging, reducing, etc.) Epic Larger, user-centric issue that contains multiple sub-issues type: enhancement A value-adding code change that enhances its existing functionality. labels Sep 16, 2020

vector-vic changed the title ~~Map fields in sinks - Automatically (CIM)~~ Send logs - Map fields automatically (CIM) Sep 16, 2020

binarylogic mentioned this issue Sep 22, 2020

feat(new sink): New sematext_metrics sink #3501

Merged

vector-vic changed the title ~~Send logs - Map fields automatically (CIM)~~ Process logs - Map fields automatically (CIM) Sep 25, 2020

binarylogic mentioned this issue Oct 1, 2020

Remap inspired reduce transform to solve multiline merging #4258

Closed

vector-vic changed the title ~~Process logs - Map fields automatically (CIM)~~ Process logs - Schemas Oct 3, 2020

This was referenced Oct 9, 2020

New get_schema_value remap function #4477

Closed

Ability to set the _id field in the elasticsearch sink (deduping events) #4479

Closed

ECS log schema support #2423

Open

jszwedko mentioned this issue Oct 27, 2020

Better handling of timestamp field for sinks #4774

Open

JeanMertz mentioned this issue Nov 6, 2020

feat(remap): compile-time program result type checking #4902

Merged

binarylogic changed the title ~~Process logs - Schemas~~ Log schemas Feb 21, 2021

JeanMertz mentioned this issue Sep 30, 2021

chore(schema): schema support RFC #9388

Closed

jszwedko closed this as completed Dec 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log schemas #3910

Log schemas #3910

vector-vic commented Sep 16, 2020 •

edited by binarylogic

Loading

vector-vic commented Sep 16, 2020

jszwedko commented Dec 29, 2022

polarathene commented Nov 26, 2023

jszwedko commented Nov 28, 2023

Log schemas #3910

Log schemas #3910

Comments

vector-vic commented Sep 16, 2020 • edited by binarylogic Loading

Use Cases

Transitioning to Vector

Transitioning Vendors

Automatic Dashboards, Alerts, & Insights

Schemas

Proposal

vector-vic commented Sep 16, 2020

jszwedko commented Dec 29, 2022

polarathene commented Nov 26, 2023

jszwedko commented Nov 28, 2023

vector-vic commented Sep 16, 2020 •

edited by binarylogic

Loading