Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log schemas #3910

Closed
vector-vic opened this issue Sep 16, 2020 · 4 comments
Closed

Log schemas #3910

vector-vic opened this issue Sep 16, 2020 · 4 comments
Labels
domain: processing Anything related to processing Vector's events (parsing, merging, reducing, etc.) Epic Larger, user-centric issue that contains multiple sub-issues type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@vector-vic
Copy link

vector-vic commented Sep 16, 2020

A common need for Vector users is the ability to map data according to different schemas. This is a key requirement for Vector since it aims to be schema, standard, and vendor-neutral. In order to deliver on this claim, Vector must not only support a variety of schemas independently, but it must also assist in the interchange between them.

Use Cases

Transitioning to Vector

Schemas create very heavy lock-in. This is because most downstream systems depend on this schema. To name a few:

  1. Alerts.

  2. Graphs and dashboards.

  3. Storages.

  4. Humans.

Changing a schema can break all of these things which usually is not acceptable. To prevent this, Vector must adopt their current schema in a way that downstream dependencies do not notice.

Transitioning Vendors

The use case above illustrates the need for Vector to support a single schema at a time, but there are cases where a user would need to support multiple. For example, when transitioning vendors. Vector must not only support the "read" schema but also transform the data to an entirely new "write" schema.

For example, if a user is transitioning from Splunk to Elasticsearch, Vector must ingest the data under the Splunk Common Information Model and transform it to the Elastic Common Schema.

Automatic Dashboards, Alerts, & Insights

A benefit of using a vendor's agent is that it'll unlock automatic dashboards, alerts, and other features. This not only saves a considerable amount of time and effort, but you can effectively delegate the management of these things to your chosen vendor. For example, I assume that DataDog, and their community, continually improve their dashboards. In this case, it's very important that Vector can transparently adopt the DataDog schema so that DataDog Vector users can receive the same benefit. It also alleviates us from having to maintain these entities as well.

Schemas

  1. Elastic Common Schema

  2. Splunk Common Information Model (CIM)

  3. OpenTelemetry Log Data Model

  4. GELF

  5. DataDog's reserved log attributes

  6. ...and more

Proposal

In short, I'm proposing that we attach the known schema to each Vector event during ingestion. This would allow us to lookup fields and map them across schemas. There are a lot of little details to discuss which we can cover in an RFC. To name a few:

  1. How would Vector detect the schema?

  2. Should Vector strictly enforce the schema? Ex: Not allowing users to add fields that would violate the schema.

  3. Should Vector reject data at the source-level that does not conform to the chosen schema?

  4. Should Vector adopt a default schema? Ex: OpenTelemetry.

  5. What happens when the user has a custom schema that we know nothing about? Ex: require them to manually map data when necessary.

@vector-vic vector-vic added domain: processing Anything related to processing Vector's events (parsing, merging, reducing, etc.) Epic Larger, user-centric issue that contains multiple sub-issues type: enhancement A value-adding code change that enhances its existing functionality. labels Sep 16, 2020
@vector-vic
Copy link
Author

@vector-vic vector-vic changed the title Map fields in sinks - Automatically (CIM) Send logs - Map fields automatically (CIM) Sep 16, 2020
@vector-vic vector-vic changed the title Send logs - Map fields automatically (CIM) Process logs - Map fields automatically (CIM) Sep 25, 2020
@vector-vic vector-vic changed the title Process logs - Map fields automatically (CIM) Process logs - Schemas Oct 3, 2020
@binarylogic binarylogic changed the title Process logs - Schemas Log schemas Feb 21, 2021
@jszwedko
Copy link
Member

Closing since this was just used for tracking.

@polarathene
Copy link

@jszwedko you may want to update this docs page which links here to track progress?:

image

@jszwedko
Copy link
Member

@jszwedko you may want to update this docs page which links here to track progress?:

image

Thanks for pointing that out! I opened #19256

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: processing Anything related to processing Vector's events (parsing, merging, reducing, etc.) Epic Larger, user-centric issue that contains multiple sub-issues type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@jszwedko @polarathene @vector-vic and others