-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log schemas #3910
Labels
domain: processing
Anything related to processing Vector's events (parsing, merging, reducing, etc.)
Epic
Larger, user-centric issue that contains multiple sub-issues
type: enhancement
A value-adding code change that enhances its existing functionality.
Comments
vector-vic
added
domain: processing
Anything related to processing Vector's events (parsing, merging, reducing, etc.)
Epic
Larger, user-centric issue that contains multiple sub-issues
type: enhancement
A value-adding code change that enhances its existing functionality.
labels
Sep 16, 2020
vector-vic
changed the title
Map fields in sinks - Automatically (CIM)
Send logs - Map fields automatically (CIM)
Sep 16, 2020
vector-vic
changed the title
Send logs - Map fields automatically (CIM)
Process logs - Map fields automatically (CIM)
Sep 25, 2020
vector-vic
changed the title
Process logs - Map fields automatically (CIM)
Process logs - Schemas
Oct 3, 2020
This was referenced Oct 9, 2020
Closing since this was just used for tracking. |
@jszwedko you may want to update this docs page which links here to track progress?: |
Thanks for pointing that out! I opened #19256 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
domain: processing
Anything related to processing Vector's events (parsing, merging, reducing, etc.)
Epic
Larger, user-centric issue that contains multiple sub-issues
type: enhancement
A value-adding code change that enhances its existing functionality.
A common need for Vector users is the ability to map data according to different schemas. This is a key requirement for Vector since it aims to be schema, standard, and vendor-neutral. In order to deliver on this claim, Vector must not only support a variety of schemas independently, but it must also assist in the interchange between them.
Use Cases
Transitioning to Vector
Schemas create very heavy lock-in. This is because most downstream systems depend on this schema. To name a few:
Alerts.
Graphs and dashboards.
Storages.
Humans.
Changing a schema can break all of these things which usually is not acceptable. To prevent this, Vector must adopt their current schema in a way that downstream dependencies do not notice.
Transitioning Vendors
The use case above illustrates the need for Vector to support a single schema at a time, but there are cases where a user would need to support multiple. For example, when transitioning vendors. Vector must not only support the "read" schema but also transform the data to an entirely new "write" schema.
For example, if a user is transitioning from Splunk to Elasticsearch, Vector must ingest the data under the Splunk Common Information Model and transform it to the Elastic Common Schema.
Automatic Dashboards, Alerts, & Insights
A benefit of using a vendor's agent is that it'll unlock automatic dashboards, alerts, and other features. This not only saves a considerable amount of time and effort, but you can effectively delegate the management of these things to your chosen vendor. For example, I assume that DataDog, and their community, continually improve their dashboards. In this case, it's very important that Vector can transparently adopt the DataDog schema so that DataDog Vector users can receive the same benefit. It also alleviates us from having to maintain these entities as well.
Schemas
Elastic Common Schema
Splunk Common Information Model (CIM)
OpenTelemetry Log Data Model
GELF
DataDog's reserved log attributes
...and more
Proposal
In short, I'm proposing that we attach the known schema to each Vector event during ingestion. This would allow us to lookup fields and map them across schemas. There are a lot of little details to discuss which we can cover in an RFC. To name a few:
How would Vector detect the schema?
Should Vector strictly enforce the schema? Ex: Not allowing users to add fields that would violate the schema.
Should Vector reject data at the source-level that does not conform to the chosen schema?
Should Vector adopt a default schema? Ex: OpenTelemetry.
What happens when the user has a custom schema that we know nothing about? Ex: require them to manually map data when necessary.
The text was updated successfully, but these errors were encountered: