-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(codecs): Add RFC for "Framing and Codecs - Sinks" #9864
chore(codecs): Add RFC for "Framing and Codecs - Sinks" #9864
Conversation
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
✔️ Deploy Preview for vector-project canceled. 🔨 Explore the source changes: f820c10 🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/61dd84ebb4122d0007f09cb2 |
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
|
||
Users should be able to set `encoding` and `framing` options on sinks, analogously to the `decoding` and `framing` options on sources. These options uniformly control how the event _payload_ is encoded. This distinction is important, as encoding for the sink specific _protocol_ and the event _payload_ are separate concerns. The payload should still be encoded according to the sink's protocol, and the sink should provide additional options if there are multiple protocols to choose from, e.g. under a `protocol` key. | ||
|
||
The fields containing event transformations in sinks on the current `encoding` options (`schema`, `only_fields`, `except_fields`, `timestamp_format`) should be moved to a dedicated option, e.g. `schema` or `transform`. Thus, this would introduce a breaking change in configurations. However, migration would be relatively straight-forward by nesting these options under the new key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one question that I think needs to be answered here is: will the schema work land before this happens? If it doesn't, then we probably need a way handle parsing these out of the existing encoding
configuration field. We could deprecate setting them via encoding
at the same time, but we would have to provide a deprecation window regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would suggest we continue to support the current config interface for now and focus on organizing the internal implementation for now. Then once both this work and the schema work is further along we can tackle changing the external API with a bit more confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good plan 👍 Should be able to support existing config options without carrying over too much baggage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified in 21f9c36.
```rust | ||
pub trait SerializerConfig: Debug + DynClone + Send + Sync { | ||
/// Builds a serializer from this configuration. | ||
/// | ||
/// Fails if the configuration is invalid. | ||
fn build(&self) -> crate::Result<BoxedSerializer>; | ||
} | ||
``` | ||
|
||
```rust | ||
pub trait FramingConfig: Debug + DynClone + Send + Sync { | ||
/// Builds a framer from this configuration. | ||
/// | ||
/// Fails if the configuration is invalid. | ||
fn build(&self) -> crate::Result<BoxedFramer>; | ||
} | ||
``` | ||
|
||
These can be build to form an `Encoder`: | ||
|
||
```rust | ||
/// An encoder that can encode structured events to byte messages. | ||
pub struct Encoder { | ||
serializer: BoxedSerializer, | ||
framer: BoxedFramer, | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'd like to see here is if we can actually make SerializerConfig
/FramingConfig
provide their outputs generically. For example:
pub trait SerializerConfig {
type Serializer;
fn build(&self) -> crate::Result<Self::Serializer>;
}
pub trait FramingConfig {
type Framer;
fn build(&self) -> crate::Result<Self::Framer>;
}
pub struct Encoder<S, F> {
serializer: S,
framer: F,
}
This is sort of an implementation detail, but we should avoid boxing individual components where possible, and prefer to box at a higher level. It might even be possible to entirely avoid boxing. I'm bringing this up here on the off-chance that not using boxed inner fields would meaningfully change how we have to approach the implementation.
Since codecs touch every single event entering and exiting Vector, we should take care to avoid boxing where possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is a very interesting point.
Since the source side is structured very similarly, I tried to apply your suggestion to the FramingConfig
and ParserConfig
.
It fails to compile since dyn_clone::clone_trait_object
needs to specify the associated type. Cloning a decoder/encoder is a rather common operation, it happens for each message or stream to reset the codec state.
I'm not sure if it's possible to circumvent this without having a Box
on the inner type, but I'd be happy for any suggestions.
|
||
Users should be able to set `encoding` and `framing` options on sinks, analogously to the `decoding` and `framing` options on sources. These options uniformly control how the event _payload_ is encoded. This distinction is important, as encoding for the sink specific _protocol_ and the event _payload_ are separate concerns. The payload should still be encoded according to the sink's protocol, and the sink should provide additional options if there are multiple protocols to choose from, e.g. under a `protocol` key. | ||
|
||
The fields containing event transformations in sinks on the current `encoding` options (`schema`, `only_fields`, `except_fields`, `timestamp_format`) should be moved to a dedicated option, e.g. `schema` or `transform`. Thus, this would introduce a breaking change in configurations. However, migration would be relatively straight-forward by nesting these options under the new key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would suggest we continue to support the current config interface for now and focus on organizing the internal implementation for now. Then once both this work and the schema work is further along we can tackle changing the external API with a bit more confidence.
Co-authored-by: Toby Lawrence <tobz@users.noreply.github.com>
Co-authored-by: Toby Lawrence <tobz@users.noreply.github.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
423bdec
to
1f7c05c
Compare
Co-authored-by: Luke Steensen <luke.steensen@gmail.com>
…` which calls `apply_rules` Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
…n of attack Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Thanks to @fuchsnj for noting that all sinks using vector/src/sinks/util/encoding/mod.rs Lines 245 to 257 in 8ce0e67
|
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Co-authored-by: Stephen Wakely <fungus.humungus@gmail.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
… be replaced by the schema work Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
} | ||
``` | ||
|
||
`Encoder` implements `tokio_util::codec::Encoder<Event>`. Internally, events first go through the `Serializer` which implements `tokio_util::codec::Encoder<Event>` and are then handed over to the `Framer` which implements `tokio_util::codec::Encoder<Bytes>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you imagine this will work in the same way for sinks that don't use one of the pieces (e.g. framing)?
It seems like sinks that entirely do their own serialization and framing (e.g. datadog_logs) would have an easy enough time opting out entirely, but would something like kafka have to pay some performance/complexity overhead of dealing with the possibility of a framer even though that's built into the protocol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I see your point. The Kafka sink would still be able to add a SerializerConfig
only, to build the Serializer
without wrapping it in an Encoder
.
Otherwise could be worth to do some micro benchmarks for how much of an impact going through one no-op framing operation has. Might not be significant, as the cost only occurs once per frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified in d7c9cbf.
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
…e operations Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Modified the RFC to reflect the changes to the |
Closes #8621.
Rendered version 👀.