Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(codecs): Add RFC for "Framing and Codecs - Sinks" #9864

Merged
merged 18 commits into from
Jan 11, 2022

Conversation

pablosichert
Copy link
Contributor

@pablosichert pablosichert commented Nov 2, 2021

Signed-off-by: Pablo Sichert <mail@pablosichert.com>
@netlify
Copy link

netlify bot commented Nov 2, 2021

✔️ Deploy Preview for vector-project canceled.

🔨 Explore the source changes: f820c10

🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/61dd84ebb4122d0007f09cb2

@bits-bot
Copy link

bits-bot commented Nov 3, 2021

CLA assistant check
All committers have signed the CLA.

@tobz tobz added the RFC label Nov 4, 2021
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
@pablosichert pablosichert marked this pull request as ready for review November 5, 2021 14:02
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
rfcs/2021-10-29-8621-framing-and-codecs-sinks.md Outdated Show resolved Hide resolved
rfcs/2021-10-29-8621-framing-and-codecs-sinks.md Outdated Show resolved Hide resolved

Users should be able to set `encoding` and `framing` options on sinks, analogously to the `decoding` and `framing` options on sources. These options uniformly control how the event _payload_ is encoded. This distinction is important, as encoding for the sink specific _protocol_ and the event _payload_ are separate concerns. The payload should still be encoded according to the sink's protocol, and the sink should provide additional options if there are multiple protocols to choose from, e.g. under a `protocol` key.

The fields containing event transformations in sinks on the current `encoding` options (`schema`, `only_fields`, `except_fields`, `timestamp_format`) should be moved to a dedicated option, e.g. `schema` or `transform`. Thus, this would introduce a breaking change in configurations. However, migration would be relatively straight-forward by nesting these options under the new key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one question that I think needs to be answered here is: will the schema work land before this happens? If it doesn't, then we probably need a way handle parsing these out of the existing encoding configuration field. We could deprecate setting them via encoding at the same time, but we would have to provide a deprecation window regardless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would suggest we continue to support the current config interface for now and focus on organizing the internal implementation for now. Then once both this work and the schema work is further along we can tackle changing the external API with a bit more confidence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good plan 👍 Should be able to support existing config options without carrying over too much baggage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with a wrapper that is able to apply the existing transformation behavior to the new encoding options while being fully backwards compatible to the old sink-specific encoding options: #10253.

For usage, have a look at #10684.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified in 21f9c36.

Comment on lines 51 to 77
```rust
pub trait SerializerConfig: Debug + DynClone + Send + Sync {
/// Builds a serializer from this configuration.
///
/// Fails if the configuration is invalid.
fn build(&self) -> crate::Result<BoxedSerializer>;
}
```

```rust
pub trait FramingConfig: Debug + DynClone + Send + Sync {
/// Builds a framer from this configuration.
///
/// Fails if the configuration is invalid.
fn build(&self) -> crate::Result<BoxedFramer>;
}
```

These can be build to form an `Encoder`:

```rust
/// An encoder that can encode structured events to byte messages.
pub struct Encoder {
serializer: BoxedSerializer,
framer: BoxedFramer,
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'd like to see here is if we can actually make SerializerConfig/FramingConfig provide their outputs generically. For example:

pub trait SerializerConfig {
    type Serializer;
    
    fn build(&self) -> crate::Result<Self::Serializer>;
}

pub trait FramingConfig {
    type Framer;
    
    fn build(&self) -> crate::Result<Self::Framer>;
}

pub struct Encoder<S, F> {
    serializer: S,
    framer: F,
}

This is sort of an implementation detail, but we should avoid boxing individual components where possible, and prefer to box at a higher level. It might even be possible to entirely avoid boxing. I'm bringing this up here on the off-chance that not using boxed inner fields would meaningfully change how we have to approach the implementation.

Since codecs touch every single event entering and exiting Vector, we should take care to avoid boxing where possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is a very interesting point.

Since the source side is structured very similarly, I tried to apply your suggestion to the FramingConfig and ParserConfig.

It fails to compile since dyn_clone::clone_trait_object needs to specify the associated type. Cloning a decoder/encoder is a rather common operation, it happens for each message or stream to reset the codec state.

I'm not sure if it's possible to circumvent this without having a Box on the inner type, but I'd be happy for any suggestions.


Users should be able to set `encoding` and `framing` options on sinks, analogously to the `decoding` and `framing` options on sources. These options uniformly control how the event _payload_ is encoded. This distinction is important, as encoding for the sink specific _protocol_ and the event _payload_ are separate concerns. The payload should still be encoded according to the sink's protocol, and the sink should provide additional options if there are multiple protocols to choose from, e.g. under a `protocol` key.

The fields containing event transformations in sinks on the current `encoding` options (`schema`, `only_fields`, `except_fields`, `timestamp_format`) should be moved to a dedicated option, e.g. `schema` or `transform`. Thus, this would introduce a breaking change in configurations. However, migration would be relatively straight-forward by nesting these options under the new key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would suggest we continue to support the current config interface for now and focus on organizing the internal implementation for now. Then once both this work and the schema work is further along we can tackle changing the external API with a bit more confidence.

rfcs/2021-10-29-8621-framing-and-codecs-sinks.md Outdated Show resolved Hide resolved
rfcs/2021-10-29-8621-framing-and-codecs-sinks.md Outdated Show resolved Hide resolved
pablosichert and others added 4 commits November 12, 2021 14:20
Co-authored-by: Toby Lawrence <tobz@users.noreply.github.com>
Co-authored-by: Toby Lawrence <tobz@users.noreply.github.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
@pablosichert pablosichert force-pushed the pablosichert/framing-and-codecs-sinks-rfc branch from 423bdec to 1f7c05c Compare November 16, 2021 18:45
pablosichert and others added 4 commits November 16, 2021 19:49
Co-authored-by: Luke Steensen <luke.steensen@gmail.com>
…` which calls `apply_rules`

Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
…n of attack

Signed-off-by: Pablo Sichert <mail@pablosichert.com>
@pablosichert
Copy link
Contributor Author

Thanks to @fuchsnj for noting that all sinks using EncodingConfig* reshape event with apply_rules via the generic implementation in

impl<E, T> Encoder<T> for E
where
E: EncodingConfiguration,
E::Codec: Encoder<T>,
T: VisitLogMut,
{
fn encode_input(&self, mut input: T, writer: &mut dyn io::Write) -> io::Result<usize> {
input.visit_logs_mut(|log| {
self.apply_rules(log);
});
self.codec().encode_input(input, writer)
}
}

Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Co-authored-by: Stephen Wakely <fungus.humungus@gmail.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
… be replaced by the schema work

Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
}
```

`Encoder` implements `tokio_util::codec::Encoder<Event>`. Internally, events first go through the `Serializer` which implements `tokio_util::codec::Encoder<Event>` and are then handed over to the `Framer` which implements `tokio_util::codec::Encoder<Bytes>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you imagine this will work in the same way for sinks that don't use one of the pieces (e.g. framing)?

It seems like sinks that entirely do their own serialization and framing (e.g. datadog_logs) would have an easy enough time opting out entirely, but would something like kafka have to pay some performance/complexity overhead of dealing with the possibility of a framer even though that's built into the protocol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I see your point. The Kafka sink would still be able to add a SerializerConfig only, to build the Serializer without wrapping it in an Encoder.

Otherwise could be worth to do some micro benchmarks for how much of an impact going through one no-op framing operation has. Might not be significant, as the cost only occurs once per frame.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified in d7c9cbf.

Signed-off-by: Pablo Sichert <mail@pablosichert.com>
@pablosichert pablosichert enabled auto-merge (squash) January 11, 2022 09:00
…e operations

Signed-off-by: Pablo Sichert <mail@pablosichert.com>
@pablosichert
Copy link
Contributor Author

Modified the RFC to reflect the changes to the Encoder/Serializer/Framer discussed in #10108 (comment).

@pablosichert pablosichert merged commit 2f2733b into master Jan 11, 2022
@pablosichert pablosichert deleted the pablosichert/framing-and-codecs-sinks-rfc branch January 11, 2022 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Framing and Codecs - Sinks RFC
7 participants