New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reapproach the log event data model and nesting strategy #704
Comments
How about approaching it the other way around? Treat events as structured and provide a |
@MarkusH yes, it's starting to seem like we'd be better off directly modeling nested data. We initially avoided this to keep things simpler, but it's causing significant complexity around the edges. |
@lukesteensen assigning this to you, for now, to create the spec for this. |
Here is a an initial proposal which considers using actually nested events in the internal representation. Current approachCurrently we have two internal data types, Proposed changes in Rust typesIn order to avoid the need to do flattening/unflattening, we can extend // stays the same
pub struct LogEvent {
fields: HashMap<Atom, Value>,
}
// stays the same
pub struct Value {
value: ValueKind,
explicit: bool,
}
// two new fields are added
pub enum ValueKind {
Bytes(Bytes),
Integer(i64),
Float(f64),
Boolean(bool),
Timestamp(DateTime<Utc>),
// nested object
Object(HashMap<Atom, ValueKind>),
// nested array
Array(Vec<ValueKind>)
} Note that nested objects and and arrays contain values of Proposed changes in ProtoBuf definitionsThe corresponding ProtoBuf definitions to match Rust definitions from the previous section are the following: // stays the same
message Log {
map<string, Value> fields = 1;
}
message ValueArray {
repeated ValueKind array = 1; // packed by default in proto3
}
message ValueMap {
map<string, ValueKind> fields = 1;
}
message ValueKind {
oneof kind {
bytes raw_bytes = 1;
google.protobuf.Timestamp timestamp = 2;
int64 integer = 4;
double float = 5;
bool boolean = 6;
ValueArray array = 7;
ValueMap map = 8;
}
}
message Value {
// just placing ValueKind here might be better, but it would break
// wire compatibility with the current version of the protocol
oneof kind {
bytes raw_bytes = 1;
google.protobuf.Timestamp timestamp = 2;
int64 integer = 4;
double float = 5;
bool boolean = 6;
ValueArray array = 7;
ValueMap map = 8;
}
bool explicit = 3;
} The following schema gives backward compatibility between new and the current versions of the protocol in case if an event doesn't contain nested arrays or maps. If we want 100% compatibiltiy, we would need to do flattening/unflattening when serializing/deserializing messages to ProtoBuf. Proposed changes in configurationThere should be no changes in configuration, as our configs already support nested fields. We use "." as separator for nested fields in config now, which should continue working. Proposed changes in transformsRight now, the Lua transform gets flat objects with keys looking like |
Nice! This looks like a solid start. There are a couple of other ideas I've had that might be worth exploring as well. First, I think we can totally drop the concept of Second, I was wondering if it'd be worth adding a string type based on the Finally, as we discussed in slack, it may be worth re-evaluating our use of Protocol Buffers here. It adds a decent amount of complexity and we don't get a lot of benefit from it yet. I do think we should focus on the Rust side of things for now, and then potentially look into this as a followup. |
I fully agree with this.
Do you mean to have two field types,
I support the idea of keeping re-evaluation of the encoding separate from the current issue. |
Yep, exactly. Currently, we use bytes for everything, even when we know from the source that the data is valid UTF-8. |
This is relevant: #1532 |
See #678 (comment)
We need to re-think how we're handling nested log event data. The above referenced PR introduces very complicated "unflatten" code that was concerning to the rest of the team. While it works, it is a red flag that we might be better off reapproaching the internal log event data model to store data in a nested fashion, avoiding the need to unflatten data entirely.
The text was updated successfully, but these errors were encountered: