Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(remap): Implement parse_aws_cloudtrail_logs remap function #7488

Closed

Conversation

pablosichert
Copy link
Contributor

Closes #4908.

Signed-off-by: Pablo Sichert <mail@pablosichert.com>
@pablosichert pablosichert force-pushed the pablosichert/parse-aws-cloudtrail-logs-vrl-function branch from fa89fab to c46d9ee Compare May 17, 2021 20:51
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Comment on lines 6 to 660
statistics.baseline_duration.into(),
);

value.into()
}
}

#[derive(Clone, Copy, Debug, Serialize, Deserialize)]
#[serde(rename_all(deserialize = "camelCase"))]
struct AwsCloudTrailLogsRecordInsightDetailsInsightContextStatisticsValue {
average: f64,
}

impl From<AwsCloudTrailLogsRecordInsightDetailsInsightContextStatisticsValue> for Value {
fn from(
statistics: AwsCloudTrailLogsRecordInsightDetailsInsightContextStatisticsValue,
) -> Self {
let mut value = BTreeMap::<String, Value>::new();

value.insert("average".to_owned(), statistics.average.into());

value.into()
}
}

#[derive(Clone, Debug, Serialize, Deserialize)]
#[serde(rename_all(deserialize = "camelCase"))]
struct AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributions {
attribute: AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributionsAttribute,
insight: Vec<AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributionsInsight>,
baseline: Vec<AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributionsBaseline>,
}

impl From<AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributions> for Value {
fn from(attributions: AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributions) -> Self {
let mut value = BTreeMap::<String, Value>::new();

value.insert("attribute".to_owned(), attributions.attribute.into());
value.insert("insight".to_owned(), attributions.insight.into());
value.insert("baseline".to_owned(), attributions.baseline.into());

value.into()
}
}

#[derive(Clone, Copy, Debug, Serialize, Deserialize)]
#[serde(rename_all(deserialize = "camelCase"))]
enum AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributionsAttribute {
UserIdentityArn,
UserAgent,
ErrorCode,
}

impl From<AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributionsAttribute> for Value {
fn from(
attribute: AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributionsAttribute,
) -> Self {
format!("{:?}", attribute).into()
}
}

#[derive(Clone, Debug, Serialize, Deserialize)]
#[serde(rename_all(deserialize = "camelCase"))]
struct AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributionsInsight {
value: String,
average: f64,
}

impl From<AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributionsInsight> for Value {
fn from(
insight: AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributionsInsight,
) -> Self {
let mut value = BTreeMap::<String, Value>::new();

value.insert("value".to_owned(), insight.value.into());
value.insert("average".to_owned(), insight.average.into());

value.into()
}
}

#[derive(Clone, Debug, Serialize, Deserialize)]
#[serde(rename_all(deserialize = "camelCase"))]
struct AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributionsBaseline {
value: String,
average: f64,
}

impl From<AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributionsBaseline> for Value {
fn from(
baseline: AwsCloudTrailLogsRecordInsightDetailsInsightContextAttributionsBaseline,
) -> Self {
let mut value = BTreeMap::<String, Value>::new();

value.insert("value".to_owned(), baseline.value.into());
value.insert("average".to_owned(), baseline.average.into());

value.into()
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of "dumb" conversion code here to produce a VRL Value.

The alternative would be to

  1. deserialize to AwsCloudTrailLogs
  2. serialize to JSON
  3. deserialze VRL Value

But those string operations are certainly more costly than the manual conversion code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...continuing #7488 (comment):

The boilerplate isn't there to deserialize, but to produce a vrl Value, since (to my knowledge) there exist is no direct conversion between the deserialized AwsCloudTrailLogs to Value without going through JSON serialization.

Alternatively, I could first deserialize to serde_json::Value and then "hand-pick" and rename the desired fields when moving to vrl Value. However, having those type-safe structs seemed to be the better alternative to me.

If you have another approach that I can't see right now, I'd be happy to omit the manual code 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I could first deserialize to serde_json::Value and then "hand-pick" and rename the desired fields when moving to vrl Value. However, having those type-safe structs seemed to be the better alternative to me.

Yeah, this is what I was thinking, that you could just deserialize to serde_json::Value and then convert that directly to a vrl::Value via the From implementation you added here. I think the type-safe deserializing is useful if we were going to be modifying the data before converting it to a vrl::Value, but I'm not seeing us do that anywhere so I'm not sure what it is buying us. I could definitely be missing something though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, we do the same in parse_aws_cloudwatch_log_subscription_message:

https://github.com/timberio/vector/blob/1e54aca6970a3f141749bd56d8cd8c98ba6e3fab/lib/vrl/stdlib/src/parse_aws_cloudwatch_log_subscription_message.rs#L75-L89

In that case the destructuring/mapping to Value is more trivial (no optional values, only one nested struct), so we don't need a From implementation there.

What I like about the structs though is that you have a declarative description of what the format is, instead of needing to read the transformation code to understand what it ends up like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 true. I'm ok if you want to leave it; it just seemed like it wasn't strictly needed. We may want to do additional transformations in the future anyway.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This generally looks good to me. I am wondering if we need all of the boilerplate deserialization code though, since we aren't doing any additional transformations of it that I can see. Am I missing something, or would what you have here be roughly similar to deserializing to serde_json::Value and then just using the From<serde_json::Value> for Value you added?

lib/vrl/compiler/src/value/convert.rs Outdated Show resolved Hide resolved
lib/vrl/stdlib/src/parse_aws_cloudtrail_logs.rs Outdated Show resolved Hide resolved
lib/vrl/stdlib/src/parse_aws_cloudtrail_logs.rs Outdated Show resolved Hide resolved
@pablosichert
Copy link
Contributor Author

pablosichert commented May 19, 2021

I am wondering if we need all of the boilerplate deserialization code though, since we aren't doing any additional transformations of it that I can see. Am I missing something, or would what you have here be roughly similar to deserializing to serde_json::Value and then just using the From<serde_json::Value> for Value you added?

I left a comment regarding that here: #7488 (comment), I probably posted while you were already doing the review 😄

Going to add further comments there to keep the discussion at one place.

Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
… string

Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
Signed-off-by: Pablo Sichert <mail@pablosichert.com>
…aws-cloudtrail-logs-vrl-function

Signed-off-by: Pablo Sichert <mail@pablosichert.com>
@pablosichert pablosichert marked this pull request as ready for review May 26, 2021 21:25
@pablosichert pablosichert requested review from a team and StephenWakely and removed request for a team May 26, 2021 21:25

#[derive(Clone, Debug, Deserialize)]
#[serde(rename_all(deserialize = "camelCase"))]
struct AwsCloudTrailLogsRecordUserIdentitySessionContextWebIdFederationData {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since this struct is private it would be ok to drop the AwsCloudTrailLogs prefix. Although maybe there is a prize for the longest struct name in the Rust ecosystem, so I understand if you want to keep it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... 😆 The only thing I can say here is that I'd love nested struct declarations in Rust to keep the namespace clean!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting Enterprise Java(TM) flashbacks

@StephenWakely
Copy link
Contributor

If the log message is just json, what is this function providing that couldn't be done with the existing parse_json function?

One thing I can think of is that this will error if the input is not a strict cloudtrail format and we have some decent typedefs coming out of the function, but is that enough of an advantage to provide a new function that we will then need to maintain?

Or is that actually a disadvantage, I'm wondering about the 'eventVersion' field. Does that imply that we could potentially need to handle different versions of log messages and are the structs we have here flexible enough to handle all of them?

@pablosichert
Copy link
Contributor Author

If the log message is just json, what is this function providing that couldn't be done with the existing parse_json function?

One thing I can think of is that this will error if the input is not a strict cloudtrail format and we have some decent typedefs coming out of the function, but is that enough of an advantage to provide a new function that we will then need to maintain?

@StephenWakely that's a good question. The parse_json would work, but as you mentioned the validation and type definitions could be benefitial.

Or is that actually a disadvantage, I'm wondering about the 'eventVersion' field. Does that imply that we could potentially need to handle different versions of log messages and are the structs we have here flexible enough to handle all of them?

I've optimistically disregarded the version, such that newer fields are optional. I don't see much value in having separate structs by version, as we don't have any means to transfer this information into remap.

I share your scepsis of the net value of this function, and looking back at the use case described in the issue make it seem like there was more a problem of malformed JSON than anything else.

@pablosichert
Copy link
Contributor Author

Closed, further context: #4908 (comment).

@pablosichert pablosichert deleted the pablosichert/parse-aws-cloudtrail-logs-vrl-function branch June 16, 2021 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle parsing of CloudTrail logs from S3
4 participants