Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding interpolating accessors #456

Merged
merged 2 commits into from
Jul 1, 2022
Merged

Adding interpolating accessors #456

merged 2 commits into from
Jul 1, 2022

Conversation

WireBaron
Copy link
Contributor

This change adds the following interpolating accessors:

  • interpolated_duration_in to state_agg
  • interpolated_average to time_weight
  • interpolated_delta and interpolated_rate to counter_agg and gauge_agg

These accessors all take an interval lower bound and interval duration along with the previous and next matching aggregate and compute the result using the computed boundary point. These accessors can be used in a postgres window function to give correct results for data that's been grouped into separate time intervals, such as with time_bucket.

Fixes #440

@@ -219,7 +219,7 @@ impl TimeWeightMethod {
Ok(pt)
}

fn weighted_sum(&self, first: TSPoint, second: TSPoint) -> f64 {
pub fn weighted_sum(&self, first: TSPoint, second: TSPoint) -> f64 {
debug_assert!(second.ts > first.ts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why reserve this for debug builds? Now that it's public, can we reconsider?

Also, would an error message help? Should users see this from bad input, or only if we have a bug?

extension/src/counter_agg.rs Outdated Show resolved Hide resolved
};
time_weighted_average::TimeWeightMethod::Linear
.interpolate(first, Some(self.first), interval_start)
.expect("unable to interpolate lower bound")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interpolate returns for two different reasons; would it be helpful for us to show the user that distinction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this upper vs. lower bound? We do have a separate panic message for the cases.

let states_len = states.len() as u64;
let durations_len = durations.len() as u64;
let mut first_state = durations.len();
let mut last_state = durations.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These read a little confusing to me. When durations.is_empty() we're setting them both to 0 via len() but it might be clearer if that were explicit.

Actually, I'm pretty sure I see two different factories here: one requiring a non-empty list and first and last Records, and one Default impl with all fields 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this was more about initializing them to values that would fail the assertion at line 77 if we didn't overwrite them inside the loop. However, you are right that this becomes easier to follow if we handle the empty case separately at the top of this function. The existing behavior allows for calling new with empty arguments, so I don't want to move that into a Default as part of this change.

extension/src/state_aggregate.rs Outdated Show resolved Hide resolved
extension/src/state_aggregate.rs Outdated Show resolved Hide resolved
extension/src/state_aggregate.rs Outdated Show resolved Hide resolved
extension/src/state_aggregate.rs Outdated Show resolved Hide resolved
extension/src/state_aggregate.rs Outdated Show resolved Hide resolved
@@ -71,6 +110,85 @@ pub mod toolkit_experimental {
let end = record.state_end as usize;
&self.states_as_str()[beg..end]
}

#[allow(clippy::unnecessary_unwrap)] // following clippy would make this much more convoluted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I find this one very hard to follow. I think clippy is only picking up on a symptom of the larger problem. I'm still trying to wrap my head around it, so I certainly don't have a suggest yet :)

One clue I have: do prev and next represent optional parameters coming in from SQL? Meaning that the only legal inputs are these?

  • (None, None)
  • (Some(prev), None)
  • (Some(prev), Some(next))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly correct, (None, Some(next)) is also valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I understand this method now, and have two suggestions:

  • My bigger problem is this is very big and complex. I think it would help quite a bit if you split out private interpolate_first and interpolate_last methods to be called here.

  • Addressing clippy's complaint is a simple change; instead of:

            let first = if interval_start < self.first_time && prev.is_some() {
                let prev = prev.unwrap();

do:

            match prev {
                Some(prev) if interval_start < self.first_time => {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the clippy complaint as you mentioned, great suggestion.

For the interpolate_first/last I found that the code I was trying to refactor out wasn't well encapsulated due to the need to potentially add to the states string that was being built up for the result. I left it as it is for now, since creating a interpolate_start_and_update_state_string doesn't seem much of an improvement, but it does seem like this could use a refactor.

states_len: 0,
states: states.into_bytes().into(),
durations_len: 0,
durations: (&*durations).into(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look pretty weird, given that states and durations are both empty here. Why not Slice::Slice(&[])?

At that point, you're looking at a big pile of default values, which is why I suggested impl Default.

This doesn't need to hold up merge. But if you want, I can push a branch demonstrating what it would look like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that does look much nicer with the Slice::Slice(&[])

For the default, are you suggesting that we move this instantiation code here into a default impl and just return default here. Or you suggesting that we implement a default and don't allow new to be called with empty containers? I assumed it's the second one, which does make the calling code a bit more complicated, but given that we only use this in one place that's probably not a big deal.

Copy link
Contributor

@epgts epgts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

first_time: first.map_or(0, |s| s.time),
last_time: last.map_or(0, |s| s.time),
first_state: first_state as u32,
last_state: last_state as u32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that, but this is giving me the willies over far too many overflow bugs that look like this. This looks a whole lot like all the qmail vulnerabilities that came from its 32-bit assumptions no longer holding after the rise of amd64 (and therefore mass market 64-bit).

Can we u32::try_from().expect("len must fit into u32") in at least some boundary places? It may be out of scope for this branch as I don't think this is a new pattern for us. Should I file an issue about it?

@@ -71,6 +110,85 @@ pub mod toolkit_experimental {
let end = record.state_end as usize;
&self.states_as_str()[beg..end]
}

#[allow(clippy::unnecessary_unwrap)] // following clippy would make this much more convoluted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I understand this method now, and have two suggestions:

  • My bigger problem is this is very big and complex. I think it would help quite a bit if you split out private interpolate_first and interpolate_last methods to be called here.

  • Addressing clippy's complaint is a simple change; instead of:

            let first = if interval_start < self.first_time && prev.is_some() {
                let prev = prev.unwrap();

do:

            match prev {
                Some(prev) if interval_start < self.first_time => {

extension/src/state_aggregate.rs Outdated Show resolved Hide resolved
next: Option<StateAgg>,
) -> crate::raw::Interval {
match aggregate {
None => todo!("either 0 or interval if matches prev last state"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to address this as part of this work, or at a later date? It might help to keep an issue open for it and reference that issue here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually an interesting question. We can figure out how to handle a null aggregate in this case, just implement the todo above. However, this null aggregate is going to be the prev for the next group, and was the next for the previous group for that matter. In both of those cases, we don't actually have the information we need to correctly handle an empty group appearing in bucketing, and will return an incorrect answer.

So, I think I want to deal with this by generating an error in this case here, as it's the only place we can identify that we're likely going to generate the wrong answer for some of our buckets.

@WireBaron
Copy link
Contributor Author

bors r+

@bors
Copy link
Contributor

bors bot commented Jul 1, 2022

@bors bors bot merged commit 6d2b748 into main Jul 1, 2022
@bors bors bot deleted the window_interpolate branch July 1, 2022 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build support for using window functions to avoid gaps caused by postgres grouping
2 participants