Adding interpolating accessors #456

WireBaron · 2022-06-24T16:38:34Z

This change adds the following interpolating accessors:

interpolated_duration_in to state_agg
interpolated_average to time_weight
interpolated_delta and interpolated_rate to counter_agg and gauge_agg

These accessors all take an interval lower bound and interval duration along with the previous and next matching aggregate and compute the result using the computed boundary point. These accessors can be used in a postgres window function to give correct results for data that's been grouped into separate time intervals, such as with time_bucket.

Fixes #440

epgts · 2022-06-28T23:45:57Z

crates/time-weighted-average/src/lib.rs

@@ -219,7 +219,7 @@ impl TimeWeightMethod {
        Ok(pt)
    }

-    fn weighted_sum(&self, first: TSPoint, second: TSPoint) -> f64 {
+    pub fn weighted_sum(&self, first: TSPoint, second: TSPoint) -> f64 {
        debug_assert!(second.ts > first.ts);


Why reserve this for debug builds? Now that it's public, can we reconsider?

Also, would an error message help? Should users see this from bad input, or only if we have a bug?

extension/src/counter_agg.rs

epgts · 2022-06-28T23:58:18Z

extension/src/counter_agg.rs

+            };
+            time_weighted_average::TimeWeightMethod::Linear
+                .interpolate(first, Some(self.first), interval_start)
+                .expect("unable to interpolate lower bound")


interpolate returns for two different reasons; would it be helpful for us to show the user that distinction?

Is this upper vs. lower bound? We do have a separate panic message for the cases.

epgts · 2022-06-29T05:35:06Z

extension/src/state_aggregate.rs

            let states_len = states.len() as u64;
            let durations_len = durations.len() as u64;
+            let mut first_state = durations.len();
+            let mut last_state = durations.len();


These read a little confusing to me. When durations.is_empty() we're setting them both to 0 via len() but it might be clearer if that were explicit.

Actually, I'm pretty sure I see two different factories here: one requiring a non-empty list and first and last Records, and one Default impl with all fields 0.

Hmm, this was more about initializing them to values that would fail the assertion at line 77 if we didn't overwrite them inside the loop. However, you are right that this becomes easier to follow if we handle the empty case separately at the top of this function. The existing behavior allows for calling new with empty arguments, so I don't want to move that into a Default as part of this change.

extension/src/state_aggregate.rs

epgts · 2022-06-29T14:48:51Z

extension/src/state_aggregate.rs

@@ -71,6 +110,85 @@ pub mod toolkit_experimental {
            let end = record.state_end as usize;
            &self.states_as_str()[beg..end]
        }
+
+        #[allow(clippy::unnecessary_unwrap)] // following clippy would make this much more convoluted 


Yeah, I find this one very hard to follow. I think clippy is only picking up on a symptom of the larger problem. I'm still trying to wrap my head around it, so I certainly don't have a suggest yet :)

One clue I have: do prev and next represent optional parameters coming in from SQL? Meaning that the only legal inputs are these?

(None, None)

(Some(prev), None)

(Some(prev), Some(next))

This is mostly correct, (None, Some(next)) is also valid.

OK, I think I understand this method now, and have two suggestions:

My bigger problem is this is very big and complex. I think it would help quite a bit if you split out private interpolate_first and interpolate_last methods to be called here.

Addressing clippy's complaint is a simple change; instead of:

let first = if interval_start < self.first_time && prev.is_some() { let prev = prev.unwrap();

do:

match prev { Some(prev) if interval_start < self.first_time => {

I fixed the clippy complaint as you mentioned, great suggestion.

For the interpolate_first/last I found that the code I was trying to refactor out wasn't well encapsulated due to the need to potentially add to the states string that was being built up for the result. I left it as it is for now, since creating a interpolate_start_and_update_state_string doesn't seem much of an improvement, but it does seem like this could use a refactor.

epgts · 2022-06-30T16:40:47Z

extension/src/state_aggregate.rs

+                        states_len: 0,
+                        states: states.into_bytes().into(),
+                        durations_len: 0,
+                        durations: (&*durations).into(),


These look pretty weird, given that states and durations are both empty here. Why not Slice::Slice(&[])?

At that point, you're looking at a big pile of default values, which is why I suggested impl Default.

This doesn't need to hold up merge. But if you want, I can push a branch demonstrating what it would look like.

You're right, that does look much nicer with the Slice::Slice(&[])

For the default, are you suggesting that we move this instantiation code here into a default impl and just return default here. Or you suggesting that we implement a default and don't allow new to be called with empty containers? I assumed it's the second one, which does make the calling code a bit more complicated, but given that we only use this in one place that's probably not a big deal.

epgts

Thanks!

epgts · 2022-06-30T16:43:44Z

extension/src/state_aggregate.rs

+                    first_time: first.map_or(0, |s| s.time),
+                    last_time: last.map_or(0, |s| s.time),
+                    first_state: first_state as u32,
+                    last_state: last_state as u32,


I get that, but this is giving me the willies over far too many overflow bugs that look like this. This looks a whole lot like all the qmail vulnerabilities that came from its 32-bit assumptions no longer holding after the rise of amd64 (and therefore mass market 64-bit).

Can we u32::try_from().expect("len must fit into u32") in at least some boundary places? It may be out of scope for this branch as I don't think this is a new pattern for us. Should I file an issue about it?

epgts · 2022-06-30T16:46:49Z

extension/src/state_aggregate.rs

@@ -71,6 +110,85 @@ pub mod toolkit_experimental {
            let end = record.state_end as usize;
            &self.states_as_str()[beg..end]
        }
+
+        #[allow(clippy::unnecessary_unwrap)] // following clippy would make this much more convoluted 


OK, I think I understand this method now, and have two suggestions:

My bigger problem is this is very big and complex. I think it would help quite a bit if you split out private interpolate_first and interpolate_last methods to be called here.

Addressing clippy's complaint is a simple change; instead of:

let first = if interval_start < self.first_time && prev.is_some() { let prev = prev.unwrap();

do:

match prev { Some(prev) if interval_start < self.first_time => {

extension/src/state_aggregate.rs

epgts · 2022-06-30T16:51:07Z

extension/src/state_aggregate.rs

+    next: Option<StateAgg>,
+) -> crate::raw::Interval {
+    match aggregate {
+        None => todo!("either 0 or interval if matches prev last state"),


Are we planning to address this as part of this work, or at a later date? It might help to keep an issue open for it and reference that issue here.

This is actually an interesting question. We can figure out how to handle a null aggregate in this case, just implement the todo above. However, this null aggregate is going to be the prev for the next group, and was the next for the previous group for that matter. In both of those cases, we don't actually have the information we need to correctly handle an empty group appearing in bucketing, and will return an incorrect answer.

So, I think I want to deal with this by generating an error in this case here, as it's the only place we can identify that we're likely going to generate the wrong answer for some of our buckets.

WireBaron · 2022-07-01T18:25:06Z

bors r+

bors · 2022-07-01T18:40:55Z

Build succeeded:

WireBaron requested review from rtwalker, JLockerman, thatzopoulos, davidkohn88 and epgts June 24, 2022 16:38

epgts reviewed Jun 29, 2022

View reviewed changes

epgts reviewed Jun 30, 2022

View reviewed changes

epgts approved these changes Jul 1, 2022

View reviewed changes

Brian Rowe added 2 commits July 1, 2022 11:06

adding first and last points to state_agg

147b397

adding interpolating functionality and accessors to aggregates

aad4ecf

WireBaron force-pushed the window_interpolate branch from 03ae67b to aad4ecf Compare July 1, 2022 18:07

bors bot merged commit 6d2b748 into main Jul 1, 2022

bors bot deleted the window_interpolate branch July 1, 2022 18:40

davidkohn88 mentioned this pull request Sep 27, 2022

[Bug]: <LOCF has not correct behaviour> #544

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding interpolating accessors #456

Adding interpolating accessors #456

WireBaron commented Jun 24, 2022

epgts Jun 28, 2022

epgts Jun 28, 2022

WireBaron Jun 29, 2022

epgts Jun 29, 2022

WireBaron Jun 29, 2022

epgts Jun 29, 2022

WireBaron Jun 29, 2022

epgts Jun 30, 2022

WireBaron Jun 30, 2022

epgts Jun 30, 2022

WireBaron Jun 30, 2022

epgts left a comment

epgts Jun 30, 2022

epgts Jun 30, 2022

epgts Jun 30, 2022

WireBaron Jun 30, 2022

WireBaron commented Jul 1, 2022

bors bot commented Jul 1, 2022

Adding interpolating accessors #456

Adding interpolating accessors #456

Conversation

WireBaron commented Jun 24, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

epgts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WireBaron commented Jul 1, 2022

bors bot commented Jul 1, 2022