Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support explicit NULL values in timevectors #429

Merged
merged 4 commits into from
Jun 3, 2022
Merged

Conversation

WireBaron
Copy link
Contributor

This change modifies the timevector structure by replacing the is_sorted field with a set of bit flags covering both whether a timevector is known to be sorted and whether it may contain NULL values. It also adds a bitvector with one bit per entry indicating whether that entry is NULL.

Fixes #425

@rtwalker
Copy link
Contributor

I think that the update tests failed because this branch is too far behind main. We've landed several changes to the update-tester crate in the meantime

@WireBaron WireBaron force-pushed the br/timevector_explicit_null branch from 12ff8dc to f23066b Compare May 25, 2022 23:07
Copy link
Contributor

@epgts epgts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but do you want to add some test cases with null values? I may have missed them; I thought I only saw existing tests without nulls updated.

Also could use a rustfmt :)

@WireBaron
Copy link
Contributor Author

Looks good to me but do you want to add some test cases with null values? I may have missed them; I thought I only saw existing tests without nulls updated.

Also could use a rustfmt :)

I did add a NULL value to the test in the sort pipeline element since I was most concerned about that behavior. It did suss out a few issues, but I'm not sure adding another test would exercise this code any differently.

@rtwalker
Copy link
Contributor

rtwalker commented Jun 2, 2022

Looks like you need another rebase onto main now that we've upgraded pgx

@WireBaron WireBaron force-pushed the br/timevector_explicit_null branch from 17dd718 to 75ca499 Compare June 2, 2022 17:36
flags: time_vector::FLAG_IS_SORTED,
internal_padding: [0; 3],
points: (&*downsampled).into(),
null_val: std::vec::from_elem(0_u8, (downsampled.len() + 7) / 8).into()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is from_elem()? I've never heard of the function before, and I can't find any refrences to it in the docs...

If it's what I think it is, it's probably more idiomatic to do

null_val: vec![0_u8; (downsampled.len() + 7) / 8].into()

|| state.points.as_slice().last().unwrap().ts
> series.points.as_slice().first().unwrap().ts)
{
state.flags ^= FLAG_IS_SORTED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is ^= really better than state.flags &= !FLAG_IS_SORTED here?

points: [TSPoint; self.num_points],
null_val: [u8; (self.num_points + 7)/ 8], // bit vector, must be last element for alignment purposes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, if only div_ceil() was stable, then we could use that instead of doing this manually. I do wonder if we should extract this out into a type, or at least extract out (num_points + 7)/ 8 into a function

Comment on lines 25 to 26
// Bit flags stored in Timevector flags
const FLAG_HAS_NULLS : u8 = 0x01;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bitflags library may be useful for this

Copy link
Contributor

@JLockerman JLockerman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're going to eventually want to encapsulate the nulls bitmap a bit better, and probably clean the text output a bit, but it's good enough for now

@WireBaron
Copy link
Contributor Author

bors r+

@bors
Copy link
Contributor

bors bot commented Jun 3, 2022

Build succeeded:

@bors bors bot merged commit 9a8357d into main Jun 3, 2022
@bors bors bot deleted the br/timevector_explicit_null branch June 3, 2022 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Explicit NULL values in Timevectors
4 participants