Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle incorrect VZV bytes representation #3883

Merged
merged 3 commits into from
Aug 17, 2023

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian commented Aug 17, 2023

The empty VZV is currently either [], when created with VZV::new/VZS::new_empty, or [0,0,0,0] when created with VZV::from(&[]). This breaks the byte equality invariant that PartialEq, among others, relies on.

This fix makes VZS::new_empty return [0,0,0,0]. However, as there is data with the [] out in the wild, we probably have to adapt at least PartialEq to handle it.

This fix make VZV::from return [] for empty input, and updates the PartialEq implementation to deal with the non-canonical representation.

@Manishearth
Copy link
Member

Manishearth commented Aug 17, 2023

IMO empty VZVs should canonically be the empty slice

@robertbastian robertbastian changed the title Remove incorrect VZV bytes representation Handle incorrect VZV bytes representation Aug 17, 2023
@@ -451,8 +455,14 @@ where
{
#[inline]
fn eq(&self, other: &VarZeroVec<'b, T, F>) -> bool {
// VarULE has an API guarantee that this is equivalent
// to `T::VarULE::eq()`
// VZV::from_elements used to produce a non-canonical representation of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine allowing this to slip

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though this is ok too, note that this also breaks the VarULE "equality is byte equality" invariant on VZS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we remove this we should also special case is_empty to check for empty length

Copy link
Member Author

@robertbastian robertbastian Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_empty should already work, because it checks VZComponent's indices slice, which gets special-case initialised.

note that this also breaks the VarULE "equality is byte equality" invariant on VZS

We already broke that invariant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that testdata CI is failing so the incorrect representation is out there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that's fine by me then, since it's legacy-only. We may wish to deprecate this in the long run.

sffc
sffc previously approved these changes Aug 17, 2023
@sffc
Copy link
Member

sffc commented Aug 17, 2023

I slightly lean toward [0,0,0,0] being canonical because it's nice that a VZV is guaranteed to have these 4 length bytes (no special cases) but not a strong opinion. It's nice that [] saves us some space.

Manishearth
Manishearth previously approved these changes Aug 17, 2023
@robertbastian robertbastian merged commit 2f806bc into unicode-org:main Aug 17, 2023
26 checks passed
@robertbastian robertbastian deleted the vzv branch August 17, 2023 16:52
@Manishearth Manishearth mentioned this pull request Sep 21, 2023
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants