Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty keys in BlobDataProvider #3551

Merged
merged 2 commits into from
Jun 21, 2023
Merged

Conversation

robertbastian
Copy link
Member

Fixes #3545

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution works because the empty string is never equal to a locale, because a locale always has a language subtag. It does seem like a fairly narrow change, which is good.

Comment on lines +83 to +84
for (locale, idx) in cursor.iter1_copied() {
debug_assert!(idx < self.buffers.len() || locale == Index32U8::SENTINEL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: This is problematic because a newer blob won't pass the invariants of an older blob reader

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally ok with breaking debug assertions for new-blob-old-reader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also instead set the index to 0; it's fine because it'll never be loaded anyway (and if someone does attempt to load it raw, they'll get a deserialize error or potentially GIGO behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that for a blob with just an empty key, 0 is also invalid as there will be zero buffers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to have debug assertions break in that relatively niche case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(but also I'm not against breaking debug assertions with the =len sentinel in the first place. would prefer to avoid it, really do not care about the empty-blob case)

Comment on lines +83 to +84
for (locale, idx) in cursor.iter1_copied() {
debug_assert!(idx < self.buffers.len() || locale == Index32U8::SENTINEL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally ok with breaking debug assertions for new-blob-old-reader.

Comment on lines +83 to +84
for (locale, idx) in cursor.iter1_copied() {
debug_assert!(idx < self.buffers.len() || locale == Index32U8::SENTINEL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also instead set the index to 0; it's fine because it'll never be loaded anyway (and if someone does attempt to load it raw, they'll get a deserialize error or potentially GIGO behavior.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok to use .len()

@robertbastian robertbastian merged commit f43d9f6 into unicode-org:main Jun 21, 2023
23 checks passed
@robertbastian robertbastian deleted the blob branch June 22, 2023 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

create dictionary based WordSegmenter returns MissingDataKey, segmenter/dictionary/w_auto@1
3 participants