Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that canonically equivalent strings have the same width #37

Merged

Conversation

Jules-Bertholet
Copy link
Contributor

@Jules-Bertholet Jules-Bertholet commented Apr 22, 2024

Before this PR, it was possible for the NFC and NFD forms of a string to be assigned different widths by this crate. This PR fixes that by marking all characters with Grapheme_Extend as zero-width, and adds a test to make sure the property continues to hold.

There are also 8 characters that we manually assign width 0 to. I think these should have been Grapheme_Extend by UAX29's rules, because they NFD decompose to characters with the property; I've reported the issue to Unicode.

Unfortunately, width_cjk() continues not to be closed under canonical equivalence.

@Manishearth Manishearth merged commit 9c4477c into unicode-rs:master Apr 22, 2024
2 checks passed
@Jules-Bertholet Jules-Bertholet deleted the canonical-equivalence branch April 22, 2024 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants