Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: check file name uniqueness with Unicode canonical case fold normalization #1409

Merged
merged 1 commit into from
Dec 6, 2022

Conversation

rdeltour
Copy link
Member

@rdeltour rdeltour commented Dec 2, 2022

This commit changes the OCF container file name uniqueness check to perform the Unicode canonical case fold normalization step defined in https://www.w3.org/TR/charmod-norm/#CanonicalFoldNormalizationStep

That is, we normalize the file name to NFD then apply full case folding before checking for uniqueness.

Previously the behaviors was:

  • we checked for uniqueness of the lower case form (String.toLowerCase)
  • we checked for uniqueness of the NFC-normalized lower case

This was flawed, since String.toLowerCase is not equivalent to Unicode full case folding.

Also, previously, only a warning (OPF-061) was reported when names were not unique after NFC normalization. This is now an error, using the same code as the other uniqueness failures (OPF-060).

This commit removes OPF-061, which is no longer used.

Fixes #1246

…malization

This commit changes the OCF container file name uniqueness check to
perform the Unicode canonical case fold normalization step defined in
https://www.w3.org/TR/charmod-norm/#CanonicalFoldNormalizationStep

That is, we normalize the file name to NFD then apply full case folding
before checking for uniqueness.

Previously the behaviors was:
- we checked for uniqueness of the lower case form (String.toLowerCase)
- we checked for uniqueness of the NFC-normalized lower case

This was flawed, since String.toLowerCase is not equivalent to Unicode
full case folding.

Also, previously, only a warning (OPF-061) was reported when names
were not unique after NFC normalization. This is now an error, using
the same code as the other uniqueness failures (OPF-060).

This commit removes OPF-061, which is no longer used.

Fixes #1246
@rdeltour rdeltour added this to the v5.0.0-beta milestone Dec 2, 2022
@rdeltour rdeltour self-assigned this Dec 2, 2022
Base automatically changed from feat/ocf-filename-character-check to release/v5.0.0 December 6, 2022 12:13
@rdeltour rdeltour merged commit 111e772 into release/v5.0.0 Dec 6, 2022
@rdeltour rdeltour deleted the feat/unicode-normalization-check branch December 6, 2022 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants