-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide methods for finding the normalized prefix of input #4256
Labels
Comments
hsivonen
added
C-collator
Component: Collation, normalization
U-gecko
User: Gecko
labels
Nov 7, 2023
CC @CanadaHonk |
CanadaHonk
pushed a commit
to CanadaHonk/icu4x
that referenced
this issue
Nov 20, 2023
Closes unicode-org#4256. No UTF16 tests or fuzzing yet.
CanadaHonk
pushed a commit
to CanadaHonk/icu4x
that referenced
this issue
Nov 20, 2023
Closes unicode-org#4256. No UTF16 tests or fuzzing yet.
CanadaHonk
pushed a commit
to CanadaHonk/icu4x
that referenced
this issue
Nov 20, 2023
Closes unicode-org#4256. No UTF16 tests or fuzzing yet. Also added UTF8 variant to FFI as `is_normalized_up_to`.
CanadaHonk
pushed a commit
to CanadaHonk/icu4x
that referenced
this issue
Nov 20, 2023
Closes unicode-org#4256. No UTF16 tests or fuzzing yet. Also added UTF8 variant to FFI as `is_normalized_up_to`.
CanadaHonk
pushed a commit
to CanadaHonk/icu4x
that referenced
this issue
Nov 20, 2023
Closes unicode-org#4256. No UTF16 tests or fuzzing yet. Also added UTF8 variant to FFI as `is_normalized_up_to`.
CanadaHonk
pushed a commit
to CanadaHonk/icu4x
that referenced
this issue
Nov 20, 2023
Closes unicode-org#4256. Added UTF8 variant to FFI as `is_normalized_up_to`. No UTF16 tests or fuzzing yet.
CanadaHonk
pushed a commit
to CanadaHonk/icu4x
that referenced
this issue
Nov 20, 2023
Closes unicode-org#4256. Added UTF8 variant to FFI as `is_normalized_up_to`. No UTF16 tests or fuzzing yet.
CanadaHonk
pushed a commit
to CanadaHonk/icu4x
that referenced
this issue
Apr 23, 2024
Closes unicode-org#4256. Added UTF8 variant to FFI as `is_normalized_up_to`. No UTF16 tests or fuzzing yet.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ComposingNormalizer
andDecomposingNormalizer
currently provide methodsis_normalized()
,is_normalized_utf8()
, andis_normalized_utf16()
. If the return value is false and the application then decides to normalize, the normalization-related data structure lookups are done twice for the (potential) already-normalized prefix.ComposingNormalizer
andDecomposingNormalizer
should provide methodsis_normalized_up_to()
,is_normalized_utf8_up_to()
, andis_normalized_utf16_up_to()
that returnusize
such that the return value is the largest possible (but no larger than the length of input) with which the following assert passes:Then this should become a valid alternative implementation of
is_normalized()
:Gecko use case: https://searchfox.org/mozilla-central/rev/e94bcd536a2a4caad0597d1b2d624342e6a389c4/intl/components/src/String.h#132
(Note that ICU4X deliberately doesn't implement quick check, which Gecko currently uses for the prefix computation.)
The text was updated successfully, but these errors were encountered: