Skip to content

ICU-23401 Correctly implement rule LB3, thereby removing a few more states#3975

Draft
eggrobin wants to merge 13 commits into
unicode-org:mainfrom
eggrobin:LB3
Draft

ICU-23401 Correctly implement rule LB3, thereby removing a few more states#3975
eggrobin wants to merge 13 commits into
unicode-org:mainfrom
eggrobin:LB3

Conversation

@eggrobin
Copy link
Copy Markdown
Member

@eggrobin eggrobin commented May 8, 2026

The line.txt rules have rule statuses distinguishing mandatory breaks (status UBRK_LINE_HARD=100) from break opportunities (status UBRK_LINE_SOFT=0).

However, they incorrectly return UBRK_LINE_SOFT at end of text unless the last character is a hard line break, contrary to https://www.unicode.org/reports/tr14/#LB3. This is unlikely to matter to callers, as the start and end of text would most likely be special-cased in any actual implementation, but it results in unnecessary splitting of states.

Correcting the rules to treat eot as a hard break removes four states from the state machine, as it no longer needs to make unnecessary distinctions between eot and LF etc.: eggrobin/unicodetools@496e9a0...cb6c98a

Checklist

  • Required: Issue filed: ICU-23401
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable
  • Approver: Feel free to merge on my behalf

eggrobin added a commit to eggrobin/unicodetools that referenced this pull request May 8, 2026
@eggrobin eggrobin changed the title LB3 ICU-23376 Correctly implement rule LB3, thereby removing a few more states May 11, 2026
@eggrobin eggrobin changed the title ICU-23376 Correctly implement rule LB3, thereby removing a few more states ICU-23401 Correctly implement rule LB3, thereby removing a few more states May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant