Skip to content

Move UAX#14 defines to line.toml#1568

Merged
makotokato merged 8 commits intounicode-org:mainfrom
makotokato:line-toml
Feb 15, 2022
Merged

Move UAX#14 defines to line.toml#1568
makotokato merged 8 commits intounicode-org:mainfrom
makotokato:line-toml

Conversation

@makotokato
Copy link
Copy Markdown
Member

  • Generate UAX#14 table using toml file like UAX#29.
  • Remove python tool to generate UAX#14 machine state table

@makotokato makotokato requested a review from aethanyc as a code owner February 1, 2022 15:21
@aethanyc aethanyc linked an issue Feb 1, 2022 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

@aethanyc aethanyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited to see python script gets converted to TOML! I assume this doesn't change the behavior?

Suggestion:

  1. Remove rule_table.rs since it's longer used.
  2. How about renaming line_breaker.rs to line.rs to match other breaker implementation like word.rs, etc?

Comment thread experimental/segmenter/build.rs Outdated
Comment thread experimental/segmenter/build.rs Outdated
break_state = true

[[rules]]
left = [ "Any" ]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Is this # LB31?

Comment thread experimental/segmenter/build.rs Outdated
Comment thread experimental/segmenter/build.rs Outdated
assert_eq!(is_break(BB, AL), false);
// LB21
assert_eq!(is_break(AL, BA), false);
assert_eq!(is_break(BB, AL), false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable change to reflect LB21: BB ×.

Comment thread experimental/segmenter/build.rs Outdated
@aethanyc aethanyc added the waiting-on-author PRs waiting for action from the author for >7 days label Feb 8, 2022
@jira-pull-request-webhook
Copy link
Copy Markdown

Notice: the branch changed across the force-push!

  • experimental/segmenter/src/line.rs is different
  • experimental/segmenter/src/provider.rs is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@makotokato
Copy link
Copy Markdown
Member Author

I assume this doesn't change the behavior?

Yes, tests are passed although some unused states are different, but it is unused.

@makotokato
Copy link
Copy Markdown
Member Author

And I also add multi thread version of data generation. But I will improve more after this.

@aethanyc aethanyc removed the waiting-on-author PRs waiting for action from the author for >7 days label Feb 14, 2022
aethanyc
aethanyc previously approved these changes Feb 14, 2022
Copy link
Copy Markdown
Contributor

@aethanyc aethanyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I also add multi thread version of data generation. But I will improve more after this.

It's nice to speed up bulid.rs as the interim developer efficiency. However, I'd feel we probably shouldn't spend more time on the performance of bulid.rs because we really should migrate the segmenter data generation into icu4x data generation tool, i.e. https://github.com/unicode-org/icu4x/tree/main/tools/datagen. Ideally, the segmenter data generation can be speed up on top of the work in #1600.

@jira-pull-request-webhook
Copy link
Copy Markdown

Notice: the branch changed across the force-push!

  • experimental/segmenter/README.md is different
  • experimental/segmenter/src/lib.rs is different
  • experimental/segmenter/src/line.rs is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@makotokato makotokato merged commit 48738cb into unicode-org:main Feb 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use Rust instead of Python to generate UAX14 line break tables

2 participants