Move UAX#14 defines to line.toml#1568
Conversation
makotokato
commented
Feb 1, 2022
- Generate UAX#14 table using toml file like UAX#29.
- Remove python tool to generate UAX#14 machine state table
aethanyc
left a comment
There was a problem hiding this comment.
I'm excited to see python script gets converted to TOML! I assume this doesn't change the behavior?
Suggestion:
- Remove
rule_table.rssince it's longer used. - How about renaming
line_breaker.rstoline.rsto match other breaker implementation likeword.rs, etc?
| break_state = true | ||
|
|
||
| [[rules]] | ||
| left = [ "Any" ] |
There was a problem hiding this comment.
Suggestion: Is this # LB31?
| assert_eq!(is_break(BB, AL), false); | ||
| // LB21 | ||
| assert_eq!(is_break(AL, BA), false); | ||
| assert_eq!(is_break(BB, AL), false); |
There was a problem hiding this comment.
This seems like a reasonable change to reflect LB21: BB ×.
0c6ae63 to
0fefbeb
Compare
|
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
Yes, tests are passed although some unused states are different, but it is unused. |
|
And I also add multi thread version of data generation. But I will improve more after this. |
aethanyc
left a comment
There was a problem hiding this comment.
And I also add multi thread version of data generation. But I will improve more after this.
It's nice to speed up bulid.rs as the interim developer efficiency. However, I'd feel we probably shouldn't spend more time on the performance of bulid.rs because we really should migrate the segmenter data generation into icu4x data generation tool, i.e. https://github.com/unicode-org/icu4x/tree/main/tools/datagen. Ideally, the segmenter data generation can be speed up on top of the work in #1600.
9554373 to
ccdcf98
Compare
|
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |