fix(itn): set 0to9 for measure & money#109
Merged
xingchensong merged 1 commit intomasterfrom Sep 11, 2023
Merged
Conversation
Member
xingchensong
commented
Sep 10, 2023

Member
Author
|
@duj12 please review this PR |
Member
Author
|
From now on, we will have enable_standalone_number=True && enable_0_to_9=False as our default. |
xingchensong
commented
Sep 11, 2023
Comment on lines
100
to
-89
| if self.enable_0_to_9: | ||
| cardinal |= number | ||
| else: | ||
| number_two_plus = (digits + digits.plus) | teen | tens | hundred | thousand | ten_thousand # noqa | ||
| cardinal |= number_two_plus |
Member
Author
xingchensong
commented
Sep 11, 2023
Comment on lines
+32
to
+34
| yyyy = digit + (digit | zero)**3 # 二零零八年 | ||
| yyy = digit + (digit | zero)**2 # 公元一六八年 | ||
| yy = (digit | zero)**2 # 零八年奥运会 |
Member
Author
There was a problem hiding this comment.
这个是为了过单元测试,顺手附带修的bug,现在单元测试分成了四组,若这里不添加yy,则其中一组单元测试过不去:
| standalone number | 0to9 |
|---|---|
| yes | yes |
| yes | no |
| no | no |
| no | yes |
xingchensong
commented
Sep 11, 2023
| # 百分之三十, 百分三十, 百分之百 | ||
| percent = ((sign + delete('的').ques).ques + delete('百分') + | ||
| delete('之').ques + (number | cross('百', '100')) | ||
| delete('之').ques + (Cardinal().number | cross('百', '100')) |
Member
Author
There was a problem hiding this comment.
这里没用number而是用cardinal().number是因为百分数不应该区分是否0~9,比如“百分之二”理应被转换为“2%”
xingchensong
commented
Sep 11, 2023
| help='enable standalone number') | ||
| parser.add_argument('--enable_0_to_9', type=str, | ||
| default='True', | ||
| default='False', |
Member
Author
There was a problem hiding this comment.
以后默认模式就是 enable standalone number = True, enable_0_to_9 = False
Comment on lines
+48
to
+112
| class TestNormalizerDisablestandalonenumberEnable0to9: | ||
|
|
||
| normalizer = InverseNormalizer( | ||
| overwrite_cache=True, | ||
| enable_standalone_number=False, | ||
| enable_0_to_9=True) | ||
|
|
||
| normalizer_cases = chain( | ||
| parse_test_case('data/char.txt'), | ||
| parse_test_case('data/date.txt'), | ||
| parse_test_case('data/fraction.txt'), | ||
| parse_test_case('data/math.txt'), | ||
| parse_test_case('data/measure.txt'), | ||
| parse_test_case('data/money.txt'), | ||
| parse_test_case('data/time.txt'), | ||
| parse_test_case('data/whitelist.txt'), | ||
| parse_test_case('data/normalizer_disable_standalone_number_enable_0_to_9.txt')) | ||
|
|
||
| @pytest.mark.parametrize("spoken, written", normalizer_cases) | ||
| def test_normalizer(self, spoken, written): | ||
| assert self.normalizer.normalize(spoken) == written | ||
|
|
||
|
|
||
| class TestNormalizerEnablestandalonenumberDisable0to9: | ||
|
|
||
| normalizer = InverseNormalizer( | ||
| overwrite_cache=True, | ||
| enable_standalone_number=True, | ||
| enable_0_to_9=False) | ||
|
|
||
| normalizer_cases = chain( | ||
| parse_test_case('data/char.txt'), | ||
| parse_test_case('data/date.txt'), | ||
| parse_test_case('data/fraction.txt'), | ||
| parse_test_case('data/math.txt'), | ||
| parse_test_case('data/money.txt'), | ||
| parse_test_case('data/time.txt'), | ||
| parse_test_case('data/whitelist.txt'), | ||
| parse_test_case('data/normalizer_enable_standalone_number_disable_0_to_9.txt')) | ||
|
|
||
| @pytest.mark.parametrize("spoken, written", normalizer_cases) | ||
| def test_normalizer(self, spoken, written): | ||
| assert self.normalizer.normalize(spoken) == written | ||
|
|
||
|
|
||
| class TestNormalizerDisablestandalonenumberDisable0to9: | ||
|
|
||
| normalizer = InverseNormalizer( | ||
| overwrite_cache=True, | ||
| enable_standalone_number=False, | ||
| enable_0_to_9=False) | ||
|
|
||
| normalizer_cases = chain( | ||
| parse_test_case('data/char.txt'), | ||
| parse_test_case('data/date.txt'), | ||
| parse_test_case('data/fraction.txt'), | ||
| parse_test_case('data/math.txt'), | ||
| parse_test_case('data/money.txt'), | ||
| parse_test_case('data/time.txt'), | ||
| parse_test_case('data/whitelist.txt'), | ||
| parse_test_case('data/normalizer_disable_standalone_number_disable_0_to_9.txt')) | ||
|
|
||
| @pytest.mark.parametrize("spoken, written", normalizer_cases) | ||
| def test_normalizer(self, spoken, written): | ||
| assert self.normalizer.normalize(spoken) == written |
pengzhendong
approved these changes
Sep 11, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
