Skip to content

fix(itn): set 0to9 for measure & money#109

Merged
xingchensong merged 1 commit intomasterfrom
xcsong-fix-0to9
Sep 11, 2023
Merged

fix(itn): set 0to9 for measure & money#109
xingchensong merged 1 commit intomasterfrom
xcsong-fix-0to9

Conversation

@xingchensong
Copy link
Copy Markdown
Member

image

@xingchensong
Copy link
Copy Markdown
Member Author

@duj12 please review this PR

@xingchensong
Copy link
Copy Markdown
Member Author

From now on, we will have enable_standalone_number=True && enable_0_to_9=False as our default.

Comment on lines 100 to -89
if self.enable_0_to_9:
cardinal |= number
else:
number_two_plus = (digits + digits.plus) | teen | tens | hundred | thousand | ten_thousand # noqa
cardinal |= number_two_plus
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前enable_0_to_9只有在cardinal的tagger中才生效,在遇到量词时 (比如“九天”) measure中用的是cardinal.number,而非cardinal.tagger,此时设置enable_0_to_9=False,“九天”依旧会被转成“9天”
image

Comment thread itn/chinese/rules/date.py
Comment on lines +32 to +34
yyyy = digit + (digit | zero)**3 # 二零零八年
yyy = digit + (digit | zero)**2 # 公元一六八年
yy = (digit | zero)**2 # 零八年奥运会
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是为了过单元测试,顺手附带修的bug,现在单元测试分成了四组,若这里不添加yy,则其中一组单元测试过不去:

standalone number 0to9
yes yes
yes no
no no
no yes

# 百分之三十, 百分三十, 百分之百
percent = ((sign + delete('的').ques).ques + delete('百分') +
delete('之').ques + (number | cross('百', '100'))
delete('之').ques + (Cardinal().number | cross('百', '100'))
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里没用number而是用cardinal().number是因为百分数不应该区分是否0~9,比如“百分之二”理应被转换为“2%”

Comment thread itn/main.py
help='enable standalone number')
parser.add_argument('--enable_0_to_9', type=str,
default='True',
default='False',
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

以后默认模式就是 enable standalone number = True, enable_0_to_9 = False

Comment on lines +48 to +112
class TestNormalizerDisablestandalonenumberEnable0to9:

normalizer = InverseNormalizer(
overwrite_cache=True,
enable_standalone_number=False,
enable_0_to_9=True)

normalizer_cases = chain(
parse_test_case('data/char.txt'),
parse_test_case('data/date.txt'),
parse_test_case('data/fraction.txt'),
parse_test_case('data/math.txt'),
parse_test_case('data/measure.txt'),
parse_test_case('data/money.txt'),
parse_test_case('data/time.txt'),
parse_test_case('data/whitelist.txt'),
parse_test_case('data/normalizer_disable_standalone_number_enable_0_to_9.txt'))

@pytest.mark.parametrize("spoken, written", normalizer_cases)
def test_normalizer(self, spoken, written):
assert self.normalizer.normalize(spoken) == written


class TestNormalizerEnablestandalonenumberDisable0to9:

normalizer = InverseNormalizer(
overwrite_cache=True,
enable_standalone_number=True,
enable_0_to_9=False)

normalizer_cases = chain(
parse_test_case('data/char.txt'),
parse_test_case('data/date.txt'),
parse_test_case('data/fraction.txt'),
parse_test_case('data/math.txt'),
parse_test_case('data/money.txt'),
parse_test_case('data/time.txt'),
parse_test_case('data/whitelist.txt'),
parse_test_case('data/normalizer_enable_standalone_number_disable_0_to_9.txt'))

@pytest.mark.parametrize("spoken, written", normalizer_cases)
def test_normalizer(self, spoken, written):
assert self.normalizer.normalize(spoken) == written


class TestNormalizerDisablestandalonenumberDisable0to9:

normalizer = InverseNormalizer(
overwrite_cache=True,
enable_standalone_number=False,
enable_0_to_9=False)

normalizer_cases = chain(
parse_test_case('data/char.txt'),
parse_test_case('data/date.txt'),
parse_test_case('data/fraction.txt'),
parse_test_case('data/math.txt'),
parse_test_case('data/money.txt'),
parse_test_case('data/time.txt'),
parse_test_case('data/whitelist.txt'),
parse_test_case('data/normalizer_disable_standalone_number_disable_0_to_9.txt'))

@pytest.mark.parametrize("spoken, written", normalizer_cases)
def test_normalizer(self, spoken, written):
assert self.normalizer.normalize(spoken) == written
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

增加三组对照测试

@xingchensong xingchensong merged commit f60d9e4 into master Sep 11, 2023
@xingchensong xingchensong deleted the xcsong-fix-0to9 branch September 11, 2023 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants