Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22285 omit the gb2312 & big5han collation tailorings by default #2365

Merged
merged 1 commit into from Mar 14, 2023

Conversation

markusicu
Copy link
Member

This reduces the size of coll/zh.res from 918660 bytes to 682904 bytes.

Checklist
  • Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22285
  • Required: The PR title must be prefixed with a JIRA Issue number.
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number.
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

@FrankYFTang
Copy link
Contributor

./main/tests/collate/src/com/ibm/icu/dev/test/collator/CollationServiceTest.java
./main/tests/collate/src/com/ibm/icu/dev/test/util/ICUResourceBundleCollationTest.java
./main/tests/core/src/com/ibm/icu/dev/test/util/TestLocaleValidity.java

need to be changes to remove the test against big5han and gb2312han

@FrankYFTang
Copy link
Contributor

https://github.com/unicode-org/icu/actions/runs/4410702565/jobs/7728445138#step:7:1

name "*.txt" -exec grep -l FAILED {} \;`;
Testsuite: com.ibm.icu.dev.test.util.ICUResourceBundleCollationTest
Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.043 sec

Testcase: TestFunctionalEquivalent took 0.01 sec
	FAILED
16:  Error, expected  Equiv=false		zh_TW_STROKE@collation=big5han		--> zh@collation=big5han,  but got false zh@collation=stroke
junit.framework.AssertionFailedError: 16:  Error, expected  Equiv=false		zh_TW_STROKE@collation=big5han		--> zh@collation=big5han,  but got false zh@collation=stroke
	at com.ibm.icu.dev.test.AbstractTestLog.errln(AbstractTestLog.java:50)
	at com.ibm.icu.dev.test.util.ICUResourceBundleCollationTest.getFunctionalEquivalentTestCases(ICUResourceBundleCollationTest.java:1[7](https://github.com/unicode-org/icu/actions/runs/4410702565/jobs/7728445138#step:7:8)9)
	at com.ibm.icu.dev.test.util.ICUResourceBundleCollationTest.TestFunctionalEquivalent(ICUResourceBundleCollationTest.java:7[8](https://github.com/unicode-org/icu/actions/runs/4410702565/jobs/7728445138#step:7:9))
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

Testcase: TestOpen took 0 sec
Testcase: TestKeywordValues took 0 sec
Testcase: TestGetWithFallback took 0.004 sec

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/main/shared/data/icudata.jar is now changed in the branch
  • icu4j/main/tests/collate/src/com/ibm/icu/dev/test/util/ICUResourceBundleCollationTest.java is now changed in the branch
  • icu4j/maven-build/maven-icu4j-datafiles/src/main/resources/com/ibm/icu/impl/data/icudt73b/coll/zh.res is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang
Copy link
Contributor

looks like you have merge conflict and need a rebase/merge

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/main/shared/data/icudata.jar is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@markusicu
Copy link
Member Author

Amended with Java changes, and rebased.

Copy link
Contributor

@echeran echeran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@markusicu markusicu merged commit 2d9fa3f into unicode-org:main Mar 14, 2023
79 checks passed
@markusicu markusicu deleted the omit-han-charset-collations branch March 14, 2023 22:20
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this was merged with me as assignee in less than 24 hours before I had a chance to open and review it)

The docs do not explain how to reset the rules back to include big5han and gb2312han. If the idea is to add a rule such as "+/collations", please document that, and also test it by adding that rule to icu/.ci-builds/data-filter.json and then run a test to ensure that the collations work by adding it here:

https://github.com/unicode-org/icu/blob/main/.ci-builds/.azure-pipelines.yml#L62

@markusicu
Copy link
Member Author

The docs do not explain how to reset the rules back to include big5han and gb2312han. If the idea is to add a rule such as "+/collations", please document that,

Ok --> PR #2371

and also test it by adding that rule to icu/.ci-builds/data-filter.json and then run a test to ensure that the collations work by adding it here:

https://github.com/unicode-org/icu/blob/main/.ci-builds/.azure-pipelines.yml#L62

I tested it manually and verified that coll/zh.res is back to its old size, but I don't know how to write a test that works with and without these tailorings in the data, and tells us something useful in CI. Ideas?

@sffc
Copy link
Member

sffc commented Mar 15, 2023

Ideas?

Pass a config somehow to the test? Lots of ways to do that: environment variable at runtime, -D variable at compile time, new argument to intltest, existing argument to intltest, ...

I think it's worthwhile testing. Otherwise there's no code anywhere that verifies that these collations even work or run to completion; we may as well delete them entirely.

@markusicu
Copy link
Member Author

Pass a config somehow to the test? Lots of ways to do that: environment variable at runtime, -D variable at compile time, new argument to intltest, existing argument to intltest, ...

I might create a follow-up ticket for that.

I think it's worthwhile testing. Otherwise there's no code anywhere that verifies that these collations even work or run to completion; we may as well delete them entirely.

There is a lot that we don't test. In particular, we don't test each collation tailoring for whether it's there and does something interesting. Doing that would require more data-driven testing, because hardcoding such detailed behaviors would be too brittle.

webkit-commit-queue pushed a commit to shvaikalesh/WebKit that referenced this pull request Feb 7, 2024
… expectations

https://bugs.webkit.org/show_bug.cgi?id=268879
<rdar://problem/122437088>

Reviewed by Yusuke Suzuki and Mark Lam.

These collation types were deprecated for a long time and not available in Chrome / Firefox [1],
and finally removed in ICU 74 [2].

[1]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Locale/getCollations#supported_collation_types
[2]: unicode-org/icu#2365

* JSTests/stress/intl-enumeration.js:
* JSTests/stress/intl-locale-info.js:

Canonical link: https://commits.webkit.org/274236@main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants