Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable supported ECI charsets dynamically #1625

Merged
merged 3 commits into from
May 22, 2023

Conversation

gredler
Copy link
Contributor

@gredler gredler commented May 20, 2023

Back in 2021, commit 895775e removed support for 5 ECIs in order to fix a fuzz test failure. While it is true that some JVMs do not support some of these character sets, many of them do. By not supporting these ECIs, ZXing is unable to decode barcodes encoded in other systems which do support them. In my specific case, this is causing issues while fuzz testing Okapi Barcode, where the fuzz tests involve encoding barcodes via Okapi and decoding them via ZXing.

With this change, instead of completely disabling ECIs 8, 10, 12, 13 and 16, the system will dynamically decide which ECIs to support based on the character sets supported by the underlying runtime. This allows restricted JVMs to function correctly without undue burden on more common, less restricted JVMs.

Support is also added (via the system property zxing.remove.eci.charsets) to force specific ECIs / charsets to remain unsupported, in order to simulate more restricted runtimes. This flexibility is used during the core unit tests to ensure reproducible builds regardless of the JVM used (and HighLevelEncodeTestCase.testECIs(), updated after the removal of the 5 ECIs, expects that they are unsupported).

private static final Map<Integer,CharacterSetECI> VALUE_TO_ECI = new HashMap<>();
private static final Map<String,CharacterSetECI> NAME_TO_ECI = new HashMap<>();
static {
String removeProp = System.getProperty("zxing.remove.eci.charsets", "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of special-casing tests, I wonder if we should just test somehow for support of each encoding? It might be some overhead to check every single one when most are almost never used; maybe smarter to just catch and exception somewhere and abort. I'm a little hesitant to test differently from how the library would run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the proposed change, CharacterSetECI is both checking for support of each encoding (via Charset.isSupported()), and also allowing an override via the new property. Is this what you meant? Or were you referring to possibly checking for support of each encoding in the unit tests and adjusting test assertions according to the runtime being used to build? This second option seemed a bit more brittle to me, as opposed to just having all builds on all JVMs simulate a more restricted JVM (via the new property).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also remove the override property completely, and adjust HighLevelEncodeTestCase.testECIs() to assume that ECI 8 (ISO-8859-6) is supported on the JVM running the test -- this test was added after ECI 8 support was removed, so currently assumes the use of ECI 24 (Cp1256) in its place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I would just skip the system property. If it can support whatever the JVM supports without failing otherwise, that's the goal and that's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think we're good to go. Please have a look at the latest diff and let me know if you have any other concerns. Thanks!

@@ -500,7 +500,7 @@ public void testECIs() {
"\u062C\u064E\u0651\u0627\u0635 (\u02BE\u0101\u1E63) \"pear\", suggested to have originated from Hebrew " +
"\u05D0\u05B7\u05D2\u05B8\u05BC\u05E1 (ag\u00E1s)"));
assertEquals("239 209 151 206 214 92 122 140 35 158 144 162 52 205 55 171 137 23 67 206 218 175 147 113 15 254" +
" 116 33 241 25 231 186 14 212 64 253 151 252 159 33 41 241 27 231 83 171 53 209 35 25 134 6 42 33 35 239 184" +
" 116 33 241 9 231 186 14 206 64 248 144 252 159 33 41 241 27 231 83 171 53 209 35 25 134 6 42 33 35 239 184" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, this change is slightly concerning. Do you know why it happens? I can probably go figure it out. Presumably because some new charset was chosen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct 241 + 25 = ECI 24 (Cp1256), and 241 + 9 = ECI 8 (ISO-8859-6). This test was added after ECI 8 was commented out, so ECI 24 was the next-best choice at the time. By adding ECI 8 back in, this test now prefers it over ECI 24.

@srowen srowen merged commit b892c40 into zxing:master May 22, 2023
5 checks passed
@srowen srowen added this to the 3.5.2 milestone May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants