Enable supported ECI charsets dynamically #1625

gredler · 2023-05-20T03:35:09Z

Back in 2021, commit 895775e removed support for 5 ECIs in order to fix a fuzz test failure. While it is true that some JVMs do not support some of these character sets, many of them do. By not supporting these ECIs, ZXing is unable to decode barcodes encoded in other systems which do support them. In my specific case, this is causing issues while fuzz testing Okapi Barcode, where the fuzz tests involve encoding barcodes via Okapi and decoding them via ZXing.

With this change, instead of completely disabling ECIs 8, 10, 12, 13 and 16, the system will dynamically decide which ECIs to support based on the character sets supported by the underlying runtime. This allows restricted JVMs to function correctly without undue burden on more common, less restricted JVMs.

Support is also added (via the system property zxing.remove.eci.charsets) to force specific ECIs / charsets to remain unsupported, in order to simulate more restricted runtimes. This flexibility is used during the core unit tests to ensure reproducible builds regardless of the JVM used (and HighLevelEncodeTestCase.testECIs(), updated after the removal of the 5 ECIs, expects that they are unsupported).

srowen · 2023-05-20T03:53:53Z

core/src/main/java/com/google/zxing/common/CharacterSetECI.java

  private static final Map<Integer,CharacterSetECI> VALUE_TO_ECI = new HashMap<>();
  private static final Map<String,CharacterSetECI> NAME_TO_ECI = new HashMap<>();
  static {
+    String removeProp = System.getProperty("zxing.remove.eci.charsets", "");


Instead of special-casing tests, I wonder if we should just test somehow for support of each encoding? It might be some overhead to check every single one when most are almost never used; maybe smarter to just catch and exception somewhere and abort. I'm a little hesitant to test differently from how the library would run.

With the proposed change, CharacterSetECI is both checking for support of each encoding (via Charset.isSupported()), and also allowing an override via the new property. Is this what you meant? Or were you referring to possibly checking for support of each encoding in the unit tests and adjusting test assertions according to the runtime being used to build? This second option seemed a bit more brittle to me, as opposed to just having all builds on all JVMs simulate a more restricted JVM (via the new property).

I could also remove the override property completely, and adjust HighLevelEncodeTestCase.testECIs() to assume that ECI 8 (ISO-8859-6) is supported on the JVM running the test -- this test was added after ECI 8 support was removed, so currently assumes the use of ECI 24 (Cp1256) in its place.

OK. I would just skip the system property. If it can support whatever the JVM supports without failing otherwise, that's the goal and that's fine.

OK, I think we're good to go. Please have a look at the latest diff and let me know if you have any other concerns. Thanks!

srowen · 2023-05-22T22:15:16Z

core/src/test/java/com/google/zxing/datamatrix/encoder/HighLevelEncodeTestCase.java

@@ -500,7 +500,7 @@ public void testECIs() {
        "\u062C\u064E\u0651\u0627\u0635 (\u02BE\u0101\u1E63) \"pear\", suggested to have originated from Hebrew " +
        "\u05D0\u05B7\u05D2\u05B8\u05BC\u05E1 (ag\u00E1s)"));
    assertEquals("239 209 151 206 214 92 122 140 35 158 144 162 52 205 55 171 137 23 67 206 218 175 147 113 15 254" +
-        " 116 33 241 25 231 186 14 212 64 253 151 252 159 33 41 241 27 231 83 171 53 209 35 25 134 6 42 33 35 239 184" +
+        " 116 33 241 9 231 186 14 206 64 248 144 252 159 33 41 241 27 231 83 171 53 209 35 25 134 6 42 33 35 239 184" +


Ah OK, this change is slightly concerning. Do you know why it happens? I can probably go figure it out. Presumably because some new charset was chosen

Correct 241 + 25 = ECI 24 (Cp1256), and 241 + 9 = ECI 8 (ISO-8859-6). This test was added after ECI 8 was commented out, so ECI 24 was the next-best choice at the time. By adding ECI 8 back in, this test now prefers it over ECI 24.

gredler added 2 commits May 19, 2023 22:41

Enable supported ECI charsets dynamically

b8ddbf2

Remove use of Java 9+ API

eaf438a

srowen reviewed May 20, 2023

View reviewed changes

Remove property-based override for dynamic ECI charsets

253d947

srowen reviewed May 22, 2023

View reviewed changes

srowen merged commit b892c40 into zxing:master May 22, 2023
5 checks passed

srowen assigned gredler May 22, 2023

srowen added the enhancement label May 22, 2023

srowen added this to the 3.5.2 milestone May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable supported ECI charsets dynamically #1625

Enable supported ECI charsets dynamically #1625

gredler commented May 20, 2023

srowen May 20, 2023

gredler May 20, 2023

gredler May 20, 2023

srowen May 20, 2023

gredler May 22, 2023

srowen May 22, 2023

gredler May 22, 2023

Enable supported ECI charsets dynamically #1625

Enable supported ECI charsets dynamically #1625

Conversation

gredler commented May 20, 2023

srowen May 20, 2023

Choose a reason for hiding this comment

gredler May 20, 2023

Choose a reason for hiding this comment

gredler May 20, 2023

Choose a reason for hiding this comment

srowen May 20, 2023

Choose a reason for hiding this comment

gredler May 22, 2023

Choose a reason for hiding this comment

srowen May 22, 2023

Choose a reason for hiding this comment

gredler May 22, 2023

Choose a reason for hiding this comment