New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF Barcode generation giving strange results #166

Closed
ashore8093 opened this Issue Jun 5, 2014 · 26 comments

Comments

Projects
None yet
4 participants
@ashore8093

ashore8093 commented Jun 5, 2014

Platform: Android 4.0+
Zxing Version: 3.0.1
Barcode Format: PDF417

Since CP437 encoding isn't present on Android I've tried setting different encodings to render a PDF417 barcode but it is prepending extra characters that vary based on the encoding.

US-ASCII
Prepends with 'A'

ISO-8859-1
Prepends with 'AB'

UTF-8
Prepends with 'A\s'

UTF16 prepends stranger characters and actually loses a character from the original source.

Using the following to set the Character set
hints.put(EncodeHintType.CHARACTER_SET, "US-ASCII");

Is there anyway to prevent the rendered barcode from prefixing these characters?

@srowen

This comment has been minimized.

Show comment
Hide comment
@srowen

srowen Jun 5, 2014

Contributor

I believe you're seeing the ECI segment. This is how the character set is encoded in the barcode. It's required to interpret it correctly if you're not using the default encoding, so no I don't think it's optional.

Contributor

srowen commented Jun 5, 2014

I believe you're seeing the ECI segment. This is how the character set is encoded in the barcode. It's required to interpret it correctly if you're not using the default encoding, so no I don't think it's optional.

@srowen srowen closed this Jun 5, 2014

@ashore8093

This comment has been minimized.

Show comment
Hide comment
@ashore8093

ashore8093 Jun 5, 2014

Should this actually show up in the scanned data though? I would think the ECI would be just metadata that tells the barcode scanner how to decode the payload which would not include the ECI segment.

ashore8093 commented Jun 5, 2014

Should this actually show up in the scanned data though? I would think the ECI would be just metadata that tells the barcode scanner how to decode the payload which would not include the ECI segment.

@srowen

This comment has been minimized.

Show comment
Hide comment
@srowen

srowen Jun 5, 2014

Contributor

No but how are you scanning it? with the library? I thought it handled or at least skipped ECI, but off the top of my head I am not sure.

Contributor

srowen commented Jun 5, 2014

No but how are you scanning it? with the library? I thought it handled or at least skipped ECI, but off the top of my head I am not sure.

@ashore8093

This comment has been minimized.

Show comment
Hide comment
@ashore8093

ashore8093 Jun 5, 2014

Scanning with the Barcode Scanner app on Android and at the retailer (Walgreens in this case). We also use the library to scan within our app and all give the same results.

ashore8093 commented Jun 5, 2014

Scanning with the Barcode Scanner app on Android and at the retailer (Walgreens in this case). We also use the library to scan within our app and all give the same results.

@micjahn

This comment has been minimized.

Show comment
Hide comment
@micjahn

micjahn Jun 5, 2014

Contributor

No, the ECI segments are interpreted as text. zxing doesn't skip or handle it correctly in the decoder part.

Contributor

micjahn commented Jun 5, 2014

No, the ECI segments are interpreted as text. zxing doesn't skip or handle it correctly in the decoder part.

@ashore8093

This comment has been minimized.

Show comment
Hide comment
@ashore8093

ashore8093 Jun 5, 2014

Making sure I'm reading this right. You're saying it's not a bug in encoding the barcode but rather in scanning and decoding a PDF417 barcode that uses an encoding other than the default?

If that's the case then it should work correctly at the retailer which uses a commericial barcode scanner. This doesn't seem to be the case though.

ashore8093 commented Jun 5, 2014

Making sure I'm reading this right. You're saying it's not a bug in encoding the barcode but rather in scanning and decoding a PDF417 barcode that uses an encoding other than the default?

If that's the case then it should work correctly at the retailer which uses a commericial barcode scanner. This doesn't seem to be the case though.

@srowen

This comment has been minimized.

Show comment
Hide comment
@srowen

srowen Jun 5, 2014

Contributor

I need to have a look to verify, but that could well be the case. ECI is an optional part of PDF417 it seems, but seems essential if you use non-standard encoding. It may be that the device also does not parse ECI for what it is.

Contributor

srowen commented Jun 5, 2014

I need to have a look to verify, but that could well be the case. ECI is an optional part of PDF417 it seems, but seems essential if you use non-standard encoding. It may be that the device also does not parse ECI for what it is.

@srowen srowen reopened this Jun 5, 2014

@srowen srowen self-assigned this Jun 5, 2014

@srowen srowen added this to the 3.1.1 milestone Jun 5, 2014

@srowen

This comment has been minimized.

Show comment
Hide comment
@srowen

srowen Jun 5, 2014

Contributor

Yeah, this is going to need a bit of work. The decoder does not read the ECI. I can make it at least consume the ECI segment without using it, but that's only part of the work. Another commit coming later.

Contributor

srowen commented Jun 5, 2014

Yeah, this is going to need a bit of work. The decoder does not read the ECI. I can make it at least consume the ECI segment without using it, but that's only part of the work. Another commit coming later.

@ashore8093

This comment has been minimized.

Show comment
Hide comment
@ashore8093

ashore8093 Jun 5, 2014

Thanks Sean! Please let me know if you need any additional information from me.

ashore8093 commented Jun 5, 2014

Thanks Sean! Please let me know if you need any additional information from me.

@srowen

This comment has been minimized.

Show comment
Hide comment
@srowen

srowen Jun 6, 2014

Contributor

@micjahn By the way do you have a reference that Cp437 is the default encoding for PDF417? I looked in the spec again and don't see this. I see a mention or two online but not sure if it's reliable? One online decoder I used doesn't seem to apply Cp437 at all but it could be wrong.

That's not directly relevant to the issue since no matter what the default, it should both be consistent and settable. But if the default weren't actually Cp437 that would make it easy to change to a consistent state (basically, ISO-8859-1 interpretation) and then worry about the setting later.

This change is a bit hard to implement since the encoding has to be applied any time byte mode is latched to.

Contributor

srowen commented Jun 6, 2014

@micjahn By the way do you have a reference that Cp437 is the default encoding for PDF417? I looked in the spec again and don't see this. I see a mention or two online but not sure if it's reliable? One online decoder I used doesn't seem to apply Cp437 at all but it could be wrong.

That's not directly relevant to the issue since no matter what the default, it should both be consistent and settable. But if the default weren't actually Cp437 that would make it easy to change to a consistent state (basically, ISO-8859-1 interpretation) and then worry about the setting later.

This change is a bit hard to implement since the encoding has to be applied any time byte mode is latched to.

@micjahn

This comment has been minimized.

Show comment
Hide comment
@micjahn

This comment has been minimized.

Show comment
Hide comment
@micjahn

micjahn Jun 6, 2014

Contributor

I found a hint in the ISO specs 15438:2006(E) annex B (default character set table).
It says that ISO 8859-1 is the default encoding, not cp437. Perhaps that is different to the older (and cancelled) version 2001.

Contributor

micjahn commented Jun 6, 2014

I found a hint in the ISO specs 15438:2006(E) annex B (default character set table).
It says that ISO 8859-1 is the default encoding, not cp437. Perhaps that is different to the older (and cancelled) version 2001.

@srowen

This comment has been minimized.

Show comment
Hide comment
@srowen

srowen Jun 6, 2014

Contributor

Yeah you're right. I am not sure Cp437 is actually the default. Well, if I addressed that much, then the encoder/decoder would be back in sync and probably match everyone's expectation. I might start there. Will wait for comments from anyone else that knows more about the default encoding.

Contributor

srowen commented Jun 6, 2014

Yeah you're right. I am not sure Cp437 is actually the default. Well, if I addressed that much, then the encoder/decoder would be back in sync and probably match everyone's expectation. I might start there. Will wait for comments from anyone else that knows more about the default encoding.

@micjahn

This comment has been minimized.

Show comment
Hide comment
@micjahn

micjahn Jun 6, 2014

Contributor

Would be cool to switch to the more common ISO8859-1 codepage. That would also be a good cause to release a new version of zxing.net soon. Cp437 caused some troubles for the users of the library.

Contributor

micjahn commented Jun 6, 2014

Would be cool to switch to the more common ISO8859-1 codepage. That would also be a good cause to release a new version of zxing.net soon. Cp437 caused some troubles for the users of the library.

@srowen

This comment has been minimized.

Show comment
Hide comment
@srowen

srowen Jun 8, 2014

Contributor

OK I think these commits address the issue, and also fix the related issue that Cp437 seems to not have been the right default anyway.

Contributor

srowen commented Jun 8, 2014

OK I think these commits address the issue, and also fix the related issue that Cp437 seems to not have been the right default anyway.

@srowen srowen closed this Jun 8, 2014

@ashore8093

This comment has been minimized.

Show comment
Hide comment
@ashore8093

ashore8093 Jun 9, 2014

That did seem to fix the issue with some barcodes although some are still giving me trouble. It seems that if there are non-numeric characters in the data (usually at the beginning) and the data is sufficiently long (greater than 15 characters) weird things start happening.

This for example will decode just fine but will not encode properly
LYL01026278571111222233333

This however decodes and encodes just fine
LYL010262785711

This won't decode properly
L0102627857111

The problem seems more prevelant when the characters are at the beginning of the data but this will also fail to encode correctly.
0LYL1026278571111222233333

ashore8093 commented Jun 9, 2014

That did seem to fix the issue with some barcodes although some are still giving me trouble. It seems that if there are non-numeric characters in the data (usually at the beginning) and the data is sufficiently long (greater than 15 characters) weird things start happening.

This for example will decode just fine but will not encode properly
LYL01026278571111222233333

This however decodes and encodes just fine
LYL010262785711

This won't decode properly
L0102627857111

The problem seems more prevelant when the characters are at the beginning of the data but this will also fail to encode correctly.
0LYL1026278571111222233333

@srowen

This comment has been minimized.

Show comment
Hide comment
@srowen

srowen Jun 9, 2014

Contributor

What do you mean that it does not encode, an error? And are you setting a
charset?
On Jun 9, 2014 3:29 PM, "ashore8093" notifications@github.com wrote:

That did seem to fix the issue with some barcodes although some are still
giving me trouble. It seems that if there are non-numeric characters in the
data (usually at the beginning) and the data is sufficiently long (greater
than 15 characters) weird things start happening.

This for example will decode just fine but will not encode properly
LYL01026278571111222233333

This however decodes and encodes just fine
LYL010262785711

This won't decode properly
L0102627857111

The problem seems more prevelant when the characters are at the beginning
of the data but this will also fail to encode correctly.
0LYL1026278571111222233333


Reply to this email directly or view it on GitHub
#166 (comment).

Contributor

srowen commented Jun 9, 2014

What do you mean that it does not encode, an error? And are you setting a
charset?
On Jun 9, 2014 3:29 PM, "ashore8093" notifications@github.com wrote:

That did seem to fix the issue with some barcodes although some are still
giving me trouble. It seems that if there are non-numeric characters in the
data (usually at the beginning) and the data is sufficiently long (greater
than 15 characters) weird things start happening.

This for example will decode just fine but will not encode properly
LYL01026278571111222233333

This however decodes and encodes just fine
LYL010262785711

This won't decode properly
L0102627857111

The problem seems more prevelant when the characters are at the beginning
of the data but this will also fail to encode correctly.
0LYL1026278571111222233333


Reply to this email directly or view it on GitHub
#166 (comment).

@ashore8093

This comment has been minimized.

Show comment
Hide comment
@ashore8093

ashore8093 Jun 9, 2014

The rendered barcode does not reflect the data encoded. So encoding 'LYL01026278571111222233333' will generate a barcode that decodes into 'CQYL01026278571111222233333'

ashore8093 commented Jun 9, 2014

The rendered barcode does not reflect the data encoded. So encoding 'LYL01026278571111222233333' will generate a barcode that decodes into 'CQYL01026278571111222233333'

@micjahn

This comment has been minimized.

Show comment
Hide comment
@micjahn

micjahn Jun 9, 2014

Contributor

The PDF417 decoder doesn't correctly interpret the MODE_SHIFT_TO_BYTE_COMPACTION_MODE.
With the following patch the complete roundtrip encoder-decoder should work for all samples above:

 .../zxing/pdf417/decoder/DecodedBitStreamParser.java      | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
diff --git a/core/src/main/java/com/google/zxing/pdf417/decoder/DecodedBitStreamParser.java b/core/src/main/java/com/google/zxing/pdf417/decoder/DecodedBitStreamParser.java
index ab97a55..750b412 100644
--- a/core/src/main/java/com/google/zxing/pdf417/decoder/DecodedBitStreamParser.java
+++ b/core/src/main/java/com/google/zxing/pdf417/decoder/DecodedBitStreamParser.java
@@ -501,6 +501,21 @@ final class DecodedBitStreamParser {
           count = 0;
         }
       }
+    } else if (mode == MODE_SHIFT_TO_BYTE_COMPACTION_MODE) {
+      if (codeIndex < codewords[0]) {
+        int code = codewords[codeIndex++];
+        if (code == TEXT_COMPACTION_MODE_LATCH ||
+            code == BYTE_COMPACTION_MODE_LATCH ||
+            code == NUMERIC_COMPACTION_MODE_LATCH ||
+            code == BYTE_COMPACTION_MODE_LATCH_6 ||
+            code == BEGIN_MACRO_PDF417_CONTROL_BLOCK ||
+            code == BEGIN_MACRO_PDF417_OPTIONAL_FIELD ||
+            code == MACRO_PDF417_TERMINATOR) {
+          codeIndex--;
+        } else {
+          result.append(code);
+        }
+      }
     }
     return codeIndex;
   }
Contributor

micjahn commented Jun 9, 2014

The PDF417 decoder doesn't correctly interpret the MODE_SHIFT_TO_BYTE_COMPACTION_MODE.
With the following patch the complete roundtrip encoder-decoder should work for all samples above:

 .../zxing/pdf417/decoder/DecodedBitStreamParser.java      | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
diff --git a/core/src/main/java/com/google/zxing/pdf417/decoder/DecodedBitStreamParser.java b/core/src/main/java/com/google/zxing/pdf417/decoder/DecodedBitStreamParser.java
index ab97a55..750b412 100644
--- a/core/src/main/java/com/google/zxing/pdf417/decoder/DecodedBitStreamParser.java
+++ b/core/src/main/java/com/google/zxing/pdf417/decoder/DecodedBitStreamParser.java
@@ -501,6 +501,21 @@ final class DecodedBitStreamParser {
           count = 0;
         }
       }
+    } else if (mode == MODE_SHIFT_TO_BYTE_COMPACTION_MODE) {
+      if (codeIndex < codewords[0]) {
+        int code = codewords[codeIndex++];
+        if (code == TEXT_COMPACTION_MODE_LATCH ||
+            code == BYTE_COMPACTION_MODE_LATCH ||
+            code == NUMERIC_COMPACTION_MODE_LATCH ||
+            code == BYTE_COMPACTION_MODE_LATCH_6 ||
+            code == BEGIN_MACRO_PDF417_CONTROL_BLOCK ||
+            code == BEGIN_MACRO_PDF417_OPTIONAL_FIELD ||
+            code == MACRO_PDF417_TERMINATOR) {
+          codeIndex--;
+        } else {
+          result.append(code);
+        }
+      }
     }
     return codeIndex;
   }
@srowen

This comment has been minimized.

Show comment
Hide comment
@srowen

srowen Jun 10, 2014

Contributor

Is this a patch vs head? that doesn't look like the current last lines of this method. I don't know if my last change is relevant to this change though. You mean you have to handle a shift to byte compaction while in byte compaction?

Contributor

srowen commented Jun 10, 2014

Is this a patch vs head? that doesn't look like the current last lines of this method. I don't know if my last change is relevant to this change though. You mean you have to handle a shift to byte compaction while in byte compaction?

@srowen

This comment has been minimized.

Show comment
Hide comment
@srowen

srowen Jun 10, 2014

Contributor

OK or put another way, it looks like if MODE_SHIFT_TO_BYTE_COMPACTION_MODE is encountered in the main loop, it should not go into the byteCompaction method at all since it actually does nothing with it and is not triggering a real latch to byte mode. Instead it should just decode the next codeword as a single character.

So at about DecodedBitStreamParser:114:

        case MODE_SHIFT_TO_BYTE_COMPACTION_MODE:
          result.append((char) codewords[codeIndex++]);
          break;

It seems to give the desired behavior but does it make sense?

Contributor

srowen commented Jun 10, 2014

OK or put another way, it looks like if MODE_SHIFT_TO_BYTE_COMPACTION_MODE is encountered in the main loop, it should not go into the byteCompaction method at all since it actually does nothing with it and is not triggering a real latch to byte mode. Instead it should just decode the next codeword as a single character.

So at about DecodedBitStreamParser:114:

        case MODE_SHIFT_TO_BYTE_COMPACTION_MODE:
          result.append((char) codewords[codeIndex++]);
          break;

It seems to give the desired behavior but does it make sense?

@micjahn

This comment has been minimized.

Show comment
Hide comment
@micjahn

micjahn Jun 10, 2014

Contributor

In my opinion it makes sense and I would go with your version.
The codeword MODE_SHIFT_TO_BYTE_COMPACTION_MODE (913) is defined in the specs as a temporary switch from text compaction to byte compaction for the next codeword. After that it reverts directly to the previously text compaction sub-mode.

Contributor

micjahn commented Jun 10, 2014

In my opinion it makes sense and I would go with your version.
The codeword MODE_SHIFT_TO_BYTE_COMPACTION_MODE (913) is defined in the specs as a temporary switch from text compaction to byte compaction for the next codeword. After that it reverts directly to the previously text compaction sub-mode.

srowen added a commit that referenced this issue Jun 10, 2014

@srowen

This comment has been minimized.

Show comment
Hide comment
@srowen

srowen Jun 10, 2014

Contributor

I will apply this change.

There was no previous compaction mode in this case; it was the first code word. I think the default is text compaction and that's OK. The encoder here latches to byte compaction three times to encode the first three chars of LYL01026278571111222233333 and this sounds weird.

Stepping through the encoder, the logic seems to be:

  • No consecutive digits to start, so numeric compaction is out
  • 3 consecutive text characters to start, followed by at least 13 digits, all of which are text-encodable, but, text compaction is skipped because it sees a lot of numeric characters coming up within 5 chars, presumably because it wants to make sure to latch to numeric soon to take advantage of numeric compaction then
  • Byte compaction sees at least 5 text-encodable characters to start, and so concludes that it should not begin byte compaction here
  • It actually rejects all compactions and so defaults to shift to byte mode for 1 character

I think byte compaction is assuming text compaction should take over in cases where text compaction won't.

I propose chopping out this block of text at about PDF417HighLevelEncoder:557

      int textCount = 0;
      while (textCount < 5 && isText(ch)) {
        textCount++;
        int i = idx + textCount;
        if (i >= len) {
          break;
        }
        ch = msg.charAt(i);
      }
      if (textCount >= 5) {
        return idx - startpos;
      }
Contributor

srowen commented Jun 10, 2014

I will apply this change.

There was no previous compaction mode in this case; it was the first code word. I think the default is text compaction and that's OK. The encoder here latches to byte compaction three times to encode the first three chars of LYL01026278571111222233333 and this sounds weird.

Stepping through the encoder, the logic seems to be:

  • No consecutive digits to start, so numeric compaction is out
  • 3 consecutive text characters to start, followed by at least 13 digits, all of which are text-encodable, but, text compaction is skipped because it sees a lot of numeric characters coming up within 5 chars, presumably because it wants to make sure to latch to numeric soon to take advantage of numeric compaction then
  • Byte compaction sees at least 5 text-encodable characters to start, and so concludes that it should not begin byte compaction here
  • It actually rejects all compactions and so defaults to shift to byte mode for 1 character

I think byte compaction is assuming text compaction should take over in cases where text compaction won't.

I propose chopping out this block of text at about PDF417HighLevelEncoder:557

      int textCount = 0;
      while (textCount < 5 && isText(ch)) {
        textCount++;
        int i = idx + textCount;
        if (i >= len) {
          break;
        }
        ch = msg.charAt(i);
      }
      if (textCount >= 5) {
        return idx - startpos;
      }
@ashore8093

This comment has been minimized.

Show comment
Hide comment
@ashore8093

ashore8093 Jun 10, 2014

I think this is on the right track but with Sean's most recent commit I'm still experiencing the same issue. I was playing around with the source and I was able to get it to work by changing the threshold for consecutive text from 5 to just 1 text character to avoid the byte compaction altogether. I don't understand the code well enough to know if this is an appropriate fix.

if (t > 0 || n == len) {
    if (encodingMode != TEXT_COMPACTION) {
        sb.append((char) LATCH_TO_TEXT);
        encodingMode = TEXT_COMPACTION;
        textSubMode = SUBMODE_ALPHA; // start with submode alpha after latch
    }
    textSubMode = encodeText(msg, p, t, sb, textSubMode);
    p += t;
}

ashore8093 commented Jun 10, 2014

I think this is on the right track but with Sean's most recent commit I'm still experiencing the same issue. I was playing around with the source and I was able to get it to work by changing the threshold for consecutive text from 5 to just 1 text character to avoid the byte compaction altogether. I don't understand the code well enough to know if this is an appropriate fix.

if (t > 0 || n == len) {
    if (encodingMode != TEXT_COMPACTION) {
        sb.append((char) LATCH_TO_TEXT);
        encodingMode = TEXT_COMPACTION;
        textSubMode = SUBMODE_ALPHA; // start with submode alpha after latch
    }
    textSubMode = encodeText(msg, p, t, sb, textSubMode);
    p += t;
}
@ashore8093

This comment has been minimized.

Show comment
Hide comment
@ashore8093

ashore8093 Jun 10, 2014

Just saw Sean's comment "I propose chopping out this block of text at about PDF417HighLevelEncoder:557". I tried this fix and it seems to work as well.

ashore8093 commented Jun 10, 2014

Just saw Sean's comment "I propose chopping out this block of text at about PDF417HighLevelEncoder:557". I tried this fix and it seems to work as well.

srowen added a commit that referenced this issue Jun 10, 2014

Issue #166 let byte compaction proceed in cases where it 'thought' te…
…xt compaction would take over, but it had not
@androidneha

This comment has been minimized.

Show comment
Hide comment
@androidneha

androidneha Sep 12, 2017

You can Try this

try
{
// Encode a String into bytes
String inputString = "00oDL9CAU7408tue0920171130";
byte[] input = inputString.getBytes("UTF-8");

        // Compress the bytes
        byte[] output = new byte[100];
        Deflater compresser = new Deflater();
        compresser.setInput(input);
        compresser.finish();
        int compressedDataLength = compresser.deflate(output);
        compresser.end();

        // Decompress the bytes
        Inflater decompresser = new Inflater();
        decompresser.setInput(output, 0, compressedDataLength);
        byte[] result = new byte[100];
        int resultLength = decompresser.inflate(result);
        decompresser.end();

        // Decode the bytes into a String
        String outputString = new String(result, 0, resultLength, "UTF-8");

        Log.e("outputString",outputString +" / "+output );
    } catch(java.io.UnsupportedEncodingException ex) {
        Log.e("outputString",ex.toString());
        // handle
    } catch (java.util.zip.DataFormatException ex) {
        Log.e("outputString",ex.toString());
        // handle
    }

androidneha commented Sep 12, 2017

You can Try this

try
{
// Encode a String into bytes
String inputString = "00oDL9CAU7408tue0920171130";
byte[] input = inputString.getBytes("UTF-8");

        // Compress the bytes
        byte[] output = new byte[100];
        Deflater compresser = new Deflater();
        compresser.setInput(input);
        compresser.finish();
        int compressedDataLength = compresser.deflate(output);
        compresser.end();

        // Decompress the bytes
        Inflater decompresser = new Inflater();
        decompresser.setInput(output, 0, compressedDataLength);
        byte[] result = new byte[100];
        int resultLength = decompresser.inflate(result);
        decompresser.end();

        // Decode the bytes into a String
        String outputString = new String(result, 0, resultLength, "UTF-8");

        Log.e("outputString",outputString +" / "+output );
    } catch(java.io.UnsupportedEncodingException ex) {
        Log.e("outputString",ex.toString());
        // handle
    } catch (java.util.zip.DataFormatException ex) {
        Log.e("outputString",ex.toString());
        // handle
    }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment