Skip to content

Fix/utf8 fh decoding#28

Merged
u8array merged 4 commits into
mainfrom
fix/utf8-fh-decoding
May 8, 2026
Merged

Fix/utf8 fh decoding#28
u8array merged 4 commits into
mainfrom
fix/utf8-fh-decoding

Conversation

@u8array
Copy link
Copy Markdown
Owner

@u8array u8array commented May 8, 2026

No description provided.

u8array added 2 commits May 8, 2026 16:26
Previously decodeFH() converted each hex pair to a character via
String.fromCharCode, mangling multi-byte UTF-8 sequences (e.g. _C3_A4
became 'ä' instead of 'ä'). Since the generator emits ^CI28, third-party
ZPL using ^FH for non-ASCII glyphs round-tripped through mojibake.

Collect contiguous escape pairs into a Uint8Array and decode the run via
TextDecoder('utf-8'); invalid byte sequences fall back to U+FFFD.
Replace inner matchAll + impossible-case fallback with a stride loop:
the outer regex already guarantees the run is a sequence of fixed-width
{delim}XX pairs, so byte offsets are computable directly. One regex
allocation per call, fixed Uint8Array, no defensive ?? fallback.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the decodeFH function to support multi-byte UTF-8 characters by grouping contiguous hex escape sequences and decoding them via TextDecoder, and adds corresponding test cases. However, feedback indicates that hardcoding the UTF-8 decoder introduces a regression for labels using single-byte encodings (such as CP1252), where valid characters may be incorrectly replaced with the Unicode replacement character. It is recommended to track the active encoding from the ^CI command to maintain compatibility across different ZPL configurations.

Comment thread src/lib/zplParser.ts Outdated
The ^FH decoder was hardcoded to UTF-8, which broke single-byte
encodings (^CI27 / ^CI0..13) where bytes like 0xE4 (= ä in CP1252) are
valid but invalid as standalone UTF-8 → U+FFFD.

Track ^CI state in the parser, map known values to TextDecoder labels:
  - ^CI28        → utf-8
  - ^CI27        → windows-1252
  - ^CI0..^CI13  → windows-1252 (ASCII-compatible legacy variants)
  - others       → keep current decoder, surface as partial import

Decoders are cached to avoid per-field allocation. Default remains UTF-8
to preserve round-trip fidelity for this app's own generator output.
@u8array
Copy link
Copy Markdown
Owner Author

u8array commented May 8, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the ZPL parser's ^FH hex escape decoding by supporting multi-byte UTF-8 sequences and character set selection via the ^CI command. It introduces a TextDecoder cache and updates the decoding logic to process contiguous hex escape runs as byte sequences. Feedback was provided regarding the ^CI command handler, which currently fails to reset the decoder to UTF-8 when an unsupported encoding is encountered, contradicting the intended fallback behavior described in the documentation.

Comment thread src/lib/zplParser.ts
Previously an unknown ^CI N kept whatever decoder was active before,
contradicting the comment that promised a UTF-8 fallback. If ^CI27
preceded the unknown command, CP1252 stayed silently active.

Always rebind to ciToEncoding's label (which is 'utf-8' for the
unsupported branch) so behaviour matches the documentation and is
predictable regardless of prior state.
@u8array u8array merged commit b7d1026 into main May 8, 2026
2 checks passed
@u8array u8array deleted the fix/utf8-fh-decoding branch May 8, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant