Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legacy from 1.0: Steps for doing encoding and decoding  #18

xml-project opened this issue Dec 31, 2016 · 1 comment


None yet
3 participants
Copy link

commented Dec 31, 2016

Steps for doing encoding and decoding 

This is an aggregation of the discussion on "Steps for doing encoding and decoding" (Issue 134 of the xproc/1.0-specification)

Opened by: ndw on 2015-02-09, 12:44h

ndw said on 2015-02-09, 12:44h:

Allowing non-XML to flow through the pipeline will be handy, but encoding/decoding may still be necessary at the boundaries. We could have p:encode/p:decode steps that convert between base64. They could also do hexBinary, maybe uuencode, etc. 

jallabine said on 2015-03-24, 14:03h:

That would be great! I need that feature for a natural language processing pipeline operating on and creating some text files that contain German umlauts. The individual transformations, mostly XSLT transformations and one carried out in an external application, work fine for themselves. But if I combine them in an XProc pipeline, the umlauts are distorted although the encoding seems to be UTF-8 (according to Notepad++). So if there is any possibility of fixing that issue, I would be delighted. Thanks a lot! 

ndw said on 2015-06-11, 12:49h:

That sounds like a bug@jallabinecan you send me an example that demonstrates it? 

On 2015-06-11, 13:01h: xquery added the steps label.

ndw said on 2015-10-07, 14:37h:

Nudge@jallabinecan you provide more detail? 

On 2015-10-07, 14:37h: ndw added this to the XProc 2.0 LC milestone.

jallabine said on 2015-12-22, 13:32h:

Dear Norman, 

I’m so sorry I could not reply to your e-mail earlier – lots of work and family issues. 

My pipeline contains one p:exec step integrating an external application named TreeTagger (which is a part-of-speech tagger for several languages). This application receives text files, tokenizes them and adds POS information related to the respective words. Since my mother tongue is German, I used German texts with some umlauts for processing. 

Initially, everything is okay with the result text but in the following steps, the umlauts appear distorted in a way that indicates encoding problems. In the meantime, I noticed two things: 

First, it apparently is a matter of display because Notepad++ (on a Windows machine) indicates “UTF-8” in the status bar and it actually IS possible to further process the text files. But the number of characters is not correctly counted then, which I absolutely need for an XSLT step later on in which strings are sorted according to their lengths. (There is no alternative for using text files, just to mention that.) 

Second, the problem seems to vanish if I assign an encoding attribute with the value of “utf-8” to each p:store step following the TreeTagger p:exec step. As far as I understood, XProc is UTF-8 based in its own right – anyway, the mentioned workaround is functional which is all I needed in the first place. It would be interesting, though, to know why the issue arose. 

I wish you, your family and friends a merry Christmas and a happy New Year, and thanks for the good work you have been doing so far in developing the XProc standard! 

Best regards, 


word b sign Sabine MahrGraduate Translator, Technical WriterSchoenbacher Hauptstr. 57D-35745 Herborn 


Phone +49-(0)2777/911184Fax +49-(0)2777/911185mailto:sabine.mahr@wordbsign.comsabine.mahr@wordbsign.com 

Von: Norman Walsh []Gesendet: Mittwoch, 7. Oktober 2015 16:38An: xproc/specificationspecification@noreply.github.comCc: jallabinesabine.mahr@wordbsign.comBetreff: Re: [specification] Steps for doing encoding and decoding (#134

Nudge you provide more detail? 

—Reply to this email directly or#134 (comment)view it on GitHub. 

@ndw ndw transferred this issue from xproc/3.0-specification Nov 1, 2018


This comment has been minimized.

Copy link

commented Jun 10, 2019

Absent concrete use cases and/or requirements, we're going to close this without action.

@ndw ndw closed this Jun 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.