Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite p:escape-markup and p:unescape-markup #313

Closed
wants to merge 1 commit into from
Closed

Conversation

@ndw
Copy link
Collaborator

ndw commented Jan 1, 2020

This PR attempts to fix #14 but it does so in a radical way: I've entirely changed the semantics of both steps!

We need these to support, for example, JSON documents that have escaped HTML in string values. However, the complexity that @xatapult notes in issue 14 is a direct consequence of the XProc 1.0 requirement that the input and output had to remain XML even when escaping and unescaping markup. That's silly in XProc 3.0, so I've removed it. Escaping takes XML or HTML and produces text. Unescaping takes text and produces XML or HTML. In order to make that work in the general case, I had to add a wrapper option to p:unescape-markup, but I think that's consistent with what we've done in other places.

(If were inventing these steps now, we might call them p:parse and p:serialize or something, but I'm inclined to leave their names alone.)

I'd like at least two other editors to approve this before we merge it. And, obviously, if anyone objects I won't merge it until we've resolved the objections.

@ndw ndw requested a review from xproc/spec-authors Jan 1, 2020
Copy link
Contributor

xml-project left a comment

Have to think about that. Not sure what the difference p:cast-content-type (xml->text) is.
Will read it more carefully tomorrow.

@ndw

This comment has been minimized.

Copy link
Collaborator Author

ndw commented Jan 1, 2020

I don't think there is any difference going from XML to text.

Going from text to XML, the difference is the ability to handle results that would not be well formed XML (because they have multiple, top-level elements). I think that's important, though maybe it's only really important for text to HTML where it wouldn't necessarily be an error anyway.

I suppose I could be persuaded that cast-content-type adequately covers the cases where these steps are required and we should remove them both. But if we decide to keep one, I think we should keep both.

@ndw

This comment has been minimized.

Copy link
Collaborator Author

ndw commented Jan 2, 2020

Per the 2 January 2020 editor's call, remove these steps: you can get the equivalent behavior with p:cast-content-type.

@ndw

This comment has been minimized.

Copy link
Collaborator Author

ndw commented Jan 3, 2020

Overtaken by #329

@ndw ndw closed this Jan 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.