Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Rewrite p:escape-markup and p:unescape-markup #313
This PR attempts to fix #14 but it does so in a radical way: I've entirely changed the semantics of both steps!
We need these to support, for example, JSON documents that have escaped HTML in string values. However, the complexity that @xatapult notes in issue 14 is a direct consequence of the XProc 1.0 requirement that the input and output had to remain XML even when escaping and unescaping markup. That's silly in XProc 3.0, so I've removed it. Escaping takes XML or HTML and produces text. Unescaping takes text and produces XML or HTML. In order to make that work in the general case, I had to add a
(If were inventing these steps now, we might call them p:parse and p:serialize or something, but I'm inclined to leave their names alone.)
I'd like at least two other editors to approve this before we merge it. And, obviously, if anyone objects I won't merge it until we've resolved the objections.
I don't think there is any difference going from XML to text.
Going from text to XML, the difference is the ability to handle results that would not be well formed XML (because they have multiple, top-level elements). I think that's important, though maybe it's only really important for text to HTML where it wouldn't necessarily be an error anyway.
I suppose I could be persuaded that cast-content-type adequately covers the cases where these steps are required and we should remove them both. But if we decide to keep one, I think we should keep both.