Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p:www-form-urldecode: relax restriction on names of www-form-urlencoded parameters #22

Open
Conal-Tuohy opened this issue Dec 18, 2016 · 10 comments

Comments

Projects
None yet
3 participants
@Conal-Tuohy
Copy link

commented Dec 18, 2016

The p:www-form-urldecode step provides a way to decode submissions produced by HTML forms, but because only names that match the xs:NCName pattern are allowed, the step can't be used to decode certain form submissions which are themselves perfectly valid as HTTP POST payloads or as the query component of a URI.

For example, the step could not parse submissions from the following form, because the input's name contains a space:

<form action="/my-form-handling-xproc-pipeline" method="get">
   <div>
      <input type="text" name="example input"/>
      <button type="submit">submit example</button>
   </div>
</form>

Submitting this form would produce the request URI /my-form-handling-xproc-pipeline?example+input= which p:www-form-urldecode would be unable to parse.

This leaves XProc developers without a convenient method to handle arbitrary forms or URIs with arbitrary query components.

The inverse step, p:www-form-urlencode, is also unnecessarily restricted, but this in practical terms this restriction is less of a obstacle since it's easier to perform the encoding than the decoding.

The specification for the p:www-form-urldecode step says:

It is a dynamic error (err:XC0061) if the name of any encoded parameter name is not a valid xs:NCName. In other words, this step can only decode simple name/value pairs where the names do not contain colons or any characters that cannot be used in XML names.

However, the parameter names submitted from an HTML form are the values of name attributes of controls on that form, which are defined as CDATA; a much broader space than xs:NCName.

Although a c:param-set element which contains a c:param whose name is not a valid xs:NCName will not be usable as the input to a parameters port, this requirement is not a good reason to impose the xs:NCName restriction on the output of the p:www-form-urldecode step. If a pipeline author wants to use parameters produced by p:www-form-urldecode as the input to a parameter port, they can always filter the parameters first. Invalidly-named parameters should only cause an exception when they are used as input to a parameter-type port.

@Conal-Tuohy

This comment has been minimized.

Copy link
Author

commented Dec 18, 2016

It seems to me that a web-friendly language like XProc should provide this as a built-in primitive. It's always possible to write your own step to decode www-form-urlencoded data, but it's quite complex since you have to both URL-decode (easy) and UTF-8-decode (not so easy) the www-form-urlencoded parameters.

@ndw

This comment has been minimized.

Copy link
Collaborator

commented Dec 23, 2016

I agree that we should eliminate this restriction.

In XProc 1.0, we turn parameters into an XML document. In 1.1, we don't have to do that. The logical thing to do in 1.1 is use maps and maps don't care about spaces in keys.

@Conal-Tuohy

This comment has been minimized.

Copy link
Author

commented Dec 23, 2016

Would this solution (of using a map) also mean that steps would have to be allowed to produce maps rather than only documents? Alternatively, I suppose, a new www-form-urldecode step could encode the map as a document à la json-to-xml().

Another alternative would be to drop the step and replace it with a new xpath extension function that returned a map.

@ndw

This comment has been minimized.

Copy link
Collaborator

commented Dec 23, 2016

I'm gravitating towards the idea that a "document" in V.next can be any XDM value, including maps. Or any arbitrary non-XML value. So a document becomes something like a wrapper around a MIME type and a bag of bits.

Making XML fairly transparent seems obvious and necessary. I'd like to make it possible for other kinds of data (JSON, specifically, but equally RDF or CSV) to be equally transparent between steps designed to process them.

@Conal-Tuohy

This comment has been minimized.

Copy link
Author

commented Dec 23, 2016

While I fully approve of steps being able to pass any kind of XDM item, I'm actually starting to think that in this particular case an extension function would be the better solution anyway (rather than a step).

We already have a huge number of handy xpath functions to parse strings, and the lack of a standard function to enable this, and hence the necessity for the p:www-form-urldecode step, is the real oddity here I think.

An extension function www-form-urldecode() would allow HTML form parameters to be handled conveniently within a more complex xpath expression. Having to "wire up" a step is a lot clunkier.

<p:when test="p:www-form-urldecode(substring-after($uri, '?'))('My Parameter')='foo'">
   <ex:foo/>
</p:when>
@ndw

This comment has been minimized.

Copy link
Collaborator

commented Dec 23, 2016

That is a very good idea, especially given that variables can now hold maps.

I'll give this a try in the 9.7 branch.

@Conal-Tuohy

This comment has been minimized.

Copy link
Author

commented Sep 30, 2017

It seems to me that often www-form-urlencoded parameters would be used in a URI, and would therefore need to preceded with at least a base URI or at least with ?. The proposed p:www-form-urlencode step doesn't make that convenient. You would need to insert the base URI in a separate step. I suggest dropping the step p:www-form-urlencode in favour of an extension function, e.g.

<p:string-replace 
   match="//html:a[1]/@href" 
   replace="concat($my-base-uri, '?', p:www-form-urlencode($parameters)"
/>

In fact I wonder if it's even worth having that extension function? We could get by with XPath 3.1 maps and the encode-for-uri function. e.g.

<p:string-replace 
   match="//html:a[1]/@href" 
   replace="
      concat(
         $my-base-uri, '?', 
         string-join(
            for $key in map:keys($parameters) return concat(
               encode-for-uri($key), 
               '=', 
               encode-for-uri($parameters($key))
            ),
            '&amp;'
         )
   )"
/>

Is the syntactic sugar worth it? Maybe it is.

@ndw

This comment has been minimized.

Copy link
Collaborator

commented Jun 12, 2018

Yes, we should remove the restriction. Exactly how: TBD.

@ndw ndw transferred this issue from xproc/3.0-specification Nov 1, 2018

@ndw

This comment has been minimized.

Copy link
Collaborator

commented Jun 10, 2019

Further consensus on 10 June 2019; we'll leave them steps for consistency with the rest of the design in 3.0; encoding will take a map and return a string; decoding will take a string and return a map.

@Conal-Tuohy

This comment has been minimized.

Copy link
Author

commented Jun 19, 2019

I presume www-form-urldecode will produce a JSON document consisting of a map of strings to arrays of strings? i.e. map(xs:string, array(xs:string))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.