Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a variant of url.searchParams that operates according to URL rules instead of <form> rules #491

Open
domenic opened this issue May 4, 2020 · 7 comments
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: api

Comments

@domenic
Copy link
Member

domenic commented May 4, 2020

Problem

See background in #18 and #478.

URLSearchParams was designed, not to hold URL query data, but instead to hold application/x-www-form-urlencoded data, i.e. the data that is sent to a server when submitting a HTML <form>.

Unfortunately, it was misnamed URLSearchParams instead of ApplicationXWWWFormURLEncodedParams. And, even more unfortunately, a property named searchParams was added to the URL class, which is an instance of the URLSearchParams class. Any attempts to use the searchParams class will give misleading information about the URL. And any attempts to manipulate it will change the contents of your URL's query string in unintended ways, converting values from a query string serialization (of the type produced by the URL parser) into an application/x-www-form-urlencoded serialization.

Some examples of how url.searchParams does not allow faithful introspection into the URL record:

const urlA = new URL('http://localhost:9999/segment?foo=bar/baz? boo');
const urlB = new URL('http://localhost:9999/segment?foo=bar%2Fbaz%3F%20boo');

// Not equal:
console.log(urlA.href); // "http://localhost:9999/segment?foo=bar/baz?%20boo"
console.log(urlB.href); // "http://localhost:9999/segment?foo=bar%2Fbaz%3F%20boo"

console.log(urlA.search); // "?foo=bar/baz?%20boo"
console.log(urlB.search); // "?foo=bar%2Fbaz%3F%20boo"

// Equal:
console.log(urlA.searchParams.get("foo")); // "bar/baz? boo"
console.log(urlB.searchParams.get("foo")); // "bar/baz? boo"

// Equal, but both different from search:
console.log(urlA.searchParams.toString()); // "foo=bar%2Fbaz%3F+boo"
console.log(urlB.searchParams.toString()); // "foo=bar%2Fbaz%3F+boo"

Some examples of how using url.searchParams for mutation will cause unintended changes to your URL record:

const url = new URL('http://httpbin.org/anything?a=~');

console.log(url.href);   // "http://httpbin.org/anything?a=~"
console.log(url.search); // "?a=~"

// This should be a no-op, but it is not:
url.searchParams.set("a", url.searchParams.get("a"));

console.log(url.href); // "http://httpbin.org/anything?a=%7E"
const url = new URL('http://httpbin.org/anything?a=~');

console.log(url.href);   // "http://httpbin.org/anything?a=~"
console.log(url.search); // "?a=~"

// This should not change the value of a, but it does:
url.searchParams.set("b", "d");

console.log(url.href); // "http://httpbin.org/anything?a=%7E&b=d"
const url = new URL('http://httpbin.org/anything?a=b c');

console.log(url.href);   // "http://httpbin.org/anything?a=b%20c"
console.log(url.search); // "?a=b%20c"

// This should be a no-op (sorting a single-element set), but it is not:
url.searchParams.sort();

console.log(url.href); // "http://httpbin.org/anything?a=b+c"

Solution

In #478 (comment) I proposed four solutions to this problem. In response, @ricea (Chromium) and @achristensen07 (WebKit) indicated they were "in favor of maintaining the status quo". I interpret this as meaning that any changes to either the URL query string parser/serializer, or the application/x-www-form-urlencoded parser/serializer, or the URLSearchParams class and url.searchParams member, are not on the table.

Given these constraints, it seems the only thing we could do is propose a new non-breaking addition to the API. As such, I propose a URLQueryParams class and a corresponding url.queryParams member, which are identical to URLSearchParams and url.searchParams, except that they use the URL parsing/serialization rules instead of the application/x-www-form-urlencoded rules. (Alternate names include url.realSearchParams or url.searchParams2.)

With that added, we could effectively deprecate url.searchParams (i.e., state loudly in the spec and MDN that using it will give unreliable results and mess up your URLs), and note that URLSearchParams is useful for representing <form> serialization, but not useful for manipulating URL search parameters.

(Optionally, we might want to define url.query / location.query / workerLocation.query as aliases for the corresponding .search properties, to fully align on the "query" naming and obsolete the "search" naming. But that's separable.)

@annevk
Copy link
Member

annevk commented May 5, 2020

The whole parameter-based format came from forms and is reflected in URLs when you use GET. Where does the parameter-based format that does not originate from forms come from?

@annevk annevk added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: api labels May 5, 2020
@mpriour
Copy link

mpriour commented May 5, 2020

@annevk The non form format comes from APIs and application level routers.

@domenic
Copy link
Member Author

domenic commented May 5, 2020

Where does the parameter-based format that does not originate from forms come from?

Are you asking where the rules in https://url.spec.whatwg.org/#query-state came from? I mean, you wrote them down :). I'd presume from one of the RFCs.

@annevk
Copy link
Member

annevk commented May 5, 2020

I'm saying that never had an official key-value format. As far as I know application/x-www-form-urlencoded is the only thing that does something with & and =. I think there might be some servers that also do something with ; though, but not sure how official that ever was.

@karwa
Copy link
Contributor

karwa commented Nov 2, 2022

I'm experimenting with a new query params interface for my URL library, and I figured some of the ideas might be useful for future web APIs.

The model I'm going with is that, rather than "query parameters", this is modelled as a key-value string within an opaque URL component. A handful of components don't have any defined internal structure - that includes the query, but also the fragment, and we also have opaque hosts and paths. Technically, you could encode a key-value string in any of them.

Media fragments are an example of key-value pairs within the fragment:

http://www.example.com/example.ogv#track=audio&t=10,20

As is OAuth 2.0:

If the resource owner grants the access request, the authorization
server issues an access token and delivers it to the client by adding
the following parameters to the fragment component of the redirection
URI using the "application/x-www-form-urlencoded" format

http://example.com/cb#access_token=2YotnFZFEjr1zCsicMWpAA&state=xyz&token_type=example&expires_in=3600

I also found an App called FoxyProxy (😅) which allows issuing commands via key-value pairs in a URL with opaque path. For example:

proxy:host=foo.com&port=999&action=add

So I think there is a general problem: these opaque URL components exist so that developers can encode custom structured data in their own identifiers. Key-value strings are just one example; comma-separated lists are another kind of structure that could be better supported. There really aren't great APIs for reading and manipulating these kinds of things in general - there's one kind of key-value string in one component which does have great, convenient APIs, and everything else is sort of forgotten about. People tend to hack stuff together to make up for it, and it's quite awkward and easy to make mistakes.

In Swift (the language my library is for), I'm able to define a schema object which allows customising which characters are interpreted as delimiters, which are written as delimiters, as well as options for escaping. Users can then do something like this:

var url = WebURL("http://example.com")!

// 'commaSeparated' is a user-defined schema object.
// Even though it is a custom KVP schema operating in the fragment,
// the API generalises so it is still super-easy to read/write key-value pairs.

url.withMutableKeyValuePairs(in: .fragment, schema: .commaSeparated) { kvps in
  kvps += [
    "foo": "bar",
    "baz": "qux"
  ]
}
print(url) // "http://example.com#foo:bar,baz:qux"
           //                    ^^^^^^^^^^^^^^^^

// You can also just get a view object, which doesn't need any awkward nesting.
// Again, generalises so it is really easy to use even for non-query params.

let kvps = url.keyValuePairs(in: .fragment, schema: .commaSeparated)
kvps["foo"]  // "bar"

The goal of an API like this which can scale to different use-cases is that it allows people to do more advanced things with URLs, more easily. It could also be an idea for the web, and the various places web technologies are used.

I also quite like that it dilutes the idea of "query parameters" as being some kind of special, proper URL component - they end up being just one expression of a general ability to encode opaque data.

@annevk
Copy link
Member

annevk commented Nov 3, 2022

While in some sense appealing, I think there's real value in not generalizing delimiters as it makes it easier to interoperate across disparate endpoints.

@karwa
Copy link
Contributor

karwa commented Nov 3, 2022

I agree; & and = are the default delimiters, and the vast majority of developers will never need to change them (or create a custom schema at all; they can use the built-in ones for form-encoding or percent-encoding). But I do think it has some small amount of value - ; (semicolon) can be used sometimes as a delimiter between pairs, and I've seen applications which try all sorts of more exotic things. The spotify: URL scheme actually uses the same delimiter between keys and values as it does between key-value pairs!

spotify:user:<username>:playlist:<id> 
spotify:search:<text>

The NodeJS querystring API also has parse and stringify methods that allow specifying a custom delimiter.

If you did need to parse/create an existing URL format which uses a different delimiter, it is not entirely trivial to do, and I think it's the kind of thing a URL API could and probably should help you do correctly. Even if we also advise that most users stick with the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: api
Development

No branches or pull requests

4 participants