Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL.canParse() #713

Closed
hemanth opened this issue Oct 17, 2022 · 22 comments · Fixed by #763
Closed

URL.canParse() #713

hemanth opened this issue Oct 17, 2022 · 22 comments · Fixed by #763
Labels
addition/proposal New features or enhancements topic: api

Comments

@hemanth
Copy link

hemanth commented Oct 17, 2022

It would be good to have a URL.isValid method?

Today we have to do something like:

const isValidUrl = (s) => {
    try {
      new URL(s);
      return true;
    } catch (err) {
      return false;
    }
};

The proposal is to have something like:

URL.isValid(<URL>, [protocol1, protocol2]);
URL.isValid('https://h3manth.com'); // true
URL.isValid('https://h3manth.com', ['file']); // false 
@jasnell
Copy link
Collaborator

jasnell commented Oct 17, 2022

"Valid" here will need to be defined as the spec defines a clear notion of a "validation error". Specifically, a URL can parse successfully but still be considered "invalid" per the spec. Does "isValid(...) === true" mean that the URL parses successfully or does it mean it parsed without any validation errors?

@jasnell
Copy link
Collaborator

jasnell commented Oct 17, 2022

As far as bikeshedding the API is concerned, I would likely prefer something more like:

URL.test(input[, base]);

With the semantics specifically only limited to: Will this parse successfully.

@domenic
Copy link
Member

domenic commented Oct 18, 2022

Please see https://whatwg.org/faq#adding-new-features, especially step 1 and 2.

@annevk annevk added needs implementer interest Moving the issue forward requires implementers to express interest addition/proposal New features or enhancements topic: validation Pertaining to the rules for URL writing and validity (as opposed to parsing) labels Oct 18, 2022
@annevk
Copy link
Member

annevk commented Oct 18, 2022

With the semantics specifically only limited to: Will this parse successfully.

How would that be different from what new URL() does?

@jasnell
Copy link
Collaborator

jasnell commented Oct 18, 2022

How would that be different from what new URL() does?

The key difference would be no allocations, in particular no Error object created if it does not parse, and therefore no expensive stack capture.

@hemanth
Copy link
Author

hemanth commented Nov 4, 2022

@domenic Should we provide more use-cases here? All that we are trying to solve here is to have a convenient way to identify if a given URL is valid or not.

@jasnell test reminds me of RegExp.prototype.test like testing against something, isValid sounds more intuitive?

Either way if we agree this is a problem space, happy to discuss more.

@annevk
Copy link
Member

annevk commented Nov 5, 2022

@hemanth well it would help if you would first clarify what semantics you are after as validity and "can parse" are very different things. (But yeah, depending on the answer, use cases and demonstrating there is a need would help.)

@jasnell
Copy link
Collaborator

jasnell commented Nov 5, 2022

The primary use case I've encountered for this is really functions like the isValid() in the OP at the top. Specifically, code that wishes to verify that a given input can be handled as a URL (can parse) without incurring the additional cost of creating either a URL object or an Error object with an expensive to generate stack. I've encountered this pattern a number of times.

Most often, I'd say, the check is performed using a regex, which is problematic in that it doesn't match the parsing semantics and can be difficult to make correct. There's an old module on npm called is-url that has over 2.5 million downloads per week to address this specific problem using an incomplete regex approach.

For me, I would 100% support an API that checks if an input can be parsed as a URL. I have no use case for further validity checks.

Alternatively, I would be good with a new URL constructor variation that did not throw on unparseable input so as to avoid the expensive operation to acquire the stack.

@hemanth
Copy link
Author

hemanth commented Nov 5, 2022

@annevk My thoughts resonated completely with @jasnell. There is also valid-url another node module which is dependent on regex and got around 2,788,820 downloads last week.

This is a very common pattern across multiple code base and would be beneficial to multiple developers to have a connivence method with in the standards.

If there is a template repo that I can fork and add more structure to the challenge and propose potential solutions, I am happy to collaborate.

@annevk annevk removed the topic: validation Pertaining to the rules for URL writing and validity (as opposed to parsing) label Nov 7, 2022
@annevk
Copy link
Member

annevk commented Nov 7, 2022

See also #372 although that would not return a boolean presumably.

Given that there's popular libraries that do something like this this is worth doing I think, but this is not validation. We should either call this something like canParse and return a boolean or parse and return null or a URL object.

@domenic
Copy link
Member

domenic commented Nov 7, 2022

It's not clear to me solving this at the level of a new API is the best idea. Wouldn't it be better to make try / catch faster? In particular, if you use catch { } instead of catch (e) { }, that should be a reliable signal to the engine that there's no need to construct a stack, which is apparently expensive (benchmarks pending).

@ljharb
Copy link

ljharb commented Nov 7, 2022

That wouldn’t likely help you if an error is caught by a calling function, or an async function, or any of the Promise methods.

@domenic
Copy link
Member

domenic commented Nov 8, 2022

Right, you would put the catch there immediately surrounding the new URL call, just like you would put the URL.isParseable call there immediately surrounding the string input.

@jasnell
Copy link
Collaborator

jasnell commented Nov 8, 2022

I'd argue that it's definitely better to make capturing that stack more efficient just in general but I don't think that eliminates to usefulness of the proposed new API. For instance, suppose I am filtering a list of strings to acquire only the ones that are parseable URLs e.g. array.filter(URL.isParseable) is a lot more ergonomic than having to wrap the check in another function that includes a try/catch for every iteration.

@ljharb
Copy link

ljharb commented Nov 8, 2022

@domenic that would certainly be an interesting unobservable optimization, for a lot of patterns - but i don't think the real problem here is performance, it's ergonomics. try/catching something is a wildly icky way to get a boolean.

@karwa
Copy link
Contributor

karwa commented Nov 8, 2022

For the ergonomics of try/catch, perhaps that's something that should be solved at the language level? For instance, some languages have a try? feature which returns nil on error. Would this be useful for other, non-URL APIs?

@ljharb
Copy link

ljharb commented Nov 8, 2022

There's not that many predicates that need to be implemented in the form of a try/catch, so i'm not sure if that'd be worth additional syntax.

@annevk annevk changed the title Proposal: URL.isValid URL.canParse() Mar 6, 2023
@annevk
Copy link
Member

annevk commented Mar 6, 2023

@domenic do your comments above mean you disagree with #372 (comment) now?

I find @jasnell's comment above about being able to use a non-throwing method in a number of scenarios compelling.

Also, https://www.npmjs.com/package/valid-url, https://www.npmjs.com/package/is-url-superb, and https://www.npmjs.com/package/is-url suggest an authoritative method would be useful for the ecosystem.

@valenting @ricea @achristensen07 thoughts?

@ricea
Copy link

ricea commented Mar 6, 2023

I'm not convinced that canParse() is the check that the people using those libraries really want. For example, we can parse a::<!>:: as a URL, but you probably wouldn't want to accept that in an input field.

In the absence of benchmarks I'm also not convinced by the performance argument.

Edit: Actually, even if there were benchmarks, why do we think this is performance critical? Who needs to verify whether a million URLs parse anyway?

annevk added a commit that referenced this issue Mar 16, 2023
@annevk annevk mentioned this issue Mar 16, 2023
4 tasks
@annevk
Copy link
Member

annevk commented Mar 16, 2023

I chatted with @ricea about this and convinced him those libraries do in fact want this functionality.

PR: #763.

@annevk annevk removed the needs implementer interest Moving the issue forward requires implementers to express interest label Mar 20, 2023
@yisibl
Copy link

yisibl commented Apr 17, 2024

If you have the following code, the url actually needs to be parsed twice.

const url = 'http://foo.com'
let parsed_url = null
if (URL.canParse(url)) {
  parsed_url = new URL(url)
}

Looking back, now that we've added URL.parse() to return null if it's invalid(#825), why do we need URL.canParse()?

@annevk
Copy link
Member

annevk commented Apr 17, 2024

I'm not a big fan of folks resurrecting closed issues. Please consider filing a new issue in the future if there's something you wish to discuss.

As for your question, I don't think that's the only case where URL.canParse() is useful. E.g., if you have to decide whether to store a given string based on whether or not it parses as a URL. URL.canParse() is nice there as it doesn't require allocating any URL object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements topic: api
Development

Successfully merging a pull request may close this issue.

8 participants