Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPv4 parser "invents" a valid ip from doi prefix; is this / should this be correct behaviour? #761

Closed
mvolz opened this issue Mar 13, 2023 · 2 comments

Comments

@mvolz
Copy link

mvolz commented Mar 13, 2023

I'm not sure whether this is a spec issue or a node implementation issue but the results is the same in node and the browser. Basically the new standard converts the non-ip 10.1000 into the valid IP 10.0.3.232

In the old standard it is left as is.

As part of our tests we have this bad input we want to reject:

https://10.1000/f<script>alert(1);</script>
(This is supposed to represent a valid yet potentially harmful DOI prepended by the wrong protocol)

In browser:
https://jsdom.github.io/whatwg-url/#url=aHR0cHM6Ly8xMC4xMDAwL2Y8c2NyaXB0PmFsZXJ0KDEpOzwvc2NyaXB0Pg==&base=YWJvdXQ6Ymxhbms=

In Node:

Both url.parse and new URL are fine with this as an origin, which is fine, we deal with that downstream. What's odd is how it's parsed:

With the updated api it "invents" an IP address for it, so the output is

URL {
  href: 'https://10.0.3.232/f%3Cscript%3Ealert(1);%3C/script%3E',
  origin: 'https://10.0.3.232',
  protocol: 'https:',
  username: '',
  password: '',
  host: '10.0.3.232',
  hostname: '10.0.3.232',
  port: '',
  pathname: '/f%3Cscript%3Ealert(1);%3C/script%3E',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}

The old url.parse output keeps the pseudo IP address as it is ->

Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: '10.1000',
  port: null,
  hostname: '10.1000',
  hash: null,
  search: null,
  query: null,
  pathname: '/f%3Cscript%3Ealert(1);%3C/script%3E',
  path: '/f%3Cscript%3Ealert(1);%3C/script%3E',
  href: 'https://10.1000/f%3Cscript%3Ealert(1);%3C/script%3E'
}

Should it actually be doing this? Obviously dois shouldn't be prepended with the wrong protocol, but wondering if instead the behaviour should be to throw an error instead of accepting it but converting it to something unrecognisable?

I know consecutive 0s are allowed to be omitted, but 10.0.0.1000 isn't valid either because there are 4 digits in the last place (the minimum length for the second number in doi prefixes precisely to prevent confusion with ip addresses)... anyone know what's happening with the parser :)?

@annevk
Copy link
Member

annevk commented Mar 13, 2023

Heya, this is intentional. See https://www.w3.org/Bugs/Public/show_bug.cgi?id=26431 for some background. https://url.spec.whatwg.org/#example-host-parsing has some other examples.

@annevk annevk closed this as completed Mar 13, 2023
@annevk annevk closed this as not planned Won't fix, can't repro, duplicate, stale Mar 13, 2023
@karwa
Copy link
Contributor

karwa commented Mar 13, 2023

For compatibility, the IP address parser supports the same formats as inet_aton. This includes the following format:

a.b

Part a specifies the first byte of the binary address. Part b is interpreted as a 24-bit value that defines the rightmost three bytes of the binary address. This notation is suitable for specifying (outmoded) Class C network addresses.

Since the second number is a 24-bit integer, 1000 is a valid value for it to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants