Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a more strict hostname validation (isValid) #100

Closed
wants to merge 2 commits into from

Conversation

remusao
Copy link
Collaborator

@remusao remusao commented Sep 2, 2017

Attempt at addressing #59 and #73 (it should be trivial to allow _ as well, but we should probably discuss of a possible API for that there => #99).

This function implements a more strict validation of the hostnames. It checks:

  • maximum total size (255)
  • maximum size of a label (63)
  • allowed characters: [0-9a-zA-Z] for first character and [0-9a-zA-Z-] for the rest
  • Forbid two consecutive dots

As far as I know that should be it, but let me know if you see some corner cases that are not covered.

Regarding performance, this one times at:

tldjs#isValid x 8,112,000 ops/sec ±0.26% (96 runs sampled)

This is about x6 slower than the previous implementation (which did not check much). So I'd say it's still fast enough (compared to cleanHost ~ 360,000 ops/sec or getDomain ~ 120,000 ops/sec). I also compared it with a more naive validation regex (which does not check every thing) which times at 9,574,128 ops/sec.[1]


[1] /^(([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9])$/ (does not check maximum size, maximum label size, forbid trailing dot).

@remusao
Copy link
Collaborator Author

remusao commented Sep 2, 2017

I changed the regex to also check the size of the labels, and it becomes slower than the implementation of this PR. (for the record)

Also, not sure what's happening with the failing tests. I could not yet reproduce locally.

@remusao
Copy link
Collaborator Author

remusao commented Sep 4, 2017

The latest implementation handles a few more corner cases and is faster:

tldjs#isValid x 10,525,000 ops/sec ±0.40% (94 runs sampled)

@remusao
Copy link
Collaborator Author

remusao commented Sep 6, 2017

@oncletom Actually, I was wondering if we could add this check in the cleanHostValue, before doing anything, to check if the argument value is already a valid hostname. If not, we try to extract one. What do you think?

@thom4parisot
Copy link
Owner

Thanks @remusao, excellent work as usual 😄

I agree with the cleanHostValue integration — it would fail sooner and would prevent to clean something which does not have to be cleaned 👍

@remusao
Copy link
Collaborator Author

remusao commented Sep 8, 2017

@oncletom We might want to close this one in favor of #103. What do you think? I also implemented the validation there + integration with cleanHostValue and a few other changes.

@thom4parisot
Copy link
Owner

Agreed 👍

@remusao remusao deleted the isValid branch January 10, 2019 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants