Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss use of cleanHost by default, and API in general #99

Closed
remusao opened this issue Sep 2, 2017 · 3 comments
Closed

Discuss use of cleanHost by default, and API in general #99

remusao opened this issue Sep 2, 2017 · 3 comments

Comments

@remusao
Copy link
Collaborator

remusao commented Sep 2, 2017

@oncletom @ctavan

As discussed in #97, let's open a discussion around the use of cleanHostValue in tld.js, and the API in general. As a reminder, I summarize below the points raised so far:

  • The mandatory use of cleanHostValue (which is the current performance bottleneck of most functions in the public API) in each function is not optimal (we will call it more than once if we use getDomain, once in getDomain and once in getPublicSuffix). Also, a user of the library has no way to bypass this step, in case we already have a hostname at hand and don't want to pay the price of parsing/cleaning the input.
  • @ctavan mentioned that the API can be confusing sometimes, and it's not clear what can be given as argument of the function: a url, a valid hostname?

I would like to propose a solution to this:

  1. Make all the functions part of the API accept a valid hostname, that would make it clear that we are interested in knowing the domain, public suffix, tld of a particular domain, with no additional processing (URL parsing, etc.).
  2. Give access to a extractHostname (the current cleanHostValue) function that can be applied before-hand to URLs to get a hostname as expected by the functions exposed by tld.js.
  3. Optionally, we could implement a fast check (e.g.: isValid), that would be performed on the input of the functions, to make sure that the input is indeed a valid hostname, and we could either raise an exception or return null.

Also, I would like to mention a last point. I have had a look at other libraries offering public suffix parsing, and a common API is to offer parse function taking a hostname as input, and returning some object of the form:

{
  domain: ...,
  publicsuffix: ...,
  subdomain: ...,
}

I find that this greatly simplifies the API. Although I'm not sure it's something we want to do, we could easily add this function in the public API, and express the other functions in terms of this one:

getPublicSuffix(hostname) => parse(hostname).publicsuffix;
getDomain(hostname) => parse(hostname).domain;
getSubDomain(hostname) => parse(hostname).subdomain;

Having one function implementing the logic would remove some redundancy in the code and offer a super simple API for the users of the library. That's just a thought and I'm not sure if this is a worthy addition, but I wanted to mention this possibility.

What do you guys think?

@chrmod
Copy link
Contributor

chrmod commented Sep 4, 2017

We had some discussion on this topic with @remusao. I'm dumping the summary for review:

From the user standpoint, the single polymorphic API may be the best choice. Being able to pass the full URL or just a hostname is not only convenient but also makes code less error prone.

parse(hostname) and parse(url) in the end require just a fast argument test to identify the type we deal with.

Question: if we detailed function should be still exposed, should they be:

  • polymorphic like parse
  • require an extractHostname, and work only with hostnames
  • to have the URL typed versions, maybe getPublicSuffixFromUrl etc

@thom4parisot
Copy link
Owner

I really like this approach and before you posted the issue, I was sketching something very similar.

At this point, I am almost regretting we don't have ESModules in Node yet, because I would have even done something like this:

import tld, {tldExists} from 'tldjs';

tld('www.google.com').publicSuffix;
tldExists('www.google.com');

@remusao
Copy link
Collaborator Author

remusao commented Oct 26, 2017

I will close this issue as the API now provides both a parse method, a way to customize extractHostname and individual methods to get specific information on an URL (getDomain, tldExists, etc.). It should enough to allow users to customize the library to their needs. Feel free to re-open if you have any remarks or suggestions related to this issue.

@remusao remusao closed this as completed Oct 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants