Skip to content

A modern ESM fork of tld.js for working with domain names, subdomains and well-known TLDs.

License

Notifications You must be signed in to change notification settings

aryasaatvik/neotld

Repository files navigation

neotld

A modern ESM fork of tld.js for working with domain names, subdomains and well-known TLDs.

It answers with accuracy to questions like what is mail.google.com's domain?, what is a.b.ide.kyoto.jp's subdomain? and is https://big.data's TLD a well-known one?.

Because it relies on Mozilla's public suffix list, now is a good time to say thank you Mozilla!

Install

# Regular install
npm install neotld
pnpm add neotld

# You can update the list of well-known TLD during the install
pnpm add neotld --neotld-update-rules

The latter is useful if you significantly rely on an up-to-date list of TLDs. You can list the recent changes (changes Atom Feed) to get a better idea of what is going on in the Public Suffix world.

Using It

import { parse, tldExists } from 'neotld';

// Checking only if TLD exists in URL or hostname
console.log(tldExists('google.com')); // true
console.log(tldExists('example.invalid')); // false

// Retrieving hostname related information of a given URL
parse('http://www.writethedocs.org/conf/eu/2017/');

API

parse()

Returns detailed information about a URL or hostname:

import { parse } from 'neotld';

parse('https://spark-public.s3.amazonaws.com/data/file.csv');
// { 
//   hostname: 'spark-public.s3.amazonaws.com',
//   isValid: true,
//   isIp: false,
//   tldExists: true,
//   publicSuffix: 's3.amazonaws.com',
//   domain: 'spark-public.s3.amazonaws.com',
//   subdomain: ''
// }

parse('gopher://domain.unknown/');
// { hostname: 'domain.unknown',
//   isValid: true,
//   isIp: false,
//   tldExists: false,
//   publicSuffix: 'unknown',
//   domain: 'domain.unknown',
//   subdomain: ''
// }

parse('https://192.168.0.0')
// { hostname: '192.168.0.0',
//   isValid: true,
//   isIp: true,
//   tldExists: false,
//   publicSuffix: null,
//   domain: null,
//   subdomain: null
// }
Property Name Type
hostname String
isValid Boolean Is the hostname valid according to the RFC?
tldExists Boolean Is the TLD well-known or not?
publicSuffix String
domain String
subdomain String

Single purpose methods

These methods are shorthands if you want to retrieve only a single value.

tldExists()

Checks if the TLD is well-known for a given hostname — parseable with [URL.parse][].

import { tldExists } from 'neotld';

tldExists('google.com');      // returns `true`
tldExists('google.local');    // returns `false` (not an explicit registered TLD)
tldExists('com');             // returns `true`
tldExists('uk');              // returns `true`
tldExists('co.uk');           // returns `true` (because `uk` is a valid TLD)
tldExists('amazon.fancy.uk'); // returns `true` (still because `uk` is a valid TLD)
tldExists('amazon.co.uk');    // returns `true` (still because `uk` is a valid TLD)
tldExists('https://user:password@example.co.uk:8080/some/path?and&query#hash'); // returns `true`

getDomain()

Returns the fully qualified domain from a given string — parseable with [URL.parse][].

import { getDomain } from 'neotld';

getDomain('google.com');        // returns `google.com`
getDomain('fr.google.com');     // returns `google.com`
getDomain('fr.google.google');  // returns `google.google`
getDomain('foo.google.co.uk');  // returns `google.co.uk`
getDomain('t.co');              // returns `t.co`
getDomain('fr.t.co');           // returns `t.co`
getDomain('https://user:password@example.co.uk:8080/some/path?and&query#hash'); // returns `example.co.uk`

getSubdomain()

Returns the complete subdomain for a given string — parseable with require('url').parse.

import { getSubdomain } from 'neotld';

getSubdomain('google.com');             // returns ``
getSubdomain('fr.google.com');          // returns `fr`
getSubdomain('google.co.uk');           // returns ``
getSubdomain('foo.google.co.uk');       // returns `foo`
getSubdomain('moar.foo.google.co.uk');  // returns `moar.foo`
getSubdomain('t.co');                   // returns ``
getSubdomain('fr.t.co');                // returns `fr`
getSubdomain('https://user:password@secure.example.co.uk:443/some/path?and&query#hash'); // returns `secure`

getPublicSuffix()

Returns the public suffix for a given string — parseable with [URL.parse][].

import { getPublicSuffix } from 'neotld';

getPublicSuffix('google.com');       // returns `com`
getPublicSuffix('fr.google.com');    // returns `com`
getPublicSuffix('google.co.uk');     // returns `co.uk`
getPublicSuffix('s3.amazonaws.com'); // returns `s3.amazonaws.com`
getPublicSuffix('tld.is.unknown');   // returns `unknown`

isValidHostname()

Checks if the given string is a valid hostname according to RFC 1035. It does not check if the TLD is well-known.

import { isValidHostname } from 'neotld';

isValidHostname('google.com');      // returns `true`
isValidHostname('.google.com');     // returns `false`
isValidHostname('my.fake.domain');  // returns `true`
isValidHostname('localhost');       // returns `false`
isValidHostname('https://user:password@example.co.uk:8080/some/path?and&query#hash'); // returns `false`
isValidHostname('192.168.0.0')      // returns `true`

Troubleshooting

Retrieving subdomain of localhost and custom hostnames

tld.js methods getDomain and getSubdomain are designed to work only with known and valid TLDs. This way, you can trust what a domain is.

localhost is a valid hostname but not a TLD. Although you can instanciate your own flavour of tld.js with additional valid hosts:

import neotld from 'neotld';

neotld.getDomain('localhost');           // returns null
neotld.getSubdomain('vhost.localhost');  // returns null

const myNeotld = neotld.fromUserSettings({
  validHosts: ['localhost']
});

customTld.getDomain('localhost');           // 'localhost'
customTld.getSubdomain('vhost.localhost');  // 'vhost'

Updating TLD Rules

Many libraries offer a list of TLDs. But, are they up-to-date? And how to update them?

tld.js bundles a list of known TLDs but this list can become outdated. This is especially true if the package have not been updated on npm for a while.

Hopefully for you, even if I'm flying over the world, if I've lost my Internet connection or even if you do manage your own list, you can update it by yourself, painlessly.

How? By passing the --neotld-update-rules to your npm install command:

# anytime you reinstall your project
npm install --neotld-update-rules

# or if you add the dependency to your project
npm install --save neotld --neotld-update-rules

Open an issue to request an update of the bundled TLDs.

Contributing

Provide a pull request (with tested code) to include your work in this main project. Issues may be awaiting for help so feel free to give a hand, with code or ideas.

Performances

neotld is fast, but keep in mind that it might vary depending on your own use-case. Because the library tried to be smart, the speed can be drastically different depending on the input (it will be faster if you provide an already cleaned hostname, compared to a random URL).

On an Intel i7-6600U (2,60-3,40 GHz):

For already cleaned hostnames

Methods ops/sec
isValidHostname ~8,700,000
extractHostname ~8,100,000
tldExists ~2,000,000
getPublicSuffix ~1,130,000
getDomain ~1,000,000
getSubdomain ~1,000,000
parse ~850,000

For random URLs

Methods ops/sec
isValidHostname ~25,400,000
extractHostname ~400,000
tldExists ~310,000
getPublicSuffix ~240,000
getDomain ~240,000
getSubdomain ~240,000
parse ~230,000

You can measure the performance of tld.js on your hardware by running the following command:

npm run benchmark

Notice: if this is not fast enough for your use-case, keep in mind that you can provide your own extractHostname function (which is the bottleneck in this benchmark) to tld.js.

License

MIT License.

About

A modern ESM fork of tld.js for working with domain names, subdomains and well-known TLDs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published