Skip to content

Commit

Permalink
Merge pull request #108 from remusao/handle-ip
Browse files Browse the repository at this point in the history
Add ip validation
  • Loading branch information
Thomas Parisot committed Feb 2, 2018
2 parents 2663273 + 67558dd commit ac2768d
Show file tree
Hide file tree
Showing 8 changed files with 420 additions and 62 deletions.
90 changes: 63 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

> `tld.js` is a Node.js module written in JavaScript to work against complex domain names, subdomains and well-known TLDs.
It answers with accuracy to questions like _what is `mail.google.com` domain?_, _what is `a.b.ide.kyoto.jp` subdomain?_ and _is `https://big.data` TLD a well-known one?_.
It answers with accuracy to questions like _what is `mail.google.com`'s domain?_, _what is `a.b.ide.kyoto.jp`'s subdomain?_ and _is `https://big.data`'s TLD a well-known one?_.

`tld.js` [runs fast](#performances), is fully tested and is safe to use in the browser (with [browserify][], webpack and others). Because it relies on Mozilla's [public suffix list][], now is a good time to say _thank you_ Mozilla!

Expand Down Expand Up @@ -43,23 +43,33 @@ This methods returns handy **properties about a URL or a hostname**.
const tldjs = require('tldjs');

tldjs.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv');
// {
// "hostname": "spark-public.s3.amazonaws.com",
// "isValid": true,
// "tldExists": true,
// "publicSuffix": "s3.amazonaws.com",
// "domain": "spark-public.s3.amazonaws.com",
// "subdomain": ""
// { hostname: 'spark-public.s3.amazonaws.com',
// isValid: true,
// isIp: false,
// tldExists: true,
// publicSuffix: 's3.amazonaws.com',
// domain: 'spark-public.s3.amazonaws.com',
// subdomain: ''
// }

tldjs.parse('gopher://domain.unknown/');
// {
// "hostname": "domain.unknown",
// "isValid": true,
// "tldExists": false,
// "publicSuffix": "unknown",
// "domain": "domain.unknown",
// "subdomain": ""
// { hostname: 'domain.unknown',
// isValid: true,
// isIp: false,
// tldExists: false,
// publicSuffix: 'unknown',
// domain: 'domain.unknown',
// subdomain: ''
// }

tldjs.parse('https://192.168.0.0')
// { hostname: '192.168.0.0',
// isValid: true,
// isIp: true,
// tldExists: false,
// publicSuffix: null,
// domain: null,
// subdomain: null
// }
```

Expand Down Expand Up @@ -154,6 +164,7 @@ isValid('.google.com'); // returns `false`
isValid('my.fake.domain'); // returns `true`
isValid('localhost'); // returns `false`
isValid('https://user:password@example.co.uk:8080/some/path?and&query#hash'); // returns `true`
isValid('192.168.0.0') // returns `true`
```

# Troubleshooting
Expand Down Expand Up @@ -209,23 +220,49 @@ Issues may be awaiting for help so feel free to give a hand, with code or ideas.

# Performances

```
While interpreting the results, keep in mind that each "op" reported by the benchmark is processing 24 domains
tldjs#isValid x 230,353 ops/sec ±10.99% (44 runs sampled)
tldjs#extractHostname x 42,333 ops/sec ±2.82% (85 runs sampled)
tldjs#tldExists x 15,083 ops/sec ±8.76% (54 runs sampled)
tldjs#getPublicSuffix x 14,334 ops/sec ±8.00% (80 runs sampled)
tldjs#getDomain x 15,092 ops/sec ±1.92% (84 runs sampled)
tldjs#getSubdomain x 13,202 ops/sec ±3.66% (72 runs sampled)
tldjs#parse x 8,561 ops/sec ±11.78% (55 runs sampled)
```
`tld.js` is fast, but keep in mind that it might vary depending on your own
use-case. Because the library tried to be smart, the speed can be drastically
different depending on the input (it will be faster if you provide an already
cleaned hostname, compared to a random URL).

On an Intel i7-6600U (2,60-3,40 GHz):

## For already cleaned hostnames

| Methods | ops/sec |
| --- | --- |
| `isValid` | ~`8,700,000` |
| `extractHostname` | ~`8,100,000` |
| `tldExists` | ~`2,000,000` |
| `getPublicSuffix` | ~`1,130,000` |
| `getDomain` | ~`1,000,000` |
| `getSubdomain` | ~`1,000,000` |
| `parse` | ~`850,000` |


## For random URLs

| Methods | ops/sec |
| --- | --- |
| `isValid` | ~`25,400,000` |
| `extractHostname` | ~`400,000` |
| `tldExists` | ~`310,000` |
| `getPublicSuffix` | ~`240,000` |
| `getDomain` | ~`240,000` |
| `getSubdomain` | ~`240,000` |
| `parse` | ~`230,000` |


You can measure the performance of `tld.js` on your hardware by running the following command:

```bash
npx tldjs -c './bin/benchmark.js'
npm run benchmark
```

_Notice_: if this is not fast enough for your use-case, keep in mind that you can
provide your own `extractHostname` function (which is the bottleneck in
this benchmark) to `tld.js`.

## Contributors

This project exists thanks to all the people who contribute. [[Contribute]](CONTRIBUTING.md).
Expand Down Expand Up @@ -255,7 +292,6 @@ Support this project by becoming a sponsor. Your logo will show up here with a l
<a href="https://opencollective.com/tldjs/sponsor/9/website" target="_blank"><img src="https://opencollective.com/tldjs/sponsor/9/avatar.svg"></a>



# License

[MIT License](LICENSE).
Expand Down
70 changes: 48 additions & 22 deletions bin/benchmark.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ var tld = require('../index.js');
var Benchmark = require('benchmark');


var DOMAINS = [
var HOSTNAMES = [
// No public suffix
'example.foo.edu.au', // null
'example.foo.edu.sh', // null
Expand All @@ -30,7 +30,10 @@ var DOMAINS = [
'example.www.ck', // !www.ck
'foo.bar.baz.city.yokohama.jp', // !city.yokohama.jp
'example.city.kobe.jp', // !city.kobe.jp
];


var URLS = [
// IDN labels
'example.北海道.jp', // 北海道.jp
'example.和歌山.jp', // 和歌山.jp
Expand All @@ -44,54 +47,62 @@ var DOMAINS = [
'FOO.bar.BAZ.ortsinfo.AT', // null

// Full URLs
// '2001:0DB8:0100:F101:0210:A4FF:FEE3:9566',
// 'http://user:pass@www.examplegoogle.com:21/blah#baz',
// 'http://iris.test.ing/&#x1E0B;&#x0323;/?&#x1E0B;&#x0323;#&#x1E0B;&#x0323;',
// 'http://0000000000000300.0xffffffffFFFFFFFF.3022415481470977',
'2001:0DB8:0100:F101:0210:A4FF:FEE3:9566',
'http://user:pass@www.examplegoogle.com:21/blah#baz',
'http://iris.test.ing/&#x1E0B;&#x0323;/?&#x1E0B;&#x0323;#&#x1E0B;&#x0323;',
'http://0000000000000300.0xffffffffFFFFFFFF.3022415481470977',
'http://192.168.0.1/',
'http://%30%78%63%30%2e%30%32%35%30.01%2e',
'http://user:pass@[::1]/segment/index.html?query#frag',
'https://[::1]',
];


// TODO - Compare to other libraries
function main() {
function bench(values) {
console.log(
'While interpreting the results, keep in mind that each "op" reported' +
' by the benchmark is processing ' + DOMAINS.length + ' domains'
' by the benchmark is processing ' + values.length + ' domains'
);

new Benchmark.Suite()
.add('tldjs#isIp', () => {
for (var i = 0; i < values.length; i += 1) {
tld.isIp(values[i]);
}
})
.add('tldjs#isValid', () => {
for (var i = 0; i < DOMAINS.length; i += 1) {
tld.isValid(DOMAINS[i]);
for (var i = 0; i < values.length; i += 1) {
tld.isValid(values[i]);
}
})
.add('tldjs#extractHostname', () => {
for (var i = 0; i < DOMAINS.length; i += 1) {
tld.extractHostname(DOMAINS[i]);
for (var i = 0; i < values.length; i += 1) {
tld.extractHostname(values[i]);
}
})
.add('tldjs#tldExists', () => {
for (var i = 0; i < DOMAINS.length; i += 1) {
tld.tldExists(DOMAINS[i]);
for (var i = 0; i < values.length; i += 1) {
tld.tldExists(values[i]);
}
})
.add('tldjs#getPublicSuffix', () => {
for (var i = 0; i < DOMAINS.length; i += 1) {
tld.getPublicSuffix(DOMAINS[i]);
for (var i = 0; i < values.length; i += 1) {
tld.getPublicSuffix(values[i]);
}
})
.add('tldjs#getDomain', () => {
for (var i = 0; i < DOMAINS.length; i += 1) {
tld.getDomain(DOMAINS[i]);
for (var i = 0; i < values.length; i += 1) {
tld.getDomain(values[i]);
}
})
.add('tldjs#getSubdomain', () => {
for (var i = 0; i < DOMAINS.length; i += 1) {
tld.getSubdomain(DOMAINS[i]);
for (var i = 0; i < values.length; i += 1) {
tld.getSubdomain(values[i]);
}
})
.add('tldjs#parse', () => {
for (var i = 0; i < DOMAINS.length; i += 1) {
tld.parse(DOMAINS[i]);
for (var i = 0; i < values.length; i += 1) {
tld.parse(values[i]);
}
})
.on('cycle', function (event) {
Expand All @@ -101,4 +112,19 @@ function main() {
}


// TODO - Compare to other libraries
function main() {
console.log('>>> -------------------- <<<');
console.log('>>> Only valid hostnames <<<');
console.log('>>> -------------------- <<<');
bench(HOSTNAMES);

console.log();
console.log('>>> ----------- <<<');
console.log('>>> Random URLs <<<');
console.log('>>> ----------- <<<');
bench(URLS);
}


main();
18 changes: 17 additions & 1 deletion index.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
'use strict';


// Load rules
var Trie = require('./lib/suffix-trie.js');
var allRules = Trie.fromJson(require('./rules.json'));
Expand All @@ -10,6 +11,7 @@ var getDomain = require('./lib/domain.js');
var getPublicSuffix = require('./lib/public-suffix.js');
var getSubdomain = require('./lib/subdomain.js');
var isValid = require('./lib/is-valid.js');
var isIp = require('./lib/is-ip.js');
var tldExists = require('./lib/tld-exists.js');


Expand Down Expand Up @@ -50,12 +52,26 @@ function factory(options) {
var result = {
hostname: _extractHostname(url),
isValid: null,
tldExists: null,
isIp: null,
tldExists: false,
publicSuffix: null,
domain: null,
subdomain: null,
};

if (result.hostname === null) {
result.isIp = false;
result.isValid = false;
return result;
}

// Check if `hostname` is a valid ip address
result.isIp = isIp(result.hostname);
if (result.isIp) {
result.isValid = true;
return result;
}

// Check if `hostname` is valid
result.isValid = isValid(result.hostname);
if (result.isValid === false) return result;
Expand Down
Loading

0 comments on commit ac2768d

Please sign in to comment.