Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ip validation #108

Merged
merged 6 commits into from
Feb 2, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 63 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

> `tld.js` is a Node.js module written in JavaScript to work against complex domain names, subdomains and well-known TLDs.

It answers with accuracy to questions like _what is `mail.google.com` domain?_, _what is `a.b.ide.kyoto.jp` subdomain?_ and _is `https://big.data` TLD a well-known one?_.
It answers with accuracy to questions like _what is `mail.google.com`'s domain?_, _what is `a.b.ide.kyoto.jp`'s subdomain?_ and _is `https://big.data`'s TLD a well-known one?_.

`tld.js` [runs fast](#performances), is fully tested and is safe to use in the browser (with [browserify][], webpack and others). Because it relies on Mozilla's [public suffix list][], now is a good time to say _thank you_ Mozilla!

Expand Down Expand Up @@ -43,23 +43,33 @@ This methods returns handy **properties about a URL or a hostname**.
const tldjs = require('tldjs');

tldjs.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv');
// {
// "hostname": "spark-public.s3.amazonaws.com",
// "isValid": true,
// "tldExists": true,
// "publicSuffix": "s3.amazonaws.com",
// "domain": "spark-public.s3.amazonaws.com",
// "subdomain": ""
// { hostname: 'spark-public.s3.amazonaws.com',
// isValid: true,
// isIp: false,
// tldExists: true,
// publicSuffix: 's3.amazonaws.com',
// domain: 'spark-public.s3.amazonaws.com',
// subdomain: ''
// }

tldjs.parse('gopher://domain.unknown/');
// {
// "hostname": "domain.unknown",
// "isValid": true,
// "tldExists": false,
// "publicSuffix": "unknown",
// "domain": "domain.unknown",
// "subdomain": ""
// { hostname: 'domain.unknown',
// isValid: true,
// isIp: false,
// tldExists: false,
// publicSuffix: 'unknown',
// domain: 'domain.unknown',
// subdomain: ''
// }

tldjs.parse('https://192.168.0.0')
// { hostname: '192.168.0.0',
// isValid: true,
// isIp: true,
// tldExists: false,
// publicSuffix: null,
// domain: null,
// subdomain: null
// }
```

Expand Down Expand Up @@ -154,6 +164,7 @@ isValid('.google.com'); // returns `false`
isValid('my.fake.domain'); // returns `true`
isValid('localhost'); // returns `false`
isValid('https://user:password@example.co.uk:8080/some/path?and&query#hash'); // returns `true`
isValid('192.168.0.0') // returns `true`
```

# Troubleshooting
Expand Down Expand Up @@ -209,23 +220,49 @@ Issues may be awaiting for help so feel free to give a hand, with code or ideas.

# Performances

```
While interpreting the results, keep in mind that each "op" reported by the benchmark is processing 24 domains
tldjs#isValid x 230,353 ops/sec ±10.99% (44 runs sampled)
tldjs#extractHostname x 42,333 ops/sec ±2.82% (85 runs sampled)
tldjs#tldExists x 15,083 ops/sec ±8.76% (54 runs sampled)
tldjs#getPublicSuffix x 14,334 ops/sec ±8.00% (80 runs sampled)
tldjs#getDomain x 15,092 ops/sec ±1.92% (84 runs sampled)
tldjs#getSubdomain x 13,202 ops/sec ±3.66% (72 runs sampled)
tldjs#parse x 8,561 ops/sec ±11.78% (55 runs sampled)
```
`tld.js` is fast, but keep in mind that it might vary depending on your own
use-case. Because the library tried to be smart, the speed can be drastically
different depending on the input (it will be faster if you provide an already
cleaned hostname, compared to a random URL).

On an Intel i7-6600U (2,60-3,40 GHz):

## For already cleaned hostnames

| Methods | ops/sec |
| --- | --- |
| `isValid` | ~`8,700,000` |
| `extractHostname` | ~`8,100,000` |
| `tldExists` | ~`2,000,000` |
| `getPublicSuffix` | ~`1,130,000` |
| `getDomain` | ~`1,000,000` |
| `getSubdomain` | ~`1,000,000` |
| `parse` | ~`850,000` |


## For random URLs

| Methods | ops/sec |
| --- | --- |
| `isValid` | ~`25,400,000` |
| `extractHostname` | ~`400,000` |
| `tldExists` | ~`310,000` |
| `getPublicSuffix` | ~`240,000` |
| `getDomain` | ~`240,000` |
| `getSubdomain` | ~`240,000` |
| `parse` | ~`230,000` |


You can measure the performance of `tld.js` on your hardware by running the following command:

```bash
npx tldjs -c './bin/benchmark.js'
npm run benchmark
```

_Notice_: if this is not fast enough for your use-case, keep in mind that you can
provide your own `extractHostname` function (which is the bottleneck in
this benchmark) to `tld.js`.

## Contributors

This project exists thanks to all the people who contribute. [[Contribute]](CONTRIBUTING.md).
Expand Down Expand Up @@ -255,7 +292,6 @@ Support this project by becoming a sponsor. Your logo will show up here with a l
<a href="https://opencollective.com/tldjs/sponsor/9/website" target="_blank"><img src="https://opencollective.com/tldjs/sponsor/9/avatar.svg"></a>



# License

[MIT License](LICENSE).
Expand Down
70 changes: 48 additions & 22 deletions bin/benchmark.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ var tld = require('../index.js');
var Benchmark = require('benchmark');


var DOMAINS = [
var HOSTNAMES = [
// No public suffix
'example.foo.edu.au', // null
'example.foo.edu.sh', // null
Expand All @@ -30,7 +30,10 @@ var DOMAINS = [
'example.www.ck', // !www.ck
'foo.bar.baz.city.yokohama.jp', // !city.yokohama.jp
'example.city.kobe.jp', // !city.kobe.jp
];


var URLS = [
// IDN labels
'example.北海道.jp', // 北海道.jp
'example.和歌山.jp', // 和歌山.jp
Expand All @@ -44,54 +47,62 @@ var DOMAINS = [
'FOO.bar.BAZ.ortsinfo.AT', // null

// Full URLs
// '2001:0DB8:0100:F101:0210:A4FF:FEE3:9566',
// 'http://user:pass@www.examplegoogle.com:21/blah#baz',
// 'http://iris.test.ing/&#x1E0B;&#x0323;/?&#x1E0B;&#x0323;#&#x1E0B;&#x0323;',
// 'http://0000000000000300.0xffffffffFFFFFFFF.3022415481470977',
'2001:0DB8:0100:F101:0210:A4FF:FEE3:9566',
'http://user:pass@www.examplegoogle.com:21/blah#baz',
'http://iris.test.ing/&#x1E0B;&#x0323;/?&#x1E0B;&#x0323;#&#x1E0B;&#x0323;',
'http://0000000000000300.0xffffffffFFFFFFFF.3022415481470977',
'http://192.168.0.1/',
'http://%30%78%63%30%2e%30%32%35%30.01%2e',
'http://user:pass@[::1]/segment/index.html?query#frag',
'https://[::1]',
];


// TODO - Compare to other libraries
function main() {
function bench(values) {
console.log(
'While interpreting the results, keep in mind that each "op" reported' +
' by the benchmark is processing ' + DOMAINS.length + ' domains'
' by the benchmark is processing ' + values.length + ' domains'
);

new Benchmark.Suite()
.add('tldjs#isIp', () => {
for (var i = 0; i < values.length; i += 1) {
tld.isIp(values[i]);
}
})
.add('tldjs#isValid', () => {
for (var i = 0; i < DOMAINS.length; i += 1) {
tld.isValid(DOMAINS[i]);
for (var i = 0; i < values.length; i += 1) {
tld.isValid(values[i]);
}
})
.add('tldjs#extractHostname', () => {
for (var i = 0; i < DOMAINS.length; i += 1) {
tld.extractHostname(DOMAINS[i]);
for (var i = 0; i < values.length; i += 1) {
tld.extractHostname(values[i]);
}
})
.add('tldjs#tldExists', () => {
for (var i = 0; i < DOMAINS.length; i += 1) {
tld.tldExists(DOMAINS[i]);
for (var i = 0; i < values.length; i += 1) {
tld.tldExists(values[i]);
}
})
.add('tldjs#getPublicSuffix', () => {
for (var i = 0; i < DOMAINS.length; i += 1) {
tld.getPublicSuffix(DOMAINS[i]);
for (var i = 0; i < values.length; i += 1) {
tld.getPublicSuffix(values[i]);
}
})
.add('tldjs#getDomain', () => {
for (var i = 0; i < DOMAINS.length; i += 1) {
tld.getDomain(DOMAINS[i]);
for (var i = 0; i < values.length; i += 1) {
tld.getDomain(values[i]);
}
})
.add('tldjs#getSubdomain', () => {
for (var i = 0; i < DOMAINS.length; i += 1) {
tld.getSubdomain(DOMAINS[i]);
for (var i = 0; i < values.length; i += 1) {
tld.getSubdomain(values[i]);
}
})
.add('tldjs#parse', () => {
for (var i = 0; i < DOMAINS.length; i += 1) {
tld.parse(DOMAINS[i]);
for (var i = 0; i < values.length; i += 1) {
tld.parse(values[i]);
}
})
.on('cycle', function (event) {
Expand All @@ -101,4 +112,19 @@ function main() {
}


// TODO - Compare to other libraries
function main() {
console.log('>>> -------------------- <<<');
console.log('>>> Only valid hostnames <<<');
console.log('>>> -------------------- <<<');
bench(HOSTNAMES);

console.log();
console.log('>>> ----------- <<<');
console.log('>>> Random URLs <<<');
console.log('>>> ----------- <<<');
bench(URLS);
}


main();
18 changes: 17 additions & 1 deletion index.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
'use strict';


// Load rules
var Trie = require('./lib/suffix-trie.js');
var allRules = Trie.fromJson(require('./rules.json'));
Expand All @@ -10,6 +11,7 @@ var getDomain = require('./lib/domain.js');
var getPublicSuffix = require('./lib/public-suffix.js');
var getSubdomain = require('./lib/subdomain.js');
var isValid = require('./lib/is-valid.js');
var isIp = require('./lib/is-ip.js');
var tldExists = require('./lib/tld-exists.js');


Expand Down Expand Up @@ -50,12 +52,26 @@ function factory(options) {
var result = {
hostname: _extractHostname(url),
isValid: null,
tldExists: null,
isIp: null,
tldExists: false,
publicSuffix: null,
domain: null,
subdomain: null,
};

if (result.hostname === null) {
result.isIp = false;
result.isValid = false;
return result;
}

// Check if `hostname` is a valid ip address
result.isIp = isIp(result.hostname);
if (result.isIp) {
result.isValid = true;
return result;
}

// Check if `hostname` is valid
result.isValid = isValid(result.hostname);
if (result.isValid === false) return result;
Expand Down
Loading