Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KBucket.distance #29

Closed
fanatid opened this issue May 15, 2016 · 11 comments
Closed

KBucket.distance #29

fanatid opened this issue May 15, 2016 · 11 comments

Comments

@fanatid
Copy link
Contributor

fanatid commented May 15, 2016

Should not be here index.js#L98 value from largest buffer, not 255?

@fanatid
Copy link
Contributor Author

fanatid commented May 15, 2016

JFYI, buffer-xor uses value, not 255

@tristanls
Copy link
Owner

I don't understand. Could you provide a code example of what outcome you expect and what you're getting instead?

@fanatid
Copy link
Contributor Author

fanatid commented May 16, 2016

id1 = [ 0x04 ]
id2 = [ 0x44, 0x04 ]
output: 16639
expected: 16388

@tristanls
Copy link
Owner

Ah, I understand now.

The short answer is that it is a convention I picked 😃 .

The long answer is that when I was exploring the references (ones from README), they all made an assumption that the node ids would be of equal length. At the time I was creating k-bucket, I had a need to compare node ids with variable lengths. The choice I made was to treat difference in length as maximum difference. Another way to illustrate that choice is through the following truth table (where x stands for "does not exist"):

bit1 bit2 xor
0 0 0
0 1 1
0 x 1
1 0 1
1 1 0
1 x 1

Since we are doing ops at level of bytes, then that would result in using 255 as the distance between a byte that exists and a byte that doesn't exist.

This test case informally documents this convention:

test['distance between 00000000 and 0000000000000000 is 0000000011111111'] = function (test) {
  test.expect(1)
  test.equal(KBucket.distance(new Buffer([ 0x00 ]), new Buffer([ 0x00, 0x00 ])), 255)
  test.done()
}

In addition what I mentioned above, the other distance choice I made is that the shorter buffer is assumed to be the most significant bits (and the filling happens in the least significant part). This was done to align with use cases where node id's are named using namespaces like:

.com
.com.mystuff
.com.mystuff.detailed

This way, it "makes sense" to compare the distance between .com.mystuff and .com.mystuff.detailed because in that case the distance calculation will say that .com.mystuff portions match, and assigns maximum difference to the .detailed portion of the name. Saying this differently, .com.mystuff is closer to .com.mystuff.detailed than .com is to .com.mystuff.detailed.

> KBucket.distance(new Buffer(".com.mystuff"), new Buffer(".com.mystuff.detailed"))
4.722366482869645e+21
> KBucket.distance(new Buffer(".com"), new Buffer(".com.mystuff.detailed"))
8.711228593176025e+40

@fanatid
Copy link
Contributor Author

fanatid commented May 16, 2016

Sorry, can't understand your truth table, how bit can not exists in a middle?

I understand that usually length of ids is equal, so this question almost doesn't make sense, but I still think that we should take value from longest id, not 255. For example, if we have 3 ids:
a = '.com'
b = '.com.em'
c = '.com.me'

xor(a, b) should not be equal to xor(a, c), right? but with current implementation we receive that distance is equal..

your example with modified distance:

var xor = require('buffer-xor')

function distance (firstId, secondId) {
  var buffer = xor(firstId, secondId)
  return buffer.reduce(function (d, v) { return d * 256 + v }, 0)
}

distance(new Buffer(".com.mystuff"), new Buffer(".com.mystuff.detailed"))
// => 855784543728809300000
distance(new Buffer(".com"), new Buffer(".com.mystuff.detailed"))
// => 1.5798505339527512e+40

@tristanls
Copy link
Owner

With the example you provided:

a = '.com'
b = '.com.em'
c = '.com.me'

I think distance(a,b) should equal distance(a,c).

Consider the following example:

a = '.com'
b = '.com.em'
c = '.com.emem'

In this case, .com is just as far from .com.em as it is from .com.emem, which is as intended for the use case where node ids are hierarchical namespaces.

@fanatid
Copy link
Contributor Author

fanatid commented May 16, 2016

I think distance(a,b) should equal distance(a,c).

Why? b not equal c!

In this case, .com is just as far from .com.em as it is from .com.emem, which is as intended for the use case where node ids are hierarchical namespaces.

no, distance is different:

var xor = require('buffer-xor')

function distance (firstId, secondId) {
  var buffer = xor(firstId, secondId)
  return buffer.reduce(function (d, v) { return d * 256 + v }, 0)
}

distance(new Buffer(".com"), new Buffer(".com.em"))
// => 3040621
distance(new Buffer(".com"), new Buffer(".com.emem"))
// => 199270163821

@tristanls
Copy link
Owner

no, distance is different

You're right.

Looks like hierarchical namespacing use case works only if each namespace in the hierarchy is of equal length.

Why? b not equal c!

The answer to the why is the truth table. I defined XOR when bit1 exists and bit2 is undefined as 1. Whereas the buffer-xor implementation defined XOR when bit1 exists and bit2 is undefined as bit1 (I think).

The fact that b is not equal to c does not require the distance from a to each of them to be different. Example:

b----a----c

I don't think the definition of KBucket.distance is incorrect. I think it's different from how it is defined in buffer-xor. The cause of the difference is how XOR is defined for buffers of different lengths.

@fanatid
Copy link
Contributor Author

fanatid commented May 16, 2016

The answer to the why is the truth table. I defined XOR when bit1 exists and bit2 is undefined as 1. Whereas the buffer-xor implementation defined XOR when bit1 exists and bit2 is undefined as bit1 (I think).

Yes, you're correct.

I don't think the definition of KBucket.distance is incorrect. I think it's different from how it is defined in buffer-xor. The cause of the difference is how XOR is defined for buffers of different lengths.

Agreed. Then maybe you consider using buffer-xor for code lines optimization? With buffer-xor, KBucket.dinstance is one line function:

KBucket.distance = function (firstId, secondId) {
  return xor(firstId, secondId).reduce(function (d, v) { return d * 256 + v }, 0)
}

@tristanls
Copy link
Owner

I benchmarked:

'use strict'
var KBucket = require('../index')
var xor = require('buffer-xor');

var _0000000100100100 = new Buffer('0124', 'hex')
var _0100000000100100 = new Buffer('4024', 'hex')

var kBucketDistance = KBucket.distance;
var bufferXorDistance = function (firstId, secondId) {
  return xor(firstId, secondId).reduce(function (d, v) { return d * 256 + v }, 0)
};

var hrtime = process.hrtime()
for (var i = 0; i < 1e7; i++) {
  kBucketDistance(_0000000100100100, _0100000000100100)
}
var kBucketDistanceDiff = process.hrtime(hrtime)
// console.log(diff[0] * 1e9 + diff[1])

hrtime = process.hrtime()
for (var i = 0; i < 1e7; i++) {
  kBucketDistance(_0000000100100100, _0100000000100100)
}
kBucketDistanceDiff = process.hrtime(hrtime)


hrtime = process.hrtime()
for (var i = 0; i < 1e7; i++) {
  bufferXorDistance(_0000000100100100, _0100000000100100)
}
var bufferXorDistanceDiff = process.hrtime(hrtime)

hrtime = process.hrtime()
for (var i = 0; i < 1e7; i++) {
  bufferXorDistance(_0000000100100100, _0100000000100100)
}
bufferXorDistanceDiff = process.hrtime(hrtime)

console.log("KBucket.distance   : ", kBucketDistanceDiff[0] * 1e9 + kBucketDistanceDiff[1])
console.log("buffer-xor distance: ", bufferXorDistanceDiff[0] * 1e9 + bufferXorDistanceDiff[1])

Results on my machine:

$ npm run benchmark:distance

> k-bucket@3.0.1 benchmark:distance /Volumes/github/tristanls/k-bucket
> node benchmarks/distance.js

KBucket.distance   :  108860569
buffer-xor distance:  6535118886
$ npm run benchmark:distance

> k-bucket@3.0.1 benchmark:distance /Volumes/github/tristanls/k-bucket
> node benchmarks/distance.js

KBucket.distance   :  106777094
buffer-xor distance:  6579524318

@fanatid
Copy link
Contributor Author

fanatid commented May 16, 2016

it was expected, buffer-xor create new buffer every time

@fanatid fanatid closed this as completed May 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants