Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Buffers #45

Closed
seishun opened this issue Nov 15, 2012 · 13 comments
Closed

Support for Buffers #45

seishun opened this issue Nov 15, 2012 · 13 comments

Comments

@seishun
Copy link

seishun commented Nov 15, 2012

Sometimes you need to send urlencoded binary data. However, the following:
qs.stringify({a: new Buffer([0x00, 0x30, 0x7f, 0x80])})
does not produce meaningful output. One would expect it to return something like:
'a=%000%7F%80'

It should be fairly easy to implement. I would submit a pull request, but there are multiple ways to urlencode a Buffer. The simplest one is probably the following:
escape(buf.toString('binary'))

Unfortunately, the 'binary' encoding is deprecated and is planned to be removed according to the docs, so you might not want to rely on it. The other two ways I know are a bit ugly:
Array.prototype.map.call(buf, function(byte) { return '%' + byte.toString(16); }).join('')
and
buf.toString('hex').match(/../g).map(function(byte) { return '%' + byte; }).join('')

Both produce the same output as the 'escape' method, except they percent-encode every byte, not only those with the value higher than 127.

What do you think? Perhaps someone knows a better method?

@buschtoens
Copy link
Collaborator

We'd probably have to do performance tests on those two methods, though I'd predict, that the first one should be a lot faster than the second one. It also would allow for a simple way to check for >127 values.

Array.prototype.map.call(buf, function(byte) {
  return byte > 127 ? '%' + byte.toString(16) : String.fromCharCode(byte);
}).join('');

However, running this code shows, that we'd have to do a more sophisticated check.

var arr = [];
for(var i = 0; i < 128; i++) arr.push(String.fromCharCode(i));
console.log(arr);

We should only allow bytes from these ranges to be not encoded with %:

  • 48-57: Integers
  • 65-90: Uppercase letters
  • 97-122: Lowercase letters

This leaves us with this snippet:

function padLeft(str, pad) {
  return pad.substring(0, pad.length - str.length) + str;
}

function encodeByte(byte) {
  return (byte >= 49 && byte <= 57)
      || (byte >= 65 && byte <= 90)
      || (byte >= 97 && byte <= 122)
      ? String.fromCharCode(byte) : '%' + padLeft(byte.toString(16), "00");
}

function encodeBuffer(buf) {
  return Array.prototype.map.call(buf, encodeByte).join('');
}

This will also be one way only, meaning that we'll only stringify Buffers. Trying to detect binary querystrings can go horribly wrong and would make the API ambiguous.

@tj
Copy link
Owner

tj commented Sep 9, 2013

personally I would just .toString() it, I don't think shipping binary in a querystring is all that useful

@buschtoens
Copy link
Collaborator

Seems like it isn't this simple...

encodeURIComponent(Buffer([0xEF]));
// => '%EF%BF%BD'

@seishun
Copy link
Author

seishun commented Sep 9, 2013

@silvinci A Buffer can't have >127 values. At this point you might as well call String.fromCharCode on each byte, then call escape on the resulting string:

escape(String.fromCharCode.apply(null, buf))

@buschtoens
Copy link
Collaborator

@seishun wat. new Buffer([0xFF]) is equal to new Buffer([255]). A Buffer is a collection of octets (bytes). One byte goes from 0 to 255.

We wanna make sure, that binary data is encoded, when neccessary (%EF). This way normal alphanum chars won't take up to much space.

Anyways: Why would you want to send binary data? I can't come up with a real use case for this, however Buffer is a standard type in Node, so we should support it.

@seishun
Copy link
Author

seishun commented Sep 9, 2013

Okay, had a brain fart. My snippet works anyway though.

Some APIs expect binary data, for example, logging in with Steam. See https://github.com/seishun/node-steam/blob/master/lib/handlers/user.js#L84.

@buschtoens
Copy link
Collaborator

On a second thought, we should leave the optional encoding out, so a Buffer will always be completely encoded with %. This would make it look more consistent and should be easier to parse for the receiver.

Your Array.prototype.map snippet misses propper padding. Otherwise it's fine.

However, as buf.toString('hex') does a native C binding call, this should be the fastest method of retrieving the encoded Buffer. The remaining question is how to insert the %. I'll do a benchmark on different approaches.

@buschtoens
Copy link
Collaborator

Okay, we have a winner. 1,000,000 rounds took me under 20 s.

var buf = new Buffer(256);
for(var b = 0; b < buf.length; b++) buf[b] = b;

console.log(buf.toString('hex').replace(/../g, '%$&').toUpperCase()); // %00%01 ... %FE%FF

Array.prototype.map took 65 s.

@tj
Copy link
Owner

tj commented Sep 9, 2013

@seishun I would base64 or hex-encode them personally, closing for now I don't think this is necessary

@tj tj closed this as completed Sep 9, 2013
@seishun
Copy link
Author

seishun commented Sep 9, 2013

@visionmedia

I would base64 or hex-encode them personally

Are you suggesting me to change the API? I have no control over that. Otherwise your module would be unnecessary because everyone would use JSON.

@buschtoens
Copy link
Collaborator

You can always use your own fork or manually encode the Buffer. The code's free to use. :)

@tj
Copy link
Owner

tj commented Sep 9, 2013

@seishun true, fair enough, just seems like a pretty extreme edge-case to store binary in a querystring

@seishun
Copy link
Author

seishun commented Sep 23, 2013

In case someone finds this issue and needs to implement it:

Do not use escape. It doesn't escape '+', which stands for a space character in x-www-form-urlencoded. Instead, use @silvinci's method above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants