moved: Body parsing bug due to special characters/encoding? #7

tj opened this Issue Apr 29, 2011 · 2 comments

2 participants

tj commented Apr 29, 2011

I was working on a bookmarklet that, among other things, form-posts the title of whatever page you're on to my server running Express, and I'm seeing Connect's body parser choke on some pages from Amazon.

Here's a super simple test case:

Run that website locally, drag the bookmarklet to your toolbar, and click it on any of the provided Amazon links. You should see an error message like this one:

URIError: URI malformed
    at decodeURIComponent (native)
    at /usr/local/lib/node/.npm/qs/0.1.0/package/lib/querystring.js:28:18
    at Array.reduce (native)
    at /usr/local/lib/node/.npm/qs/0.1.0/package/lib/querystring.js:27:6
    at IncomingMessage.<anonymous> (/usr/local/lib/node/.npm/connect/1.3.0/package/lib/middleware/bodyParser.js:74:15)
    at IncomingMessage.emit (events.js:61:17)
    at HTTPParser.onMessageComplete (http.js:132:23)
    at Socket.ondata (http.js:1007:22)
    at Socket._onReadable (net.js:677:27)
    at IOWatcher.onReadable [as callback] (net.js:177:10)

This happens on Amazon pages where the title has special characters, like é or ü. You can change the title of an Amazon page (e.g. by setting document.title in the console) to just é, for example, and it will cause the bug.

I've done some investigating and can give you some more info, but at a high level, it seems that the browser in this case encodes the form differently than encodeURIComponent() does, which causes decodeURIComponent() — used by Connect's body parser — to choke.

For example, calling encodeURIComponent() on that é yields %C3%A9 everywhere, but what the server receives in the form body from these Amazon pages is %E9. Attempting to decodeURIComponent() on %E9 causes this error.

I tried making a sample page for this, but the form post matched encodeURIComponent(). I'm guessing the behavior on Amazon is related to encoding, but I haven't been able to confirm, maybe because Express sends a Content-Type header that specifies utf-8.

All said, it seems that Connect's body parser shouldn't break on these encodings. Hope this info helps. Thanks!

tj commented Apr 29, 2011

^ moved from senchalabs/connect


I also had a similar case. When POST with Shift_JIS, decodeURIComponent cannot decode.

Because decodeURIComponent use only UTF-8. Other charset should use an appropriate function.

For example, This is Shift_JIS decoder library.

How about such a code?


var express = require('express');
express.bodyParser.qs.decoder = UnescapeSJIS;

But, ISO-8859-1 decoder was not able to be found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment