Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf8 redirect breaks stuff #376

Open
rlidwka opened this issue Sep 13, 2021 · 6 comments
Open

utf8 redirect breaks stuff #376

rlidwka opened this issue Sep 13, 2021 · 6 comments

Comments

@rlidwka
Copy link

rlidwka commented Sep 13, 2021

Part 1: error

There is a server on the internet that shows following headers:

$ curl -I http://59.gibdd.ru
HTTP/1.1 301 Moved Permanently
Date: Mon, 13 Sep 2021 15:51:29 GMT
Server: Apache
Location: https://гибдд.рф/r/59/
Vary: Accept-Encoding
Content-Type: text/html; charset=iso-8859-1

I'm trying to follow that redirect, and it fails:

require('needle')('get','http://59.gibdd.ru/',{follow:10}).
  then(()=>console.log('ok'),()=>console.log('error'))

It produces this:

> Uncaught TypeError [ERR_INVALID_URL]: Invalid URL: https://гибдд.ÑÑ
                                                                          /r/59/
    at onParseError (internal/url.js:259:9)
    at new URL (internal/url.js:335:5)
    at resolve_url (/tmp/node_modules/needle/lib/needle.js:165:12)
    at ClientRequest.<anonymous> (/tmp/node_modules/needle/lib/needle.js:582:28)
    at Object.onceWrapper (events.js:422:26)
    at ClientRequest.emit (events.js:315:20)
    at ClientRequest.EventEmitter.emit (domain.js:529:15)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:641:27)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:126:17)
    at Socket.socketOnData (_http_client.js:509:22) {
  input: 'https://гибдд.Ñ\x80Ñ\x84/r/59/',
  code: 'ERR_INVALID_URL'
}

This error is not intercepted by needle and is thrown into the void (or rather node.js own unhandledException handler).

Promise is never resolved or rejected, and user has no way of intercepting it.

Part 2: header encoding

Sometimes someone upstream makes a decision that every single downstream developer has to fix on their own. Sad fact of life. So...

Turns out, node.js itself returns headers in binary encoding, as explained here: nodejs/node#7928

> require('needle')('get','http://59.gibdd.ru/').then(res=>{console.log(res.headers)})
Promise { <pending> }
> {
  date: 'Mon, 13 Sep 2021 15:55:57 GMT',
  server: 'Apache',
  location: 'https://гибдд.Ñ\x80Ñ\x84/r/59/',
  vary: 'Accept-Encoding',
  'content-length': '298',
  connection: 'close',
  'content-type': 'text/html; charset=iso-8859-1'
}

But if you try to parse that URL, you'll get either incorrect result (with url.parse) or an exception (with new URL).

Got resolved this by using Buffer.from(res.headers.location, 'binary').toString() instead of res.headers.location as shown here: sindresorhus/got#214

Needle fails here:

var redirect_url = resolve_url(headers.location, uri);

So instead of:

var redirect_url = resolve_url(headers.location, uri);

It should be:

var location = Buffer.from(headers.location, 'binary').toString();
var redirect_url = resolve_url(location, uri);

User should probably get fixed location header as well (shouldn't break api because it was never working in the first place):

out.emit('redirect', headers.location);

Part 3: intercepting an error

Even if we fix encoding on a legitimate redirect, someone can intentionally or accidentally throw in something that breaks new URL syntax.

> new url.URL('http://\xB8', 'http://google.com');
Uncaught TypeError [ERR_INVALID_URL]: Invalid URL: http://¸

So I believe the call to new URL should be wrapped in try..catch and error thrown whenever request errors would normally go.

@sessionsKrammer
Copy link

test

@puzrin
Copy link

puzrin commented Sep 18, 2021

@tomas , could you take a look? Unhandled error is a serious problem. If you have no time - we can do PR. We need new release with fix. Error handling - mandatory, redirect encoding - if possible.

@puzrin
Copy link

puzrin commented Sep 24, 2021

@tomas can we help anyhow? Fix of this issue is important for us.

@yetzt
Copy link
Contributor

yetzt commented Oct 19, 2021

http headers are expected to be ISO-8859-1. the server sends the location header in utf-8, the http module interprets it as latin1 and we end up with mojibake. most browsers these days gracefully accept headers in utf-8 and don't break. i think it's worth doing the same.

@puzrin
Copy link

puzrin commented Oct 19, 2021

@yetzt note, bad headers is not a big deal. Problem is in unhandled exception and loss of promise end. First post contains all details how to fix with minimal efforts.

@yetzt
Copy link
Contributor

yetzt commented Oct 19, 2021

if so, open a pull request. @puzrin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants