Skip to content

Incorrect encoding handling in Content-Disposition header #58240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
horsley opened this issue May 9, 2025 · 1 comment
Open

Incorrect encoding handling in Content-Disposition header #58240

horsley opened this issue May 9, 2025 · 1 comment

Comments

@horsley
Copy link

horsley commented May 9, 2025

Version

v22.15.0

Platform

Linux horsleyli-1q8kg06lua 5.4.119-19.0009.44 #1 SMP Tue May 7 20:08:55 CST 2024 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

http

What steps will reproduce the bug?

const net = require('net');
const http = require('http');

const contentLengthComesFirst = process.argv.includes('--content-length-comes-first');

const backend = net.createServer((socket) => {
  socket.once('data', (data) => {
    // Build raw HTTP response with binary-safe headers
    const filename = '漏洞.txt';
    const response =
      'HTTP/1.1 200 OK\r\n' +
      (contentLengthComesFirst ? `Content-Length: 12\r\n`:'') +
      // Raw UTF-8 bytes for Chinese filename (without URL encoding)
      `Content-Disposition: attachment; filename="${Buffer.from(filename).toString('binary')}"; filename*=UTF-8''${encodeURIComponent(filename)}\r\n` +
      `Content-Type: application/octet-stream\r\n` +
      (!contentLengthComesFirst ? `Content-Length: 12\r\n`:'') +
      'Connection: close\r\n\r\n' +
      'file content';

    const responseBuffer = Buffer.from(response, 'binary');
    socket.end(responseBuffer);
  });
});

backend.listen(() => {
  const proxy = http.createServer((req, res) => {
    const options = {
      hostname: 'localhost',
      port: backend.address().port,
      method: req.method,
      headers: req.headers,
      path: '/backend'
    };
    const proxyReq = http.request(options, (proxyRes) => {
      res.statusCode = proxyRes.statusCode;
      res.statusMessage = proxyRes.statusMessage;
      for (const header in proxyRes.headers) {
        res.setHeader(header, proxyRes.headers[header]);
      }
      
      // Handle the 'data' event to ensure the response is sent correctly
      proxyRes.on('data', (chunk) => {
        res.write(chunk);
      });
      // Handle the 'end' event to finish the response
      proxyRes.on('end', () => {
        res.end();
      });
    });
    
    req.pipe(proxyReq);
  }).listen(() => {
    const client = net.connect(proxy.address().port, () => {
      client.write(`GET /proxy HTTP/1.1\r\nHost: localhost:${backend.address().port}\r\n\r\n`);
    });

    let responseData = Buffer.alloc(0);
    client.on('data', (chunk) => {
      responseData = Buffer.concat([responseData, chunk]);
    });
    client.on('end', () => {
      const startFlag = Buffer.from('filename="');
      const endFlag = Buffer.from('"');
      const startIndex = responseData.indexOf(startFlag) + startFlag.length;
      const endIndex = responseData.indexOf(endFlag, startIndex);
      const filenameBuffer = responseData.slice(startIndex, endIndex);
      console.log('filename Buffer:', filenameBuffer.toString('hex'));
      console.log('filename utf8:', filenameBuffer.toString('utf8'));

      proxy.close(() => backend.close());
    });
  });
});

Image

When the Content-Length header appears after Content-Disposition header in an HTTP response, Node.js' encoder behaves correctly. However, when Content-Length is placed BEFORE Content-Disposition, the encoder exhibits abnormal behavior which corruption utf8 char in header processing.

How often does it reproduce? Is there a required condition?

always

What is the expected behavior? Why is that the expected behavior?

Expected Behavior:
Regardless of header order (Content-Length before or after Content-Disposition), the HTTP parser should preserve raw byte values in filename parameter

Technical Justification:

  1. RFC 7230 Section 3.2.2:

    • "The order in which header fields with differing field names are received is not significant"
    • Content-Length position should not affect header parsing
  2. RFC 6266 Section 4.1:

    • "If both filename and filename* are present, filename* should be used"
    • UTF-8 encoding must be properly handled
  3. Binary Safety:

    • filename parameter's raw bytes (e6 bc 8f e6 b4 a9) should remain intact
    • No double-encoding (Latin-1 → UTF-8 conversion) should occur
  4. Test Case Consistency:
    Both scenarios should produce identical outputs:

    • Hex dump of original UTF-8 bytes
    • Proper Chinese character decoding

What do you see instead?

Utf8 character e6bc8fe6b49e (漏洞) was encoded to 0f1e when Content-Length is placed BEFORE Content-Disposition

Additional information

node/lib/_http_outgoing.js

Lines 598 to 607 in 3f5899f

if (isContentDispositionField(key) && self._contentLength) {
// The value could be an array here
if (ArrayIsArray(value)) {
for (let i = 0; i < value.length; i++) {
value[i] = Buffer.from(value[i], 'latin1');
}
} else {
value = Buffer.from(value, 'latin1');
}
}

This code snippet in Node.js demonstrates a special handling for the Content-Disposition header's encoding when the Content-Length header is present. The logic converts the Content-Disposition header value into a Latin1-encoded Buffer ​​only if Content-Length has already been processed​​ and exists in the headers (self._contentLength is truthy).

Since headers are processed sequentially in caller, the outcome depends on the ​​order of header definitions​​:

  • If Content-Length is set ​​before​​ Content-Disposition, the condition self._contentLength is met, and the encoding logic applies.
  • If Content-Length is set ​​after​​ Content-Disposition, self._contentLength is undefined during the Content-Disposition processing, so the encoding step is skipped
@mag123c
Copy link

mag123c commented May 10, 2025

Through tests similar to the provided reproduction steps, I've confirmed that when a server responds with a header like Content-Disposition: attachment; filename="한글파일.txt" (where "한글파일.txt" is comprised of UTF-8 bytes), the filename appears corrupted (likely misinterpreted as latin1) in res.headers['content-disposition'] when received by the Node.js HTTP client (http.request).

I've considered a couple of approaches to address this and would appreciate the community's guidance on which area might be most appropriate for a fix:

  1. Node.js C++ Bindings (handling llhttp callbacks):
  • The idea is to modify the C++ code that handles the on_header_value callback from llhttp. When converting the header value byte sequence to a JavaScript string, if the current header is Content-Disposition (and notably, if a filename* parameter is not present), we could attempt to interpret the bytes as UTF-8 first. If that fails (e.g., invalid UTF-8 sequence), it would fall back to the current default behavior (presumably latin1).
  1. Node.js JavaScript HTTP Library (e.g., in lib/_http_incoming.js):
  • This approach involves post-processing the string giá trị (which might have already been converted to a string, potentially as latin1, at the C++ level) within the JavaScript layer. When the res.headers object is being populated, we could apply a heuristic to the content-disposition header value. If the filename parameter exhibits a pattern indicative of UTF-8 misinterpreted as latin1, it could be corrected back to UTF-8.

It seems llhttp itself, being a low-level parser, might not be the direct place for this semantic fix. I'm wondering which of the two approaches above (or perhaps an alternative) would be more aligned with Node.js's design philosophy, performance considerations, and maintainability.

If there are existing efforts or preferred strategies for handling such encoding ambiguities, I would be grateful for any pointers.
Based on the feedback, I'm prepared to work on the fix, including tests, and submit a PR.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants