Skip to content

Incorrect encoding handling in Content-Disposition header #58240

Open
@horsley

Description

@horsley

Version

v22.15.0

Platform

Linux horsleyli-1q8kg06lua 5.4.119-19.0009.44 #1 SMP Tue May 7 20:08:55 CST 2024 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

http

What steps will reproduce the bug?

const net = require('net');
const http = require('http');

const contentLengthComesFirst = process.argv.includes('--content-length-comes-first');

const backend = net.createServer((socket) => {
  socket.once('data', (data) => {
    // Build raw HTTP response with binary-safe headers
    const filename = '漏洞.txt';
    const response =
      'HTTP/1.1 200 OK\r\n' +
      (contentLengthComesFirst ? `Content-Length: 12\r\n`:'') +
      // Raw UTF-8 bytes for Chinese filename (without URL encoding)
      `Content-Disposition: attachment; filename="${Buffer.from(filename).toString('binary')}"; filename*=UTF-8''${encodeURIComponent(filename)}\r\n` +
      `Content-Type: application/octet-stream\r\n` +
      (!contentLengthComesFirst ? `Content-Length: 12\r\n`:'') +
      'Connection: close\r\n\r\n' +
      'file content';

    const responseBuffer = Buffer.from(response, 'binary');
    socket.end(responseBuffer);
  });
});

backend.listen(() => {
  const proxy = http.createServer((req, res) => {
    const options = {
      hostname: 'localhost',
      port: backend.address().port,
      method: req.method,
      headers: req.headers,
      path: '/backend'
    };
    const proxyReq = http.request(options, (proxyRes) => {
      res.statusCode = proxyRes.statusCode;
      res.statusMessage = proxyRes.statusMessage;
      for (const header in proxyRes.headers) {
        res.setHeader(header, proxyRes.headers[header]);
      }
      
      // Handle the 'data' event to ensure the response is sent correctly
      proxyRes.on('data', (chunk) => {
        res.write(chunk);
      });
      // Handle the 'end' event to finish the response
      proxyRes.on('end', () => {
        res.end();
      });
    });
    
    req.pipe(proxyReq);
  }).listen(() => {
    const client = net.connect(proxy.address().port, () => {
      client.write(`GET /proxy HTTP/1.1\r\nHost: localhost:${backend.address().port}\r\n\r\n`);
    });

    let responseData = Buffer.alloc(0);
    client.on('data', (chunk) => {
      responseData = Buffer.concat([responseData, chunk]);
    });
    client.on('end', () => {
      const startFlag = Buffer.from('filename="');
      const endFlag = Buffer.from('"');
      const startIndex = responseData.indexOf(startFlag) + startFlag.length;
      const endIndex = responseData.indexOf(endFlag, startIndex);
      const filenameBuffer = responseData.slice(startIndex, endIndex);
      console.log('filename Buffer:', filenameBuffer.toString('hex'));
      console.log('filename utf8:', filenameBuffer.toString('utf8'));

      proxy.close(() => backend.close());
    });
  });
});

Image

When the Content-Length header appears after Content-Disposition header in an HTTP response, Node.js' encoder behaves correctly. However, when Content-Length is placed BEFORE Content-Disposition, the encoder exhibits abnormal behavior which corruption utf8 char in header processing.

How often does it reproduce? Is there a required condition?

always

What is the expected behavior? Why is that the expected behavior?

Expected Behavior:
Regardless of header order (Content-Length before or after Content-Disposition), the HTTP parser should preserve raw byte values in filename parameter

Technical Justification:

  1. RFC 7230 Section 3.2.2:

    • "The order in which header fields with differing field names are received is not significant"
    • Content-Length position should not affect header parsing
  2. RFC 6266 Section 4.1:

    • "If both filename and filename* are present, filename* should be used"
    • UTF-8 encoding must be properly handled
  3. Binary Safety:

    • filename parameter's raw bytes (e6 bc 8f e6 b4 a9) should remain intact
    • No double-encoding (Latin-1 → UTF-8 conversion) should occur
  4. Test Case Consistency:
    Both scenarios should produce identical outputs:

    • Hex dump of original UTF-8 bytes
    • Proper Chinese character decoding

What do you see instead?

Utf8 character e6bc8fe6b49e (漏洞) was encoded to 0f1e when Content-Length is placed BEFORE Content-Disposition

Additional information

node/lib/_http_outgoing.js

Lines 598 to 607 in 3f5899f

if (isContentDispositionField(key) && self._contentLength) {
// The value could be an array here
if (ArrayIsArray(value)) {
for (let i = 0; i < value.length; i++) {
value[i] = Buffer.from(value[i], 'latin1');
}
} else {
value = Buffer.from(value, 'latin1');
}
}

This code snippet in Node.js demonstrates a special handling for the Content-Disposition header's encoding when the Content-Length header is present. The logic converts the Content-Disposition header value into a Latin1-encoded Buffer ​​only if Content-Length has already been processed​​ and exists in the headers (self._contentLength is truthy).

Since headers are processed sequentially in caller, the outcome depends on the ​​order of header definitions​​:

  • If Content-Length is set ​​before​​ Content-Disposition, the condition self._contentLength is met, and the encoding logic applies.
  • If Content-Length is set ​​after​​ Content-Disposition, self._contentLength is undefined during the Content-Disposition processing, so the encoding step is skipped

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions