Description
Version
v22.15.0
Platform
Linux horsleyli-1q8kg06lua 5.4.119-19.0009.44 #1 SMP Tue May 7 20:08:55 CST 2024 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
http
What steps will reproduce the bug?
const net = require('net');
const http = require('http');
const contentLengthComesFirst = process.argv.includes('--content-length-comes-first');
const backend = net.createServer((socket) => {
socket.once('data', (data) => {
// Build raw HTTP response with binary-safe headers
const filename = '漏洞.txt';
const response =
'HTTP/1.1 200 OK\r\n' +
(contentLengthComesFirst ? `Content-Length: 12\r\n`:'') +
// Raw UTF-8 bytes for Chinese filename (without URL encoding)
`Content-Disposition: attachment; filename="${Buffer.from(filename).toString('binary')}"; filename*=UTF-8''${encodeURIComponent(filename)}\r\n` +
`Content-Type: application/octet-stream\r\n` +
(!contentLengthComesFirst ? `Content-Length: 12\r\n`:'') +
'Connection: close\r\n\r\n' +
'file content';
const responseBuffer = Buffer.from(response, 'binary');
socket.end(responseBuffer);
});
});
backend.listen(() => {
const proxy = http.createServer((req, res) => {
const options = {
hostname: 'localhost',
port: backend.address().port,
method: req.method,
headers: req.headers,
path: '/backend'
};
const proxyReq = http.request(options, (proxyRes) => {
res.statusCode = proxyRes.statusCode;
res.statusMessage = proxyRes.statusMessage;
for (const header in proxyRes.headers) {
res.setHeader(header, proxyRes.headers[header]);
}
// Handle the 'data' event to ensure the response is sent correctly
proxyRes.on('data', (chunk) => {
res.write(chunk);
});
// Handle the 'end' event to finish the response
proxyRes.on('end', () => {
res.end();
});
});
req.pipe(proxyReq);
}).listen(() => {
const client = net.connect(proxy.address().port, () => {
client.write(`GET /proxy HTTP/1.1\r\nHost: localhost:${backend.address().port}\r\n\r\n`);
});
let responseData = Buffer.alloc(0);
client.on('data', (chunk) => {
responseData = Buffer.concat([responseData, chunk]);
});
client.on('end', () => {
const startFlag = Buffer.from('filename="');
const endFlag = Buffer.from('"');
const startIndex = responseData.indexOf(startFlag) + startFlag.length;
const endIndex = responseData.indexOf(endFlag, startIndex);
const filenameBuffer = responseData.slice(startIndex, endIndex);
console.log('filename Buffer:', filenameBuffer.toString('hex'));
console.log('filename utf8:', filenameBuffer.toString('utf8'));
proxy.close(() => backend.close());
});
});
});
When the Content-Length header appears after Content-Disposition header in an HTTP response, Node.js' encoder behaves correctly. However, when Content-Length is placed BEFORE Content-Disposition, the encoder exhibits abnormal behavior which corruption utf8 char in header processing.
How often does it reproduce? Is there a required condition?
always
What is the expected behavior? Why is that the expected behavior?
Expected Behavior:
Regardless of header order (Content-Length before or after Content-Disposition), the HTTP parser should preserve raw byte values in filename parameter
Technical Justification:
-
RFC 7230 Section 3.2.2:
- "The order in which header fields with differing field names are received is not significant"
- Content-Length position should not affect header parsing
-
RFC 6266 Section 4.1:
- "If both filename and filename* are present, filename* should be used"
- UTF-8 encoding must be properly handled
-
Binary Safety:
- filename parameter's raw bytes (e6 bc 8f e6 b4 a9) should remain intact
- No double-encoding (Latin-1 → UTF-8 conversion) should occur
-
Test Case Consistency:
Both scenarios should produce identical outputs:- Hex dump of original UTF-8 bytes
- Proper Chinese character decoding
What do you see instead?
Utf8 character e6bc8fe6b49e (漏洞) was encoded to 0f1e when Content-Length is placed BEFORE Content-Disposition
Additional information
Lines 598 to 607 in 3f5899f
This code snippet in Node.js demonstrates a special handling for the Content-Disposition header's encoding when the Content-Length header is present. The logic converts the Content-Disposition header value into a Latin1-encoded Buffer only if Content-Length has already been processed and exists in the headers (self._contentLength is truthy).
Since headers are processed sequentially in caller, the outcome depends on the order of header definitions:
- If Content-Length is set before Content-Disposition, the condition self._contentLength is met, and the encoding logic applies.
- If Content-Length is set after Content-Disposition, self._contentLength is undefined during the Content-Disposition processing, so the encoding step is skipped