You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Linux horsleyli-1q8kg06lua 5.4.119-19.0009.44 #1 SMP Tue May 7 20:08:55 CST 2024 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
http
What steps will reproduce the bug?
constnet=require('net');consthttp=require('http');constcontentLengthComesFirst=process.argv.includes('--content-length-comes-first');constbackend=net.createServer((socket)=>{socket.once('data',(data)=>{// Build raw HTTP response with binary-safe headersconstfilename='漏洞.txt';constresponse='HTTP/1.1 200 OK\r\n'+(contentLengthComesFirst ? `Content-Length: 12\r\n`:'')+// Raw UTF-8 bytes for Chinese filename (without URL encoding)`Content-Disposition: attachment; filename="${Buffer.from(filename).toString('binary')}"; filename*=UTF-8''${encodeURIComponent(filename)}\r\n`+`Content-Type: application/octet-stream\r\n`+(!contentLengthComesFirst ? `Content-Length: 12\r\n`:'')+'Connection: close\r\n\r\n'+'file content';constresponseBuffer=Buffer.from(response,'binary');socket.end(responseBuffer);});});backend.listen(()=>{constproxy=http.createServer((req,res)=>{constoptions={hostname: 'localhost',port: backend.address().port,method: req.method,headers: req.headers,path: '/backend'};constproxyReq=http.request(options,(proxyRes)=>{res.statusCode=proxyRes.statusCode;res.statusMessage=proxyRes.statusMessage;for(constheaderinproxyRes.headers){res.setHeader(header,proxyRes.headers[header]);}// Handle the 'data' event to ensure the response is sent correctlyproxyRes.on('data',(chunk)=>{res.write(chunk);});// Handle the 'end' event to finish the responseproxyRes.on('end',()=>{res.end();});});req.pipe(proxyReq);}).listen(()=>{constclient=net.connect(proxy.address().port,()=>{client.write(`GET /proxy HTTP/1.1\r\nHost: localhost:${backend.address().port}\r\n\r\n`);});letresponseData=Buffer.alloc(0);client.on('data',(chunk)=>{responseData=Buffer.concat([responseData,chunk]);});client.on('end',()=>{conststartFlag=Buffer.from('filename="');constendFlag=Buffer.from('"');conststartIndex=responseData.indexOf(startFlag)+startFlag.length;constendIndex=responseData.indexOf(endFlag,startIndex);constfilenameBuffer=responseData.slice(startIndex,endIndex);console.log('filename Buffer:',filenameBuffer.toString('hex'));console.log('filename utf8:',filenameBuffer.toString('utf8'));proxy.close(()=>backend.close());});});});
When the Content-Length header appears after Content-Disposition header in an HTTP response, Node.js' encoder behaves correctly. However, when Content-Length is placed BEFORE Content-Disposition, the encoder exhibits abnormal behavior which corruption utf8 char in header processing.
How often does it reproduce? Is there a required condition?
always
What is the expected behavior? Why is that the expected behavior?
Expected Behavior:
Regardless of header order (Content-Length before or after Content-Disposition), the HTTP parser should preserve raw byte values in filename parameter
Technical Justification:
RFC 7230 Section 3.2.2:
"The order in which header fields with differing field names are received is not significant"
Content-Length position should not affect header parsing
RFC 6266 Section 4.1:
"If both filename and filename* are present, filename* should be used"
UTF-8 encoding must be properly handled
Binary Safety:
filename parameter's raw bytes (e6 bc 8f e6 b4 a9) should remain intact
No double-encoding (Latin-1 → UTF-8 conversion) should occur
Test Case Consistency:
Both scenarios should produce identical outputs:
Hex dump of original UTF-8 bytes
Proper Chinese character decoding
What do you see instead?
Utf8 character e6bc8fe6b49e (漏洞) was encoded to 0f1e when Content-Length is placed BEFORE Content-Disposition
This code snippet in Node.js demonstrates a special handling for the Content-Disposition header's encoding when the Content-Length header is present. The logic converts the Content-Disposition header value into a Latin1-encoded Buffer only if Content-Length has already been processed and exists in the headers (self._contentLength is truthy).
Since headers are processed sequentially in caller, the outcome depends on the order of header definitions:
If Content-Length is set before Content-Disposition, the condition self._contentLength is met, and the encoding logic applies.
If Content-Length is set after Content-Disposition, self._contentLength is undefined during the Content-Disposition processing, so the encoding step is skipped
The text was updated successfully, but these errors were encountered:
Through tests similar to the provided reproduction steps, I've confirmed that when a server responds with a header like Content-Disposition: attachment; filename="한글파일.txt" (where "한글파일.txt" is comprised of UTF-8 bytes), the filename appears corrupted (likely misinterpreted as latin1) in res.headers['content-disposition'] when received by the Node.js HTTP client (http.request).
I've considered a couple of approaches to address this and would appreciate the community's guidance on which area might be most appropriate for a fix:
Node.js C++ Bindings (handling llhttp callbacks):
The idea is to modify the C++ code that handles the on_header_value callback from llhttp. When converting the header value byte sequence to a JavaScript string, if the current header is Content-Disposition (and notably, if a filename* parameter is not present), we could attempt to interpret the bytes as UTF-8 first. If that fails (e.g., invalid UTF-8 sequence), it would fall back to the current default behavior (presumably latin1).
Node.js JavaScript HTTP Library (e.g., in lib/_http_incoming.js):
This approach involves post-processing the string giá trị (which might have already been converted to a string, potentially as latin1, at the C++ level) within the JavaScript layer. When the res.headers object is being populated, we could apply a heuristic to the content-disposition header value. If the filename parameter exhibits a pattern indicative of UTF-8 misinterpreted as latin1, it could be corrected back to UTF-8.
It seems llhttp itself, being a low-level parser, might not be the direct place for this semantic fix. I'm wondering which of the two approaches above (or perhaps an alternative) would be more aligned with Node.js's design philosophy, performance considerations, and maintainability.
If there are existing efforts or preferred strategies for handling such encoding ambiguities, I would be grateful for any pointers.
Based on the feedback, I'm prepared to work on the fix, including tests, and submit a PR.
Version
v22.15.0
Platform
Subsystem
http
What steps will reproduce the bug?
When the Content-Length header appears after Content-Disposition header in an HTTP response, Node.js' encoder behaves correctly. However, when Content-Length is placed BEFORE Content-Disposition, the encoder exhibits abnormal behavior which corruption utf8 char in header processing.
How often does it reproduce? Is there a required condition?
always
What is the expected behavior? Why is that the expected behavior?
Expected Behavior:
Regardless of header order (Content-Length before or after Content-Disposition), the HTTP parser should preserve raw byte values in filename parameter
Technical Justification:
RFC 7230 Section 3.2.2:
RFC 6266 Section 4.1:
Binary Safety:
Test Case Consistency:
Both scenarios should produce identical outputs:
What do you see instead?
Utf8 character e6bc8fe6b49e (漏洞) was encoded to 0f1e when Content-Length is placed BEFORE Content-Disposition
Additional information
node/lib/_http_outgoing.js
Lines 598 to 607 in 3f5899f
This code snippet in Node.js demonstrates a special handling for the Content-Disposition header's encoding when the Content-Length header is present. The logic converts the Content-Disposition header value into a Latin1-encoded Buffer only if Content-Length has already been processed and exists in the headers (self._contentLength is truthy).
Since headers are processed sequentially in caller, the outcome depends on the order of header definitions:
The text was updated successfully, but these errors were encountered: