diff --git a/CHANGES/12689.breaking.rst b/CHANGES/12689.breaking.rst new file mode 100644 index 00000000000..9d807e00769 --- /dev/null +++ b/CHANGES/12689.breaking.rst @@ -0,0 +1,7 @@ +Tightened outbound header serialization to reject all ASCII control +characters forbidden by :rfc:`9110#section-5.5` and :rfc:`9112#section-4` +(``0x00``\-``0x08``, ``0x0A``\-``0x1F``, ``0x7F``) in status lines, +header field-names, and field-values. Previously only CR, LF and NUL were +rejected. HTAB (``0x09``) remains permitted in field values. Applications +that placed bare control characters in outbound headers will now raise +:exc:`ValueError` instead of emitting non-RFC-compliant bytes -- by :user:`rodrigobnogueira`. diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md index 3b6648685d5..2834202427f 100644 --- a/THREAT_MODEL.md +++ b/THREAT_MODEL.md @@ -332,7 +332,8 @@ called out where the writer's safety depends on them). **Components covered.** - `aiohttp/_http_writer.pyx` — Cython `_serialize_headers` and - `_write_str_raise_on_nlcr` (the CR/LF/NUL bytewise rejector). + `_write_str_raise_on_nlcr` (the forbidden-CTL bytewise rejector: rejects + `0x00-0x08`, `0x0A-0x1F`, `0x7F`; HTAB and SP remain permitted). - `aiohttp/http_writer.py` — `StreamWriter` (the `AbstractStreamWriter` implementation) plus the pure-Python `_py_serialize_headers` / `_safe_header` fallback and the Cython/pure-Python switch at @@ -345,7 +346,9 @@ called out where the writer's safety depends on them). implementation; if `_http_writer` (Cython) imports successfully and `AIOHTTP_NO_EXTENSIONS` is unset, the Cython implementation replaces it (`http_writer.py:_py_serialize_headers`). Both implementations apply the same -CR / LF / NUL rejection on names *and* values *and* the status/request line. +RFC 9110 §5.5 / RFC 9112 §4 forbidden-CTL rejection (`0x00-0x08`, +`0x0A-0x1F`, `0x7F`; HTAB and SP permitted) on names *and* values *and* +the status/request line. **Trust boundaries & data flow.** @@ -353,7 +356,7 @@ CR / LF / NUL rejection on names *and* values *and* the status/request line. flowchart LR Handler([User handler / ClientRequest]) -->|status_line, headers, body| SW[StreamWriter] SW --> Serialize[_serialize_headers] - Serialize -->|reject CR/LF/NUL| Bytes[Wire bytes] + Serialize -->|reject forbidden CTLs| Bytes[Wire bytes] SW --> Body[write / write_eof / write_chunked] Body --> Bytes Bytes --> Transport[(asyncio Transport)] @@ -380,38 +383,38 @@ on the wire. The wire-side consumer is the **untrusted** counterparty | # | Component / Vector | STRIDE | Threat | Risk | | :--- | :--- | :--- | :--- | :--- | -| 2.1 | Header name/value with CR / LF / NUL | T / I | Response-splitting / header injection allowing the next "header" or even a complete second response/request to be appended on the wire. | High | +| 2.1 | Header name/value with forbidden CTL | T / I | Response-splitting / header injection (CR / LF) or non-RFC-compliant CTLs (`0x01-0x08`, `0x0B-0x1F`, `0x7F`) that downstream agents historically treat inconsistently. | High | | 2.2 | Status-line `reason` with CR / LF | T | Same family as 2.1 but on the status line; could let an attacker-controlled reason inject a body or a second status line. | High | -| 2.3 | Request-line path/method | T | Path-side smuggling via CR / LF / NUL or whitespace inside the path the writer emits. | Medium | +| 2.3 | Request-line path/method | T | Path-side smuggling via forbidden CTLs or whitespace inside the path the writer emits. | Medium | | 2.4 | `Content-Length` ≠ actual body length | T | If a handler / ClientRequest emits a body whose length disagrees with declared `Content-Length`, an intermediary may interpret framing differently from the writer's peer (smuggling). | Medium | | 2.5 | `Content-Length` *and* `Transfer-Encoding: chunked` | T | Both headers reach the wire if user code constructs them via the raw headers dict; intermediaries disagree on which wins. | Medium | | 2.6 | Body emission on HEAD / 1xx / 204 / 304 | T | Writer strips CL/TE for empty-body responses but **does not block the application from writing a body**; bytes after the `\r\n\r\n` confuse the next pipelined request. | Medium | -| 2.7 | `Set-Cookie` / `Cookie` value | T | Cookie name or value containing CR / LF / NUL passes through `SimpleCookie.output()` unchanged; only caught by writer's header validation. | Medium | +| 2.7 | `Set-Cookie` / `Cookie` value | T | Cookie name or value containing forbidden CTLs passes through `SimpleCookie.output()` unchanged; only caught by writer's header validation. | Medium | | 2.8 | Compression / `Content-Encoding` | T | Body double-compression when user sets `Content-Encoding` manually and also enables `compress=...`. Intermediaries may reject or mis-decode a doubly-compressed body. | Low | | 2.9 | Drain / backpressure on slow readers | D | Slow consumer (or `Sec-WebSocket-Key`-style hold) keeps `transport.write()` queued; writer drains at 64 KiB threshold (`http_writer.py:StreamWriter.write`). A handler that doesn't await `drain()` can blow up. | Medium | | 2.10 | Single oversized chunk | D | `write(b)` with a multi-GB blob is handed straight to `transport.write`; memory pressure shifts to asyncio's buffer. | Low | | 2.11 | Chunked encoding hex framing | T | Malformed chunk-size lines (negative values, leading-`+`, leading zeros, hex obfuscation) would let a non-aiohttp peer reframe the body differently and smuggle. | Low | -| 2.12 | Header insertion validation timing | T | CR/LF/NUL rejection is *write-time*, not *insert-time*. A handler that sets a malicious header and then aborts before `write_headers()` will not raise. (Documented; not a recommended change.) | Low | -| 2.13 | Cython ⇄ pure-Python parity | T | Divergence between the two `_serialize_headers` implementations could let one backend silently pass CR/LF/NUL that the other rejects, weakening egress safety asymmetrically. | Low | +| 2.12 | Header insertion validation timing | T | Forbidden-CTL rejection is *write-time*, not *insert-time*. A handler that sets a malicious header and then aborts before `write_headers()` will not raise. (Documented; not a recommended change.) | Low | +| 2.13 | Cython ⇄ pure-Python parity | T | Divergence between the two `_serialize_headers` implementations could let one backend silently pass a forbidden CTL that the other rejects, weakening egress safety asymmetrically. | Low | | 2.14 | Trailers asymmetry | T | The writer never emits trailers, but the parser accepts incoming trailers; not a writer-side threat in itself, just a documentation point for completeness. | Low | **Mitigations.** | # | Threat | Existing | Recommended | | :--- | :--- | :--- | :--- | -| 2.1 | Header CR / LF / NUL injection | Both backends reject these bytes via `_write_str_raise_on_nlcr` (`_http_writer.pyx:_write_str_raise_on_nlcr`) and `_safe_header` (`http_writer.py:_safe_header`), raising `ValueError` from `_serialize_headers` before any byte hits the transport. Applied symmetrically to names, values, and the status line. | **The current tests import whichever `_serialize_headers` won the import, so only one backend is exercised. Parameterise like `tests/test_http_parser.py` does (cross-cuts [§6.1](#61-highest-leverage-recommendations) #3).** | -| 2.2 | Status-line `reason` injection | `web_response.Response._set_status` (`web_response.py:StreamResponse._set_status`) rejects `\r` / `\n` in `reason` *at set-time*. The writer also rejects them at write-time as part of the status-line validation. | None. | -| 2.3 | Request-line path / method | The full status line (`{method} {path} HTTP/{v}.{v}`) goes through `_write_str_raise_on_nlcr` / `_safe_header`, so CR / LF / NUL are caught regardless of whether `path` came from `yarl` or `method` was a caller-supplied string. yarl additionally rejects these bytes earlier per RFC 3986. | None. | +| 2.1 | Header forbidden-CTL injection | Both backends reject the full RFC 9110 §5.5 / RFC 9112 §4 forbidden set (`0x00-0x08`, `0x0A-0x1F`, `0x7F`; HTAB and SP permitted) via `_write_str_raise_on_nlcr` (`_http_writer.pyx:_write_str_raise_on_nlcr`) and `_safe_header` (`http_writer.py:_safe_header`), raising `ValueError` from `_serialize_headers` before any byte hits the transport. Applied symmetrically to names, values, and the status line. | **The current tests import whichever `_serialize_headers` won the import, so only one backend is exercised. Parameterise like `tests/test_http_parser.py` does (cross-cuts [§6.1](#61-highest-leverage-recommendations) #3).** | +| 2.2 | Status-line `reason` injection | `web_response.Response._set_status` (`web_response.py:StreamResponse._set_status`) rejects `\r` / `\n` in `reason` *at set-time*. The writer also rejects the full forbidden-CTL set at write-time as part of the status-line validation. | None. | +| 2.3 | Request-line path / method | The full status line (`{method} {path} HTTP/{v}.{v}`) goes through `_write_str_raise_on_nlcr` / `_safe_header`, so forbidden CTLs are caught regardless of whether `path` came from `yarl` or `method` was a caller-supplied string. yarl additionally rejects CR/LF/NUL earlier per RFC 3986. | None. | | 2.4 | CL / body-length mismatch | None at write-time. `web.Response.write_eof` and the chunked writer write what they're given. | **Recommended hardening: in DEBUG mode, assert / warn when actual bytes-written disagrees with declared `Content-Length` at `write_eof()`. Useful for catching smuggling-adjacent bugs in user handlers.** | | 2.5 | CL + TE simultaneous | Server-side `enable_chunked_encoding()` (`web_response.py:StreamResponse.enable_chunked_encoding`) raises if `Content-Length` is already set; client-side `_update_transfer_encoding()` (`client_reqrep.py:ClientRequest._update_transfer_encoding`) raises if user sets `chunked=True` while `Content-Length` is in headers. Manual user injection into the raw headers dict is *not* caught. | **Consider a write-time assert in `StreamWriter` that rejects `Content-Length` and `Transfer-Encoding: chunked` coexisting.** | | 2.6 | Body-suppression edge cases | `web_response.py:StreamResponse._prepare_headers` strips `Content-Length` and `Transfer-Encoding` for HEAD / 1xx / 204 / 304 (`EMPTY_BODY_STATUS_CODES`, `helpers.py:EMPTY_BODY_METHODS`). The framework's own machinery doesn't write a body for these. | **User**: Do not call `resp.write(...)` in a handler responding HEAD / 1xx / 204 / 304 — framing strips CL / TE but does not block the byte write. **Optional aiohttp change: have `StreamWriter` short-circuit body writes when `length == 0` and the response was framed as empty-body.** | -| 2.7 | Cookie injection | `populate_with_cookies` (`helpers.py:populate_with_cookies`) routes the cookie through `SimpleCookie.output()` and then into a regular header, where the CR / LF / NUL check at write-time catches anything `SimpleCookie` happened to pass through. | Documented design decision: rely on writer-level validation rather than tightening `set_cookie` / `populate_with_cookies` further. Keep regression tests covering cookie name/value with CR / LF / NUL across both backends. | +| 2.7 | Cookie injection | `populate_with_cookies` (`helpers.py:populate_with_cookies`) routes the cookie through `SimpleCookie.output()` and then into a regular header, where the forbidden-CTL check at write-time catches anything `SimpleCookie` happened to pass through. | Documented design decision: rely on writer-level validation rather than tightening `set_cookie` / `populate_with_cookies` further. Keep regression tests covering cookie name/value with forbidden CTLs across both backends. | | 2.8 | Manual `Content-Encoding` | Server side: `enable_compression()` (`web_response.py:StreamResponse.enable_compression`) returns early if `Content-Encoding` already present, so the body is not double-compressed. Client side: `ClientRequest._update_content_encoding` raises `ValueError("compress can not be set if Content-Encoding header is set")` — symmetric guard. | None. | | 2.9 | Drain / backpressure | `StreamWriter.write` drains at `LIMIT = 0x10000` bytes (`http_writer.py:StreamWriter.write`) when `drain=True` is set by the caller. Application code is expected to `await write(...)` to honour backpressure. | **User**: `await write(...)` in handlers; tight `for` loops without `await` can starve the event loop. Cross-reference [§5.7](#57-server-connection-lifecycle) for connection-level read/write timeouts that mitigate slow consumers. | | 2.10 | Oversized single chunk | None at the writer layer — bytes go straight to `transport.write`. asyncio applies its own high-water marks via the transport. | **User**: Relies on application-level bounds (use streaming, generators, `FileResponse`, etc., for large bodies). | | 2.11 | Chunked hex framing | The writer always uses `f"{len(chunk):x}\r\n"` followed by the chunk and `\r\n` (`http_writer.py:StreamWriter._write_chunked_payload`). | None. | | 2.12 | Insert-time vs write-time validation | Headers are validated at write-time only; `set_status` validates `reason` at set-time. | Documented design decision: late validation is acceptable; keep behaviour as-is. | -| 2.13 | Cython ⇄ pure-Python parity | Both backends share the same logic and test surface; the Cython version uses a fast bytewise check, the Python version uses `in` on three sentinel characters. | **Parameterise the writer tests over both backends so egress equivalence on malicious inputs is exercised under both (see [§6.1](#61-highest-leverage-recommendations) #3).** | +| 2.13 | Cython ⇄ pure-Python parity | Both backends share the same logic and test surface; the Cython version uses a fast per-codepoint range check (`ch < 0x20 and ch != 0x09`, plus `0x7F`), the Python version uses a precompiled `re` over the same forbidden set. | **Parameterise the writer tests over both backends so egress equivalence on malicious inputs is exercised under both (see [§6.1](#61-highest-leverage-recommendations) #3).** | | 2.14 | Trailers asymmetry | Writer does not emit trailers; parser accepts trailers on incoming. Documented for completeness. | None. | **Past advisories / hardening (recap).** @@ -424,11 +427,19 @@ on the wire. The wire-side consumer is the **untrusted** counterparty the status-line `reason`. Fixed by rejecting CR/LF in `reason` at `_set_status` set-time, on top of the existing writer-side check (threat 2.2). - -Writer-level CR / LF / NUL rejection via `_safe_header` and +- **[#12689](https://github.com/aio-libs/aiohttp/pull/12689)** (hardening, no + CVE) — outbound header serialization only rejected CR/LF/NUL; other + RFC 9110 §5.5 / RFC 9112 §4 forbidden CTLs (`0x01-0x08`, `0x0B-0x1F`, + `0x7F`) could be emitted on the wire if a handler placed them into a + header. Tightened `_safe_header` and `_write_str_raise_on_nlcr` to + reject the full forbidden set (threat 2.1). + +Writer-level forbidden-CTL rejection via `_safe_header` and `_write_str_raise_on_nlcr` has been in place since the header-injection family of issues was first surfaced (well before CVE-2023-37276, which -was a parser-side fix). +was a parser-side fix); the rejected set was broadened from +{CR, LF, NUL} to the full RFC 9110 forbidden set in +[#12689](https://github.com/aio-libs/aiohttp/pull/12689). --- diff --git a/aiohttp/_http_writer.pyx b/aiohttp/_http_writer.pyx index ee8dcd7d2e9..7859ebd6682 100644 --- a/aiohttp/_http_writer.pyx +++ b/aiohttp/_http_writer.pyx @@ -111,9 +111,11 @@ cdef inline int _write_str_raise_on_nlcr(Writer* writer, object s): out_str = str(s) for ch in out_str: - if ch in {0x0D, 0x0A, 0x00}: + # https://www.rfc-editor.org/info/rfc9110/#section-5.5-5 + # https://www.rfc-editor.org/info/rfc9112/#section-4-3 + if (ch < 0x20 and ch != 0x09) or ch == 0x7F: raise ValueError( - "Newline, carriage return, or null byte detected in headers. " + "Forbidden control character detected in headers. " "Potential header injection attack." ) if _write_utf8(writer, ch) < 0: diff --git a/aiohttp/http_writer.py b/aiohttp/http_writer.py index d8fca34ee84..a1168cfdebb 100644 --- a/aiohttp/http_writer.py +++ b/aiohttp/http_writer.py @@ -1,6 +1,7 @@ """Http related parsers and protocol.""" import asyncio +import re import sys from typing import ( # noqa TYPE_CHECKING, @@ -363,10 +364,15 @@ async def drain(self) -> None: await protocol._drain_helper() +# https://www.rfc-editor.org/info/rfc9110/#section-5.5-5 +# https://www.rfc-editor.org/info/rfc9112/#section-4-3 +_FORBIDDEN_HEADER_CHARS_RE = re.compile(r"[\x00-\x08\x0a-\x1f\x7f]") + + def _safe_header(string: str) -> str: - if "\r" in string or "\n" in string or "\x00" in string: + if _FORBIDDEN_HEADER_CHARS_RE.search(string) is not None: raise ValueError( - "Newline, carriage return, or null byte detected in headers. " + "Forbidden control character detected in headers. " "Potential header injection attack." ) return string diff --git a/requirements/constraints.txt b/requirements/constraints.txt index d29fcac2147..317b891ba58 100644 --- a/requirements/constraints.txt +++ b/requirements/constraints.txt @@ -163,7 +163,7 @@ pip-tools==7.5.3 # via -r requirements/dev.in pkgconfig==1.6.0 # via -r requirements/test-common.in -platformdirs==4.9.6 +platformdirs==4.10.0 # via # python-discovery # virtualenv @@ -234,7 +234,7 @@ pytest-xdist==3.8.0 # via -r requirements/test-common.in python-dateutil==2.9.0.post0 # via freezegun -python-discovery==1.3.1 +python-discovery==1.4.0 # via virtualenv python-on-whales==0.81.0 # via @@ -326,7 +326,7 @@ uvloop==0.22.1 ; platform_system != "Windows" # -r requirements/lint.in valkey==6.1.1 # via -r requirements/lint.in -virtualenv==21.3.3 +virtualenv==21.4.1 # via pre-commit wait-for-it==2.3.0 # via -r requirements/test-common.in diff --git a/requirements/dev.txt b/requirements/dev.txt index 762e5d0a829..a746c370577 100644 --- a/requirements/dev.txt +++ b/requirements/dev.txt @@ -160,7 +160,7 @@ pip-tools==7.5.3 # via -r requirements/dev.in pkgconfig==1.6.0 # via -r requirements/test-common.in -platformdirs==4.9.6 +platformdirs==4.10.0 # via # python-discovery # virtualenv @@ -229,7 +229,7 @@ pytest-xdist==3.8.0 # via -r requirements/test-common.in python-dateutil==2.9.0.post0 # via freezegun -python-discovery==1.3.1 +python-discovery==1.4.0 # via virtualenv python-on-whales==0.81.0 # via @@ -316,7 +316,7 @@ uvloop==0.22.1 ; platform_system != "Windows" and implementation_name == "cpytho # -r requirements/lint.in valkey==6.1.1 # via -r requirements/lint.in -virtualenv==21.3.3 +virtualenv==21.4.1 # via pre-commit wait-for-it==2.3.0 # via -r requirements/test-common.in diff --git a/requirements/lint.txt b/requirements/lint.txt index d731a1ae94e..b24f04640e6 100644 --- a/requirements/lint.txt +++ b/requirements/lint.txt @@ -84,7 +84,7 @@ packaging==26.2 # via pytest pathspec==1.1.1 # via mypy -platformdirs==4.9.6 +platformdirs==4.10.0 # via # python-discovery # virtualenv @@ -127,7 +127,7 @@ pytest-mock==3.15.1 # via -r requirements/lint.in python-dateutil==2.9.0.post0 # via freezegun -python-discovery==1.3.1 +python-discovery==1.4.0 # via virtualenv python-on-whales==0.81.0 # via -r requirements/lint.in @@ -165,7 +165,7 @@ uvloop==0.22.1 ; platform_system != "Windows" # via -r requirements/lint.in valkey==6.1.1 # via -r requirements/lint.in -virtualenv==21.3.3 +virtualenv==21.4.1 # via pre-commit yarl==1.24.2 # via aiohttp diff --git a/tests/test_http_writer.py b/tests/test_http_writer.py index a4877977789..baab763c297 100644 --- a/tests/test_http_writer.py +++ b/tests/test_http_writer.py @@ -943,10 +943,11 @@ async def test_write_headers_with_compression_coalescing( [ "\n", "\r", + "\x00", ], ) def test_serialize_headers_raises_on_new_line_or_carriage_return(char: str) -> None: - """Verify serialize_headers raises on cr or nl in the headers.""" + """Verify serialize_headers raises on cr, nl, or null byte in the headers.""" status_line = "HTTP/1.1 200 OK" headers = CIMultiDict( { @@ -961,21 +962,69 @@ def test_serialize_headers_raises_on_new_line_or_carriage_return(char: str) -> N _serialize_headers(status_line, headers) -def test_serialize_headers_raises_on_null_byte() -> None: +@pytest.mark.parametrize( + "char", + [chr(c) for c in (*range(0x01, 0x09), *range(0x0B, 0x20), 0x7F)], +) +def test_serialize_headers_raises_on_forbidden_control_chars_in_value( + char: str, +) -> None: + """Verify serialize_headers rejects RFC 9110-forbidden CTLs in values.""" status_line = "HTTP/1.1 200 OK" - headers = CIMultiDict( - { - hdrs.CONTENT_TYPE: "text/plain\x00", - } - ) + headers = CIMultiDict({hdrs.CONTENT_TYPE: f"text/plain{char}"}) with pytest.raises( ValueError, - match="null byte detected in headers", + match="Forbidden control character detected in headers", ): _serialize_headers(status_line, headers) +@pytest.mark.parametrize( + "char", + [chr(c) for c in (*range(0x01, 0x09), *range(0x0B, 0x20), 0x7F)], +) +def test_serialize_headers_raises_on_forbidden_control_chars_in_name( + char: str, +) -> None: + """Verify serialize_headers rejects RFC 9110-forbidden CTLs in names.""" + status_line = "HTTP/1.1 200 OK" + headers = CIMultiDict({f"X-Bad{char}Header": "value"}) + + with pytest.raises( + ValueError, + match="Forbidden control character detected in headers", + ): + _serialize_headers(status_line, headers) + + +@pytest.mark.parametrize( + "char", + [chr(c) for c in (*range(0x01, 0x09), *range(0x0B, 0x20), 0x7F)], +) +def test_serialize_headers_raises_on_forbidden_control_chars_in_status_line( + char: str, +) -> None: + """Verify serialize_headers rejects RFC 9110-forbidden CTLs in status line.""" + status_line = f"HTTP/1.1 200 OK{char}" + headers: CIMultiDict[str] = CIMultiDict() + + with pytest.raises( + ValueError, + match="Forbidden control character detected in headers", + ): + _serialize_headers(status_line, headers) + + +def test_serialize_headers_allows_htab_in_value() -> None: + """Verify HTAB (0x09) remains permitted in field values per RFC 9110.""" + status_line = "HTTP/1.1 200 OK" + headers = CIMultiDict({hdrs.CONTENT_TYPE: "text/plain\tcharset=utf-8"}) + + result = _serialize_headers(status_line, headers) + assert b"text/plain\tcharset=utf-8" in result + + async def test_write_compressed_data_with_headers_coalescing( buf: bytearray, protocol: BaseProtocol, transport: asyncio.Transport ) -> None: