Fix slice-index panic in DtdParser on chunked unknown markup (#957)#958
Conversation
The `UndecidedMarkup(skipped)` arm of `DtdParser::feed` stages bytes into a fixed `[u8; 9]` keyword-resolution buffer and, when no keyword matched and `>` was not yet seen, updated the state to `UndecidedMarkup(skipped + cur.len())`. With chunked input that keeps hitting `<!`-shaped tokens but never matches one of `<!--`, `<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, the saved `skipped` could grow past `bytes.len() == 9`. The next entry into the arm panicked on `bytes[..skipped].copy_from_slice(...)` with `range end index N out of range for slice of length 9`. Once the 9-byte window is full and `switch()` returned `None`, the markup is definitively unknown — there is no benefit to waiting for more data, and `switch`'s "skip until `>`" fallback already exists in the `InElementDecl` state. So when `end == bytes.len()`, drop the already-examined window and transition to `InElementDecl`. Adds an `issue957` regression test in the same style as `issue950`, using a 25-byte malformed DTD and `BufReader::with_capacity(4)` to force enough re-entries for `skipped` to overflow the work buffer. Full `cargo test` suite stays green.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #958 +/- ##
==========================================
+ Coverage 55.08% 56.48% +1.39%
==========================================
Files 44 44
Lines 16911 17653 +742
==========================================
+ Hits 9316 9971 +655
- Misses 7595 7682 +87
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks! Yes, that is elegant solution. It is bad that this was not catched by our fuzzing tests. I haven't good understanding how they work. Improving the testset would be good. You say, that this bug was catched by fuzzer. Could you share the details of how it works for you, or maybe you even want to do PR to improve our fuzzing procedure? |
|
Happy to help, and good question. Nothing fancy on our side — it's just I went ahead and opened #959 with a small Hope it's useful! |
#960) Sibling of #957. The fix in #958 prevents `skipped` from growing across multiple `UndecidedMarkup` re-entries inside `DtdParser::feed`, but the same panic was still reachable via the *initial* assignment one arm earlier: *self = Self::UndecidedMarkup(cur.len() - i - 1); When a chunk delivered `<` followed by 9+ bytes of unknown markup (so `switch()` returned `None` on a window long enough to definitively rule out all keywords), `skipped` was set to that long value in one shot. The next entry into `UndecidedMarkup` panicked on the very first line: bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]); // range end index 24 out of range for slice of length 9 CIFuzz on PR #959 (a new chunked-`BufRead` fuzz target) reproduced this within ~3s, 192,123 executions. Local minimal reproducer: let reader = BufReader::with_capacity( 24, "<!DOCTYPE r[<aaaaaaaaaaaaaaaaaaaaaaaaaaaaa>]>".as_bytes(), ); Mirrors the bail-out already present in the `UndecidedMarkup` arm: when `cur.len() - i - 1 >= 9`, the markup is definitively not one of `<!--`, `<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, so we transition to `InElementDecl` (skip until `>`) instead of staging more bytes than the 9-byte work buffer can hold. Adds `issue960` regression test in the same style as `issue957`. Full `cargo test` stays green.
tafia#960) Sibling of tafia#957. The fix in tafia#958 prevents `skipped` from growing across multiple `UndecidedMarkup` re-entries inside `DtdParser::feed`, but the same panic was still reachable via the *initial* assignment one arm earlier: *self = Self::UndecidedMarkup(cur.len() - i - 1); When a chunk delivered `<` followed by 9+ bytes of unknown markup (so `switch()` returned `None` on a window long enough to definitively rule out all keywords), `skipped` was set to that long value in one shot. The next entry into `UndecidedMarkup` panicked on the very first line: bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]); // range end index 24 out of range for slice of length 9 CIFuzz on PR tafia#959 (a new chunked-`BufRead` fuzz target) reproduced this within ~3s, 192,123 executions. Local minimal reproducer: let reader = BufReader::with_capacity( 24, "<!DOCTYPE r[<aaaaaaaaaaaaaaaaaaaaaaaaaaaaa>]>".as_bytes(), ); Mirrors the bail-out already present in the `UndecidedMarkup` arm: when `cur.len() - i - 1 >= 9`, the markup is definitively not one of `<!--`, `<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, so we transition to `InElementDecl` (skip until `>`) instead of staging more bytes than the 9-byte work buffer can hold. Adds `issue960` regression test in the same style as `issue957`. Full `cargo test` stays green. (cherry picked from commit 0672dfa)
Fixes #957.
Bug
In
src/parser/dtd.rs, theSelf::UndecidedMarkup(skipped)arm ofDtdParser::feedstages the prefix of an unresolved markup token into a fixed[u8; 9]work buffer (long enough for the longest recognised keyword,!NOTATION). When the buffered window did not match any keyword and did not contain a>, the state was updated toUndecidedMarkup(skipped + cur.len())andfeedreturned to wait for more bytes.With chunked input that keeps hitting
<!-shaped tokens that never match one of<!--,<![CDATA[,<!ELEMENT,<!ATTLIST,<!ENTITY,<!NOTATION,skippedcould grow pastbytes.len() == 9. The next re-entry into the arm panicked on the very first line,bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]), withThis survived #954's fix for #950 (different panic, same state machine).
Fix
Once the 9-byte window is full (
end == bytes.len()) andswitch()returnedNone, the markup is definitively not one of the recognised keywords. There is no benefit to waiting for more data; the parser can drop the examined window and skip until the closing>. TheInElementDeclstate already does exactly that (no quote awareness needed — we only reach this point if we failed to match the keywords that promote intoInQuoteSensitive), so the new branch reuses it rather than introducing a new state.The shape of the existing comments — "Buffer is long enough to store the longest possible keyword
!NOTATION" — already impliesskippedwas meant to be bounded bybytes.len(); this just enforces it.Test
Added
issue957intests/issues.rsin the same style asissue950. A 25-byte malformed DTD plusBufReader::with_capacity(4)is enough to force the failure mode (each<aa,<bb, … straddles a chunk boundary, soUndecidedMarkupis re-entered repeatedly):Same disposition as
issue950— we don't assert on the returned event since proper DTD validity reporting is a future improvement; we just pin that the parser doesn't panic.Verification
cargo test(full suite, all features default) stays green: doc tests + everytests/*binary, no failures.Why this matters
Any consumer reading untrusted XML through
Reader::read_event_into(with aBufReaderor any chunkedBufRead) can be panicked by ~25 bytes of crafted input. We hit it via a coverage-guided fuzzer feedingBufReader<File>over arbitrary bytes; the workaround downstream is to pre-reject<!-shaped tokens that aren't comments / CDATA before the reader sees them, but the state-machine invariant should be enforced in the library.