Fix slice-index panic in DtdParser on chunked unknown markup (#957) by dobermai · Pull Request #958 · tafia/quick-xml

dobermai · 2026-05-06T10:31:01Z

Fixes #957.

Bug

In src/parser/dtd.rs, the Self::UndecidedMarkup(skipped) arm of DtdParser::feed stages the prefix of an unresolved markup token into a fixed [u8; 9] work buffer (long enough for the longest recognised keyword, !NOTATION). When the buffered window did not match any keyword and did not contain a >, the state was updated to UndecidedMarkup(skipped + cur.len()) and feed returned to wait for more bytes.

With chunked input that keeps hitting <!-shaped tokens that never match one of <!--, <![CDATA[, <!ELEMENT, <!ATTLIST, <!ENTITY, <!NOTATION, skipped could grow past bytes.len() == 9. The next re-entry into the arm panicked on the very first line, bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]), with

range end index N out of range for slice of length 9

This survived #954's fix for #950 (different panic, same state machine).

Fix

Once the 9-byte window is full (end == bytes.len()) and switch() returned None, the markup is definitively not one of the recognised keywords. There is no benefit to waiting for more data; the parser can drop the examined window and skip until the closing >. The InElementDecl state already does exactly that (no quote awareness needed — we only reach this point if we failed to match the keywords that promote into InQuoteSensitive), so the new branch reuses it rather than introducing a new state.

if end == bytes.len() {
    cur = &cur[end - skipped..];
    *self = Self::InElementDecl;
    continue;
}

The shape of the existing comments — "Buffer is long enough to store the longest possible keyword !NOTATION" — already implies skipped was meant to be bounded by bytes.len(); this just enforces it.

Test

Added issue957 in tests/issues.rs in the same style as issue950. A 25-byte malformed DTD plus BufReader::with_capacity(4) is enough to force the failure mode (each <aa, <bb, … straddles a chunk boundary, so UndecidedMarkup is re-entered repeatedly):

let reader = BufReader::with_capacity(4, "<!DOCTYPE r[<aa<bb<cc<dd>".as_bytes());
//                                chunks: ____----____----____----_

Same disposition as issue950 — we don't assert on the returned event since proper DTD validity reporting is a future improvement; we just pin that the parser doesn't panic.

Verification

New regression test passes on the fix; reverting the fix reproduces the panic.
cargo test (full suite, all features default) stays green: doc tests + every tests/* binary, no failures.

Why this matters

Any consumer reading untrusted XML through Reader::read_event_into (with a BufReader or any chunked BufRead) can be panicked by ~25 bytes of crafted input. We hit it via a coverage-guided fuzzer feeding BufReader<File> over arbitrary bytes; the workaround downstream is to pre-reject <!-shaped tokens that aren't comments / CDATA before the reader sees them, but the state-machine invariant should be enforced in the library.

The `UndecidedMarkup(skipped)` arm of `DtdParser::feed` stages bytes into a fixed `[u8; 9]` keyword-resolution buffer and, when no keyword matched and `>` was not yet seen, updated the state to `UndecidedMarkup(skipped + cur.len())`. With chunked input that keeps hitting `<!`-shaped tokens but never matches one of `<!--`, `<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, the saved `skipped` could grow past `bytes.len() == 9`. The next entry into the arm panicked on `bytes[..skipped].copy_from_slice(...)` with `range end index N out of range for slice of length 9`. Once the 9-byte window is full and `switch()` returned `None`, the markup is definitively unknown — there is no benefit to waiting for more data, and `switch`'s "skip until `>`" fallback already exists in the `InElementDecl` state. So when `end == bytes.len()`, drop the already-examined window and transition to `InElementDecl`. Adds an `issue957` regression test in the same style as `issue950`, using a 25-byte malformed DTD and `BufReader::with_capacity(4)` to force enough re-entries for `skipped` to overflow the work buffer. Full `cargo test` suite stays green.

codecov-commenter · 2026-05-06T13:32:42Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.48%. Comparing base (a759d65) to head (9fd1d58).
⚠️ Report is 25 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #958      +/-   ##
==========================================
+ Coverage   55.08%   56.48%   +1.39%     
==========================================
  Files          44       44              
  Lines       16911    17653     +742     
==========================================
+ Hits         9316     9971     +655     
- Misses       7595     7682      +87

Flag	Coverage Δ
unittests	`56.48% <100.00%> (+1.39%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Mingun · 2026-05-06T19:59:36Z

Thanks! Yes, that is elegant solution.

It is bad that this was not catched by our fuzzing tests. I haven't good understanding how they work. Improving the testset would be good. You say, that this bug was catched by fuzzer. Could you share the details of how it works for you, or maybe you even want to do PR to improve our fuzzing procedure?

dobermai · 2026-05-06T20:36:53Z

Happy to help, and good question.

Nothing fancy on our side — it's just cargo-fuzz/libFuzzer, same as yours. The one thing that's different is that we feed the parser via BufReader<File> rather than Cursor<&[u8]>. <Cursor as BufRead>::fill_buf always hands back the whole remaining input in one go, so a Cursor-backed harness never gets to exercise parser states that span fill_buf calls. That's the whole reason #950 and #957 slipped through — both regression tests reproduce only with BufReader::with_capacity(small_N, ..).

I went ahead and opened #959 with a small fuzz_chunked_reader target. It just takes the first input byte as a BufReader capacity and feeds the rest as XML — same input shape as fuzz_target_1, so corpora are interchangeable. I sanity-checked it by reverting the #957 fix locally and seeding the corpus with the regression-test input prefixed with the capacity byte; libFuzzer reproduced the panic on the seed within one run, and with the fix back in place 5000 runs stay clean. All the details are in the PR.

Hope it's useful!

#960) Sibling of #957. The fix in #958 prevents `skipped` from growing across multiple `UndecidedMarkup` re-entries inside `DtdParser::feed`, but the same panic was still reachable via the *initial* assignment one arm earlier: *self = Self::UndecidedMarkup(cur.len() - i - 1); When a chunk delivered `<` followed by 9+ bytes of unknown markup (so `switch()` returned `None` on a window long enough to definitively rule out all keywords), `skipped` was set to that long value in one shot. The next entry into `UndecidedMarkup` panicked on the very first line: bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]); // range end index 24 out of range for slice of length 9 CIFuzz on PR #959 (a new chunked-`BufRead` fuzz target) reproduced this within ~3s, 192,123 executions. Local minimal reproducer: let reader = BufReader::with_capacity( 24, "<!DOCTYPE r[<aaaaaaaaaaaaaaaaaaaaaaaaaaaaa>]>".as_bytes(), ); Mirrors the bail-out already present in the `UndecidedMarkup` arm: when `cur.len() - i - 1 >= 9`, the markup is definitively not one of `<!--`, `<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, so we transition to `InElementDecl` (skip until `>`) instead of staging more bytes than the 9-byte work buffer can hold. Adds `issue960` regression test in the same style as `issue957`. Full `cargo test` stays green.

tafia#960) Sibling of tafia#957. The fix in tafia#958 prevents `skipped` from growing across multiple `UndecidedMarkup` re-entries inside `DtdParser::feed`, but the same panic was still reachable via the *initial* assignment one arm earlier: *self = Self::UndecidedMarkup(cur.len() - i - 1); When a chunk delivered `<` followed by 9+ bytes of unknown markup (so `switch()` returned `None` on a window long enough to definitively rule out all keywords), `skipped` was set to that long value in one shot. The next entry into `UndecidedMarkup` panicked on the very first line: bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]); // range end index 24 out of range for slice of length 9 CIFuzz on PR tafia#959 (a new chunked-`BufRead` fuzz target) reproduced this within ~3s, 192,123 executions. Local minimal reproducer: let reader = BufReader::with_capacity( 24, "<!DOCTYPE r[<aaaaaaaaaaaaaaaaaaaaaaaaaaaaa>]>".as_bytes(), ); Mirrors the bail-out already present in the `UndecidedMarkup` arm: when `cur.len() - i - 1 >= 9`, the markup is definitively not one of `<!--`, `<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, so we transition to `InElementDecl` (skip until `>`) instead of staging more bytes than the 9-byte work buffer can hold. Adds `issue960` regression test in the same style as `issue957`. Full `cargo test` stays green. (cherry picked from commit 0672dfa)

Mingun merged commit 2c46d24 into tafia:master May 6, 2026
7 checks passed

Mingun added the bug label May 6, 2026

dobermai mentioned this pull request May 6, 2026

Add fuzz_chunked_reader target for chunk-boundary parser bugs #959

Merged

This was referenced May 6, 2026

DtdParser panics on chunked input with long unknown markup after < #960

Closed

Fix slice-index panic in DtdParser on long single-chunk unknown markup (#960) #961

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix slice-index panic in DtdParser on chunked unknown markup (#957)#958

Fix slice-index panic in DtdParser on chunked unknown markup (#957)#958
Mingun merged 1 commit into
tafia:masterfrom
dobermai:fix/issue-957-undecided-markup-overflow

dobermai commented May 6, 2026

Uh oh!

codecov-commenter commented May 6, 2026

Uh oh!

Uh oh!

Mingun commented May 6, 2026

Uh oh!

dobermai commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dobermai commented May 6, 2026

Uh oh!

codecov-commenter commented May 6, 2026

Codecov Report

Uh oh!

Uh oh!

Mingun commented May 6, 2026

Uh oh!

dobermai commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants