Skip to content

Fix slice-index panic in DtdParser on chunked unknown markup (#957)#958

Merged
Mingun merged 1 commit into
tafia:masterfrom
dobermai:fix/issue-957-undecided-markup-overflow
May 6, 2026
Merged

Fix slice-index panic in DtdParser on chunked unknown markup (#957)#958
Mingun merged 1 commit into
tafia:masterfrom
dobermai:fix/issue-957-undecided-markup-overflow

Conversation

@dobermai
Copy link
Copy Markdown
Contributor

@dobermai dobermai commented May 6, 2026

Fixes #957.

Bug

In src/parser/dtd.rs, the Self::UndecidedMarkup(skipped) arm of DtdParser::feed stages the prefix of an unresolved markup token into a fixed [u8; 9] work buffer (long enough for the longest recognised keyword, !NOTATION). When the buffered window did not match any keyword and did not contain a >, the state was updated to UndecidedMarkup(skipped + cur.len()) and feed returned to wait for more bytes.

With chunked input that keeps hitting <!-shaped tokens that never match one of <!--, <![CDATA[, <!ELEMENT, <!ATTLIST, <!ENTITY, <!NOTATION, skipped could grow past bytes.len() == 9. The next re-entry into the arm panicked on the very first line, bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]), with

range end index N out of range for slice of length 9

This survived #954's fix for #950 (different panic, same state machine).

Fix

Once the 9-byte window is full (end == bytes.len()) and switch() returned None, the markup is definitively not one of the recognised keywords. There is no benefit to waiting for more data; the parser can drop the examined window and skip until the closing >. The InElementDecl state already does exactly that (no quote awareness needed — we only reach this point if we failed to match the keywords that promote into InQuoteSensitive), so the new branch reuses it rather than introducing a new state.

if end == bytes.len() {
    cur = &cur[end - skipped..];
    *self = Self::InElementDecl;
    continue;
}

The shape of the existing comments — "Buffer is long enough to store the longest possible keyword !NOTATION" — already implies skipped was meant to be bounded by bytes.len(); this just enforces it.

Test

Added issue957 in tests/issues.rs in the same style as issue950. A 25-byte malformed DTD plus BufReader::with_capacity(4) is enough to force the failure mode (each <aa, <bb, … straddles a chunk boundary, so UndecidedMarkup is re-entered repeatedly):

let reader = BufReader::with_capacity(4, "<!DOCTYPE r[<aa<bb<cc<dd>".as_bytes());
//                                chunks: ____----____----____----_

Same disposition as issue950 — we don't assert on the returned event since proper DTD validity reporting is a future improvement; we just pin that the parser doesn't panic.

Verification

  • New regression test passes on the fix; reverting the fix reproduces the panic.
  • cargo test (full suite, all features default) stays green: doc tests + every tests/* binary, no failures.

Why this matters

Any consumer reading untrusted XML through Reader::read_event_into (with a BufReader or any chunked BufRead) can be panicked by ~25 bytes of crafted input. We hit it via a coverage-guided fuzzer feeding BufReader<File> over arbitrary bytes; the workaround downstream is to pre-reject <!-shaped tokens that aren't comments / CDATA before the reader sees them, but the state-machine invariant should be enforced in the library.

The `UndecidedMarkup(skipped)` arm of `DtdParser::feed` stages bytes
into a fixed `[u8; 9]` keyword-resolution buffer and, when no keyword
matched and `>` was not yet seen, updated the state to
`UndecidedMarkup(skipped + cur.len())`. With chunked input that keeps
hitting `<!`-shaped tokens but never matches one of `<!--`,
`<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, the
saved `skipped` could grow past `bytes.len() == 9`. The next entry
into the arm panicked on `bytes[..skipped].copy_from_slice(...)` with
`range end index N out of range for slice of length 9`.

Once the 9-byte window is full and `switch()` returned `None`, the
markup is definitively unknown — there is no benefit to waiting for
more data, and `switch`'s "skip until `>`" fallback already exists in
the `InElementDecl` state. So when `end == bytes.len()`, drop the
already-examined window and transition to `InElementDecl`.

Adds an `issue957` regression test in the same style as `issue950`,
using a 25-byte malformed DTD and `BufReader::with_capacity(4)` to
force enough re-entries for `skipped` to overflow the work buffer.
Full `cargo test` suite stays green.
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.48%. Comparing base (a759d65) to head (9fd1d58).
⚠️ Report is 25 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #958      +/-   ##
==========================================
+ Coverage   55.08%   56.48%   +1.39%     
==========================================
  Files          44       44              
  Lines       16911    17653     +742     
==========================================
+ Hits         9316     9971     +655     
- Misses       7595     7682      +87     
Flag Coverage Δ
unittests 56.48% <100.00%> (+1.39%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Mingun Mingun merged commit 2c46d24 into tafia:master May 6, 2026
7 checks passed
@Mingun
Copy link
Copy Markdown
Collaborator

Mingun commented May 6, 2026

Thanks! Yes, that is elegant solution.

It is bad that this was not catched by our fuzzing tests. I haven't good understanding how they work. Improving the testset would be good. You say, that this bug was catched by fuzzer. Could you share the details of how it works for you, or maybe you even want to do PR to improve our fuzzing procedure?

@dobermai
Copy link
Copy Markdown
Contributor Author

dobermai commented May 6, 2026

Happy to help, and good question.

Nothing fancy on our side — it's just cargo-fuzz/libFuzzer, same as yours. The one thing that's different is that we feed the parser via BufReader<File> rather than Cursor<&[u8]>. <Cursor as BufRead>::fill_buf always hands back the whole remaining input in one go, so a Cursor-backed harness never gets to exercise parser states that span fill_buf calls. That's the whole reason #950 and #957 slipped through — both regression tests reproduce only with BufReader::with_capacity(small_N, ..).

I went ahead and opened #959 with a small fuzz_chunked_reader target. It just takes the first input byte as a BufReader capacity and feeds the rest as XML — same input shape as fuzz_target_1, so corpora are interchangeable. I sanity-checked it by reverting the #957 fix locally and seeding the corpus with the regression-test input prefixed with the capacity byte; libFuzzer reproduced the panic on the seed within one run, and with the fix back in place 5000 runs stay clean. All the details are in the PR.

Hope it's useful!

Mingun pushed a commit that referenced this pull request May 7, 2026
#960)

Sibling of #957. The fix in #958 prevents `skipped` from growing across
multiple `UndecidedMarkup` re-entries inside `DtdParser::feed`, but the
same panic was still reachable via the *initial* assignment one arm
earlier:

    *self = Self::UndecidedMarkup(cur.len() - i - 1);

When a chunk delivered `<` followed by 9+ bytes of unknown markup (so
`switch()` returned `None` on a window long enough to definitively rule
out all keywords), `skipped` was set to that long value in one shot. The
next entry into `UndecidedMarkup` panicked on the very first line:

    bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]);
    // range end index 24 out of range for slice of length 9

CIFuzz on PR #959 (a new chunked-`BufRead` fuzz target) reproduced this
within ~3s, 192,123 executions. Local minimal reproducer:

    let reader = BufReader::with_capacity(
        24,
        "<!DOCTYPE r[<aaaaaaaaaaaaaaaaaaaaaaaaaaaaa>]>".as_bytes(),
    );

Mirrors the bail-out already present in the `UndecidedMarkup` arm: when
`cur.len() - i - 1 >= 9`, the markup is definitively not one of `<!--`,
`<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, so we
transition to `InElementDecl` (skip until `>`) instead of staging more
bytes than the 9-byte work buffer can hold.

Adds `issue960` regression test in the same style as `issue957`. Full
`cargo test` stays green.
Mingun pushed a commit to Mingun/quick-xml that referenced this pull request May 8, 2026
tafia#960)

Sibling of tafia#957. The fix in tafia#958 prevents `skipped` from growing across
multiple `UndecidedMarkup` re-entries inside `DtdParser::feed`, but the
same panic was still reachable via the *initial* assignment one arm
earlier:

    *self = Self::UndecidedMarkup(cur.len() - i - 1);

When a chunk delivered `<` followed by 9+ bytes of unknown markup (so
`switch()` returned `None` on a window long enough to definitively rule
out all keywords), `skipped` was set to that long value in one shot. The
next entry into `UndecidedMarkup` panicked on the very first line:

    bytes[..skipped].copy_from_slice(&buf[buf.len() - skipped..]);
    // range end index 24 out of range for slice of length 9

CIFuzz on PR tafia#959 (a new chunked-`BufRead` fuzz target) reproduced this
within ~3s, 192,123 executions. Local minimal reproducer:

    let reader = BufReader::with_capacity(
        24,
        "<!DOCTYPE r[<aaaaaaaaaaaaaaaaaaaaaaaaaaaaa>]>".as_bytes(),
    );

Mirrors the bail-out already present in the `UndecidedMarkup` arm: when
`cur.len() - i - 1 >= 9`, the markup is definitively not one of `<!--`,
`<![CDATA[`, `<!ELEMENT`, `<!ATTLIST`, `<!ENTITY`, `<!NOTATION`, so we
transition to `InElementDecl` (skip until `>`) instead of staging more
bytes than the 9-byte work buffer can hold.

Adds `issue960` regression test in the same style as `issue957`. Full
`cargo test` stays green.

(cherry picked from commit 0672dfa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DtdParser panics with slice OOB on chunked DTD input that never matches a keyword (regression after #954)

3 participants