Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim next element after DocType #603

Merged
merged 2 commits into from Jun 10, 2023
Merged

Trim next element after DocType #603

merged 2 commits into from Jun 10, 2023

Conversation

danjpgriffin
Copy link

Allow comment to exist between DocType and root element

With the new EntityResolver/Doctype changes, putting an xml comment between a doctype and the root element causes the parse to fail. Trimming the next element fixes this.

Copy link
Collaborator

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment and update changelog. I would like also to have some more formal tests for that situation (can be added in de/mod.rs). Preferably to test each possible situation that you can imagine. I have a feeling, that the same problem can be happened not only for DocType + Comment events.

@@ -2880,7 +2880,7 @@ impl StartTrimmer {
#[inline(always)]
fn trim<'a>(&mut self, event: Event<'a>) -> Option<PayloadEvent<'a>> {
let (event, trim_next_event) = match event {
Event::DocType(e) => (PayloadEvent::DocType(e), false),
Event::DocType(e) => (PayloadEvent::DocType(e), true),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment describing, why we need trim here because this is not obvious (otherwise it would be done initially)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Immediately following a Doctype event, the Serde parser expects to read an initial structure from the XML file - it expects a 'Start' tag event. Any comments following the doctype in the stream is parsed to Text/CData events, which leads the parse to fail with an 'ExpectedStart' panic. By skipping (trimming) any Text/CData events the next event the serde deserializer sees will be a Start event, which is always required to start building up a new Struct from the xml.

The Doctype should trim events here just like the Start/End events. Only the CData/Text events should not trim (otherwise any CData/Text following them will be ignored)

@Mingun Mingun added bug serde Issues related to mapping from Rust types to XML labels May 25, 2023
@danjpgriffin danjpgriffin requested a review from Mingun June 9, 2023 04:50
Dan Griffin added 2 commits June 10, 2023 02:16
Otherwise consequent `Text` events (which is possible if their delimited by
Comment or PI events, which is skipped) will be merged but not trimmed.
That will lead to returning a `Text` event when try to call `deserialize_struct`
or `deserialize_map` which will trigger `DeError::ExpectedStart` error.

The incorrect trim behavior was introduced in tafia#581, when DocType event began to be processed
Copy link
Collaborator

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work! After investigating situation I realized, that actually comment is not needed, because trimming Text events after DocType event just follow common rule. Returning false was an oversight in #583, which introduced this regression.

After investigating I realized that we should add tests for deserialization that uses deserialize_struct or deserialize_map methods (HashMap uses the last) and an XML contains comments or processing instructions in XML prolog

@codecov-commenter
Copy link

codecov-commenter commented Jun 9, 2023

Codecov Report

Merging #603 (5634a53) into master (5fc695e) will increase coverage by 0.05%.
The diff coverage is 100.00%.

❗ Current head 5634a53 differs from pull request most recent head d49f2d5. Consider uploading reports for the commit d49f2d5 to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master     #603      +/-   ##
==========================================
+ Coverage   64.38%   64.44%   +0.05%     
==========================================
  Files          36       36              
  Lines       16914    16916       +2     
==========================================
+ Hits        10890    10901      +11     
+ Misses       6024     6015       -9     
Flag Coverage Δ
unittests 64.44% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/de/mod.rs 69.01% <100.00%> (-0.05%) ⬇️

... and 3 files with indirect coverage changes

@Mingun Mingun merged commit 358cc58 into tafia:master Jun 10, 2023
6 checks passed
crapStone added a commit to Calciumdibromid/CaBr2 that referenced this pull request Jun 29, 2023
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [quick-xml](https://github.com/tafia/quick-xml) | dependencies | minor | `0.28.2` -> `0.29.0` |

---

### Release Notes

<details>
<summary>tafia/quick-xml</summary>

### [`v0.29.0`](https://github.com/tafia/quick-xml/blob/HEAD/Changelog.md#&#8203;0290----2023-06-13)

[Compare Source](tafia/quick-xml@v0.28.2...v0.29.0)

##### New Features

-   [#&#8203;601]: Add `serde_helper` module to the crate root with some useful utility
    functions and document using of enum's unit variants as a text content of element.
-   [#&#8203;606]: Implement indentation for `AsyncWrite` trait implementations.

##### Bug Fixes

-   [#&#8203;603]: Fix a regression from [#&#8203;581] that an XML comment or a processing
    instruction between a <!DOCTYPE> and the root element in the file brokes
    deserialization of structs by returning `DeError::ExpectedStart`
-   [#&#8203;608]: Return a new error `Error::EmptyDocType` on empty doctype instead
    of crashing because of a debug assertion.

##### Misc Changes

-   [#&#8203;594]: Add a helper macro to help deserialize internally tagged enums
    with Serde, which doesn't work out-of-box due to serde limitations.

[#&#8203;581]: tafia/quick-xml#581

[#&#8203;594]: tafia/quick-xml#594

[#&#8203;601]: tafia/quick-xml#601

[#&#8203;603]: tafia/quick-xml#603

[#&#8203;606]: tafia/quick-xml#606

[#&#8203;608]: tafia/quick-xml#608

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS4xMTguMCIsInVwZGF0ZWRJblZlciI6IjM1LjExOC4wIiwidGFyZ2V0QnJhbmNoIjoiZGV2ZWxvcCJ9-->

Co-authored-by: cabr2-bot <cabr2.help@gmail.com>
Co-authored-by: crapStone <crapstone01@gmail.com>
Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1940
Reviewed-by: crapStone <crapstone01@gmail.com>
Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants