New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix errors in sequence deserialization #387
Conversation
This is a long PR so it might take me a few days to get through it. Going through the documentation first since it gives me a better picture of what to expect from the code. |
failures (36): seq::fixed_name::fixed_size::field_after_list::overlapped seq::fixed_name::fixed_size::field_before_list::overlapped seq::fixed_name::fixed_size::two_lists::overlapped seq::fixed_name::fixed_size::unknown_items::overlapped seq::fixed_name::variable_size::field_after_list::overlapped seq::fixed_name::variable_size::field_before_list::overlapped seq::fixed_name::variable_size::two_lists::overlapped seq::fixed_name::variable_size::unknown_items::overlapped seq::variable_name::fixed_size::field_after_list::after seq::variable_name::fixed_size::field_after_list::before seq::variable_name::fixed_size::field_after_list::overlapped seq::variable_name::fixed_size::field_before_list::after seq::variable_name::fixed_size::field_before_list::before seq::variable_name::fixed_size::field_before_list::overlapped seq::variable_name::fixed_size::two_lists::choice_and_fixed::fixed_after seq::variable_name::fixed_size::two_lists::choice_and_fixed::fixed_before seq::variable_name::fixed_size::two_lists::choice_and_fixed::overlapped_fixed_after seq::variable_name::fixed_size::two_lists::choice_and_fixed::overlapped_fixed_before seq::variable_name::fixed_size::two_lists::fixed_and_choice::fixed_after seq::variable_name::fixed_size::two_lists::fixed_and_choice::fixed_before seq::variable_name::fixed_size::two_lists::fixed_and_choice::overlapped_fixed_after seq::variable_name::fixed_size::two_lists::fixed_and_choice::overlapped_fixed_before seq::variable_name::variable_size::field_after_list::after seq::variable_name::variable_size::field_after_list::before seq::variable_name::variable_size::field_after_list::overlapped seq::variable_name::variable_size::field_before_list::after seq::variable_name::variable_size::field_before_list::before seq::variable_name::variable_size::field_before_list::overlapped seq::variable_name::variable_size::two_lists::choice_and_fixed::fixed_after seq::variable_name::variable_size::two_lists::choice_and_fixed::fixed_before seq::variable_name::variable_size::two_lists::choice_and_fixed::overlapped_fixed_after seq::variable_name::variable_size::two_lists::choice_and_fixed::overlapped_fixed_before seq::variable_name::variable_size::two_lists::fixed_and_choice::fixed_after seq::variable_name::variable_size::two_lists::fixed_and_choice::fixed_before seq::variable_name::variable_size::two_lists::fixed_and_choice::overlapped_fixed_after seq::variable_name::variable_size::two_lists::fixed_and_choice::overlapped_fixed_before
@dralley , I've address all your suggestions except one (I have a question about it). Could you finish your review this week? I have a tons of plans and unfortunately they all are a strictly coupled with each other. (Use Compare button against forse-pushed line to see the difference from the previous version) |
@Mingun Yeah, I'll take another look tonight |
src/de/seq.rs
Outdated
/// get a string representation of a tag. | ||
/// | ||
/// Returns `true`, if `start` is not in the `fields` list and `false` otherwise. | ||
pub fn is_unknown( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be named more descriptively e.g. is_unknown_{field|tag|element}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edit: I guess "unknown" here is referring to lookahead and what element will come next rather than what elements should be accepted by the serializer?
"unknown" also sounds a little strange in this context since we theoretically do known the names of the enum variants?
re:
enum Choice { one, two, three } struct Root { #[serde(rename = "$value")] item: [Choice; 3], node: (), }
and this struct should be able to deserialized from
<root> <one/> <two/> <three/> <node/> </root>
Maybe tag_maps_to_field
? (That name sounds a bit wrong as well, so if you prefer the former names I'm fine with that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose is_unknown
simple in contrary to is_known
(i.e. if tag is listed in fields
it is known for a map), but because each usage of is_known()
would be !is_known()
I inverted it. The method does not have any specific cognitive load, so just not_in
should be enough.
src/de/seq.rs
Outdated
/// so we take an any element. But because in XML sequences a flattened into a | ||
/// maps, then we could take an elements that have their own dedicated fields | ||
/// in a struct. To prevent this we use an `Exclude` filter, that filters out | ||
/// any known names of a struct fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// In the second case we don't know what tag name should be expected as a
/// sequence element, so we accept any element. Since the sequence are flattened
/// into maps, we collect elements which have dedicated fields in a struct by using an
/// `Exclude` filter that filters out elements with names matching field names from the struct.
This language is kind of subtle so let me know if I butchered the meaning. Again though I guess I'm not sure I understand why we can't know the names of the enum fields - not familiar with Serde's API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then
/// In the second case we don't know what tag name should be expected as a
/// sequence element, so we accept any element. Since the sequence are flattened
/// into maps, we skip elements which have dedicated fields in a struct by using an
///Exclude
filter that filters out elements with names matching field names from the struct.
This describes that fact, that struct
struct AnyName {
field: (),
list: Vec<()>,
}
in XML would be represented as
<any-tag>
<field/>
<list/>
<list/>
<any-tag>
The JSON equivalent would be
{
"field": null,
"list": null,
"list": null
}
instead of correct
{
"field": null,
"list": [null, null]
}
Because collection element could be an enum
and enums in XML could be represented as
<variant-name .../>
and because the type of each struct
field is opaque to us at the AnyName
deserialization level, we should consider all tags as a potential list items, if one of our struct fields will be a sequence (for example, a Vec<T>
), with one exception: if we are know with guarantee that a tag with that name is mapped to another field. We list all names of that tags in the Exclude
variant and this list is a fields
list of a struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I always thought the most natural XML representation was akin to the JSON one, e.g.
<any-tag>
<field/>
<list>
<item/>
<item/>
<item/>
</list>
<any-tag>
There would then be no ambiguity about which tags map to which fields. But given we have to process whatever the user provided, and their schema might not be structured this way, I see your point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be nice for this to be the case, but I often see such XML. If you think about it, this is natural, given how easy it is to describe such a structure in XSD:
<xs:complexType>
<xs:sequence>
<xs:element name="field"/>
<xs:element name="list" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
Example of a struct, that should be a valid definition: enum Choice { one, two, three } struct Root { #[serde(rename = "$value")] item: [Choice; 3], node: (), } and this struct should be able to deserialized from <root> <one/> <two/> <three/> <node/> </root> The following tests were fixed (10): seq::variable_name::fixed_size::field_after_list::after seq::variable_name::fixed_size::field_after_list::before seq::variable_name::fixed_size::field_before_list::after seq::variable_name::fixed_size::field_before_list::before seq::variable_name::fixed_size::two_lists::choice_and_fixed::fixed_after seq::variable_name::fixed_size::two_lists::choice_and_fixed::fixed_before seq::variable_name::fixed_size::two_lists::fixed_and_choice::fixed_after seq::variable_name::fixed_size::two_lists::fixed_and_choice::fixed_before seq::variable_name::variable_size::field_after_list::before seq::variable_name::variable_size::field_before_list::before failures (26): seq::fixed_name::fixed_size::field_after_list::overlapped seq::fixed_name::fixed_size::field_before_list::overlapped seq::fixed_name::fixed_size::two_lists::overlapped seq::fixed_name::fixed_size::unknown_items::overlapped seq::fixed_name::variable_size::field_after_list::overlapped seq::fixed_name::variable_size::field_before_list::overlapped seq::fixed_name::variable_size::two_lists::overlapped seq::fixed_name::variable_size::unknown_items::overlapped seq::variable_name::fixed_size::field_after_list::overlapped seq::variable_name::fixed_size::field_before_list::overlapped seq::variable_name::fixed_size::two_lists::choice_and_fixed::overlapped_fixed_after seq::variable_name::fixed_size::two_lists::choice_and_fixed::overlapped_fixed_before seq::variable_name::fixed_size::two_lists::fixed_and_choice::overlapped_fixed_after seq::variable_name::fixed_size::two_lists::fixed_and_choice::overlapped_fixed_before seq::variable_name::variable_size::field_after_list::after seq::variable_name::variable_size::field_after_list::overlapped seq::variable_name::variable_size::field_before_list::after seq::variable_name::variable_size::field_before_list::overlapped seq::variable_name::variable_size::two_lists::choice_and_fixed::fixed_after seq::variable_name::variable_size::two_lists::choice_and_fixed::fixed_before seq::variable_name::variable_size::two_lists::choice_and_fixed::overlapped_fixed_after seq::variable_name::variable_size::two_lists::choice_and_fixed::overlapped_fixed_before seq::variable_name::variable_size::two_lists::fixed_and_choice::fixed_after seq::variable_name::variable_size::two_lists::fixed_and_choice::fixed_before seq::variable_name::variable_size::two_lists::fixed_and_choice::overlapped_fixed_after seq::variable_name::variable_size::two_lists::fixed_and_choice::overlapped_fixed_before Co-authored-by: Daniel Alley <dalley@redhat.com>
Codecov Report
@@ Coverage Diff @@
## master #387 +/- ##
==========================================
+ Coverage 58.80% 60.73% +1.93%
==========================================
Files 19 19
Lines 9537 9843 +306
==========================================
+ Hits 5608 5978 +370
+ Misses 3929 3865 -64
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
src/de/map.rs
Outdated
/// Determines, should [`Deserializer::next_text_impl()`] expand the second | ||
/// level of tags or not. | ||
/// | ||
/// If this field is `true`, we processes the following XML shape: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// If this field is `true`, we processes the following XML shape: | |
/// If this field is `true`, we process the following XML shape: |
- name: Run tests (all features) | ||
env: | ||
LLVM_PROFILE_FILE: coverage/all-features-%p-%m.profraw | ||
run: cargo test --all-features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this should cover testing both with and without overlapped-lists
which is a concern because of all the alternate code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is why I've added this.
@@ -42,6 +42,52 @@ default = [] | |||
## [standard compliant]: https://www.w3.org/TR/xml11/#charencoding | |||
encoding = ["encoding_rs"] | |||
|
|||
## This feature enables support for deserializing lists where tags are overlapped | |||
## with tags that do not correspond to the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious about the use case, is this basically for situations where you're parsing messy handwritten XML / HTML? Generally I would think machine-generated XML wouldn't have this problem.
It might be worth providing that as an example of a situation where you might want to enable it if that is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I cannot remember that would I worked with such XMLs, but all parsers, that I know, very tolerant to overlapped lists (even in xs:sequence
, although XSD requires exact order of tags. I never seen using xs:all
that allows arbitrary order). So I thought there should be an option to allow this behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dralley I'm trying to parse KML file (from Google Earth Pro) which can have elements of types Folder, Document and Point interleaved in a single list. I'd like to ideally get them in the same order, so I tried to create an enum covering all three cases, but it didn't work. So this at least gives me an option to parse it as three lists without the preservation of order in-between the items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@k-bx, if you case still didn't work, could you describe it in more detail? That is what that definitely should work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mingun it works good enough, but not perfectly. The desired way to handle things is:
input: <TagOne></TagOne><TagTwo></TagTwo><TagOne></TagOne>
parsed as:
struct TagOne {};
struct TagTwo {};
enum Tag {
TagOne(TagOne),
TagTwo(TagTwo)
}
-- result should be
vec![Tag::TagOne(TagOne {}), Tag::TagTwo(TagTwo {}), Tag::TagOne(TagOne {})]
What you get is a perfect preservation of the order in which you get your mixed tags.
I didn't see a way to do something like this via the current lib ^
Do I need to create a separate issue to make something like this possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is an error, I filled #500 for it
The following tests were fixed (6): seq::variable_name::variable_size::field_after_list::after seq::variable_name::variable_size::field_before_list::after seq::variable_name::variable_size::two_lists::choice_and_fixed::fixed_after seq::variable_name::variable_size::two_lists::choice_and_fixed::fixed_before seq::variable_name::variable_size::two_lists::fixed_and_choice::fixed_after seq::variable_name::variable_size::two_lists::fixed_and_choice::fixed_before failures (20): seq::fixed_name::fixed_size::field_after_list::overlapped seq::fixed_name::fixed_size::field_before_list::overlapped seq::fixed_name::fixed_size::two_lists::overlapped seq::fixed_name::fixed_size::unknown_items::overlapped seq::fixed_name::variable_size::field_after_list::overlapped seq::fixed_name::variable_size::field_before_list::overlapped seq::fixed_name::variable_size::two_lists::overlapped seq::fixed_name::variable_size::unknown_items::overlapped seq::variable_name::fixed_size::field_after_list::overlapped seq::variable_name::fixed_size::field_before_list::overlapped seq::variable_name::fixed_size::two_lists::choice_and_fixed::overlapped_fixed_after seq::variable_name::fixed_size::two_lists::choice_and_fixed::overlapped_fixed_before seq::variable_name::fixed_size::two_lists::fixed_and_choice::overlapped_fixed_after seq::variable_name::fixed_size::two_lists::fixed_and_choice::overlapped_fixed_before seq::variable_name::variable_size::field_after_list::overlapped seq::variable_name::variable_size::field_before_list::overlapped seq::variable_name::variable_size::two_lists::choice_and_fixed::overlapped_fixed_after seq::variable_name::variable_size::two_lists::choice_and_fixed::overlapped_fixed_before seq::variable_name::variable_size::two_lists::fixed_and_choice::overlapped_fixed_after seq::variable_name::variable_size::two_lists::fixed_and_choice::overlapped_fixed_before Co-authored-by: Daniel Alley <dalley@redhat.com>
…outside it This fixes a data race with nested maps in sequence fields. Also, top-level sequences always have `has_value_field == false`, so no need to check it Co-authored-by: Daniel Alley <dalley@redhat.com>
Example of such XML: <item/> <another-item/> <item/> <item/> Here we need to skip `<another-item/>` in order to collect all `<item/>`s. So ability to skip events and replay them later was added This fixes all remaining tests Co-authored-by: Daniel Alley <dalley@redhat.com>
…onsumption Co-authored-by: Daniel Alley <dalley@redhat.com>
Replaces Mingun/fast-xml#12
Fixes #342
This PR adds a bunch of tests to ensure, that deserialization of sequences is correct in all situations.
Structs with
$value
field can now deserialize their other fields from elements, not only from attributes, as it was before:can be deserialized into
Added a new cargo feature
overlapped-lists
which allows also interleave sequencetags with non-sequence tags in the XML. When use that feature the following XML
can be deserialized to the struct shown above: