Skip to content

Conversation

@Mingun
Copy link
Collaborator

@Mingun Mingun commented Jun 19, 2022

This PR solves several highly-coupled problems, that blocks changing for safe namespace handling (#59). The main reason in that various decode methods accepts a reference to the Reader, but if we move namespace buffer inside the Reader, we could face with borrowing problems (actually, namespace buffer was decoupled from Reader due to this reason in #69).

First, I've fixed #191 by introducing a new StartText event which generated when reader encounters "text" before the XML markup. In well-formed XMLs that text could represent a BOM.

Secondly, I've removed all functions that handle BOM in inappropriate time. For example, there a no sense to handle BOM when you decode attribute values, because you will never get BOM there. For that reason I've moved all BOM handling code to the BytesStartText struct.

Because after that change you can handle BOM only as the first event from the reader, that allowed me to move the automatic BOM-based encoding detection to the reader instead of triggering it by calling BytesText::decode_without_bom (where the name is also incorrect, since the function removes BOM from text and decodes the rest content, so correct one is decode_with_bom_removal).

As encoding API already changed very much, I've also solved the #180 by unify function signature with and without encoding feature enabled.

@Mingun Mingun added bug encoding Issues related to support of various encodings of the XML documents labels Jun 19, 2022
@Mingun Mingun requested a review from dralley June 19, 2022 19:22
@codecov-commenter
Copy link

codecov-commenter commented Jun 19, 2022

Codecov Report

Merging #399 (d49a4b8) into master (3b37c0e) will decrease coverage by 0.42%.
The diff coverage is 75.20%.

@@            Coverage Diff             @@
##           master     #399      +/-   ##
==========================================
- Coverage   61.72%   61.29%   -0.43%     
==========================================
  Files          20       20              
  Lines       10233    10148      -85     
==========================================
- Hits         6316     6220      -96     
- Misses       3917     3928      +11     
Flag Coverage Δ
unittests 61.29% <75.20%> (-0.43%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/de/escape.rs 65.15% <ø> (-1.28%) ⬇️
src/errors.rs 9.52% <0.00%> (-2.85%) ⬇️
src/events/attributes.rs 94.19% <0.00%> (+3.52%) ⬆️
src/lib.rs 21.09% <0.00%> (-4.92%) ⬇️
src/reader.rs 88.38% <79.44%> (-0.02%) ⬇️
src/de/mod.rs 76.14% <80.00%> (+0.87%) ⬆️
src/events/mod.rs 75.57% <80.85%> (+1.23%) ⬆️
src/de/seq.rs 91.83% <100.00%> (-0.76%) ⬇️
src/se/mod.rs 93.81% <100.00%> (-0.01%) ⬇️
src/writer.rs 90.36% <100.00%> (+0.02%) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b37c0e...d49a4b8. Read the comment docs.

Mingun and others added 4 commits June 20, 2022 00:53
Because `Decoder` type was private, hardly ever that someone use it
The method `Reader::decoder()` is public anyway, but its result type is not, which means
that it cannot be used as method argument, and that is not good
BOM cannot arise in the attribute values - this is by definition a mark that
can be located only at the begin of the stream and every attribute value is
inside the stream
Holding reference to a reader prevents some usages in the future

Co-authored-by: Daniel Alley <dalley@redhat.com>
Copy link
Collaborator

@dralley dralley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do a great job with testing, documentation, splitting up commits and such - I can't find too much to complain about :)

All the changes here look sensible

@Mingun
Copy link
Collaborator Author

Mingun commented Jun 20, 2022

@dralley, I tried to reword some sentences in changelog as best as I can, and also add a couple of comments to the TagState. Could you take a look at whether is all ok?

@dralley
Copy link
Collaborator

dralley commented Jun 20, 2022

I left some suggestions

@Mingun
Copy link
Collaborator Author

Mingun commented Jun 20, 2022

Thanks! In the final version I've updated benchmarks (they has been broken since #393) and applied you suggestions. Merging now

@Mingun Mingun merged commit 8fa6f1e into tafia:master Jun 20, 2022
@Mingun Mingun deleted the bom branch June 20, 2022 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug encoding Issues related to support of various encodings of the XML documents

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reduce code redundancy for the encoding feature quick-xml captures UTF BOM as Event::Text encoding feature breaks API

3 participants