Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename reading methods, fix some encoding errors in error-processing paths and tests #412

Merged
merged 8 commits into from Jul 9, 2022

Conversation

Mingun
Copy link
Collaborator

@Mingun Mingun commented Jul 8, 2022

This PR includes some of the stuff, improving code here and there, including:

  • fix incorrect encoding used when handle some errors

  • remove excess BufRead constraints which opens door for the async support fo the same Reader struct

  • some minor improvements in tests

  • some documentation updates

  • rename core methods to better reflect their purpose:

    Old Name New Name
    read_event read_event_into
    read_to_end read_to_end_into
    read_text read_text_into
    read_event_unbuffered read_event
    read_to_end_unbuffered read_to_end

    (read_namespaced_event is not renamed because I plan to remove it in the namespace PR)

@Mingun Mingun added enhancement encoding Issues related to support of various encodings of the XML documents documentation Issues about improvements or bugs in documentation labels Jul 8, 2022
@Mingun Mingun requested a review from dralley July 8, 2022 16:55
src/reader.rs Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jul 8, 2022

Codecov Report

Merging #412 (ae458cb) into master (0febc2b) will decrease coverage by 0.45%.
The diff coverage is 24.41%.

@@            Coverage Diff             @@
##           master     #412      +/-   ##
==========================================
- Coverage   50.03%   49.58%   -0.46%     
==========================================
  Files          22       22              
  Lines       13870    13967      +97     
==========================================
- Hits         6940     6925      -15     
- Misses       6930     7042     +112     
Flag Coverage Δ
unittests 49.58% <24.41%> (-0.46%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
benches/macrobenches.rs 0.00% <0.00%> (ø)
benches/microbenches.rs 0.00% <0.00%> (ø)
examples/custom_entities.rs 0.00% <0.00%> (ø)
examples/nested_readers.rs 0.00% <0.00%> (ø)
examples/read_texts.rs 0.00% <0.00%> (ø)
src/events/attributes.rs 93.58% <0.00%> (-0.62%) ⬇️
src/lib.rs 12.26% <0.00%> (-0.07%) ⬇️
src/writer.rs 49.01% <0.00%> (ø)
src/reader.rs 60.07% <24.63%> (-3.39%) ⬇️
src/de/mod.rs 77.29% <75.00%> (-0.39%) ⬇️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0febc2b...ae458cb. Read the comment docs.

from_utf8(a.key.as_ref()).unwrap(),
from_utf8(&*a.unescaped_value().unwrap()).unwrap()
decoder.decode(a.key.as_ref()).unwrap(),
decoder.decode(&*a.unescaped_value().unwrap()).unwrap()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would be another instance of the incorrect "unescape, then decode" behavior mentioned, yes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you right. Need to fix that too

@dralley
Copy link
Collaborator

dralley commented Jul 9, 2022

In the future, something like https://crates.io/crates/buffered-reader might be nice - the internal buffer can dynamically resize which would allow avoiding the copy between the BufReader buffer and an external one.

Unfortunately, at the moment, it is LGPL licensed which makes it a bit challenging to depend on.

Mingun and others added 7 commits July 9, 2022 18:13
Also, Text events was incorrectly unescaped before decoding instead of
correct decode, then unescape
- read_event             => read_event_into
- read_to_end            => read_to_end_into
- read_text              => read_text_into
- read_event_unbuffered  => read_event
- read_to_end_unbuffered => read_to_end
- read_event_buffered    => read_event_impl
…sRef<[u8]>`

AsRef is too unsafe because you accidentally could pass wrong parameters

Also fixes using incorrect encoding if `read_to_end` family of methods or `read_text`
method would not find a corresponding end tag and the reader has a non-UTF-8 encoding
Co-authored-by: Daniel Alley <dalley@redhat.com>
@Mingun Mingun merged commit 2eecf00 into tafia:master Jul 9, 2022
@Mingun Mingun deleted the soundness branch July 9, 2022 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Issues about improvements or bugs in documentation encoding Issues related to support of various encodings of the XML documents enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants