Skip to content

Conversation

@andreivasiliu
Copy link
Contributor

@andreivasiliu andreivasiliu commented Apr 7, 2020

Add a new reader.read_event_unbuffered() method that does not require a user-given Vec buffer, and borrows from the input, implemented if the input is a &[u8].

Still needs more polishing, and it needs the buf_position branch to be pushed first, since it is based on those changes. I had to move all of the input-reading methods into a trait, and make them return a reference to the text that was read. Because of that, there's now a new requirement of at most 1 input-reading method being called per read_event(), so I had to rework whitespace skipping, and to move all of the bang element processing into yet another read_until-like function which doesn't return until it has all of the text.

Next up is making a deserializer that can use this to remove the DeserializedOwned restriction and allow user structs to borrow from the input when possible, allowing for truly zero-copy parsing and deserialization.

The only user-facing change is the 1 new method, the rest is completely hidden. I'm not fond of the new method's name, so I'd appreciate any help with figuring out a better name for it.

Bikeshedding for the rest of the names would be appreciated too.

@andreivasiliu andreivasiliu marked this pull request as draft April 10, 2020 09:00
@andreivasiliu andreivasiliu changed the title Add Vec-less reading of events Add Vec-less reading of events and borrowing deserialization Apr 18, 2020
@andreivasiliu
Copy link
Contributor Author

Managed to make a prototype of a deserializer that can borrow strings from the input. I had to make a new public trait, and make it a requirement for the input, so this is a major-version-breaking kinda change, but I think most users won't notice it.

Closes #195 once finished. Requires #201 to be closed first.

Needs a lot more polish, I'm especially not liking the naming and the excessive duplication on the read_until-like methods.

Copy link
Owner

@tafia tafia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for this PR. This is awesome, I like it very much!!

I have made a few comments, nothing major except perhaps checking if encoding feature works fine for serde related part.

I have 2 higher level comments:

  1. As per serde documentation (https://serde.rs/lifetimes.html) we are NOT supposed to use the 'de lifetime in our deserializer. I've tried to update the bench-serde.rs file using &str instead of String but I got lot of errors. I am no serde expert so I may be doing something wrong.

  2. Speaking of benchmarks, we have a 14% regression on regular reader but a 35% amelioration on serde related one.

name                   old ns/iter  new ns/iter  diff ns/iter   diff %  speedup 
 bench_quick_xml        233,063      264,648            31,585   13.55%   x 0.88 
 bench_serde_quick_xml  1,121,517    728,877          -392,640  -35.01%   x 1.54 

I believe part of the regression may come from the checks done twice in the read_bang(_element) functions.

Again thanks a lot for this work, it is massive. I am very sorry I couldn't review it sooner.

visitor.visit_string(value)
let text = self.next_text()?;
let unescaped = text.unescaped()?;
let decoded = self.reader.decoder().decode(&unescaped)?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal but I got it wrong the first time and we actually need to decode before unescape.

visitor.visit_string(value)
let text = self.next_text()?;
let unescaped = text.unescaped()?;
let decoded = self.reader.decoder().decode(&unescaped)?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it needs to be adapted to support the encoding feature.

self.deserialize_string(visitor)
}

fn deserialize_str<V: de::Visitor<'de>>(self, visitor: V) -> Result<V::Value, DeError> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as deserialize_string


/// Skips until end element is found. Unlike `next()` it will not allocate
/// when it cannot satisfy the lifetime.
fn read_to_end(&mut self, name: &[u8]) -> Result<(), DeError>;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be in the trait? Can't we implement it automatically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep hitting an NLL limitation (I mentioned it in the issue), I can only ever touch the generic buf at most once in every branch; in the Vec<u8> implementation I need to clear it multiple times, and I can't do that while it's generic. Once Rust fixes that, or I find a way to bypass the issue, I can probably improve this.

}

impl<'i, R: BufRead + 'i> BorrowingReader<'i> for IoReader<R> {
fn next(&mut self) -> Result<Event<'static>, DeError> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be Event<'i> as per the trait definition? Or shouldn't we implement BorrowingReader<'static> for IoReader<R>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both work, since 'static can be used anywhere a different lifetime is required. I could change it if you want, but I like making it obvious why it is able to satisfy 'i (i.e., it's static so it doesn't care about it).

};

if skip_text {
return self.read_event_buffered(buf);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns can be omitted


// Note: Do not update position, so the error points to a sane place
// rather than at the EOF.
Err(Error::UnexpectedEof("Element".to_string()))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we just return Ok(None) and expect the calling function to handle associated error. (UnexpectedEOF for instance).

}
}
let len = buf.len();
fn read_bang<'a, 'b>(&'a mut self, buf: &'b [u8]) -> Result<Event<'b>> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're checking the starts twice between this read_bang and read_bang_element fn. We could probably find a way to do it only once (either merge the fn but you may need to repeat it for each trait, or have a proper read_comment, read_cdata and read_doctype).

@tafia
Copy link
Owner

tafia commented May 17, 2020

As per serde documentation (https://serde.rs/lifetimes.html) we are NOT supposed to use the 'de lifetime in our deserializer. I've tried to update the bench-serde.rs file using &str instead of String but I got lot of errors. I am no serde expert so I may be doing something wrong.

Actually using Cow<'a, str> worked but it was slower than using regular String ...

@andreivasiliu
Copy link
Contributor Author

andreivasiliu commented May 17, 2020

Cows, interestingly, default to being deserialized as owned by default (even when the deserializer gives a borrowed string to the visitor). If the field is prefixed with #[serde(borrow)] then it'll borrow when it can, or take owned when decoding/unescaping says it can't.

I'm currently in the middle of another project so it'll take a while week or two until I get back to this, apologies!

@tafia
Copy link
Owner

tafia commented May 17, 2020

No worries! I took far too long to review.

@andreivasiliu
Copy link
Contributor Author

That regression is actually really bugging me, I'd like to know at least where it's from.

Benchmarks just confuse me, they're consistently saying that only text processing got worse... except I'm pretty sure nothing changed in text processing... o.O

  name                                old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
- bench_quick_xml                     232,610      272,305            39,695   17.07%   x 0.85
- bench_quick_xml_escaped             278,140      308,536            30,396   10.93%   x 0.90
+ bench_quick_xml_namespaced          414,775      396,945           -17,830   -4.30%   x 1.04
+ bench_quick_xml_read_cdata_event    82           71                    -11  -13.41%   x 1.15
+ bench_quick_xml_read_comment_event  83           72                    -11  -13.25%   x 1.15
+ bench_quick_xml_read_start_event    111          96                    -15  -13.51%   x 1.16
- bench_quick_xml_read_text_event     54           67                     13   24.07%   x 0.81

I'll need to come up with better benchmarks.

@tafia
Copy link
Owner

tafia commented Jun 4, 2020

I agree benchmarks are extremely unsatisfying. Still it repeats consistently.Maybe it is linked to o that read_bang comment

@andreivasiliu
Copy link
Contributor Author

Looks like my previous benchmark was mostly right, it's mostly text that is slower; interestingly, if whitespace is stripped (and thus some events elided) via trim_text, it is sometimes even faster than before.

Half of it comes from a reindexing in read_bytes_until, which causes a bounds check now since it doesn't know that the buffer's length can only ever grow:

Ok(Some(&buf[start..]))

I don't know how to fix that yet. Still looking for the other half.

Benchmarks:

  name                                          old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
- bench_quick_xml                               248,385      273,660            25,275   10.18%   x 0.91
- bench_quick_xml_escaped                       272,957      306,278            33,321   12.21%   x 0.89
+ bench_quick_xml_escaped_trimmed               257,206      248,625            -8,581   -3.34%   x 1.03
- bench_quick_xml_mostly_cdata                  21,851       22,985              1,134    5.19%   x 0.95
- bench_quick_xml_mostly_empty_tags             453,845      499,480            45,635   10.06%   x 0.91
- bench_quick_xml_mostly_namespaced_tags        62,985       76,694             13,709   21.77%   x 0.82
- bench_quick_xml_mostly_tags                   204,582      239,886            35,304   17.26%   x 0.85
- bench_quick_xml_mostly_tags_and_text          213,720      233,603            19,883    9.30%   x 0.91
- bench_quick_xml_mostly_tags_and_text_trimmed  217,949      219,513             1,564    0.72%   x 0.99
+ bench_quick_xml_mostly_tags_trimmed           163,403      144,813           -18,590  -11.38%   x 1.13
- bench_quick_xml_namespaced                    372,500      428,576            56,076   15.05%   x 0.87
+ bench_quick_xml_namespaced_trimmed            384,055      344,745           -39,310  -10.24%   x 1.11
+ bench_quick_xml_read_cdata_event_trimmed      84           80                     -4   -4.76%   x 1.05
+ bench_quick_xml_read_comment_event_trimmed    95           81                    -14  -14.74%   x 1.17
- bench_quick_xml_read_start_event              126          132                     6    4.76%   x 0.95
+ bench_quick_xml_read_start_event_trimmed      111          107                    -4   -3.60%   x 1.04
- bench_quick_xml_read_text_event               58           68                     10   17.24%   x 0.85
+ bench_quick_xml_trimmed                       238,913      234,001            -4,912   -2.06%   x 1.02

@SergioBenitez
Copy link
Contributor

What prevented this from seeing movement? Being able to deserialize borrowed data blocks Rocket from using this library for its xml support. See rwf2/Rocket#1606.

@andreivasiliu
Copy link
Contributor Author

There's a 15%-20% degradation in performance for people that don't use this, if we were to merge this; I spent months trying to figure out something here, and every time I found something that would work, it was blocked by an NLL limitation (this, described here). It got really frustrating, and other projects got more interesting.

Also, it was only blocking a private hobby project of mine, nothing as important as Rocket.

If @tafia is okay to make things slower for non-borrowing users, I can probably pick this up again.

Also, the lack of GATs makes the API kinda weird, since I have to use an output lifetime as a sort of input that the user has to somehow hint at to even be able to use it; so I just made all the traits and machinery private, but it would've been nice to have GATs, and was hoping they'd come out faster. This is not essential, I can just keep that stuff private.

@SergioBenitez
Copy link
Contributor

I've rebased and fixed a few issues. See #290. Feel free to take the work and continue here or continue there.

@tafia
Copy link
Owner

tafia commented Aug 10, 2021

Closing it as #290 got merged

@tafia tafia closed this Aug 10, 2021
@dralley
Copy link
Collaborator

dralley commented Sep 2, 2021

@andreivasiliu GATs are very very close to being ready now rust-lang/rust#44265

@andreivasiliu
Copy link
Contributor Author

I know! I'm very excited about it. Although even once released, it'll probably take a while before I rewrite this to use them, to keep the minimum supported Rust version still accessible for users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants