-
Notifications
You must be signed in to change notification settings - Fork 266
Add Vec-less reading of events and borrowing deserialization #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cf038ac to
0591583
Compare
0591583 to
3c637bf
Compare
|
Managed to make a prototype of a deserializer that can borrow strings from the input. I had to make a new public trait, and make it a requirement for the input, so this is a major-version-breaking kinda change, but I think most users won't notice it. Closes #195 once finished. Requires #201 to be closed first. Needs a lot more polish, I'm especially not liking the naming and the excessive duplication on the |
tafia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for this PR. This is awesome, I like it very much!!
I have made a few comments, nothing major except perhaps checking if encoding feature works fine for serde related part.
I have 2 higher level comments:
-
As per serde documentation (https://serde.rs/lifetimes.html) we are NOT supposed to use the
'delifetime in our deserializer. I've tried to update the bench-serde.rs file using &str instead of String but I got lot of errors. I am no serde expert so I may be doing something wrong. -
Speaking of benchmarks, we have a 14% regression on regular reader but a 35% amelioration on serde related one.
name old ns/iter new ns/iter diff ns/iter diff % speedup
bench_quick_xml 233,063 264,648 31,585 13.55% x 0.88
bench_serde_quick_xml 1,121,517 728,877 -392,640 -35.01% x 1.54
I believe part of the regression may come from the checks done twice in the read_bang(_element) functions.
Again thanks a lot for this work, it is massive. I am very sorry I couldn't review it sooner.
| visitor.visit_string(value) | ||
| let text = self.next_text()?; | ||
| let unescaped = text.unescaped()?; | ||
| let decoded = self.reader.decoder().decode(&unescaped)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal but I got it wrong the first time and we actually need to decode before unescape.
| visitor.visit_string(value) | ||
| let text = self.next_text()?; | ||
| let unescaped = text.unescaped()?; | ||
| let decoded = self.reader.decoder().decode(&unescaped)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it needs to be adapted to support the encoding feature.
| self.deserialize_string(visitor) | ||
| } | ||
|
|
||
| fn deserialize_str<V: de::Visitor<'de>>(self, visitor: V) -> Result<V::Value, DeError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as deserialize_string
|
|
||
| /// Skips until end element is found. Unlike `next()` it will not allocate | ||
| /// when it cannot satisfy the lifetime. | ||
| fn read_to_end(&mut self, name: &[u8]) -> Result<(), DeError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be in the trait? Can't we implement it automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep hitting an NLL limitation (I mentioned it in the issue), I can only ever touch the generic buf at most once in every branch; in the Vec<u8> implementation I need to clear it multiple times, and I can't do that while it's generic. Once Rust fixes that, or I find a way to bypass the issue, I can probably improve this.
| } | ||
|
|
||
| impl<'i, R: BufRead + 'i> BorrowingReader<'i> for IoReader<R> { | ||
| fn next(&mut self) -> Result<Event<'static>, DeError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be Event<'i> as per the trait definition? Or shouldn't we implement BorrowingReader<'static> for IoReader<R>?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both work, since 'static can be used anywhere a different lifetime is required. I could change it if you want, but I like making it obvious why it is able to satisfy 'i (i.e., it's static so it doesn't care about it).
| }; | ||
|
|
||
| if skip_text { | ||
| return self.read_event_buffered(buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returns can be omitted
|
|
||
| // Note: Do not update position, so the error points to a sane place | ||
| // rather than at the EOF. | ||
| Err(Error::UnexpectedEof("Element".to_string())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we just return Ok(None) and expect the calling function to handle associated error. (UnexpectedEOF for instance).
| } | ||
| } | ||
| let len = buf.len(); | ||
| fn read_bang<'a, 'b>(&'a mut self, buf: &'b [u8]) -> Result<Event<'b>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're checking the starts twice between this read_bang and read_bang_element fn. We could probably find a way to do it only once (either merge the fn but you may need to repeat it for each trait, or have a proper read_comment, read_cdata and read_doctype).
Actually using |
|
Cows, interestingly, default to being deserialized as owned by default (even when the deserializer gives a borrowed string to the visitor). If the field is prefixed with I'm currently in the middle of another project so it'll take a |
|
No worries! I took far too long to review. |
|
That regression is actually really bugging me, I'd like to know at least where it's from. Benchmarks just confuse me, they're consistently saying that only text processing got worse... except I'm pretty sure nothing changed in text processing... o.O name old ns/iter new ns/iter diff ns/iter diff % speedup
- bench_quick_xml 232,610 272,305 39,695 17.07% x 0.85
- bench_quick_xml_escaped 278,140 308,536 30,396 10.93% x 0.90
+ bench_quick_xml_namespaced 414,775 396,945 -17,830 -4.30% x 1.04
+ bench_quick_xml_read_cdata_event 82 71 -11 -13.41% x 1.15
+ bench_quick_xml_read_comment_event 83 72 -11 -13.25% x 1.15
+ bench_quick_xml_read_start_event 111 96 -15 -13.51% x 1.16
- bench_quick_xml_read_text_event 54 67 13 24.07% x 0.81I'll need to come up with better benchmarks. |
|
I agree benchmarks are extremely unsatisfying. Still it repeats consistently.Maybe it is linked to o that read_bang comment |
|
Looks like my previous benchmark was mostly right, it's mostly text that is slower; interestingly, if whitespace is stripped (and thus some events elided) via Half of it comes from a reindexing in Ok(Some(&buf[start..]))I don't know how to fix that yet. Still looking for the other half. Benchmarks: name old ns/iter new ns/iter diff ns/iter diff % speedup
- bench_quick_xml 248,385 273,660 25,275 10.18% x 0.91
- bench_quick_xml_escaped 272,957 306,278 33,321 12.21% x 0.89
+ bench_quick_xml_escaped_trimmed 257,206 248,625 -8,581 -3.34% x 1.03
- bench_quick_xml_mostly_cdata 21,851 22,985 1,134 5.19% x 0.95
- bench_quick_xml_mostly_empty_tags 453,845 499,480 45,635 10.06% x 0.91
- bench_quick_xml_mostly_namespaced_tags 62,985 76,694 13,709 21.77% x 0.82
- bench_quick_xml_mostly_tags 204,582 239,886 35,304 17.26% x 0.85
- bench_quick_xml_mostly_tags_and_text 213,720 233,603 19,883 9.30% x 0.91
- bench_quick_xml_mostly_tags_and_text_trimmed 217,949 219,513 1,564 0.72% x 0.99
+ bench_quick_xml_mostly_tags_trimmed 163,403 144,813 -18,590 -11.38% x 1.13
- bench_quick_xml_namespaced 372,500 428,576 56,076 15.05% x 0.87
+ bench_quick_xml_namespaced_trimmed 384,055 344,745 -39,310 -10.24% x 1.11
+ bench_quick_xml_read_cdata_event_trimmed 84 80 -4 -4.76% x 1.05
+ bench_quick_xml_read_comment_event_trimmed 95 81 -14 -14.74% x 1.17
- bench_quick_xml_read_start_event 126 132 6 4.76% x 0.95
+ bench_quick_xml_read_start_event_trimmed 111 107 -4 -3.60% x 1.04
- bench_quick_xml_read_text_event 58 68 10 17.24% x 0.85
+ bench_quick_xml_trimmed 238,913 234,001 -4,912 -2.06% x 1.02 |
|
What prevented this from seeing movement? Being able to deserialize borrowed data blocks Rocket from using this library for its |
|
There's a 15%-20% degradation in performance for people that don't use this, if we were to merge this; I spent months trying to figure out something here, and every time I found something that would work, it was blocked by an NLL limitation (this, described here). It got really frustrating, and other projects got more interesting. Also, it was only blocking a private hobby project of mine, nothing as important as Rocket. If @tafia is okay to make things slower for non-borrowing users, I can probably pick this up again. Also, the lack of GATs makes the API kinda weird, since I have to use an output lifetime as a sort of input that the user has to somehow hint at to even be able to use it; so I just made all the traits and machinery private, but it would've been nice to have GATs, and was hoping they'd come out faster. This is not essential, I can just keep that stuff private. |
|
I've rebased and fixed a few issues. See #290. Feel free to take the work and continue here or continue there. |
|
Closing it as #290 got merged |
|
@andreivasiliu GATs are very very close to being ready now rust-lang/rust#44265 |
|
I know! I'm very excited about it. Although even once released, it'll probably take a while before I rewrite this to use them, to keep the minimum supported Rust version still accessible for users. |
Add a new
reader.read_event_unbuffered()method that does not require a user-given Vec buffer, and borrows from the input, implemented if the input is a&[u8].Still needs more polishing, and it needs the buf_position branch to be pushed first, since it is based on those changes. I had to move all of the input-reading methods into a trait, and make them return a reference to the text that was read. Because of that, there's now a new requirement of at most 1 input-reading method being called per
read_event(), so I had to rework whitespace skipping, and to move all of the bang element processing into yet anotherread_until-like function which doesn't return until it has all of the text.Next up is making a deserializer that can use this to remove the
DeserializedOwnedrestriction and allow user structs to borrow from the input when possible, allowing for truly zero-copy parsing and deserialization.The only user-facing change is the 1 new method, the rest is completely hidden. I'm not fond of the new method's name, so I'd appreciate any help with figuring out a better name for it.
Bikeshedding for the rest of the names would be appreciated too.