Performance test against `toml-rs` #132

epage · 2021-08-26T00:16:37Z

The cargo team is concerned about negatively impacting cargos performance in switching to toml_edit. Specifically, they are concerned about parsing large dependency trees.

It is unclear how much impact toml-rs has on the whole operation and how much would be solved by

Storing immutable Cargo.toml files in binary
Moving strings in cargo to smol_str or kstring

But we should at least be able to say how we compare with the same API (#129)

toml-rs API on top of toml_edit #129
Identify test cases to compare toml_edit, toml-rs API on top of toml_edit #129, and toml-rs
Bring performance in line with toml-rs

The text was updated successfully, but these errors were encountered:

epage · 2021-09-12T00:06:50Z

We take about twice as long, with the base API. with the easy API, we take even longer.

Ideas

Build up a table before inserting, rather than looking up the table on each key-value
Dig through parser, looking for unneeded allocations
kstring or something like it
change our hash function

epage · 2021-09-22T15:22:01Z

My plan is to write up a bench for cargo to see how it performs in a more real-world use case and work with others on setting an acceptable performance target. I'll then look at holistic performance improvements, like using a String type in cargo and toml_edit with small-string optimization or caching parsed results for immutable Cargo.tomls (ie in the registry) in a more optimizable format, like json or cbor.

rust-lang/cargo#9935 is relevant for the benchmark

epage · 2021-09-24T21:59:51Z

I've done small-string optimization for formatting and keys and various other improvements and brought parse time on a medium Cargo.toml from 250us to 200us.

Potential next steps

Switch parsers (I saw a lot of overhead from combine)
Do small string optimization for values (hesitant to do this)

I guess I'm stuck with the above plan of moving from micro-benchmark to real-world resolver benchmarks.

epage · 2021-09-29T01:56:46Z

We've now dropped from the original 250us to 180us. Still a good distance to the goal of 120us.

combine's Errors::merge is one of our biggest time sinks that seems "easy" to cut. Easy because technically, we have no user-facing errors and we shouldn't need to call this at all. However, what it takes to get combine to recognize that is another thing.

epage · 2021-09-29T15:13:51Z

Current breakdown for running my example, without printing

30% of time is spent in process startup (included since it skews the other percentages)
10% of time is spent copying strings (unaligned copies due to small-string optimization)
1.34% of time is spent in std::str::from_utf8 (I was paranoid and didn't use from_utf8_unchecked in some places when switching to a byte parser, looks small enough to not be worth analyzing)
8% of time is malloc
1.5% of time is realloc
7.8% of time is spent in a combine call Errors::merge (which includes calling into malloc and realloc)

epage · 2021-09-29T15:20:58Z

Errors::merge seems like the lowest hanging fruit because its an error cost hitting us when no user-visible errors happen. From what I understand, this mostly comes from choice trying multiple parsers and returning their failures. Once a parser "commits" (default behavior), it no longer does merging. If we use attempt, the parser doesn't commit and errors get merged.

Looks like the only options for dropping the Errors::merge overhead

Replace choice with dispatch
- This can really much up the grammar and break abstractions because the dispatch call needs to have knowledge of what the underlying parsers
- This is a game of whack-a-mole
Find ways to remove attempt
- This is hard
- This is a game of whack-a-mole
Create a custom stream (rather than combine::easy) that still gives good errors but doesn't cost so much
- Biggest risk but biggest potential pay off since its a fix once and it applies everywhere
Switch to nom since it doesn't have these weird cases
- This would also make profiling easier because costs would be associated with functions, rather than combinators returned by those functions
- A decent amount of work (1-2 days)
- A bit uglier because they rely on functions more (no impl Parser for tuples, much fewer functions on Parser)

ordian · 2021-09-29T15:24:04Z

Another option I considered is rewriting parser entirely using a recursive descent as done toml-rs. That's also a decent amount of work, but we'd gain more control this way in both error messages and overhead.

epage · 2021-09-29T15:52:47Z

Yes, its good to call that one out. I had originally dismissed it because I assume it'll be a significant amount of work to rewrite from scratch or even to figure out how to reuse parts of toml-rs with format-preserving.

epage · 2021-09-29T15:54:27Z

Also, one thing I've appreciated about toml_edit is its easy to map the code to the ABNF. at least toml-rs doesn't maintain that. Unsure how difficult it would be if we started completely from scratch

killercup · 2021-10-06T08:34:58Z

Wow, this and your blog post were a fun read, @epage! I'm very happy to see someone as dedicated and talented as you taking on this issue :)

One maybe stupid question: Why didn't you benchmark parsing big Cargo.lock files? Or is the plan to continue to use toml-rs for those?

epage · 2021-10-06T13:44:23Z

Good question and glad you asked so I can get these things recorded.

When talking with Alex on Zulip, their main performance concern is the Resolver and specifically they've seen the parsing of Cargo.tomls show up when profiling. I asked specifically about Cargo.lock and he didn't seem to care.

The reason why writing Cargo.lock doesn't matter is because they've hand implemented it.

epage · 2021-10-06T13:44:28Z

On a related note, I've been asked by others on why I don't do small-string optimizations for Values.

This is a trade off between keeping the grammar simple and close to the ABNF vs performance. We have to build up quoted strings gradually to handle escaping and newline normalization. We can create a fast path where there is no escaping or newlines, or if newlines don't need normalization. This will let us then use small string optimization
Our primary performance target is serde. Keys are assume to map to struct fields and nothing gets allocated for small-strings. However, Values will frequently map to a String and serde let's us forward the String we allocated directly to the struct. If we introduced small string optimization, at best we'd have 1 unaligned memcpy (for the small string) plus 1 allocation and at worst 2 allocations compared to the 1 if we use a String for values

A non-reason to use String for Value is end-user mutability. This is an API for editing, so the first thought is we might expose Formatted::value_mut so people can edit. We can't provide that API in a safe way because it'll get the Repr out of sync.

epage · 2021-10-06T19:45:20Z

Here is the start of Cargo's resolver benchmark: rust-lang/cargo#9955

I've not dug into it yet but I've been told its not expected to help in our case. As it gets merged, we'll need to profile it to see if it looks like a relevant case and possibly add new cases to help.

epage · 2022-01-21T22:34:32Z

Benchmarks came in around 5% slower. Still acceptable for cargo

epage added the documentation label Aug 26, 2021

epage mentioned this issue Aug 26, 2021

Make toml_edit viable for cargo #133

Closed

5 tasks

chipbuster mentioned this issue Sep 14, 2021

starship toggle command reformats configuration file starship/starship#2992

Closed

epage mentioned this issue Sep 22, 2021

Add cargo-add (from cargo-edit) to cargo proper rust-lang/cargo#5586

Closed

epage added the cargo label Nov 16, 2021

epage closed this as completed Jan 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance test against `toml-rs` #132

Performance test against `toml-rs` #132

epage commented Aug 26, 2021 •

edited

Loading

epage commented Sep 12, 2021

epage commented Sep 22, 2021

epage commented Sep 24, 2021

epage commented Sep 29, 2021

epage commented Sep 29, 2021

epage commented Sep 29, 2021 •

edited

Loading

ordian commented Sep 29, 2021

epage commented Sep 29, 2021

epage commented Sep 29, 2021

killercup commented Oct 6, 2021

epage commented Oct 6, 2021

epage commented Oct 6, 2021

epage commented Oct 6, 2021

epage commented Jan 21, 2022

Performance test against toml-rs #132

Performance test against toml-rs #132

Comments

epage commented Aug 26, 2021 • edited Loading

epage commented Sep 12, 2021

epage commented Sep 22, 2021

epage commented Sep 24, 2021

epage commented Sep 29, 2021

epage commented Sep 29, 2021

epage commented Sep 29, 2021 • edited Loading

ordian commented Sep 29, 2021

epage commented Sep 29, 2021

epage commented Sep 29, 2021

killercup commented Oct 6, 2021

epage commented Oct 6, 2021

epage commented Oct 6, 2021

epage commented Oct 6, 2021

epage commented Jan 21, 2022

Performance test against `toml-rs` #132

Performance test against `toml-rs` #132

epage commented Aug 26, 2021 •

edited

Loading

epage commented Sep 29, 2021 •

edited

Loading