Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance test against toml-rs #132

Closed
2 of 3 tasks
Tracked by #133
epage opened this issue Aug 26, 2021 · 14 comments
Closed
2 of 3 tasks
Tracked by #133

Performance test against toml-rs #132

epage opened this issue Aug 26, 2021 · 14 comments

Comments

@epage
Copy link
Member

epage commented Aug 26, 2021

The cargo team is concerned about negatively impacting cargos performance in switching to toml_edit. Specifically, they are concerned about parsing large dependency trees.

It is unclear how much impact toml-rs has on the whole operation and how much would be solved by

  • Storing immutable Cargo.toml files in binary
  • Moving strings in cargo to smol_str or kstring

But we should at least be able to say how we compare with the same API (#129)

@epage
Copy link
Member Author

epage commented Sep 12, 2021

We take about twice as long, with the base API. with the easy API, we take even longer.

Ideas

  • Build up a table before inserting, rather than looking up the table on each key-value
  • Dig through parser, looking for unneeded allocations
  • kstring or something like it
  • change our hash function

@epage
Copy link
Member Author

epage commented Sep 22, 2021

My plan is to write up a bench for cargo to see how it performs in a more real-world use case and work with others on setting an acceptable performance target. I'll then look at holistic performance improvements, like using a String type in cargo and toml_edit with small-string optimization or caching parsed results for immutable Cargo.tomls (ie in the registry) in a more optimizable format, like json or cbor.

rust-lang/cargo#9935 is relevant for the benchmark

@epage
Copy link
Member Author

epage commented Sep 24, 2021

I've done small-string optimization for formatting and keys and various other improvements and brought parse time on a medium Cargo.toml from 250us to 200us.

Potential next steps

  • Switch parsers (I saw a lot of overhead from combine)
  • Do small string optimization for values (hesitant to do this)

I guess I'm stuck with the above plan of moving from micro-benchmark to real-world resolver benchmarks.

@epage
Copy link
Member Author

epage commented Sep 29, 2021

We've now dropped from the original 250us to 180us. Still a good distance to the goal of 120us.

combine's Errors::merge is one of our biggest time sinks that seems "easy" to cut. Easy because technically, we have no user-facing errors and we shouldn't need to call this at all. However, what it takes to get combine to recognize that is another thing.

@epage
Copy link
Member Author

epage commented Sep 29, 2021

Current breakdown for running my example, without printing

  • 30% of time is spent in process startup (included since it skews the other percentages)
  • 10% of time is spent copying strings (unaligned copies due to small-string optimization)
  • 1.34% of time is spent in std::str::from_utf8 (I was paranoid and didn't use from_utf8_unchecked in some places when switching to a byte parser, looks small enough to not be worth analyzing)
  • 8% of time is malloc
  • 1.5% of time is realloc
  • 7.8% of time is spent in a combine call Errors::merge (which includes calling into malloc and realloc)

@epage
Copy link
Member Author

epage commented Sep 29, 2021

Errors::merge seems like the lowest hanging fruit because its an error cost hitting us when no user-visible errors happen. From what I understand, this mostly comes from choice trying multiple parsers and returning their failures. Once a parser "commits" (default behavior), it no longer does merging. If we use attempt, the parser doesn't commit and errors get merged.

Looks like the only options for dropping the Errors::merge overhead

  • Replace choice with dispatch
    • This can really much up the grammar and break abstractions because the dispatch call needs to have knowledge of what the underlying parsers
    • This is a game of whack-a-mole
  • Find ways to remove attempt
    • This is hard
    • This is a game of whack-a-mole
  • Create a custom stream (rather than combine::easy) that still gives good errors but doesn't cost so much
    • Biggest risk but biggest potential pay off since its a fix once and it applies everywhere
  • Switch to nom since it doesn't have these weird cases
    • This would also make profiling easier because costs would be associated with functions, rather than combinators returned by those functions
    • A decent amount of work (1-2 days)
    • A bit uglier because they rely on functions more (no impl Parser for tuples, much fewer functions on Parser)

@ordian
Copy link
Member

ordian commented Sep 29, 2021

Another option I considered is rewriting parser entirely using a recursive descent as done toml-rs. That's also a decent amount of work, but we'd gain more control this way in both error messages and overhead.

@epage
Copy link
Member Author

epage commented Sep 29, 2021

Yes, its good to call that one out. I had originally dismissed it because I assume it'll be a significant amount of work to rewrite from scratch or even to figure out how to reuse parts of toml-rs with format-preserving.

@epage
Copy link
Member Author

epage commented Sep 29, 2021

Also, one thing I've appreciated about toml_edit is its easy to map the code to the ABNF. at least toml-rs doesn't maintain that. Unsure how difficult it would be if we started completely from scratch

@killercup
Copy link

Wow, this and your blog post were a fun read, @epage! I'm very happy to see someone as dedicated and talented as you taking on this issue :)

One maybe stupid question: Why didn't you benchmark parsing big Cargo.lock files? Or is the plan to continue to use toml-rs for those?

@epage
Copy link
Member Author

epage commented Oct 6, 2021

Good question and glad you asked so I can get these things recorded.

When talking with Alex on Zulip, their main performance concern is the Resolver and specifically they've seen the parsing of Cargo.tomls show up when profiling. I asked specifically about Cargo.lock and he didn't seem to care.

The reason why writing Cargo.lock doesn't matter is because they've hand implemented it.

@epage
Copy link
Member Author

epage commented Oct 6, 2021

On a related note, I've been asked by others on why I don't do small-string optimizations for Values.

  • This is a trade off between keeping the grammar simple and close to the ABNF vs performance. We have to build up quoted strings gradually to handle escaping and newline normalization. We can create a fast path where there is no escaping or newlines, or if newlines don't need normalization. This will let us then use small string optimization
  • Our primary performance target is serde. Keys are assume to map to struct fields and nothing gets allocated for small-strings. However, Values will frequently map to a String and serde let's us forward the String we allocated directly to the struct. If we introduced small string optimization, at best we'd have 1 unaligned memcpy (for the small string) plus 1 allocation and at worst 2 allocations compared to the 1 if we use a String for values

A non-reason to use String for Value is end-user mutability. This is an API for editing, so the first thought is we might expose Formatted::value_mut so people can edit. We can't provide that API in a safe way because it'll get the Repr out of sync.

@epage
Copy link
Member Author

epage commented Oct 6, 2021

Here is the start of Cargo's resolver benchmark: rust-lang/cargo#9955

I've not dug into it yet but I've been told its not expected to help in our case. As it gets merged, we'll need to profile it to see if it looks like a relevant case and possibly add new cases to help.

@epage epage added the cargo label Nov 16, 2021
@epage
Copy link
Member Author

epage commented Jan 21, 2022

Benchmarks came in around 5% slower. Still acceptable for cargo

@epage epage closed this as completed Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants