Investigate degradation of performance for medium-size codebase (5~10 kLoc) #1622

yannham · 2023-09-21T16:32:51Z

Describe the bug
While I can't produce the code here for privacy reason, I've been personally given a Nickel codebase where the export time is very long (for 5~10kLoC, between 10 and 20s on my laptop). There is no reason that a program of this size is that slow. The code looks rather reasonable, and it means that we are at risk of similar performance degradation for any codebase that reaches this size.

Let's investigate, profile and rely on #1484 to see what's happening and bring that time down to something reasonable.

To Reproduce
Can't share the offending code here for privacy reason, unfortunately.

Expected behavior
That the export time is low (as a starter, say under a second).

Tasks

Give feedback

Contract deduplication for records #1646

type: feature request
Collect basic performance metrics #1648

type: feature request
Get rid of most generated variables #1647

type: feature request
Contract elision for static type annotations #1668
Options

yannham · 2023-09-26T09:28:07Z

Update

The flamegraph isn't very informative, without one big obvious contention point:

@vkleen's plan is to start adding tracing to various part of the Nickel interpreter to gather finer stats and timing for those kind of use-cases.

In the meantime, I noticed that:

A lot of contract application were duplicated. Printing the result (with the default nickel invocation, without export) shows things such as { some_field | ContractFoo | ContractFoo | ContractFoo | ContractFoo ...}. I implemented contract deduplication using the contract equality check from the typechecker, which brought down the evaluation time by around 50%. I still think we can reduce the original time by a good order of magnitude.
A bazillion of generated variables are created. It's probably stressing the environment (insertion and lookup) a lot, and in the worst case, environment can behave as a linked list which could explain the overall performance. The next step is to add a Closure node to the AST instead of using generated variables. That is, use pointers directly without environment indirection.

yannham · 2023-10-09T13:28:34Z

Some news: I got rid of closurization (or at least of generated variables) on a branch, which was a non trivial changes. It got rid of almost 1M environment oeprations (insertion and fetch). But much to my surprise, this wasn't the dominant cost, and this didn't win much. Some metrics:

# on branch master 09e51fba6a08fa269bd32c024e2600b8699e72eb
# before getting rid of generated variables
$ cargo run --bin nickel --release --features metrics -- eval opl-config-test/config.ncl --metrics
Environment::get: 14655437
LocIdent::fresh: 1153805
Environment::insert: 7555685
Program::new: 1
Thunk::new_rev: 873
Environment::clone: 33178242
Thunk::new: 7163086
Environment.curr_layer_size_at_clone: min 1, max 923, avg 1.5; samples 4566981
Environment.hashmap_size_get: min 0, max 923, avg 1.9; samples 47239539
Environment.get_layers_traversed: min 1, max 924, avg 3.2; samples 14655437

# after removal of generated variables
$ cargo run --bin nickel --release --features metrics -- eval opl-config-test/config.ncl --metrics
Environment::get: 11808112
LocIdent::fresh: 369
Environment::insert: 6369042
Program::new: 1
Thunk::new_rev: 873
Environment::clone: 34544815
Thunk::new: 7234437
Environment.curr_layer_size_at_clone: min 1, max 357, avg 1.4; samples 4267139
Environment.hashmap_size_get: min 0, max 110, avg 1.2; samples 33297774
Environment.get_layers_traversed: min 1, max 35, avg 2.8; samples 11808112

We won 1M environment operations, but the order of magnitude didn't change. And it's still huge. I started to gathered way much more metrics, number of execution per primitive operations and per contract applications, typically (the data are verbose and raw, so I joined the file.

out.metrics.master.log

We are applying 1.3M contracts (!) and, it seems, a lot of array contracts - and in particular polymorphic contracts. Which is a bit of a shame, given that for example the map function's contract forall a b. a -> b -> Array a -> Array b is applying to each and every element of the array 4 contracts: sealing a, unsealing a, sealing b, unsealing b. Which is linear in the size of the argument. But those contracts, as long as our type system is sound (in particular, that blame safety is verified), are all useless ! They are all checking for things that map must ensure, but map being statically typed, it has to be the case.

The next step I did was thus to implement a form of contract simplification for static type annotations, to get constant-time behavior as much as possible. This resulted in another 50% win: we're not back under the seconds, but still back under 4sec, which is not bad. Will set up the corresponding PR soon, and continue from there to understand why we still have so many contracts being applied.

marijetamh · 2023-10-17T14:52:53Z

@yannham is this done, as due date is passed ?

yannham · 2023-10-17T15:08:20Z

While it's not back "under a second", it's consistently back around 3seconds after several optimizations. This is not the end of the story, but work coming next can be done as part of a new tracking issue/epic, I think. We did investigate and applied somehow successfully the first train of optimizations.

yannham added P0 blocker: fix immediately! type: bug labels Sep 21, 2023

yannham self-assigned this Sep 21, 2023

yannham mentioned this issue Sep 26, 2023

Implement contract deduplication optimization #1631

Merged

1 task

yannham closed this as completed in #1631 Sep 28, 2023

vkleen reopened this Sep 28, 2023

This was referenced Sep 29, 2023

Contract deduplication for records #1646

Closed

Collect basic performance metrics #1648

Closed

This was referenced Oct 9, 2023

Contract elision for static types #1671

Merged

Contract elision for static type annotations #1668

Closed

Array contract deduplication #1674

Merged

Get rid of most generated variables #1679

Merged

yannham closed this as completed Oct 17, 2023

yannham mentioned this issue Oct 25, 2023

Use match_shared in the main eval loop #1698

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate degradation of performance for medium-size codebase (5~10 kLoc) #1622

Investigate degradation of performance for medium-size codebase (5~10 kLoc) #1622

yannham commented Sep 21, 2023 •

edited

Tasks

yannham commented Sep 26, 2023

yannham commented Oct 9, 2023 •

edited

marijetamh commented Oct 17, 2023

yannham commented Oct 17, 2023

Investigate degradation of performance for medium-size codebase (5~10 kLoc) #1622

Investigate degradation of performance for medium-size codebase (5~10 kLoc) #1622

Comments

yannham commented Sep 21, 2023 • edited

Tasks

yannham commented Sep 26, 2023

Update

yannham commented Oct 9, 2023 • edited

marijetamh commented Oct 17, 2023

yannham commented Oct 17, 2023

yannham commented Sep 21, 2023 •

edited

yannham commented Oct 9, 2023 •

edited