Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate degradation of performance for medium-size codebase (5~10 kLoc) #1622

Closed
4 tasks done
yannham opened this issue Sep 21, 2023 · 4 comments · Fixed by #1631
Closed
4 tasks done

Investigate degradation of performance for medium-size codebase (5~10 kLoc) #1622

yannham opened this issue Sep 21, 2023 · 4 comments · Fixed by #1631
Assignees
Labels
P0 blocker: fix immediately! type: bug

Comments

@yannham
Copy link
Member

yannham commented Sep 21, 2023

Describe the bug
While I can't produce the code here for privacy reason, I've been personally given a Nickel codebase where the export time is very long (for 5~10kLoC, between 10 and 20s on my laptop). There is no reason that a program of this size is that slow. The code looks rather reasonable, and it means that we are at risk of similar performance degradation for any codebase that reaches this size.

Let's investigate, profile and rely on #1484 to see what's happening and bring that time down to something reasonable.

To Reproduce
Can't share the offending code here for privacy reason, unfortunately.

Expected behavior
That the export time is low (as a starter, say under a second).

Tasks

  1. type: feature request
    yannham
  2. type: feature request
    vkleen
  3. type: feature request
    yannham
  4. yannham
@yannham yannham added P0 blocker: fix immediately! type: bug labels Sep 21, 2023
@yannham yannham self-assigned this Sep 21, 2023
@yannham
Copy link
Member Author

yannham commented Sep 26, 2023

Update

The flamegraph isn't very informative, without one big obvious contention point:

dev-flamegraph

@vkleen's plan is to start adding tracing to various part of the Nickel interpreter to gather finer stats and timing for those kind of use-cases.

In the meantime, I noticed that:

  1. A lot of contract application were duplicated. Printing the result (with the default nickel invocation, without export) shows things such as { some_field | ContractFoo | ContractFoo | ContractFoo | ContractFoo ...}. I implemented contract deduplication using the contract equality check from the typechecker, which brought down the evaluation time by around 50%. I still think we can reduce the original time by a good order of magnitude.
  2. A bazillion of generated variables are created. It's probably stressing the environment (insertion and lookup) a lot, and in the worst case, environment can behave as a linked list which could explain the overall performance. The next step is to add a Closure node to the AST instead of using generated variables. That is, use pointers directly without environment indirection.

@yannham
Copy link
Member Author

yannham commented Oct 9, 2023

Some news: I got rid of closurization (or at least of generated variables) on a branch, which was a non trivial changes. It got rid of almost 1M environment oeprations (insertion and fetch). But much to my surprise, this wasn't the dominant cost, and this didn't win much. Some metrics:

# on branch master 09e51fba6a08fa269bd32c024e2600b8699e72eb
# before getting rid of generated variables
$ cargo run --bin nickel --release --features metrics -- eval opl-config-test/config.ncl --metrics
Environment::get: 14655437
LocIdent::fresh: 1153805
Environment::insert: 7555685
Program::new: 1
Thunk::new_rev: 873
Environment::clone: 33178242
Thunk::new: 7163086
Environment.curr_layer_size_at_clone: min 1, max 923, avg 1.5; samples 4566981
Environment.hashmap_size_get: min 0, max 923, avg 1.9; samples 47239539
Environment.get_layers_traversed: min 1, max 924, avg 3.2; samples 14655437

# after removal of generated variables
$ cargo run --bin nickel --release --features metrics -- eval opl-config-test/config.ncl --metrics
Environment::get: 11808112
LocIdent::fresh: 369
Environment::insert: 6369042
Program::new: 1
Thunk::new_rev: 873
Environment::clone: 34544815
Thunk::new: 7234437
Environment.curr_layer_size_at_clone: min 1, max 357, avg 1.4; samples 4267139
Environment.hashmap_size_get: min 0, max 110, avg 1.2; samples 33297774
Environment.get_layers_traversed: min 1, max 35, avg 2.8; samples 11808112

We won 1M environment operations, but the order of magnitude didn't change. And it's still huge. I started to gathered way much more metrics, number of execution per primitive operations and per contract applications, typically (the data are verbose and raw, so I joined the file.

out.metrics.master.log

We are applying 1.3M contracts (!) and, it seems, a lot of array contracts - and in particular polymorphic contracts. Which is a bit of a shame, given that for example the map function's contract forall a b. a -> b -> Array a -> Array b is applying to each and every element of the array 4 contracts: sealing a, unsealing a, sealing b, unsealing b. Which is linear in the size of the argument. But those contracts, as long as our type system is sound (in particular, that blame safety is verified), are all useless ! They are all checking for things that map must ensure, but map being statically typed, it has to be the case.

The next step I did was thus to implement a form of contract simplification for static type annotations, to get constant-time behavior as much as possible. This resulted in another 50% win: we're not back under the seconds, but still back under 4sec, which is not bad. Will set up the corresponding PR soon, and continue from there to understand why we still have so many contracts being applied.

@marijetamh
Copy link

@yannham is this done, as due date is passed ?

@yannham
Copy link
Member Author

yannham commented Oct 17, 2023

While it's not back "under a second", it's consistently back around 3seconds after several optimizations. This is not the end of the story, but work coming next can be done as part of a new tracking issue/epic, I think. We did investigate and applied somehow successfully the first train of optimizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 blocker: fix immediately! type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants