Idea: Address two major `indexOf` performance issues #106

pvdz · 2016-07-01T16:01:34Z

We currently have two main performance problems in finitedomain when scaling up. They both related to indexOf. The reason is variable names. We do a lot of lookups for translating variable names to their internal id. This is a side effect of other optimizations, in particular the one where all domains are stored in a large domain. We currently translate the variables to the index of this array. So internally we only work with numbers, varIndexes. But this sometimes requires a lookup and that was fine on "small" tests.

But when the number of variables exceeds a few thousand to more than 20k the indexOf overhead becomes a problem. To illustrate here are two prints of profiling the curator test which runs, at the time of writing this, in about 20s in total. This is after already optimizing it down from a runtime of 70s by eliminating obvious unnecessary indexOf occurrences.

There are two phases; the preparation phase and the solving phase. There are two different ways in which indexOf is used so the solutions are probably different for each;

Preparation phase (the top six items, sans the anon, are all internals to the browser's indexOf):

Solving phase:

The preparation phase is haunted by indexOf because it needs to translate each variable name to its final index. And while this is "free" when declaring the variable, the pattern is usually doing something like solver.decl('A', 5); solver.decl('B', 6); solver.eq('A', 'B'). So for the last call the index would need to be looked up which incurs the indexOf wrath of a growing list of variable names.

The solving phase is slightly different. A recent optimization changed the way "propagation" works from actively maintaining a list of solved propagators to dynamically tracking a list of vars that changed and revisiting only those propagators that can be affected by those vars.

This last optimization works great on the small and relatively small test cases but went kapooya when we tried scaling up the number of variables.

The two cases are difference since the first one is about an ever growing list of variables that never decreases and goes all the way up to n. This list persists throughout the search. So calls later in the preparation phase always take more time than earlier calls. The solve phase uses constantly growing lists, which tend to be smaller than the big list most of the time, that are quite quickly discarded and regenerated (with different values). Also the first list is an array of strings and the second is a list of numbers.

Here is a print of the second case. You're seeing the number of elements in the array where the culprit indexOf happens:

Let's try to solve this. The solution in both cases is simply using a different data structure. An unguided search in a flat list of pseudo unordered strings is obviously not going to scale well. It's like doing bubble sort on big data. Good luck.

The actual data model may be different, though maybe not I'm not sure. For the main list of variables that persists we'll need a structure that performs well on search(). The delete() and findAll() cases are not very relevant for it since we never remove elements and we could also track all vars in an actual array for trivial findAll(). Since insert() only happens once per var per search the (constant) overhead and GC concerns are not very relevant.

This last bit is different for the solve time problem because those arrays are constantly growing and destroyed. So it needs a structure that has good insert() and findAll() performance and which doesn't bog down the GC too much.

Anyways. There's a few different structures (somehow missing Trie) to pick from but I'm not convinced there's a clear winner between them. Especially when implemented in JS, taking the memory and GC overhead into account. A worst case search of O(log(n)) is definitely better than O(n) so this must be figured out and tested asap.

The text was updated successfully, but these errors were encountered:

pvdz · 2016-07-02T09:23:13Z

Added a "Trie" for the config.all_var_names array and made all indexOf calls on it now search the Trie instead.

Before:

    - TC: curator solving...
      - TC: compiling...
      - MV: Matrix.compileConstraints() started...
      - MV: Matrix.compileConstraints(): 5249ms
      - TC: compiling: 5249ms
      - TC: pre-processing...
      - TC: pre-processing: 3423ms
      - FD Preparing...
      - FD Prepare Time: 5192ms
      - FD Var Count: 23488
      - FD Constraint Count: 18886
      - FD Propagator Count: 35166
      - FD Solving...
      - FD Solving Time: 4544ms
      - FD Solutions: 1
      - FD Solution Construction Time: 4ms
    - TC: total curator solve time: 18416ms
    - TC: mapping...
    - TC: mapping: 1ms

After:

    - TC: curator solving...
      - TC: compiling...
      - MV: Matrix.compileConstraints() started...
      - MV: Matrix.compileConstraints(): 272ms
      - TC: compiling: 272ms
      - TC: pre-processing...
      - TC: pre-processing: 3426ms
      - FD Preparing...
      - FD Prepare Time: 524ms
      - FD Var Count: 23488
      - FD Constraint Count: 18886
      - FD Propagator Count: 35166
      - FD Solving...
      - FD Solving Time: 3809ms
      - FD Solutions: 1
      - FD Solution Construction Time: 8ms
    - TC: total curator solve time: 8041ms
    - TC: mapping...
    - TC: mapping: 1ms

pvdz changed the title ~~Idea: eliminate indexofs~~ Idea: Address two major indexOf performance issues Jul 2, 2016

pvdz mentioned this issue Jul 2, 2016

[FD] performance improvement ideas #101

Open

22 tasks

pvdz mentioned this issue Jul 2, 2016

Huge perf win by using Tries for varName->varIndex #110

Merged

pvdz closed this as completed Jul 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: Address two major `indexOf` performance issues #106

Idea: Address two major `indexOf` performance issues #106

pvdz commented Jul 1, 2016 •

edited

Loading

pvdz commented Jul 2, 2016

Idea: Address two major indexOf performance issues #106

Idea: Address two major indexOf performance issues #106

Comments

pvdz commented Jul 1, 2016 • edited Loading

pvdz commented Jul 2, 2016

Idea: Address two major `indexOf` performance issues #106

Idea: Address two major `indexOf` performance issues #106

pvdz commented Jul 1, 2016 •

edited

Loading