Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Address two major indexOf performance issues #106

Closed
pvdz opened this issue Jul 1, 2016 · 1 comment
Closed

Idea: Address two major indexOf performance issues #106

pvdz opened this issue Jul 1, 2016 · 1 comment

Comments

@pvdz
Copy link
Contributor

pvdz commented Jul 1, 2016

We currently have two main performance problems in finitedomain when scaling up. They both related to indexOf. The reason is variable names. We do a lot of lookups for translating variable names to their internal id. This is a side effect of other optimizations, in particular the one where all domains are stored in a large domain. We currently translate the variables to the index of this array. So internally we only work with numbers, varIndexes. But this sometimes requires a lookup and that was fine on "small" tests.

But when the number of variables exceeds a few thousand to more than 20k the indexOf overhead becomes a problem. To illustrate here are two prints of profiling the curator test which runs, at the time of writing this, in about 20s in total. This is after already optimizing it down from a runtime of 70s by eliminating obvious unnecessary indexOf occurrences.

There are two phases; the preparation phase and the solving phase. There are two different ways in which indexOf is used so the solutions are probably different for each;

Preparation phase (the top six items, sans the anon, are all internals to the browser's indexOf):
image

Solving phase:
image

The preparation phase is haunted by indexOf because it needs to translate each variable name to its final index. And while this is "free" when declaring the variable, the pattern is usually doing something like solver.decl('A', 5); solver.decl('B', 6); solver.eq('A', 'B'). So for the last call the index would need to be looked up which incurs the indexOf wrath of a growing list of variable names.

The solving phase is slightly different. A recent optimization changed the way "propagation" works from actively maintaining a list of solved propagators to dynamically tracking a list of vars that changed and revisiting only those propagators that can be affected by those vars.

This last optimization works great on the small and relatively small test cases but went kapooya when we tried scaling up the number of variables.

The two cases are difference since the first one is about an ever growing list of variables that never decreases and goes all the way up to n. This list persists throughout the search. So calls later in the preparation phase always take more time than earlier calls. The solve phase uses constantly growing lists, which tend to be smaller than the big list most of the time, that are quite quickly discarded and regenerated (with different values). Also the first list is an array of strings and the second is a list of numbers.

Here is a print of the second case. You're seeing the number of elements in the array where the culprit indexOf happens:

image

Let's try to solve this. The solution in both cases is simply using a different data structure. An unguided search in a flat list of pseudo unordered strings is obviously not going to scale well. It's like doing bubble sort on big data. Good luck.

The actual data model may be different, though maybe not I'm not sure. For the main list of variables that persists we'll need a structure that performs well on search(). The delete() and findAll() cases are not very relevant for it since we never remove elements and we could also track all vars in an actual array for trivial findAll(). Since insert() only happens once per var per search the (constant) overhead and GC concerns are not very relevant.

This last bit is different for the solve time problem because those arrays are constantly growing and destroyed. So it needs a structure that has good insert() and findAll() performance and which doesn't bog down the GC too much.

Anyways. There's a few different structures (somehow missing Trie) to pick from but I'm not convinced there's a clear winner between them. Especially when implemented in JS, taking the memory and GC overhead into account. A worst case search of O(log(n)) is definitely better than O(n) so this must be figured out and tested asap.

@pvdz pvdz changed the title Idea: eliminate indexofs Idea: Address two major indexOf performance issues Jul 2, 2016
@pvdz
Copy link
Contributor Author

pvdz commented Jul 2, 2016

Added a "Trie" for the config.all_var_names array and made all indexOf calls on it now search the Trie instead.

Before:

    - TC: curator solving...
      - TC: compiling...
      - MV: Matrix.compileConstraints() started...
      - MV: Matrix.compileConstraints(): 5249ms
      - TC: compiling: 5249ms
      - TC: pre-processing...
      - TC: pre-processing: 3423ms
      - FD Preparing...
      - FD Prepare Time: 5192ms
      - FD Var Count: 23488
      - FD Constraint Count: 18886
      - FD Propagator Count: 35166
      - FD Solving...
      - FD Solving Time: 4544ms
      - FD Solutions: 1
      - FD Solution Construction Time: 4ms
    - TC: total curator solve time: 18416ms
    - TC: mapping...
    - TC: mapping: 1ms

After:

    - TC: curator solving...
      - TC: compiling...
      - MV: Matrix.compileConstraints() started...
      - MV: Matrix.compileConstraints(): 272ms
      - TC: compiling: 272ms
      - TC: pre-processing...
      - TC: pre-processing: 3426ms
      - FD Preparing...
      - FD Prepare Time: 524ms
      - FD Var Count: 23488
      - FD Constraint Count: 18886
      - FD Propagator Count: 35166
      - FD Solving...
      - FD Solving Time: 3809ms
      - FD Solutions: 1
      - FD Solution Construction Time: 8ms
    - TC: total curator solve time: 8041ms
    - TC: mapping...
    - TC: mapping: 1ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant