You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently have two main performance problems in finitedomain when scaling up. They both related to indexOf. The reason is variable names. We do a lot of lookups for translating variable names to their internal id. This is a side effect of other optimizations, in particular the one where all domains are stored in a large domain. We currently translate the variables to the index of this array. So internally we only work with numbers, varIndexes. But this sometimes requires a lookup and that was fine on "small" tests.
But when the number of variables exceeds a few thousand to more than 20k the indexOf overhead becomes a problem. To illustrate here are two prints of profiling the curator test which runs, at the time of writing this, in about 20s in total. This is after already optimizing it down from a runtime of 70s by eliminating obvious unnecessary indexOf occurrences.
There are two phases; the preparation phase and the solving phase. There are two different ways in which indexOf is used so the solutions are probably different for each;
Preparation phase (the top six items, sans the anon, are all internals to the browser's indexOf):
Solving phase:
The preparation phase is haunted by indexOf because it needs to translate each variable name to its final index. And while this is "free" when declaring the variable, the pattern is usually doing something like solver.decl('A', 5); solver.decl('B', 6); solver.eq('A', 'B'). So for the last call the index would need to be looked up which incurs the indexOf wrath of a growing list of variable names.
The solving phase is slightly different. A recent optimization changed the way "propagation" works from actively maintaining a list of solved propagators to dynamically tracking a list of vars that changed and revisiting only those propagators that can be affected by those vars.
This last optimization works great on the small and relatively small test cases but went kapooya when we tried scaling up the number of variables.
The two cases are difference since the first one is about an ever growing list of variables that never decreases and goes all the way up to n. This list persists throughout the search. So calls later in the preparation phase always take more time than earlier calls. The solve phase uses constantly growing lists, which tend to be smaller than the big list most of the time, that are quite quickly discarded and regenerated (with different values). Also the first list is an array of strings and the second is a list of numbers.
Here is a print of the second case. You're seeing the number of elements in the array where the culprit indexOf happens:
Let's try to solve this. The solution in both cases is simply using a different data structure. An unguided search in a flat list of pseudo unordered strings is obviously not going to scale well. It's like doing bubble sort on big data. Good luck.
The actual data model may be different, though maybe not I'm not sure. For the main list of variables that persists we'll need a structure that performs well on search(). The delete() and findAll() cases are not very relevant for it since we never remove elements and we could also track all vars in an actual array for trivial findAll(). Since insert() only happens once per var per search the (constant) overhead and GC concerns are not very relevant.
This last bit is different for the solve time problem because those arrays are constantly growing and destroyed. So it needs a structure that has good insert() and findAll() performance and which doesn't bog down the GC too much.
Anyways. There's a few different structures (somehow missing Trie) to pick from but I'm not convinced there's a clear winner between them. Especially when implemented in JS, taking the memory and GC overhead into account. A worst case search of O(log(n)) is definitely better than O(n) so this must be figured out and tested asap.
The text was updated successfully, but these errors were encountered:
pvdz
changed the title
Idea: eliminate indexofs
Idea: Address two major indexOf performance issues
Jul 2, 2016
We currently have two main performance problems in finitedomain when scaling up. They both related to
indexOf
. The reason is variable names. We do a lot of lookups for translating variable names to their internal id. This is a side effect of other optimizations, in particular the one where all domains are stored in a large domain. We currently translate the variables to the index of this array. So internally we only work with numbers,varIndex
es. But this sometimes requires a lookup and that was fine on "small" tests.But when the number of variables exceeds a few thousand to more than 20k the
indexOf
overhead becomes a problem. To illustrate here are two prints of profiling the curator test which runs, at the time of writing this, in about 20s in total. This is after already optimizing it down from a runtime of 70s by eliminating obvious unnecessaryindexOf
occurrences.There are two phases; the preparation phase and the solving phase. There are two different ways in which
indexOf
is used so the solutions are probably different for each;Preparation phase (the top six items, sans the anon, are all internals to the browser's
![image](https://cloud.githubusercontent.com/assets/209817/16526632/65f07faa-3fb2-11e6-9453-10fb096b6d78.png)
indexOf
):Solving phase:
![image](https://cloud.githubusercontent.com/assets/209817/16526646/7f01acc6-3fb2-11e6-99af-615c30ecd7d8.png)
The preparation phase is haunted by
indexOf
because it needs to translate each variable name to its final index. And while this is "free" when declaring the variable, the pattern is usually doing something likesolver.decl('A', 5); solver.decl('B', 6); solver.eq('A', 'B')
. So for the last call the index would need to be looked up which incurs theindexOf
wrath of a growing list of variable names.The solving phase is slightly different. A recent optimization changed the way "propagation" works from actively maintaining a list of solved propagators to dynamically tracking a list of vars that changed and revisiting only those propagators that can be affected by those vars.
This last optimization works great on the small and relatively small test cases but went kapooya when we tried scaling up the number of variables.
The two cases are difference since the first one is about an ever growing list of variables that never decreases and goes all the way up to
n
. This list persists throughout the search. So calls later in the preparation phase always take more time than earlier calls. The solve phase uses constantly growing lists, which tend to be smaller than the big list most of the time, that are quite quickly discarded and regenerated (with different values). Also the first list is an array of strings and the second is a list of numbers.Here is a print of the second case. You're seeing the number of elements in the array where the culprit
indexOf
happens:Let's try to solve this. The solution in both cases is simply using a different data structure. An unguided search in a flat list of pseudo unordered strings is obviously not going to scale well. It's like doing bubble sort on big data. Good luck.
The actual data model may be different, though maybe not I'm not sure. For the main list of variables that persists we'll need a structure that performs well on
search()
. Thedelete()
andfindAll()
cases are not very relevant for it since we never remove elements and we could also track all vars in an actual array for trivialfindAll()
. Sinceinsert()
only happens once per var per search the (constant) overhead and GC concerns are not very relevant.This last bit is different for the solve time problem because those arrays are constantly growing and destroyed. So it needs a structure that has good
insert()
andfindAll()
performance and which doesn't bog down the GC too much.Anyways. There's a few different structures (somehow missing Trie) to pick from but I'm not convinced there's a clear winner between them. Especially when implemented in JS, taking the memory and GC overhead into account. A worst case search of
O(log(n))
is definitely better thanO(n)
so this must be figured out and tested asap.The text was updated successfully, but these errors were encountered: