Reactivity: optimize full array iterations #673

jods4 · 2020-01-30T00:54:50Z

What problem does this feature solve?

I propose a performance optimization.

Applications use plenty of arrays. Full array iteration is extremely common:

v-for performs a for (i = 0; i < length; i ) array[i] loop when rendering;
computed or watch often call .filter(), .sort(), .map() and co. All of these result in a full iteration;
user code can also do a for loop or call .forEach().

This top use-case is tracked like any normal object, key by key. If I have a reactive array containing 100 items, then calling .filter in a computed will result in 102 new tracking entries: one for filter, one for length and 100 more from [0] to [99].
Each one results in a Set allocation one entry in the reverse deps array the required upkeep every time the effect runs again.

Because both (1) full array iteration is very common; and (2) arrays can be quite big, making tracking expensive; I think this may be worthy of a special optimization.

Here's one idea: detect full enumeration and only register a single symbol ALL_KEY in response.
It could work like this:

Add arrayRanges: Map<Array, number> on effect.
When track is called on an array target, if the key is numeric:
- if target is not in activeEffect.arrayRanges and key is 0, add it with value 0.
- if target is in arrayRanges and key === arrayRanges.get(target) 1 then increment arrayRanges.
At the end of run:
- for each arrayRanges that is equals to target.length - 1, create a dependency with symbol ALL_KEY.
- Clear arrayRanges
When calling trigger, if target is an array and key is numeric, additionally trigger ALL_KEY.

What does the proposed API look like?

No new public API

The text was updated successfully, but these errors were encountered:

yyx990803 · 2020-01-30T01:26:55Z

I would suggest doing a PR with benchmarks, since this feels like premature optimization to me. The cost of managing arrayRanges isn't necessarily cheaper than normal track calls (which really aren't that expensive).

jods4 · 2020-01-30T01:37:56Z

@yyx990803 Sure, can do. Maybe we should define some kind of target size?
I can make it look as bad as I want since track allocates in O(n) and my proposal in O(1) (for ideal pattern that fit a full iteration, random access is still bad).

yyx990803 · 2020-01-30T18:40:48Z

I think around 1000 would be the upside of what we typically see? Reopening since I think this is worth pursuing if a benchmark shows significant improvement.

jods4 · 2020-02-02T13:48:02Z

@yyx990803 I built a benchmark to evaluate what kind of improvement we are talking about.

I would like to engage in a discussion about what the best solution is here, as there are many possible designs and trade-offs. The benchmark is based on optimized arrays in userland.

App under test

I built a fake process explorer, with a little sparkline for each entry. Of course, this scenario is made specifically to stress test array observation. Source code is here:
https://github.com/jods4/vue-array-stress

You can tweak the size of arrays and pick up an implementation amongst 3:

default is Vue 3 alpha.4 reactive around an array;
fast is a reactive array that only tracks access globally, with a single track() entry. It's similar to my original proposition above, except it doesn't try to detect full iteration, it's in full iteration mode all the time.
faster is an additional API that allow direct readonly access to underlying array during iteration. Basically same as previous one, but bypassing proxy and track overhead during a readonly iteration.

Implementation notes

My arrays are shallow (they don't make their contents reactive magically) but it's something that can be factored in. Goal here is to have ballpark numbers to discuss. Sparklines are arrays of numbers, so it would make sense to be shallow in this use-case anyway.

I don't account for length being special, or any methods, non-numeric fields, etc. Maybe I should. There are lots of possible tweaks, e.g. optimizing the built-in methods (in the way faster does).

There's no solution that is best in every scenario. This benchmark focuses on full array iteration, which is in my opinion the most common usage of reactive arrays in applications, either directly for loops, in templates v-for or through array methods like map, filter, sort, reduce, etc.

Results

Average results on my machine, minified production build, Edge 81 64 bits.

	small 20x10		medium 60x50		large 100x100		Memory usage (large)
Default	4.1ms	100%	21.8ms	100%	66ms	100%	4.0Mo	100%
Fast	4.1ms	100%	18.5ms	85%	51ms	77%	2.8Mo	70%
Faster	3.9ms	95%	14.3ms	66%	38ms	57%	2.8Mo	70%

So what I'm targeting here is the overhead of tracking every array entry individually. On an array of size n, it means n Set allocated by track, and a deps array of size n on effect. As arrays can grow large, this ends up being a lot of bookkeeping and puts pressure on GC.

This isn't strictly a micro-benchmark as some "real work" is done: painting a canvas, Vue updates the DOM. This explains why at very small sizes (20 arrays of 10 elements), there is no visible improvement: it doesn't matter compared to diffing and actual DOM work.

But at larger sizes, the improvements can be big, as you can see above. Both in speed and memory allocation.
In the 100x100 case, there were 12k sets allocated (math tells me 10k is caused by sparklines tracking), but only 1.8k in optimized array (math tells me 100 were caused by sparklines tracking).

Conclusion

I think there are significant gains to be had, at least for scenarios handling large quantity of data such as Dataviz, Charts, Dashboards, Datagrids, ...

I am not sure what the best API would be, though, as there are trade-offs everywhere.

My original idea of magically detecting full iteration would work but it has limitations. It wouldn't work with random access (reverse iteration, full iteration starting at position 1, sorting, ...) which means lots of missed opportunities. Moreover it could be in the ballpark of fast but couldn't be pushed to faster, a different API is required for that.

What do you think?

yyx990803 · 2020-02-02T19:53:53Z

This is very informative, thanks!

So here's my takeaways from the benchmark:

It's impractical to have an "automatic-best-case-for-all" solution because we can never be sure of the user's intentions
However, if we were to provide dedicated APIs for users to optimize such cases, it can lead to APIs that are difficult to use correctly. If any of the assumptions are broken it may lead to incorrect behavior that is hard to debug.
The performance gains from the "fast" mode isn't that drastic IMO. But the "faster" mode requires additional API (again can be hard to understand and use correctly).

On the other hand, I noticed that you are using stateful objects that updates themselves (the Process class) - which leads to the need for this types of optimizations.

If I were building the same application, I'd simply treat the entire data set as immutable and construct it fresh on every tick, and with markNonReactive(). So the timer will look something like:

useTimer(500, () => {
  data.processes = markNonReactive(data.processes.map(p => {
    return p.tick() // returns a new copy of itself
  })
})

Btw, to properly encapsulate a timer that auto-stops on component unmount:

// timer.js
import { onUmounted } from 'vue'

function useTimer(interval, cb) {
  const handle = setInterval(cb, interval)
  onUnmounted(() => {
    clearInterval(handle)
  })
}

To sum up: I don't think we should introduce separate APIs for this type of Array access optimization. Rather, we should document and recommend strategies for optimizing large Array iterations (i.e. construct fresh non-reactive arrays on each update).

jods4 · 2020-02-03T16:24:25Z

Playing with this benchmark got me thinking.

The beauty of having explicit reactivity is the flexibility it gives.
It is not something you can do when everything is magically observed (e.g. Angular).

I think it would be nice if:

Vue exposes easy-to-use, simple apis that work well enough for most cases.
While allowing power-users to have fine control over reactivity if they need to. For example creating the faster collections in this benchmark.

This philosophy would tie in with #675.
In that issue you said you'd prefer limiting the API surface and provide optimized collections in Vue itself but I don't think that's the way to go.

You wouldn't want to implement all possible primitives, much less maintain it in the public API surface of Vue. Not providing the required APIs stiffles innovations from the community.

The faster array I used here (or a derivative of it) could be a NPM library. People who mutate large arrays may benefit from importing it. Not saying I'm gonna create a NPM package, just pointing out how such improvements could be opted-in by the community in specific cases.

About swapping in an immutable array: yes it's a way to optimize this specific example.
That is not always possible, though. Doing so implies changing the identity of every Process. Here it's ok but a more complex application might have them referenced elsewhere. Immutability is not always a good fit, so I think having a way to optimize mutations is nice.

Thanks for the tips! ;)

jods4 · 2020-03-02T13:05:44Z

I'm closing this issue because I don't think there's more to say about it.

With access to @vue/reactivity, it's really easy to build an array that only tracks read/writes globally. Using one more simple api not shown in the benchmark it is 33% to 50% faster than Vuejs built-in array, depending on size.

I think it's a better choice for most use cases, but in the end you can easily use the one you want in your code.

The one change I'm still hoping we can have is an official way to mark our objects as being "reactive" without using the built-ins, so that they mesh well with Vue core. This is tracked by vuejs/rfcs#129

jods4 mentioned this issue Jan 30, 2020

Reactivity: expose track and trigger #675

Closed

yyx990803 closed this as completed Jan 30, 2020

yyx990803 reopened this Jan 30, 2020

jods4 mentioned this issue Feb 10, 2020

Open Reactive API a little more vuejs/rfcs#129

Closed

ShenQingchuan mentioned this issue Feb 12, 2020

fix(reactivity): fix #709 issue, effect should observe for of iteration in Map #716

Closed

jods4 closed this as completed Mar 2, 2020

github-actions bot locked and limited conversation to collaborators Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reactivity: optimize full array iterations #673

Reactivity: optimize full array iterations #673

jods4 commented Jan 30, 2020

yyx990803 commented Jan 30, 2020

jods4 commented Jan 30, 2020

yyx990803 commented Jan 30, 2020

jods4 commented Feb 2, 2020

yyx990803 commented Feb 2, 2020 •

edited

jods4 commented Feb 3, 2020 •

edited

jods4 commented Mar 2, 2020

Reactivity: optimize full array iterations #673

Reactivity: optimize full array iterations #673

Comments

jods4 commented Jan 30, 2020

What problem does this feature solve?

What does the proposed API look like?

yyx990803 commented Jan 30, 2020

jods4 commented Jan 30, 2020

yyx990803 commented Jan 30, 2020

jods4 commented Feb 2, 2020

App under test

Implementation notes

Results

Conclusion

yyx990803 commented Feb 2, 2020 • edited

jods4 commented Feb 3, 2020 • edited

jods4 commented Mar 2, 2020

yyx990803 commented Feb 2, 2020 •

edited

jods4 commented Feb 3, 2020 •

edited