Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(effect): refactorings to improve performance, reduce memory usage #2345

Closed
wants to merge 11 commits into from
Closed

feat(effect): refactorings to improve performance, reduce memory usage #2345

wants to merge 11 commits into from

Conversation

basvanmeurs
Copy link
Contributor

@basvanmeurs basvanmeurs commented Oct 9, 2020

This PR is a refactoring of the reactivity module.

  1. Make use of object prototypes for effects to reduce memory usage
  2. Ref-specific high-performance implementation for track and trigger
  3. Miscellaneous memory- and performance-related tweaks

Results

  • Big runtime performance improvement for ref, computed, watch and watchEffect (30%-80% depending on the amount of dependencies)
  • Memory usage decreased by about 30% when creating ref, computed, watch and watchEffect
  • Creation time performance improvement, most notably for watchers and watchEffects

Performance

You can compare my benchmarks on your own machine:
This PR
3.0.0

Or view my own results:
performance-comparison

Memory

In terms of memory I only did a pretty rudimentary test:

                const arr = [];
		for (let i = 0; i < 1e5; i++) {
			const v = ref(100);
			const c = computed(() => v.value * 2);
			const w = watch(c, (v) => {
			});
			const we = watchEffect(() => {
				const v = c.value;
			});
			arr.push(v);
			arr.push(c);
			arr.push(w);
			arr.push(we);
		}

memory-comparison

I managed to save some memory by using more properties on the prototypes, and moved some parameters from the option object because those ReactiveEffectOptions objects have a memory overhead as well. Left is the PR in its current state, right is 3.0.0. A reduction of about 30%. I think we're approaching the limit of what we can do.

@basvanmeurs basvanmeurs marked this pull request as draft October 9, 2020 21:16
@basvanmeurs
Copy link
Contributor Author

basvanmeurs commented Oct 12, 2020

I think we need to keep track of the performance of the vue core, especially the reactivity part. To verify my own work, as well as testing for future development, I created this little repository: https://github.com/basvanmeurs/vue-next-benchmarks

You can run some reactivity benchmarks live here: https://basvanmeurs.github.io/vue-next-benchmarks/

You can either enter a specific version or a specific url.

The work in this PR can be compared to the current release using the following URLs:
https://basvanmeurs.github.io/vue-next-benchmarks/?v=vue.global.js
https://basvanmeurs.github.io/vue-next-benchmarks/?v=3.0.0

basvanmeurs added a commit to basvanmeurs/vue-next-benchmarks that referenced this pull request Oct 13, 2020
@basvanmeurs basvanmeurs changed the title feat(effect): improve performance, reduce memory usage feat(effect): refactorings to improve performance, reduce memory usage Oct 13, 2020
@basvanmeurs basvanmeurs marked this pull request as ready for review October 13, 2020 10:46
@yyx990803
Copy link
Member

This is great work! However, it seems it contains some non trivial changes to the usage of effect, which is technically part of the public API of @vue/reactivity. Can you summarize a list of changed APIs (both internally and externally)?

Regarding the benchmarks - this is very helpful, it'd be awesome to maybe extend this to component instance creation and vdom updates as well. We've had the thought of running something similar on CI but never got time to actually work on it.

@dsonet
Copy link
Contributor

dsonet commented Oct 13, 2020

It would be very great to implement #2195 after this.

@basvanmeurs
Copy link
Contributor Author

Can you summarize a list of changed APIs (both internally and externally)?

Reactivity API changes

  1. ReactiveEffect is now an object instead of a function. It must now be invoked using the run method (effect.run() vs effect()).

  2. The effect function has changed. The properties lazy, allowRecurse and scheduler have been extracted out of the ReactiveEffectOptions and are now passed as function parameters.

Vue API changes

The external vue api has not changed as the effect function is not exported. The ReactiveEffect and ReactiveEffectOptions types are exported from the vue package but they aren't mentioned in the official documentation.

@basvanmeurs basvanmeurs marked this pull request as draft October 14, 2020 14:03
@basvanmeurs
Copy link
Contributor Author

Moving back to draft as I have some additional optimizations

@basvanmeurs basvanmeurs marked this pull request as ready for review October 14, 2020 14:27
@basvanmeurs basvanmeurs marked this pull request as draft October 14, 2020 14:28
@basvanmeurs
Copy link
Contributor Author

After the latest updates:

vue-perf-comparison

@basvanmeurs basvanmeurs marked this pull request as ready for review October 15, 2020 15:33
@dsonet dsonet mentioned this pull request Nov 14, 2020

ReactiveEffect.prototype.active = true
ReactiveEffect.prototype.allowRecurse = false
ReactiveEffect.prototype.options = EMPTY_OBJ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@basvanmeurs The code might be cleaner if you declare those three as getters on the class?
public get active() { return true } would end up on the prototype and I don't think usage of a getter rather than field would impact perf / memory in any noticeable way (we have your benchmarks to verify).

Other than coding style, it's also easier for tools to determine that a module is side-effect free if it doesn't contain statements, just class declarations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially enthousiastic. But when I tried changing this, all property assignments started to cause typescript errors (Cannot assign to 'options' because it is a read-only property.). I could ignore them for every assignment by casting to any: (this as any).options = {}. But personally I think that the resulting code will be worse than what it is now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I did not see that one coming.
I agree your original code is a lot better than using any in callers.
I see no way to "idiomatically" have TS infer that we're having fields with default values stored in prototype.

If we did things in a "less-JS" way we would either mark those fields optional and provide the default value in code that reads them, or always init fields with the default value.
Not sure about the perf/memory impact of those ways, it would need some benchmarking.

Always assigning a value to a field member would increase memory usage but might lead to monomorphic code, which maybe executes faster. Not sure if it would be noticeable. You have a benchmark avail., maybe you could try.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried setting the properties in the constructor during development for the reason you mentioned. It did not affect performance at all, at least in Chromium. It might be different in other browsers, but probably not by much.

Because it really saved some memory I decided to put them on the prototype.

@jods4
Copy link
Contributor

jods4 commented Nov 14, 2020

@basvanmeurs Thank you very much for tackling this. ❤️
Reactive perf is at the heart of everything in Vue. Even small improvements will compound to larger benefits in big, intensive apps.

Like you, I noticed there was room for improvement in the reactivity package and wanted to create a PR. Couldn't manage to actually do it because I had too much work.
So I'll share one more idea that you may try out:

During a run, the dependency tracking logic is simply throwing out all previous dependencies, and builds up new arrays, maps, with the detected dependencies. I see you've optimized those, but it's still a non-trivial amount of allocations.
In busy systems with lots going on, that puts pressure on the GC.

A key observation is that most effects have constant, or nearly constant, list of dependencies.

So prior art uses a different way to update those dependency lists:

  • Associate a run id with each registered effect dependency.
  • The run id is a simple global counter that is incremented every time an effect runs.
  • Dependencies are not reset on each run, instead:
  • Track with existing dependencies simply assigns the current run id but does nothing more.
  • Track on new dependencies adds them as usual, with the current run id.
  • At the end of the effect run, a clean-up phase iterates all dependencies and removes those whose ids is less than the current run (i.e. those that did not call track on this run).

@jods4
Copy link
Contributor

jods4 commented Nov 14, 2020

Looked at your code and there's another thing that you could do on top of my previous comment.

trigger currently snapshots (i.e. copies into an array) dependencies, so it's not impacted by their unlikely mutation.
With some additional cleverness the copy is not required.

When a trigger effect runs, other effects may be triggered in cascade and run on top of it, possibly adding or removing themselves in the dependency list currently iterated.

I'll explain why the "concept" of adding or removing dependencies doesn't matter:

  • Newly added dependencies can be skipped based on having a higher run id than the value when trigger started.
  • Notice that "updated" dependencies, that would have run because of a nested trigger, also have an updated run id and will be skipped. And that's actually great and better than the snapshot option: since they already ran, there's no need to run them a second time as they already saw this trigger updated value.
  • By the same logic, removed dependencies can be ignored, since they just ran there's no need to run them a second time, esp. now that the trigger source is not a dependency of those effects anymore.

Removed dependencies can create a mutation problem in the array, though. If we're at index i and a dependency before i is spliced out, that would cause us to skip one element in the array, which is a bug.

This can be solved by not splicing removed dependencies, but rather setting them to null. That way, the array never shrinks and iteration won't skip any element.

One short-term benefit here is that setting an array item to null is more efficient than splicing, which must copies all subsequent items and is O(N).

One problem is that now we may have holes in the array and over time that may become a memory leak if a dependency is added/removed in a cycling fashion.
That problem can be solved in 2 ways:

  1. Create a global Set list of arrays that needs compacting. Use requestIdleCallback to schedule compaction of all those arrays at a time when nothing happens (nor our code, not even user activity). That's great because it moves shrinking arrays work out of the immediate feedback loop into a time where we don't care about CPU.
    Not all browsers have requestIdleCallback (https://caniuse.com/?search=requestIdleCallback), we can fallback to end-of-JS-turn for the old browsers. Same behavior, at a slightly less convenient time.

  2. I like 1 better but another design is to keep the index of the first free slot in each dependencies array. Instead of pushing new deps, those slots are reused by doing deps[freeSlot] = newDep, which also works past the end of array. Once added, the next freeSlot must be found by iterating down the array.

All of this might sound like it's a bit tricky, but if you abstract dependencies list management into a single class or module or derived array then it's self contained and well encapsulated, calling code is clean.

@basvanmeurs
Copy link
Contributor Author

basvanmeurs commented Nov 16, 2020

Hi @jods4

Thanks for your review and feedback!

I also realized the importance of improving the performance on this, as in my own app I found that too much time was spent in triggering and tracking stuff.

Let me first respond to your first idea (effect ids).

During a run, the dependency tracking logic is simply throwing out all previous dependencies, and builds up new arrays, maps, with the detected dependencies. I see you've optimized those, but it's still a non-trivial amount of allocations.
In busy systems with lots going on, that puts pressure on the GC.

That's a claim that can be easily investigated with my vue-next-benchmarks ;-) I've used my own PR for benchmarking.

Benchmarks

The 'computed:write ref, read 1000 computeds' is a good test to see the impact of this tracking/triggering in practice. The code is like this:

() => {
    const v = ref(100);
    const computeds = [];
    for (let i = 0, n = 1000; i < n; i++) {
      const c = computed(() => {
        return v.value * 2
      });
      const cv = c.value;
      computeds.push(c);
    }
    let i = 0;
    return suite.add("write ref, read 1000 computeds", () => {
      v.value = i++;
      computeds.forEach(c => c.value);
    });
  }

If the total time, about 30% of the time is spent in the 'cleanup' of effects, and 18% (surprisingly less than the cleanup) is spent in trackRefValue, and 3% is spent in triggerRefValue. More than half of the total time is spent on the process that you described. GC only takes about 1 to 2% by the way. Chrome is very good at reusing those many small temporary arrays and sets efficiently.

What if we reverse the test, having a single computed having many (1000) dependencies?

() => {
    const refs = [];
    for (let i = 0, n = 1000; i < n; i++) {
      refs.push(ref(i));
    }
    const c = computed(() => {
      let total = 0;
      refs.forEach(ref => total += ref.value);
      return total;
    });
    let i = 0;
    const n = refs.length;
    return suite.add("1000 refs, 1 computed", () => {
      refs[i++ % n].value++;
      const v = c.value;
    });
  }

The test is twice as fast as the first one. But cleanup now takes 47% of all time! trackEffects now takes 18%, while triggerRefValue only amounts to 0.5%.

Looking towards reactive objects (reactiveObject:write reactive obj, read 1000 computeds):

() => {
    const r = reactive({a: 1});
    const computeds = [];
    for (let i = 0, n = 1000; i < n; i++) {
      const c = computed(() => {
        return r.a * 2
      });
      computeds.push(c);
    }
    let i = 0;
    return suite.add("write reactive obj, read 1000 computeds", () => {
      r.a = i++;
      computeds.forEach(c => c.value);
    });
  }

There are 1000 computeds referencing 1 ref.

The figures here are 18% cleanup, 17% tracking, 6% triggering. In comparison, tracking/triggering is relatively more expensive because of the dependency sub-set selection. And the relative time spent in the tracking/triggering is less because the proxy in itself causes a lot of overhead as well. Still it amounts to 41%.

Profiling results for the reactiveObject:1000 reactive objs, 1 computed test are:
21% cleanup, 17% tracking and only 0.2% triggering. GC is again below 2%.

Reactivity benchmark conclusions

So I think we can roughly gain about 50% performance by preventing the deps cleanup/retrack process. Very significant and way too much to ignore if you ask me.. Especially as it's, as you mentioned, at the heart of everything.

As GC time is not significant, I think we shouldn't focus on prevent assigning new Sets and arrays. We should be focusing on the idea of preventing tracking and cleanup, or at least making it smarter. Then re-run the performance tests to see where we're at :-)

Cleanup performance

You mention some interesting ideas. To tell you the truth I had an idea of my own, but it relied on the order of tracked effects which complicated things.

Will your ideas help performance? I believe that even checking whether an effect already exists in a set takes a similar amount of time than to really add it! So your idea will not help us much in terms of tracking performance. But not having to cleanup beforehand will really help a lot based on the profiling information. Up to 20 to 30%.

Another risk is introducing (hard to find) bugs. Your algorithm is simple, but we have to take into account that effects are triggered recursively. So during the run of an effect, other effects will be ran as well, and the run id is incremented and decremented multiple times before the cleanup takes place.

Let's say we have effect A. It's ran, so we record the current run id and invoke the effect. In the case that one of the original effects (let's call it B) of effect A is no longer used by A, but is still used by one of the recursively triggered effects of A, B will still be registered as a dependency of A as it's run id is higher than A's run id. The end result will be that A 'falsely' retains its dependency to B.

We could fix this by maintaining an array of run ids but this has (performance and complexity) downsides. So let's take a step back: is it a problem that A retains the dependency to B? First off, this is really an edge case. And as far as I can tell, this is not really a problem because when that dependency is triggered, it would recursively trigger the subject as well anyway. It won't lead to memory leaks as it will be removed once it is no longer a dependency of a dependency of A.

This idea is quite easy to implement and test. It will cost only a couple of lines of code, and gain us 20 to 30% of performance. I think that's worth it.

Having said all that, I must also share my disappointment. After the release of vue-next, it seems that all progress on the core has come to a halt. I'm reluctant to spend any time on implementing this or other improvements right now, as long as this PR is not merged. I would only be making the merge more difficult if I would. And there's no guarantee that it will be merged at all..

This project is great but this is the time to streamline the core. If we don't do it now it will become more and more difficult.

@jods4
Copy link
Contributor

jods4 commented Nov 16, 2020

Your algorithm is simple, but we have to take into account that effects are triggered recursively. So during the run of an effect, other effects will be ran as well, and the run id is incremented and decremented multiple times before the cleanup takes place.

To clarify: run id is a monotone sequence, always increasing.
At the beginning of each run you increment by one and take note of the current value. You never decrease its value.

Effects are not re-entrant. The code checks if the effect is already on the stack and doesn't run it a second time.
Different effects can stack, but each effect has its own distinct dependencies (in internal terms of course, two effects may depend on the same source), there's no overlap between two effects.
The technique described in my first comment (run ids to remove unused deps) was not invented by me. I stole it from Steven Sanderson, who used it in KO, where it's been working for years. My second comment is more uncharted territory.

Reduces memory usage for (computed) refs by 25% and improves creation performance
Move out of the ReactiveEffectOptions in an effort to remove the necessity of creating this object for most use cases, saving memory and performance.
Move several options to params, so that an object is not required for watchers or computeds. This reduces memory. On top of that, place some optional params on the prototype unless non-default.
There's no point in underscoring the effect property of the runner.
…body

1. One variable less on the stack
2. Smaller functions means less time to optimize
Out-of-bounds should be avoided always, for performance reasons. In practice, some benchmarks (especially 'write ref, read 1000 computeds') showed significant (30%) improvement after this change.
Explain that refs store their deps in a local property.
@yyx990803 yyx990803 added the ❗ p4-important Priority 4: this fixes bugs that violate documented behavior, or significantly improves perf. label Dec 4, 2020
@LinusBorg LinusBorg added this to Planned in Next Patch Feb 1, 2021
@LinusBorg LinusBorg moved this from Planned to Guidance needed in Next Patch Feb 1, 2021
@LinusBorg LinusBorg removed this from Guidance needed in Next Patch Feb 5, 2021
@LinusBorg LinusBorg added the need guidance The approach/solution in the PR is unclear and requires guidance from maintainer to proceed further. label Feb 6, 2021
@yyx990803 yyx990803 added this to In progress in 3.2.0 via automation Feb 25, 2021
@basvanmeurs
Copy link
Contributor Author

@yyx990803 would you prefer a new PR which only includes the performance improvements that don't affect the API?

@yyx990803 yyx990803 changed the base branch from master to 3.2 June 22, 2021 19:24
@yyx990803
Copy link
Member

@basvanmeurs yes, I'm actually reviewing the changes for 3.2 right now and it would be great to have a new PR with a perf-only changeset (and rebased).

Also, I apologize for letting this sit for so long - and really appreciate your patience. Thanks again!

@basvanmeurs
Copy link
Contributor Author

Thanks Evan. See #3995

3.2.0 automation moved this from In progress to Done Jun 23, 2021
yyx990803 added a commit that referenced this pull request Jun 24, 2021
Based on #2345 , but with smaller API change

- Use class implementation for `ReactiveEffect`
- Switch internal creation of effects to use the class constructor
- Avoid options object allocation
- Avoid creating bound effect runner function (used in schedulers) when not necessary.
- Consumes ~17% less memory compared to last commit
- Introduces a very minor breaking change: the `scheduler` option passed to `effect` no longer receives the runner function.
yyx990803 added a commit that referenced this pull request Jul 1, 2021
Based on #2345 , but with smaller API change

- Use class implementation for `ReactiveEffect`
- Switch internal creation of effects to use the class constructor
- Avoid options object allocation
- Avoid creating bound effect runner function (used in schedulers) when not necessary.
- Consumes ~17% less memory compared to last commit
- Introduces a very minor breaking change: the `scheduler` option passed to `effect` no longer receives the runner function.
yyx990803 added a commit that referenced this pull request Jul 5, 2021
Based on #2345 , but with smaller API change

- Use class implementation for `ReactiveEffect`
- Switch internal creation of effects to use the class constructor
- Avoid options object allocation
- Avoid creating bound effect runner function (used in schedulers) when not necessary.
- Consumes ~17% less memory compared to last commit
- Introduces a very minor breaking change: the `scheduler` option passed to `effect` no longer receives the runner function.
yyx990803 added a commit that referenced this pull request Jul 6, 2021
Based on #2345 , but with smaller API change

- Use class implementation for `ReactiveEffect`
- Switch internal creation of effects to use the class constructor
- Avoid options object allocation
- Avoid creating bound effect runner function (used in schedulers) when not necessary.
- Consumes ~17% less memory compared to last commit
- Introduces a very minor breaking change: the `scheduler` option passed to `effect` no longer receives the runner function.
yyx990803 added a commit that referenced this pull request Jul 15, 2021
Based on #2345 , but with smaller API change

- Use class implementation for `ReactiveEffect`
- Switch internal creation of effects to use the class constructor
- Avoid options object allocation
- Avoid creating bound effect runner function (used in schedulers) when not necessary.
- Consumes ~17% less memory compared to last commit
- Introduces a very minor breaking change: the `scheduler` option passed to `effect` no longer receives the runner function.
yyx990803 added a commit that referenced this pull request Jul 16, 2021
Based on #2345 , but with smaller API change

- Use class implementation for `ReactiveEffect`
- Switch internal creation of effects to use the class constructor
- Avoid options object allocation
- Avoid creating bound effect runner function (used in schedulers) when not necessary.
- Consumes ~17% less memory compared to last commit
- Introduces a very minor breaking change: the `scheduler` option passed to `effect` no longer receives the runner function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❗ p4-important Priority 4: this fixes bugs that violate documented behavior, or significantly improves perf. need guidance The approach/solution in the PR is unclear and requires guidance from maintainer to proceed further. scope: reactivity version: minor
Projects
No open projects
3.2.0
Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants