Make ScaleContinuous$map() faster
#5513
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These are 2 small changes:
palette(x)directly is faster than only runningpalette()on the uniquexand then matching all values to those.vec_assign()instead ofifelse()is faster and is the main speed boost.In benchmarks below, 'classic' is the current algorithm, 'direct' is replacing the matching with
palette(x)and 'assigned' is the method proposed in this PR.To throw a single number at it, the
ScaleContinuous$map()method in this PR is ~3.7x faster than the current one at 100.000 values to map. Memory allocation is also 20-25% less.If we have mosty unique values,
palette(x)has an advantage over the current algorithm.If there are only 10 possible unique values, the
palette(x)method is no longer faster but on par with the current method. It only seems beneficial to replace the matching strategy, regardless of uniqueness of data.Created on 2023-11-08 with reprex v2.0.2
Caveat to these benchmarks is that there are no
NAs to replace, but I've checked that thevec_assign()advantage persists when there are actual NA values.