perf: unsafe stringify now uses concatenation #676

Cadienvan · 2023-01-09T19:55:28Z

The only change made was moving away from the + operator, using string concatenation instead.
An invisible, yet valuable change IMO.

The explanation is pretty simple: The + operator creates a new string on each operation, which can be inefficient when concatenating a large number of strings, or in this case, concatenating different "chunks" multiple times.

@broofa I just saw your comment on my other pull request, yet I was already preparing this PR, so please let me know if this can be valid or performance isn't taken in consideration at all even in these cases, I'll stop going for it if it is the case ofc :)

Before:

Starting. Tests take ~1 minute to run ...
uuid.stringify() x 1,599,033 ops/sec ±1.74% (83 runs sampled)
uuid.parse() x 1,668,695 ops/sec ±2.76% (90 runs sampled)
---

uuid.v1() x 1,053,405 ops/sec ±1.34% (89 runs sampled)
uuid.v1() fill existing array x 2,140,536 ops/sec ±1.81% (92 runs sampled)
uuid.v4() x 8,738,755 ops/sec ±1.92% (86 runs sampled)
uuid.v4() fill existing array x 2,978,811 ops/sec ±0.34% (92 runs sampled)
uuid.v3() x 156,645 ops/sec ±2.54% (70 runs sampled)
uuid.v5() x 177,174 ops/sec ±2.65% (68 runs sampled)
Fastest is uuid.v4()

After:

Starting. Tests take ~1 minute to run ...
uuid.stringify() x 1,726,349 ops/sec ±0.56% (91 runs sampled)
uuid.parse() x 1,704,779 ops/sec ±0.50% (95 runs sampled)
---

uuid.v1() x 1,089,921 ops/sec ±0.34% (93 runs sampled)
uuid.v1() fill existing array x 2,210,764 ops/sec ±0.39% (89 runs sampled)
uuid.v4() x 9,479,529 ops/sec ±2.45% (86 runs sampled)
uuid.v4() fill existing array x 3,063,681 ops/sec ±1.19% (96 runs sampled)
uuid.v3() x 156,433 ops/sec ±2.42% (73 runs sampled)
uuid.v5() x 173,674 ops/sec ±2.46% (71 runs sampled)
Fastest is uuid.v4()

broofa · 2023-01-09T20:47:51Z

This is more in line with the sort of change I'd be willing to consider (relatively modest impact on codebase). Running the benchmarks on my own system, for both node as well as Chrome and Firefox, I see similar numbers.

(You can run the benchmarks in the browser via npm run build && npx http-server -o examples/benchmark/benchmark.html)

The + operator creates a new string on each operation

Do you have a link or article explaining why the += is faster than + operator? Doesn't += also create a new string?

@ctavan @Rush @LinusU : Thoughts? I'm on the fence here. This is a ~8% perf gain, with minimal code change. But as I said to @Cadienvan in his other PR, I'm increasingly leery of making changes simply for the sake of performance.

Cadienvan · 2023-01-09T21:09:59Z

I'm sorry I can't find any specific article about it, yet I'm sure it works like that as I struggled with some performance issues on a Node application I was working on and this solved the issue. I had a look (We are talking about more than a year ago) at some source code but can't find it right now.

Here's a proper explanation of the phenomenon:

In JavaScript, the + operator is used for concatenation when talking about strings.

Concatenation involves creating a new string by combining the two operands (+ and =), which requires allocating new memory and copying the strings into the new memory.

On the other hand, the += operator is used specifically for concatenation. It works by taking the left operand (a string), appending the right operand to it, and then assigning the resulting string back to the left operand. This can be more efficient than using the + operator for concatenation, because it avoids the need to allocate new memory and copy the strings. Instead, it modifies the existing string in-place.

I'm pretty sure using something like node-clinic can provide some information about it, but never used it so I can't tell for sure.

broofa · 2023-01-09T21:51:47Z

struggled with some performance issues on a Node application

I'd be curious to know how this change impacts perf on non-Node (V8) runtimes.

Cadienvan · 2023-01-09T22:03:07Z

I have to be honest, tried some popular browsers with small benchmarks out of curiosity, couldn't find anything relevant as whichever engine is involved, the worst it can happen is that both operations create a new string as performances always got better or stayed equal, never gone down.

Rush · 2023-01-09T22:27:10Z

@broofa I will share my thoughts since I am highlighted and I did make performance contributions in the past :)

Software is not made in a vacuum and it does not run on abstract machines but on real hardware and in terms of JS it runs on larger software stacks (V8, Mozilla).

Just like the Linux kernel optimizes its code to run more efficiently on real, tangible hardware it is our responsibility as JS developers to write code that runs efficiently on JS engines that people use in real life.

So yes, an 8% improvement - if confirmed on multiple hardware/JS engine setups sounds meaningful enough to invest in.

LinusU · 2023-01-10T08:52:49Z

It seems like this is ~27% slower on Safari 16.2 on an M1 MacBook Air. On Chrome 108.0.5359.124 (Official Build) (arm64) on the same computer, it's ~8% slower.

I've created a test case that anyone can run on JSBench:

https://jsbench.me/j6lcpzr2u3/1

Cadienvan · 2023-01-10T09:32:19Z

It isn't the case on my M1 Max (Chrome 108.0.5359.124 as yours) where the new version is faster, yet of course I'll accept whatever you decide.

broofa · 2023-01-10T16:59:16Z

Thanks @LinusU. I ran your test case on a variety of browsers using BrowserStack. Results below.

Note: JSBench and the uuid benchmark test both use BenchmarkJS.

Edge, Chrome, Yandex, and Opera all use Chromium (and therefore V8, I assume?). Firefox uses SpiderMonkey. Safari uses Nitro.

This benchmark is consistently slower on all platforms. Which is ... interesting, since uuids built-in test shows it as being faster. FWIW, I swapped the tests around on JSBench to make sure run order wasn't a factor (it wasn't).

I gotta get to work here, but I'd like to understand why we seem to be getting different results between JSBench and the built-in test before moving forward with this.

Edge - 4.9% slower:

Firefox - 5.4% slower:

Chrome - 3.8% slower:

Safari - 26.8% slower. 'Tested on my laptop because BrowserStack's Safari refused to load the JSBench site for some reason:

Yandex - Didn't load on BrowserStack, and I'm not about to install it on my laptop. Uses Chromium though, so 🤷

Opera - 2% slower, not providing a screenshot as I don't consider it a supported platform. Uses Chromium, too, so 🤷

Cadienvan · 2023-01-10T18:34:54Z

I guess Node acts differently than the V8 engine itself, it's the only thing I can think of. Now the question is: how impactful is the browser usage in such a library?

LinusU · 2023-01-11T16:56:17Z

btw. I don't think that this is correct:

This can be more efficient than using the + operator for concatenation, because it avoids the need to allocate new memory and copy the strings. Instead, it modifies the existing string in-place.

I'm personally surprised that the new code is faster (if it is), since I'd assume that it'd be easier for the compiler/runtime to optimize the older code. Not that it shouldn't be able to optimize both codes into the same instructions for the computer...

That it isn't a question of modifying in place or not can be seen in this simple example:

> a = "hello"
'hello'
> b = a
'hello'
> b += ', world!'
'hello, world!'
> a
'hello'
> b
'hello, world!'

Strings are treated as "primitives" in JavaScript, which means that the runtime will have to use a "copy on write" scheme and keep track of which strings can be modified in place, and which must first be copied to a new location in the memory.

That being said, I believe that the only way to know is to measure so it doesn't really matter 😄

When seeing the initial post I was already a bit skeptical about the results/our benchmarks since some numbers that shouldn't have changed changed (e.g. parse) and some things that should have changed didn't (e.g. v3, v5 that got marginally slower). Maybe this is normal fluctuation though, but I wanted to measure just the stringifying.

So I brought out my favourite benchmark tool, hyperfine, and pasted the code into two separate files:

v1.js

// Code from "Setup JavaScript"

for (let i = 0; i < 4_000_000; i++) {
  unsafeStringifyV1(arr0)
  unsafeStringifyV1(arr1)
  unsafeStringifyV1(arr2)
  unsafeStringifyV1(arr3)
  unsafeStringifyV1(arr4)
  unsafeStringifyV1(arr5)
  unsafeStringifyV1(arr6)
  unsafeStringifyV1(arr7)
  unsafeStringifyV1(arr8)
  unsafeStringifyV1(arr9)
}

v2.js

// Code from "Setup JavaScript"

for (let i = 0; i < 4_000_000; i++) {
  unsafeStringifyV2(arr0)
  unsafeStringifyV2(arr1)
  unsafeStringifyV2(arr2)
  unsafeStringifyV2(arr3)
  unsafeStringifyV2(arr4)
  unsafeStringifyV2(arr5)
  unsafeStringifyV2(arr6)
  unsafeStringifyV2(arr7)
  unsafeStringifyV2(arr8)
  unsafeStringifyV2(arr9)
}

Then I ran this with Node.js v18.12.1 on my arm64 M1 MacBook Air:

$ hyperfine --warmup 1 'node v1.js' 'node v2.js'
Benchmark 1: node v1.js
  Time (mean ± σ):      5.424 s ±  0.202 s    [User: 5.264 s, System: 0.055 s]
  Range (min … max):    5.313 s …  5.993 s    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: node v2.js
  Time (mean ± σ):      5.964 s ±  0.007 s    [User: 5.794 s, System: 0.058 s]
  Range (min … max):    5.953 s …  5.974 s    10 runs
 
Summary
  'node v1.js' ran
    1.10 ± 0.04 times faster than 'node v2.js'

Note that there were statistical outliers even with a warmup. I believe that this is because I'm running this on my laptop on battery 😅

I did run this multiple time (changing 4_000_000 to different values, I started at 100_000 but the difference was too small, and 4_000_000 was the longest I had time to wait right now) and v1 was always faster.

Now the question is: how impactful is the browser usage in such a library?

It would be great to have some numbers, but I think that there are many many people using this in browsers. In fact, since all LTS versions of Node.js now have crypto.randomUUID (uuid v4, which I believe is the most popular) built in, then maybe even more people than use it in Node.js.

I also run the npm run test:benchmark scripts on my computer and this is the results:

Name	Before change	After change
uuid.stringify()	x 4,494,323 ops/sec ±0.14%	x 4,478,745 ops/sec ±0.16%
uuid.parse()	x 4,021,155 ops/sec ±0.05%	x 3,997,423 ops/sec ±0.15%
uuid.v1()	x 4,612,946 ops/sec ±0.18%	x 4,499,286 ops/sec ±0.16%
uuid.v1() fill existing array	x 10,069,463 ops/sec ±0.13%	x 10,048,499 ops/sec ±0.16%
uuid.v4()	x 22,898,259 ops/sec ±1.66%	x 22,940,018 ops/sec ±1.73%
uuid.v4() fill existing array	x 8,156,566 ops/sec ±0.23%	x 7,539,778 ops/sec ±0.19%
uuid.v3()	x 560,691 ops/sec ±0.28%	x 643,689 ops/sec ±0.44%
uuid.v5()	x 665,117 ops/sec ±0.43%	x 671,356 ops/sec ±0.42%

Stringify is marginally slower, but then "uuid.v4() fill existing array" is a lot slower so that's a bit strange 🤔

And then v3 is a lot faster, and v5 is marginally faster, so that's very interesting 🤔

Cadienvan · 2023-01-11T18:18:49Z

This is starting to become a scientific paper ahah I like it, it seems like even specific CPUs or combinations of specs are influencing this benchmark as running those again on my pc still seems to prove the changed version is faster.

LinusU · 2023-01-11T19:25:54Z

This is starting to become a scientific paper ahah I like it, it seems like even specific CPUs or combinations of specs are influencing this benchmark as running those again on my pc still seems to prove the changed version is faster.

Haha, yeah, it's amazing!

Would you be able to run the hyperfine test that I specified above? I'm really interested in seeing what it yields on your computer!

I also found another possible code change whilst looking at this, and some very interesting behaviour regarding toLowerCase and Regex#test on V8! 👉 #677

broofa · 2023-01-11T19:27:00Z

Thanks @LinusU. As always, your insight is very helpful.

Note: I think you mixed up the output in your "simple example", above:

> a
'hello, world!' // <-- should be 'hello'
> b
'hello' // <-- should be 'hello, world!'

LinusU · 2023-01-11T19:29:32Z

Note: I think you mixed up the output in your "simple example", above:

Argh, yeah I did, thanks for pointing it out. I originally wrote it wrong in my terminal, and then figured I'd just fix it in the terminal, but I should have re-run it! 😅

I've updated the comment!

Cadienvan · 2023-01-12T09:16:48Z

Would you be able to run the hyperfine test that I specified above? I'm really interested in seeing what it yields on your computer!

@LinusU I'd love to, are you able to provide the exact gist you tested so we can do some tests?

jeremyfiel · 2023-02-08T18:54:09Z

@mcollina this would be a great stream topic!

broofa · 2023-10-11T23:26:29Z

@Cadienvan, my apologies for letting this stagnate. Having just had to revert a prior change to this code due to the impact on memory allocation, I'm going to say that the code as it stands is "good enough" and try to avoid further bit-swizzling on it.

perf: unsafe stringify now uses concatenation

f87d878

LinusU mentioned this pull request Jan 11, 2023

perf: remove superfluous call to toLowerCase #677

Merged

broofa closed this Oct 11, 2023

Cadienvan deleted the perf/stringify-improved branch October 12, 2023 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: unsafe stringify now uses concatenation #676

perf: unsafe stringify now uses concatenation #676

Cadienvan commented Jan 9, 2023 •

edited

broofa commented Jan 9, 2023

Cadienvan commented Jan 9, 2023

broofa commented Jan 9, 2023

Cadienvan commented Jan 9, 2023

Rush commented Jan 9, 2023

LinusU commented Jan 10, 2023

Cadienvan commented Jan 10, 2023

broofa commented Jan 10, 2023

Cadienvan commented Jan 10, 2023

LinusU commented Jan 11, 2023 •

edited

Cadienvan commented Jan 11, 2023

LinusU commented Jan 11, 2023

broofa commented Jan 11, 2023

LinusU commented Jan 11, 2023 •

edited

Cadienvan commented Jan 12, 2023

jeremyfiel commented Feb 8, 2023

broofa commented Oct 11, 2023

perf: unsafe stringify now uses concatenation #676

perf: unsafe stringify now uses concatenation #676

Conversation

Cadienvan commented Jan 9, 2023 • edited

broofa commented Jan 9, 2023

Cadienvan commented Jan 9, 2023

broofa commented Jan 9, 2023

Cadienvan commented Jan 9, 2023

Rush commented Jan 9, 2023

LinusU commented Jan 10, 2023

Cadienvan commented Jan 10, 2023

broofa commented Jan 10, 2023

Cadienvan commented Jan 10, 2023

LinusU commented Jan 11, 2023 • edited

Cadienvan commented Jan 11, 2023

LinusU commented Jan 11, 2023

broofa commented Jan 11, 2023

LinusU commented Jan 11, 2023 • edited

Cadienvan commented Jan 12, 2023

jeremyfiel commented Feb 8, 2023

broofa commented Oct 11, 2023

Cadienvan commented Jan 9, 2023 •

edited

LinusU commented Jan 11, 2023 •

edited

LinusU commented Jan 11, 2023 •

edited