Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance by using v8 serialization #150

Closed
wants to merge 7 commits into from

Conversation

onigoetz
Copy link
Contributor

@onigoetz onigoetz commented Dec 27, 2023

Hi @JounQin

as promised, here is a PR that uses v8 serialization.

I tried using the same benchmark as before and it does indeed improve the performance, but not by as much as I hoped;

❯ node benchmark/benchmark.mjs
Load time
┌─────────┬────────────────────────┬───────────┬────────────────────┬───────────┬─────────┐
│ (index) │       Task Name        │  ops/sec  │ Average Time (ns)  │  Margin   │ Samples │
├─────────┼────────────────────────┼───────────┼────────────────────┼───────────┼─────────┤
│    0    │        'native'        │ '648 285' │ 1542.5299744942151 │ '±10.87%' │  64830  │
│    1    │ 'sync-threads (PR #3)' │ '921 905' │ 1084.7097943601377 │ '±2.76%'  │  92191  │
│    2    │ 'sync-threads (1.0.1)' │ '919 676' │ 1087.3388626049523 │ '±1.95%'  │  91968  │
│    3    │   'synckit (0.8.8)'    │ '987 481' │ 1012.6775366805905 │ '±0.82%'  │  98749  │
│    4    │     'synckit (PR)'     │ '957 444' │ 1044.4468048460658 │ '±1.75%'  │  95745  │
└─────────┴────────────────────────┴───────────┴────────────────────┴───────────┴─────────┘
Execution time
┌─────────┬────────────────────────┬───────────┬────────────────────┬───────────┬─────────┐
│ (index) │       Task Name        │  ops/sec  │ Average Time (ns)  │  Margin   │ Samples │
├─────────┼────────────────────────┼───────────┼────────────────────┼───────────┼─────────┤
│    0    │        'native'        │ '148 628' │ 6728.164121295051  │ '±2.96%'  │  14863  │
│    1    │ 'sync-threads (PR #3)' │ '11 446'  │  87365.1000610085  │ '±6.94%'  │  1145   │
│    2    │ 'sync-threads (1.0.1)' │   '795'   │ 1256925.8536821532 │ '±42.65%' │   81    │
│    3    │   'synckit (0.8.8)'    │  '1 659'  │ 602447.0128208757  │ '±34.16%' │   166   │
│    4    │     'synckit (PR)'     │  '2 249'  │ 444501.6712612576  │ '±11.02%' │   225   │
└─────────┴────────────────────────┴───────────┴────────────────────┴───────────┴─────────┘
Total time
┌─────────┬────────────────────────┬──────────┬────────────────────┬───────────┬─────────┐
│ (index) │       Task Name        │ ops/sec  │ Average Time (ns)  │  Margin   │ Samples │
├─────────┼────────────────────────┼──────────┼────────────────────┼───────────┼─────────┤
│    0    │        'native'        │ '94 942' │ 10532.723105914218 │ '±3.71%'  │  9495   │
│    1    │ 'sync-threads (PR #3)' │ '13 445' │ 74373.32819828756  │ '±1.95%'  │  1345   │
│    2    │ 'sync-threads (1.0.1)' │ '1 419'  │ 704270.0790223622  │ '±45.77%' │   168   │
│    3    │   'synckit (0.8.8)'    │ '1 340'  │ 745890.9609738517  │ '±59.90%' │   136   │
│    4    │     'synckit (PR)'     │ '2 799'  │ 357143.01722390315 │ '±7.90%'  │   280   │
└─────────┴────────────────────────┴──────────┴────────────────────┴───────────┴─────────┘

As you can see the performance is not as good as what I was able to get on sync-threads

I did not do any advanced benchmarking yet, but have one or two leads on that:

  • there might be something on the amount of code/modules to load, I've seen that it has quite an impact on benchmarks
  • If it were possible to reuse the SharedArrayBuffer between runs and not initialize a new one each time, I have the impression we could squeeze more performance out of it.
  • The content of the serialized message is different between sync-threads and synckit (one more object wrapping the message) maybe this has an impact?

I can make a few more tests but I'll have to do that a bit later

Copy link

changeset-bot bot commented Dec 27, 2023

⚠️ No Changeset found

Latest commit: 5f5e515

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

codesandbox-ci bot commented Dec 27, 2023

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

@JounQin
Copy link
Member

JounQin commented Dec 27, 2023

I can make a few more tests but I'll have to do that a bit later

Don't push strongly on yourself. ❤️

@onigoetz
Copy link
Contributor Author

Made another change which seems to bring a nice boost

❯ node benchmark/benchmark.mjs
Load time
┌─────────┬───────────────────────────────┬───────────┬────────────────────┬──────────┬─────────┐
│ (index) │           Task Name           │  ops/sec  │ Average Time (ns)  │  Margin  │ Samples │
├─────────┼───────────────────────────────┼───────────┼────────────────────┼──────────┼─────────┤
│    0    │           'native'            │ '792 745' │ 1261.4390091428393 │ '±4.54%' │  79275  │
│    1    │    'sync-threads (PR #3)'     │ '952 414' │ 1049.9627944495455 │ '±1.50%' │  95242  │
│    2    │    'sync-threads (1.0.1)'     │ '951 001' │ 1051.5226540488425 │ '±1.25%' │  95101  │
│    3    │       'synckit (0.8.8)'       │ '989 859' │ 1010.243983666929  │ '±2.25%' │  98986  │
│    4    │        'synckit (PR)'         │ '953 254' │ 1049.0382515278395 │ '±2.70%' │  95326  │
│    5    │ 'synckit (PR, 2nd iteration)' │ '979 876' │ 1020.5364501753178 │ '±0.80%' │  97988  │
└─────────┴───────────────────────────────┴───────────┴────────────────────┴──────────┴─────────┘
Execution time
┌─────────┬───────────────────────────────┬───────────┬────────────────────┬───────────┬─────────┐
│ (index) │           Task Name           │  ops/sec  │ Average Time (ns)  │  Margin   │ Samples │
├─────────┼───────────────────────────────┼───────────┼────────────────────┼───────────┼─────────┤
│    0    │           'native'            │ '147 791' │ 6766.295110421187  │ '±1.29%'  │  14780  │
│    1    │    'sync-threads (PR #3)'     │ '11 327'  │ 88277.88326872836  │ '±3.01%'  │  1133   │
│    2    │    'sync-threads (1.0.1)'     │   '823'   │ 1214413.6658634047 │ '±42.21%' │   83    │
│    3    │       'synckit (0.8.8)'       │  '2 028'  │  493011.089381326  │ '±29.53%' │   203   │
│    4    │        'synckit (PR)'         │  '2 144'  │ 466257.3925284452  │ '±7.59%'  │   215   │
│    5    │ 'synckit (PR, 2nd iteration)' │  '5 626'  │ 177726.22811434325 │ '±12.70%' │   563   │
└─────────┴───────────────────────────────┴───────────┴────────────────────┴───────────┴─────────┘
Total time
┌─────────┬───────────────────────────────┬───────────┬───────────────────┬───────────┬─────────┐
│ (index) │           Task Name           │  ops/sec  │ Average Time (ns) │  Margin   │ Samples │
├─────────┼───────────────────────────────┼───────────┼───────────────────┼───────────┼─────────┤
│    0    │           'native'            │ '116 544' │ 8580.42560284458  │ '±2.94%'  │  11655  │
│    1    │    'sync-threads (PR #3)'     │ '12 995'  │ 76948.63906273474 │ '±1.69%'  │  1300   │
│    2    │    'sync-threads (1.0.1)'     │  '1 396'  │ 715948.3111496513 │ '±39.84%' │   141   │
│    3    │       'synckit (0.8.8)'       │  '1 704'  │ 586677.8552183631 │ '±18.25%' │   171   │
│    4    │        'synckit (PR)'         │  '1 678'  │ 595893.0566197351 │ '±25.07%' │   168   │
│    5    │ 'synckit (PR, 2nd iteration)' │  '1 479'  │ 676000.0064049239 │ '±29.49%' │   162   │
└─────────┴───────────────────────────────┴───────────┴───────────────────┴───────────┴─────────┘

Indeed, reusing the same buffer is a significant performance boost.
I'm not sure that my other leads will bring a significant improvement at this stage.

I'm not sure why the linting fails on README.md maybe there is something with the ESLint configuration ?

@JounQin
Copy link
Member

JounQin commented Dec 27, 2023

For the linting part, I'll talk a look later.

@JounQin
Copy link
Member

JounQin commented Dec 27, 2023

image

@onigoetz

synckit is used by eslint-mdx, the error means the current refactor is not compatible with previous.

What means the user must specific the bufferSize accordingly after.

SYNCKIT_BUFFER_SIZE=1024000 yarn lint:es may do the trick.

@onigoetz
Copy link
Contributor Author

Got it, I think I know what it is then, I'll change the readme accordingly.

@JounQin
Copy link
Member

JounQin commented Dec 27, 2023

Got it, I think I know what it is then, I'll change the readme accordingly.

Hmm... I don't think only readme change is enough, it's not easy to guess the final output size easily from the developer's side? We'd better hide this detail inside like previous.

@onigoetz
Copy link
Contributor Author

Yes, there is a possibility.

This happens because SYNCKIT_BUFFER_SIZE / bufferSize defines the size of the SharedArrayBuffer but the current version actually could be set to 1 because it's only used to notify on the atomic wait and never to transfer the content from the worker to the main thread.

We could leverage the SharedArrayBuffer.grow method to automatically grow it but it's not supported below Node 20

What do you think ?

@onigoetz
Copy link
Contributor Author

I was able to get a bit more details on the error, I don't understand how/why it's running the code block and outputting the content of the file itself.

synckit/README.md
  0:0  error  Parsing error: Worker response is bigger than the allowed transfer size. SharedArrayBuffer is 65536 bytes big, response is 72910 {"id":0,"result":{"root":{"type":"root","children":[{"type":"heading","depth":1,"children":[{"type":"text","value":"synckit","position":{"start":{"line":1,"column":3,"offset":2},"end":{"line":1,"colum

@JounQin
Copy link
Member

JounQin commented Dec 28, 2023

I was able to get a bit more details on the error, I don't understand how/why it's running the code block and outputting the content of the file itself.

synckit/README.md
  0:0  error  Parsing error: Worker response is bigger than the allowed transfer size. SharedArrayBuffer is 65536 bytes big, response is 72910 {"id":0,"result":{"root":{"type":"root","children":[{"type":"heading","depth":1,"children":[{"type":"text","value":"synckit","position":{"start":{"line":1,"column":3,"offset":2},"end":{"line":1,"colum

It's virtual file feature of ESLint, and in eslint-mdx, both the whole file content and the code blocks will be linted at the same time.

@JounQin
Copy link
Member

JounQin commented Dec 28, 2023

We could leverage the SharedArrayBuffer.grow method to automatically grow it but it's not supported below Node 20

Then we could have different strategies for different node versions, what means we can use the previous solution for node <20 for compatibility, but change to use sharedArrayBuffer for node >=20 for better performance.

src/index.ts Outdated Show resolved Hide resolved
@onigoetz
Copy link
Contributor Author

Hi, I changed the behaviour to only use the SharedArrayBuffer if it can be automatically grown, which does not require any new option to be added

Copy link

codecov bot commented Dec 29, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (7f453ca) 100.00% compared to head (f3623d6) 99.52%.
Report is 1 commits behind head on main.

❗ Current head f3623d6 differs from pull request most recent head 5f5e515. Consider uploading reports for the commit 5f5e515 to get more accurate results

Files Patch % Lines
src/index.ts 88.88% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##              main     #150      +/-   ##
===========================================
- Coverage   100.00%   99.52%   -0.48%     
===========================================
  Files            1        1              
  Lines          207      212       +5     
  Branches       103      106       +3     
===========================================
+ Hits           207      211       +4     
- Misses           0        1       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/SharedArrayBuffer.d.ts Outdated Show resolved Hide resolved
src/index.ts Outdated Show resolved Hide resolved
src/index.ts Outdated
@@ -501,7 +511,7 @@ function startWorkerThread<R, T extends AnyAsyncFn<R>>(
: workerPathUrl,
{
eval: useEval,
workerData: { workerPort },
workerData: { workerPort, sharedBuffer, useBuffer },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
workerData: { workerPort, sharedBuffer, useBuffer },
workerData: useBuffer ? { sharedBuffer } : { workerPort },

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we need the buffer anyway for the Atomics.wait
Although we could indeed infer the useBuffer on the other side but I don't know what the relative performance cost is between checking a boolean on an existing object or passing one more property through an object.

I personally prefer the stability of knowing that the properties are here anyway and simplifying the type definition.

Copy link
Member

@JounQin JounQin Dec 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const useBuffer = 'growable' in SharedArrayBuffer.prototype

The above can be used outside, no need to pass it at all.

We're not using maxByteLength at all, so it should always be growable if this property exists.

@JounQin
Copy link
Member

JounQin commented Dec 29, 2023

Thanks @onigoetz for your great job! And sorry for my nitpicking.

@JounQin
Copy link
Member

JounQin commented Dec 29, 2023

And don't worry about the test coverage failing, I'll add some ignore comments accordingly, of course, you can also add them by yourself.

src/SharedArrayBuffer.d.ts Outdated Show resolved Hide resolved
@JounQin
Copy link
Member

JounQin commented Dec 29, 2023

Besides, I want to invite you to join this package.

Three options:

  1. join @un-ts org
  2. be a outside collaborator on this repo only
  3. refuse and keep us as-is

Feel free to refuse.

@onigoetz
Copy link
Contributor Author

And sorry for my nitpicking.

No worries, it's your project and you'll be the one maintaining what I contribute.

Besides, I want to invite you to join this package.

Thanks a lot for the proposition but I have to decline for now. I regularly make small pull requests here and there for some specific things and don't have much time to dedicate to that. If I contribute more regularly on your packages I would be happy to reconsider the offer.

And don't worry about the test coverage failing, I'll add some ignore comments accordingly

Oh, that's curious. I checked and the branch that is uncovered is the one that is using v8 serialization.

Apparently I misunderstood the documentation; a SharedArrayBuffer can only be growable if there is a maxByteLength option specified when creating it.

I am experimenting with this option and will update this PR accordingly, my idea at this stage is:

  • Always serialize on worker side
  • If the serialized size is smaller than the buffer size -> send through SAB
  • If the serialized size is bigger, but the buffer can be grown -> grow + send through SAB
  • Otherwise fallback to postMessage

I was able to get this to work but am still stuck on the yarn lint command on the README.md, I have the impression there is something weird going on there.

@JounQin
Copy link
Member

JounQin commented Dec 30, 2023

Thanks a lot for the proposition but I have to decline for now.

Sure, never mind.

Oh, that's curious. I checked and the branch that is uncovered is the one that is using v8 serialization.

Hah, I was thinking it could be related to Node compatibility, because we could not test on Node 18 and Node 20 at the same time? (I'm still doubting this)

@onigoetz
Copy link
Contributor Author

RangeError: "length" is outside of buffer bounds
    at fromArrayBuffer (node:buffer:502:15)
    at Function.from (node:buffer:302:14)
    at syncFn (synckit/lib/index.cjs:357:16)
    at Parser.parseForESLint (/synckit/node_modules/eslint-mdx/lib/parser.js:32:49)
    at parse (synckit/node_modules/eslint/lib/linter/linter.js:849:22)
    at Linter._verifyWithoutProcessors (synckit/node_modules/eslint/lib/linter/linter.js:1326:33)
    at Linter._verifyWithoutProcessors (synckit/node_modules/@eslint-community/eslint-plugin-eslint-comments/lib/utils/patch.js:183:42)
    at synckit/node_modules/eslint/lib/linter/linter.js:1928:29
    at Array.map (<anonymous>)

Oh, I understand something now; eslint-mdx is using synckit ! the ultimate test 😮 !
I really didn't get why this error happened, as if were executing the code inside the readme's examples.

I'm really puzzled as to why it does not work yet, I will figure it out :)

@onigoetz
Copy link
Contributor Author

onigoetz commented Jan 2, 2024

I think the PR is ready to be merged at this stage, but I think it should not be merged, because I made a discovery on the benchmark I used;

Run 1

Execution time
......
Fastest is native
┌─────────┬────────────────────────┬───────────┬────────────┬─────────┐
│ (index) │       Task Name        │  ops/sec  │   Margin   │ Samples │
├─────────┼────────────────────────┼───────────┼────────────┼─────────┤
│    0    │        'native'        │ '154,208' │ '± 0.21%'  │   98    │
│    1    │ 'sync-threads (PR #3)' │ '16,280'  │ '± 0.27%'  │   92    │
│    2    │   'synckit (0.8.8)'    │  '1,442'  │ '± 62.94%' │   34    │
│    3    │  'synckit (PR #150)'   │   '437'   │ '± 37.86%' │   32    │
│    4    │ 'sync-threads (1.0.1)' │   '412'   │ '± 30.07%' │   13    │
└─────────┴────────────────────────┴───────────┴────────────┴─────────┘

Run 2

┌─────────┬────────────────────────┬───────────┬────────────┬─────────┐
│ (index) │       Task Name        │  ops/sec  │   Margin   │ Samples │
├─────────┼────────────────────────┼───────────┼────────────┼─────────┤
│    0    │        'native'        │ '153,057' │ '± 0.20%'  │   97    │
│    1    │ 'sync-threads (PR #3)' │ '16,132'  │ '± 0.76%'  │   92    │
│    2    │  'synckit (PR #150)'   │  '1,686'  │ '± 13.07%' │   47    │
│    3    │   'synckit (0.8.8)'    │   '884'   │ '± 49.34%' │   40    │
│    4    │ 'sync-threads (1.0.1)' │   '341'   │ '± 48.13%' │    9    │
└─────────┴────────────────────────┴───────────┴────────────┴─────────┘

As you can see, the position of synckit (PR #150) and synckit (0.8.8) gets switched between two runs, and has a significant margin of error, whereas sync-threads (PR #3) does not.

It puzzled me for a while but I discovered the cause when I started looking at memory usage

Memory usage after native
Memory usage by rss, 139.59168MB 

Memory usage after sync-threads (PR #3)
Memory usage by rss, 149.962752MB 

Memory usage after sync-threads (1.0.1)
Memory usage by rss, 5437.128704MB 

Memory usage after synckit (0.8.8)
Memory usage by rss, 3357.376512MB 

Memory usage after synckit (PR #150)
Memory usage by rss, 3224.764416MB 

Memory usage after synckit (PR new)
Memory usage by rss, 2942.451712MB 

As you can see, it jumps from 149MB to 5432MB when running the benchmark for sync-threads (1.0.1) and this causes the process to start swapping like crazy. Which penalizes every benchmark running after this one.

I switched the benchmarks around and here are the new results:

Execution time
Memory usage after: native
Memory usage by rss, 137.101312MB 

Memory usage after: sync-threads (PR #3)
Memory usage by rss, 146.587648MB 

Memory usage after: synckit (0.8.8)
Memory usage by rss, 195.166208MB 

Memory usage after: synckit (PR #150)
Memory usage by rss, 235.388928MB 

Memory usage after: synckit (PR new)
Memory usage by rss, 269.9264MB 

Memory usage after: sync-threads (1.0.1)
Memory usage by rss, 6417.2032MB 
┌─────────┬────────────────────────┬───────────┬────────────┬─────────┐
│ (index) │       Task Name        │  ops/sec  │   Margin   │ Samples │
├─────────┼────────────────────────┼───────────┼────────────┼─────────┤
│    0    │        'native'        │ '152,872' │ '± 0.37%'  │   101   │
│    1    │   'synckit (PR new)'   │ '16,408'  │ '± 0.27%'  │   98    │
│    2    │ 'sync-threads (PR #3)' │ '16,007'  │ '± 0.29%'  │   96    │
│    3    │   'synckit (0.8.8)'    │ '16,012'  │ '± 0.45%'  │   99    │
│    4    │  'synckit (PR #150)'   │ '14,699'  │ '± 1.06%'  │   92    │
│    5    │ 'sync-threads (1.0.1)' │   '396'   │ '± 39.70%' │   10    │
└─────────┴────────────────────────┴───────────┴────────────┴─────────┘

As you can see, the margin of error is very low, for most runs, which seems to indicate that most runs complete within the same timeframe.

And the other thing we can see is that this PR is actually slower than the current version of synckit. By looking at the comparison between sync-threads (PR #3) and synckit (0.8.0) I seem to understand that postMessage is probably using an equivalent of SharedArrayBuffer under the hood but without size limitations which makes the whole juggling with buffer sizes unneeded.

Fear not!
I have found a way to still make synckit a bit faster, and this will come in a separate PR.
You can already see its results in the benchmark above ( synckit (PR new) )


For documentation's sake, this PR does the following:

  • Create a single SharedArrayBuffer and pass it to the worker (Instead of one per function run)
  • Once the function is run on the worker, serialize it
  • if it's smaller than the allowed size, send it through SAB
  • if it's bigger than the allowed size, but still within the growth limit (if supported), resize and send through SAB
  • if it is still too big, send through postMessage (store the buffer length as -1)
  • on the receiving side, choose the way to receive the message through the SAB's first byte

I also added two tests that ensure we can grow the buffer if relevant or fallback if it does not fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants