perf(localSearch): add concurrency pooling, cleanup logic, improve performance #3374

zhangyx1998 · 2023-12-23T11:59:19Z

Highlights

This PR enables the local search plugin to work in true parallel. While keeping its API backward compatible, it now allows asynchronous user-provided callbacks for both html generation and section splitting (which is very expensive).

In addition, by refactoring the logic of task dispatching, the plugin will no longer accumulate content for all pages and locales before indexing. Therefore, this change will benefit all local-search users out-of-box.

What's next

I've implemented multiprocessing that worked with this PR. That implementation is not yet included here because (1) I am not sure whether the inclusion of a new package (JSDOM) is acceptable, and (2) I do not know if people would agree with the idea of multiprocessing.

As you can see in this commit, the new miniSearch.splitIntoSections() config option allows an AsyncGenerator as its return value. And the AsyncGenerator can be used to proxy a worker thread using JS Event model.

One important reason I chose local-search plugin to start experimenting is data isolation. The splitting task only needs html string, so it's easy to be stripped out of the context. In my project, the overhead of IPC is reduced to the minimum since the main process only sends a path, not the entire content.

However, if done correctly, many other processes can also be refactored to run in parallel. For example, the renderPage() looks very promising, which currently takes relatively long to complete, especially for large volume of content. If we can make that part parallel, I believe almost everyone would receive a noticeable speedup.

The speedup is crazy

Both task are performed on Apple M1 Max, building this project

Before: 16008.00 seconds (4.4 hours)

  vitepress v1.0.0-rc.32

⠹ building client + server bundles...
(!) Some chunks are larger than 500 kB after minification. Consider:
- Using dynamic import() to code-split the application
- Use build.rollupOptions.output.manualChunks to improve chunking: https://rollupjs.org/configuration-options/#output-manualchunks
- Adjust chunk size limit for this warning via build.chunkSizeWarningLimit.
/Users/Yuxuan/Lab/xorg-doc/docs/index.md
✓ building client + server bundles...
✓ rendering pages...
build complete in 16008.00s.

After: 240 seconds (4 minutes)

  vitepress v1.0.0-rc.32

🔍️ Indexing files for search...
✅ Indexing finished...
⠹ building client + server bundles...
(!) Some chunks are larger than 500 kB after minification. Consider:
- Using dynamic import() to code-split the application
- Use build.rollupOptions.output.manualChunks to improve chunking: https://rollupjs.org/configuration-options/#output-manualchunks
- Adjust chunk size limit for this warning via build.chunkSizeWarningLimit.
🔍️ Indexing files for search...
✅ Indexing finished...
✓ building client + server bundles...
✓ rendering pages...
build complete in 248.53s.

Problem to be fixed

The default section splitting function splitPageIntoSections() uses two regular expressions to detect and stip headers from html sources. However, it assumes that each <h*> element has an <a> element as it first child. This is only true when generated by pure markdown sources. However, markdown allows html element embedding (vitepress even makes it reactive), and users would expect their own headers to be detected normally, which may not follow the RegExp patterns.

In addition, the original splitter used a sparse array to track current header hierarchies. However, it fails at some very simple cases. For example:

.       <!-- expected | actual   -->
#   A1  <!-- A1       | A1       -->
##  A2  <!-- A1 A2    | A1 A2    -->
#   B1  <!-- B1       | B1 A2    -->
### B3  <!-- B1 B3    | B1 A2 B3 -->

All these problems have been fixed in the worker implementation from my project. I can spend some time to port them back (with or without multiprocessing).

fixes vuejs#3377

zhangyx1998 · 2023-12-26T10:48:46Z

Force pushed to keep up with main branch (use pMap instead).

types/default-theme.d.ts

…code

brc-dd

LGTM, I'll merge this after testing once with unocss docs or something.

zhangyx1998 · 2023-12-30T18:18:26Z

LGTM, I'll merge this after testing once with unocss docs or something.

Sure, I will create another PR to port the JSDOM based section splitter back after this one gets merged. After that all users should receive performance benefit as well as enhanced indexing robustness.

zhangyx1998 mentioned this pull request Dec 26, 2023

[local search] indexing too slow to be usable for large sites #3377

Closed

4 tasks

zhangyx1998 force-pushed the refactor/local-search branch from 719813f to 9889339 Compare December 26, 2023 10:46

refactor(localSearch): use pMap, improve performance, cleanup logic

b1dce46

fixes vuejs#3377

zhangyx1998 force-pushed the refactor/local-search branch from 9889339 to b1dce46 Compare December 26, 2023 10:48

brc-dd self-assigned this Dec 27, 2023

Merge branch 'vuejs:main' into refactor/local-search

b1914a7

zhangyx1998 mentioned this pull request Dec 29, 2023

feat: implement multithreading for rendering and local search indexing, use JSDOM for better section split performance and reliability. #3386

Open

brc-dd reviewed Dec 30, 2023

View reviewed changes

types/default-theme.d.ts Outdated Show resolved Hide resolved

zhangyx1998 added 2 commits December 30, 2023 13:05

Merge branch 'main' into refactor/local-search

198228f

underscore _splitIntoSections to avoid being included in client side …

9e5c8ee

…code

brc-dd changed the title ~~refactor(localSearch): add concurrency pooling, cleanup logic, improve performance~~ perf(localSearch): add concurrency pooling, cleanup logic, improve performance Dec 30, 2023

brc-dd approved these changes Dec 30, 2023

View reviewed changes

dont index 404

aabedf9

brc-dd merged commit ac5881e into vuejs:main Dec 30, 2023
7 checks passed

github-actions bot locked as resolved and limited conversation to collaborators Jan 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(localSearch): add concurrency pooling, cleanup logic, improve performance #3374

perf(localSearch): add concurrency pooling, cleanup logic, improve performance #3374

zhangyx1998 commented Dec 23, 2023 •

edited

zhangyx1998 commented Dec 26, 2023

brc-dd left a comment

zhangyx1998 commented Dec 30, 2023

perf(localSearch): add concurrency pooling, cleanup logic, improve performance #3374

perf(localSearch): add concurrency pooling, cleanup logic, improve performance #3374

Conversation

zhangyx1998 commented Dec 23, 2023 • edited

Highlights

What's next

The speedup is crazy

Before: 16008.00 seconds (4.4 hours)

After: 240 seconds (4 minutes)

Problem to be fixed

zhangyx1998 commented Dec 26, 2023

brc-dd left a comment

Choose a reason for hiding this comment

zhangyx1998 commented Dec 30, 2023

zhangyx1998 commented Dec 23, 2023 •

edited