Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance on large monorepos #115

Open
kunickiaj opened this issue May 25, 2023 · 3 comments
Open

Poor performance on large monorepos #115

kunickiaj opened this issue May 25, 2023 · 3 comments

Comments

@kunickiaj
Copy link

When trying to use this on a large monorepo, performance is very poor -- one docs site with ~30 pages has gone from taking a a few seconds to build to a few minutes.

I'm testing a change that would parallelize calls to git log up front once we have the list of files on_files since mkdocs itself can't be parallelized to run on_page_markdown in a multi-threaded fashion.

Curious if anyone else has run into this and if this is something that would be a useful contribution via PR.

@timvink
Copy link
Owner

timvink commented May 26, 2023

@squidfunk I know you did this in several of your plugins. Any wisdom to share?

@squidfunk
Copy link
Sponsor Collaborator

squidfunk commented May 26, 2023

In Material for MkDocs, the privacy plugin, optimize plugin and new social plugin make heavy use of concurrent futures and caching of (partial) results. It's a new technique I learned when first writing the optimize plugin. The general idea is to split off work into threads where possible, and only reconcile jobs when necessary. Examples:

  • The new social plugin entirely offloads image creation in on_page_markdown into threads and generates all layers in parallel after deduplicating them, and reconciles them for compositing the final image. It then reconciles the composited images in on_post_page to copy the generated image returned from the future in order to ensure a consistent state for other plugins that run after on_post_page.

  • The privacy plugin searches for external assets in on_page_content and enqueues them for downloading, moving that into concurrent threads as well, since some assets need more assets to be downloaded (e.g. Google Fonts CSS contains links to web font files that need to be downloaded as well). Then, in on_post_page and on_post_template, external assets are replaced, and potentially further discovered external assets (added by other plugins) are downloaded synchronously. However, the plugin does as much work as possible asynchronously.

  • The privacy plugin and optimize plugin can actually work together (!), downloading external assets and pushing them through the optimization pipeline, by reconciling downloaded assets in on_env (in the privacy plugin), which can then be picked up by the optimize plugin. This allows to build documentation with external assets (e.g. screenshots), hosting them outside of a repository, but inlining heavily optimized versions of them into the build.

I plan to write a blog post about my learnings in writing MkDocs plugins in the future.

@kunickiaj
Copy link
Author

I have a prototype patch for the time stamp one where I do something similar. I'm using on_files to compute time stamps for all files, finding that 10 is about the max parallelism that works reliably.

As a test case a site with 78 markdown files in a large monorepo takes 5 seconds to generate with backstage techdocs cli. With the time stamp plug-in this increased to 378 seconds. With the parallel precomputed time stamp patch this goes down to 69 seconds -- still quite a big hit but a big improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants