Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompute all last commit timestamps in on_files #116

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kunickiaj
Copy link

@kunickiaj kunickiaj commented May 31, 2023

In parallel, precompute all last commit timestamps in on_files so that we can process more quickly. We need to do this when we have all the files so we can do the work in parallel, rather than on_page_markdown.

This does not pre-compute for first commit timestamp. Can significantly improve wall time ref: #115

Looking for some feedback on this approach. If this looks reasonable we can figure out support for the first commit timestamp as well as a way to configure parallelism. This currently takes the min of 10 or however many cpus are reported.

On an M1 Max Macbook Pro (8 performance, 2 efficiency cores) this resulted in a speed up of ~5.5x when processing a large monorepo that originally took 378 seconds down to 69 seconds. Tested on 78 markdown files rendered in a repo of approximately 700k commits and 500k files.

In parallel, precompute all last commit timestamps in on_files so that
we can process more quickly. We need to do this when we have all the
files so we can do the work in parallel, rather than on_page_markdown.

This does not pre-compute for first commit timestamp.
@timvink
Copy link
Owner

timvink commented Oct 15, 2023

Sorry for the very late reply, this project has not been a priority..

Very cool PR, 5.5x improvement is considerable!

One problem I see however is using the files collection at on_files() instead of the page at on_page_markdown() . The reason is that some other plugins move files around. Here's an example mkdocs-monorepo

They basically create a new docs_dir from several source folders:

https://github.com/backstage/mkdocs-monorepo-plugin/blob/c778b3010eb986a2f3b719bc7a3d29d86236c238/mkdocs_monorepo_plugin/plugin.py#L54-L61

And then they update the page.abs_src_url :

https://github.com/backstage/mkdocs-monorepo-plugin/blob/c778b3010eb986a2f3b719bc7a3d29d86236c238/mkdocs_monorepo_plugin/plugin.py#L65-L72

So this bit from the PR will need some more edge case handling:

https://github.com/timvink/mkdocs-git-revision-date-localized-plugin/pull/116/files#diff-38d392fd1ac6a39ad46a5d047e294c69fe0f1b6aa8fc7fea3a35c1846925d21cR166-R172

@timvink
Copy link
Owner

timvink commented Oct 15, 2023

Another promising avenue might be to tweak git a bit, there are a couple of settings for large repos that might git blame operations much faster:

https://www.git-tower.com/blog/git-performance/

Have you tried something like that? Might be worth documenting in this plugin

@kunickiaj
Copy link
Author

Yeah, we're well aware of all those git features to make monorepos less of a pain, but it is still incredibly slow. To be fair, when updating docs for a single project or two the time hit is probably still acceptable as the application CI is going to take longer in most cases -- but if doing a bulk update across many docs in the repo it's going to time out CI. (Not to mention the $ cost of longer running CI in general).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants